Bayesian Generalized Additive Models in Liesel#

This title is short and catchy, but does not convey the full range of models covered by this Python library. We could also say:

  • Bayesian Generalized Additive Models for Location, Scale, and Shape (and beyond)

  • Bayesian Structured Additive Distributional Regression

Panel of GAM summary plots

This library provides functionality to make the setup of generalized additive models in Liesel convenient. It uses ryp to obtain basis and penalty matrices from the R package mgcv, nd relies on formulaic to parse Wilkinson formulas, known to many from the formula syntax in R.

Some technical highlights:

  • Express Bayesian models as probabilistic graphical models in Python via liesel.model

  • Build custom MCMC algorithms in Python, including Gibbs samplers, Hamiltonian Monte Carlo (HMC), the iteratively reweighted least squares sampler (IWLS), and more via liesel.goose

  • Speed up models using just-in-time compilation and automatic differentiation via JAX, since Liesel builds on JAX.

  • Use statistical distributions and bijectors offered by Tensorflow-Probability

Learn more in the Liesel paper:

  • Riebl, H., Wiemann, P. F. V., & Kneib, T. (2023). Liesel: A probabilistic programming framework for developing semi-parametric regression models and custom Bayesian inference algorithms (No. arXiv:2209.10975). arXiv. http://arxiv.org/abs/2209.10975

Installation#

The library can be installed from PYPI:

$ pip install liesel_gam

Since liesel-GAM interfaces with R via ryp under the hood, you also need the R packages {arrow} and {svglite} to be available on your system:

$ Rscript -e "install.packages(c('arrow', 'svglite'))"

Demo Notebooks#

This documentation contains some notebooks that demonstrate how to put the pieces together. The API documentation further below then provides extensive information on all the individual pieces.

Check out the demos on polynomial regression and on P-splines for additional example code, for example on on posterior predictive sampling.

Relevant Literature#

Fahrmeier et al. (2013) is a textbook that introduces structured additive regression concepts from the ground up. Wood (2017) is another seminal textbook on generalized additive models. The R package mgcv provides many basis functions and penalty matrices that we use in liesel_gam.

The other references are seminal papers on structured additive distributional regression.

  • Kneib, T., Klein, N., Lang, S., & Umlauf, N. (2019). Modular regression—A Lego system for building structured additive distributional regression models with tensor product interactions. TEST, 28(1), 1–39. https://doi.org/10.1007/s11749-019-00631-z

  • Umlauf, N., Klein, N., & Zeileis, A. (2018). Bamlss: Bayesian additive models for location, scale, and shape (and beyond). Journal of Computational and Graphical Statistics, 27(3), 612–627. https://doi.org/10.1080/10618600.2017.1407325

  • Klein, N., Kneib, T., Lang, S., & Sohn, A. (2015). Bayesian structured additive distributional regression with an application to regional income inequality in Germany. The Annals of Applied Statistics, 9(2), 1024–1052. https://doi.org/10.1214/15-AOAS823

API Reference#

High-level API#

AdditivePredictor

A Liesel Var that represents an additive predictor.

TermBuilder

Initializes structured additive model terms.

BasisBuilder

Initializes Basis objects from data in a PandasRegistry.

Plots#

plot_1d_smooth

Plots a posterior summary for a one-dimensional smooth.

plot_2d_smooth

Plots a posterior summary for a two-dimensional smooth function.

plot_forest

Forest plot summary of a linear or discrete effect.

plot_polys

Plot data on a map of regions defined by a dictionary of polygons.

plot_regions

Plot a summary map of a discrete spatial effect.

plot_1d_smooth_clustered

Plots a clustered smooth or linear function.

Summary#

summarise_1d_smooth

Creates a summary dataframe for a one-dimensional StrctTerm.

summarise_nd_smooth

Summarises an n-dimensional smooth.

summarise_lin

Summarises a linear term.

summarise_cluster

Summarises a discrete term represented by RITerm or MRFTerm.

summarise_regions

Summarises a discrete spatial term.

summarise_1d_smooth_clustered

Summarises a clustered smooth or linear function.

summarise_by_samples

Summarizes an array of posterior samples via subsamples.

polys_to_df

Turns a polys dictionary into a dataframe appropriate for plotting.

Bases#

Basis

General basis for a structured additive term.

MRFBasis

Dedicated basis object for Markov random fields.

LinBasis

Dedicated basis object for linear effects.

Terms and Variables#

StrctTerm

General structured additive term.

StrctInteractionTerm

Anisotropic structured additive interaction term.

StrctTensorProdTerm

Anisotropic structured additive tensor product term.

LinTerm

Specialized BasisDot for general linear effects.

StrctLinTerm

Specialized StrctTerm for linear effects.

LinMixin

Mixin that adds formula metadata to linear-term classes.

IndexingTerm

Term object for memory-efficient representation of sparse bases.

RITerm

Term object for memory-efficient representation of independent random intercepts.

MRFTerm

Term object for Markov random fields.

BasisDot

Basic term variable for a dot-product basis @ coef.

ScaleIG

A variable with an Inverse Gamma prior on its square.

UserVar

A liesel.model.Var, adapted for subclassing.

Distribution#

MultivariateNormalSingular

Potentially rank-deficient multivariate Gaussian distribution used as a prior in structured additive terms.

MultivariateNormalStructured

Potentially rank-deficient multivariate Gaussian distribution for the prior used in structured tensor product terms.

StructuredPenaltyOperator

Operator for efficiently computing the pseudo log determinant and quadratic form of a structured tensor product precision matrix.

Other#

PandasRegistry

Registry for managing variables and their transformations.

CategoryMapping

Wraps a category mapping of labels to integers.

MRFSpec

A named tuple, containing information about the Markov random field setup.

NameManager

Creates unique names.

VarIGPrior

demo_data

Generate demo data for structured additive distributional regression.

demo_data_ta

Generate demo data for anisotropic tensor products.

LinearConstraintEVD

Computes reparameterization matrices for linear constraints.

In/Out

read_bnd

Reads a .bnd file with geographical information, returns a polys dictionary.

polygon_is_closed

Validate that a polygon is closed: first vertex equals last vertex (within tolerance).

Experimental#

The API of modules, classes and functions in the experimental module is less stable than in other modules of liesel_gam. If you depend on this, expect changes in the future.

BSplineApprox

Approximate B-spline evaluations on a fixed grid.

Acknowledgements and Funding#

We are grateful to the German Research Foundation (DFG) for funding the development through grant 443179956.

University of Göttingen Funded by DFG

Indices and tables#