VC: Varying Coefficient#

Setup and Imports#

import jax.numpy as jnp
import liesel.goose as gs
import liesel.model as lsl
import tensorflow_probability.substrates.jax.distributions as tfd

import liesel_gam as gam
# import data from R
from ryp import r, to_py

r("library(mgcv)")
r("data(columb)")
r("data(columb.polys)")

columb = to_py("columb", format="pandas").reset_index()
polys = to_py("columb.polys", format="numpy")
Loading required package: nlme
This is mgcv 1.9-3. For overview type 'help("mgcv-package")'.
gam.plot_polys(region="district", which=["crime"], df=columb, polys=polys)
columb.head()
index area home.value income crime open.space district x y label observed
0 0 0.309441 80.467003 19.531 15.725980 2.850747 0 8.827218 14.369076 0 True
1 1 0.259329 44.567001 21.232 18.801754 5.296720 1 8.332658 14.031624 1 True
2 2 0.192468 26.350000 15.956 30.626781 4.534649 2 9.012265 13.819719 2 True
3 3 0.083841 33.200001 4.477 32.387760 0.394427 3 8.460801 13.716962 3 True
4 4 0.488888 23.225000 11.252 50.731510 0.405664 4 9.007982 13.296366 4 True

Model Definition#

df = columb
tb = gam.TermBuilder.from_df(df)

loc = gam.AdditivePredictor("$\\mu$")
scale = gam.AdditivePredictor("$\\sigma$", inv_link=jnp.exp)


y = lsl.Var.new_obs(
    value=df.crime.to_numpy(),
    distribution=lsl.Dist(tfd.Normal, loc=loc, scale=scale),
    name="y",
)
loc += tb.lin("area")

# I use scale=1.0 for demo; otherwise there's no nonlinearity
smooth = tb.ps("income", scale=1.0, k=20)
loc += tb.vc("area", by=smooth)

Build and plot model#

model = lsl.Model([y])
model.plot_vars()
liesel.model.model - INFO - Converted dtype of Value(name="y_value").value
../../_images/821548d47e12d62bc672a91c832e94e6f05d83c2fb2b34893a03989931a39695.png

Run MCMC#

eb = gs.LieselMCMC(model).get_engine_builder(seed=1, num_chains=4)

eb.add_burnin(3000)
eb.add_posterior(10_000, thinning=10)

engine = eb.build()
engine.sample_all_epochs()
results = engine.get_results()
liesel.goose.builder - WARNING - No jitter functions provided for position keys '$\\beta_{0,\\sigma}$', '$\\beta_{0,\\mu}$', '$\\beta_{ps(income)}$', '$\\beta_{lin(X)}$'. The initial values for these keys won't be jittered
liesel.goose.engine - INFO - Initializing kernels...
liesel.goose.engine - INFO - Done
liesel.goose.engine - INFO - Starting epoch: BURNIN, 3000 transitions, 1000 jitted together
100%|██████████████████████████████████████████| 3/3 [00:03<00:00,  1.05s/chunk]
liesel.goose.engine - INFO - Finished epoch
liesel.goose.engine - INFO - Finished warmup
liesel.goose.engine - INFO - Starting epoch: POSTERIOR, 10000 transitions, 1000 jitted together
100%|████████████████████████████████████████| 10/10 [00:00<00:00, 10.01chunk/s]
liesel.goose.engine - INFO - Finished epoch

MCMC summary#

summary = gs.Summary(results)
summary

Parameter summary:

kernel mean sd q_0.05 q_0.5 q_0.95 sample_size ess_bulk ess_tail rhat
parameter index
$\beta_{0,\mu}$ () kernel_01 43.391140 3.366579 37.937537 43.370134 48.945009 4000 2411.448336 3362.116664 1.000702
$\beta_{0,\sigma}$ () kernel_00 2.578274 0.107992 2.404410 2.575065 2.767230 4000 3721.120953 3719.728872 0.999620
$\beta_{lin(X)}$ (0,) kernel_03 -35.269196 15.168326 -60.310472 -35.178322 -10.583267 4000 2241.893591 3284.642761 1.001789
$\beta_{ps(income)}$ (0,) kernel_02 0.021969 1.004871 -1.596882 0.028648 1.737700 4000 3908.197582 4053.404557 0.999797
(1,) kernel_02 -0.036039 1.001104 -1.660604 -0.039733 1.576322 4000 3574.884123 3621.987359 0.999895
(2,) kernel_02 0.014698 1.013416 -1.673915 0.014581 1.694197 4000 3779.472613 3564.219279 1.000904
(3,) kernel_02 0.028991 1.004531 -1.623405 0.025629 1.702308 4000 4000.345735 3908.048134 0.999788
(4,) kernel_02 -0.001196 0.965355 -1.570153 -0.006431 1.624117 4000 3796.446290 3972.726172 0.999813
(5,) kernel_02 -0.025677 1.008663 -1.701040 -0.035333 1.621115 4000 4002.483911 3724.568115 0.999961
(6,) kernel_02 -0.024267 1.008338 -1.668561 -0.006960 1.638577 4000 3876.955717 3839.455076 0.999404
(7,) kernel_02 -0.009935 0.992258 -1.650928 -0.000927 1.606830 4000 3927.522120 3910.823750 1.000244
(8,) kernel_02 0.041307 0.965528 -1.542792 0.044304 1.633256 4000 3891.836037 3888.981465 0.999798
(9,) kernel_02 -0.021953 0.980754 -1.597951 -0.020468 1.609202 4000 3897.169620 3822.694475 0.999709
(10,) kernel_02 0.004618 0.997596 -1.602746 0.001373 1.640568 4000 3690.435128 3918.456156 1.000368
(11,) kernel_02 -0.006033 0.978728 -1.610733 -0.015630 1.597220 4000 3784.525304 3918.513384 1.000226
(12,) kernel_02 0.049197 0.995806 -1.603548 0.048111 1.685111 4000 3878.068795 3914.805969 0.999967
(13,) kernel_02 -0.090835 0.992792 -1.722934 -0.076297 1.522597 4000 3710.183565 3909.855047 0.999696
(14,) kernel_02 0.154257 0.978700 -1.458221 0.136620 1.769688 4000 3716.395131 3973.648679 1.000251
(15,) kernel_02 -0.218342 0.956908 -1.826052 -0.217859 1.342845 4000 3498.112953 3890.556821 1.000828
(16,) kernel_02 0.694740 0.926384 -0.831754 0.712509 2.191217 4000 3868.663743 4008.891208 1.000165
(17,) kernel_02 0.668956 0.640261 -0.388318 0.675546 1.710092 4000 3856.886357 3900.257724 0.999956
(18,) kernel_02 -10.438582 2.636226 -14.879080 -10.376647 -6.037162 4000 3606.876497 3827.181345 0.999523

Plots#

samples = results.get_posterior_samples()
gam.plot_1d_smooth(term=model.vars["ps(income)"], samples=samples)