For GLSSMs we can evaluate the likelihood analytically with a single pass of the Kalman filter. Based on the predictions \(\hat Y_{t| t - 1}\) and associated covariance matrices \(\Psi_{t + 1 | t}\) for \(t = 0, \dots n\) produced by the Kalman filter we can derive the gaussian negative log- likelihood which is given by the gaussian distribution with that mean and covariance matrix and observation \(Y_t\).
For a parametrized GLSSM, that is a model that depends on parameters \(\theta\), we can use numerical optimization to find the maximum likelihood estimatior.
Caution
With these methods, the user has to take care that they provide a parametrization that is unconstrained, i.e. using \(\log\) transformations for positive parameters.
Implementation Details
For low dimensional state space models obtaining the gradient of the negative log likelihood may be feasible by automatic differentiation, in this case use the mle_glssm_ad method. Otherwise the derivative free Nelder-Mead method in mle_glssm may be favorable.
To stabilize numerical results we minimize \(\frac{1}{(n + 1)p} \log_{\theta} p(y)\) instead of \(\log p_\theta (y)\).
Numerical differentiation is much faster here, and as accurate as automatic differentiation.
import matplotlib.pyplot as plt
# 2d grid on the log scalek =21# number of evaluations in each dimensionlog_s2_eps, log_s2_eta = jnp.meshgrid( jnp.linspace(-3, 3, k) + theta[0], jnp.linspace(-3, 3, k) + theta[1])# flattenthetas = jnp.vstack([log_s2_eps.ravel(), log_s2_eta.ravel()]).Tdef gnll_theta(theta):return gnll_full(y, parameterized_lcm(theta, aux))nlls = vmap(gnll_theta)(thetas)# location of minium in nllsi = jnp.argmin(nlls)# location of minimum in the gridi_eps, i_eta = i //21, i %21plt.contourf(log_s2_eps, log_s2_eta, nlls.reshape(k, k), alpha=0.5)plt.scatter( log_s2_eps[i_eps, i_eta], log_s2_eta[i_eps, i_eta], c="white", marker="x", label="min_grid",)plt.scatter(theta[0], theta[1], c="r", marker="x", label="true")plt.scatter(*result_bfgs.x, c="g", marker="x", label="$\\hat\\theta$")plt.legend()plt.xlabel("$\\log(\\sigma^2_\\varepsilon)$")plt.ylabel("$\\log(\\sigma^2_\\eta)$")plt.colorbar()plt.show()
Inference for Log-Concave State Space Models
For non-gaussian state space models we cannot evaluate the likelihood analytically but resort to simulation methods, more specifically importance sampling.
Importance Sampling is performed using a surrogate gaussian model that shares the state density \(g(x) = p(x)\) and is parameterized by synthetic observations \(z\) and their covariance matrices \(\Omega\). In this surrogate model the likeilhood \(\ell_g = g(z)\) and posterior distribution \(g(x|z)\) are tractable and we can simulate from the posterior.
Having obtained \(N\) independent samples \(X^i, i= 1, \dots, N\) from this surrogate posterior we can evaluate the likelihood \(\ell\) by Monte-Carlo integration:
where \(w(X^i) = \frac{p\left(y|X^i\right)}{g\left(z|X^i\right)}\) are the unnormalized importance sampling weights. Additionally, we use the bias correction term $ $ from (Durbin and Koopman 1997), where \(s^2_w\) is the empirical variance of the weights and \(\bar w\) is their mean.
In total we estimate the negative log-likelihood by
\[
- \log p(y) \approx \ell_g - \log \left(\sum_{i=1}^N w(X^i) \right) + \log N - \frac{s^{2}_{w}}{2 N \bar w^{2}}
\]
Implementation Details
Similar to MLE in GLSSMs, we minimize \(-\frac{1}{(n + 1)p} \log p(y)\) instead of \(-\log p(y)\).
To perform maximum likelihood estimation in a parameterized log-concave state space model we have to evaluate the likelihood several times. For evaluating the likelihood at \(\theta\) we have to perform the following:
Approximate the negative log likelihood using the methods of this module.
This makes maximum likelihood an intensive task for these kinds of models.
For an initial guess we optimize the approximatie loglikelihood with the weights component fixed at the mode, see Eq. (21) in(Durbin and Koopman 1997) for further details.
# 2d grid on the log scale(log_sigma_min, log_r_min), (log_sigma_max, log_r_max) = jnp.min( jnp.vstack((theta_lc, theta0, theta_hat)), axis=0), jnp.max(jnp.vstack((theta_lc, theta0, theta_hat)), axis=0)k =20# number of evaluations in each dimensiondelta =0.5log_sigma, log_r = jnp.meshgrid( jnp.linspace(log_sigma_min - delta, log_sigma_max + delta, k), jnp.linspace(log_r_min - delta, log_r_max + delta, k),)# flattenthetas = jnp.vstack([log_sigma.ravel(), log_r.ravel()]).Tkey, subkey = jrn.split(key)nlls = vmap(pgnll_full, (0, None))(thetas, subkey)# location of minimum in nllsi = jnp.argmin(nlls)# location of minimum in the gridi_sigma, i_r = i // k, i % kplt.contourf(log_sigma, log_r, nlls.reshape(k, k))plt.scatter( log_sigma[i_sigma, i_r], log_r[i_sigma, i_r], c="white", marker="x", label="min_grid",)plt.scatter(theta_lc[0], theta_lc[1], c="r", marker="x", label="true")plt.scatter(theta0[0], theta0[1], c="black", marker="x", label="$\\theta_0$")plt.scatter(*theta_hat, c="g", marker="o", label="min_mle")plt.legend()plt.xlabel("$\\log(\\sigma^2_\\varepsilon)$")plt.ylabel("$\\log(r)$")plt.colorbar()plt.show()
From the above picture, we see that \(\log r\) is hard to determine: the likelihood is very flat in the \(r\) direction, which explains the precision loss warning in the optimizer. Nevertheless, our estimate \(\hat\theta\) seems to have converged to a reasonable value.
References
Durbin, James, and Siem Jan Koopman. 1997. “Monte Carlo Maximum Likelihood Estimation for Non-Gaussian State Space Models.”Biometrika 84 (3): 669–84. https://doi.org/10.1093/biomet/84.3.669.