Sequential Monte Carlo

Introduction

MCMC: What works and what doesn’t, Simple Model

Stylized Example

\begin{beamerboxesrounded}{Model} Reduced form: $ (1- {\color{blue} \phi_1} L)(1-{\color{blue} \phi_2} L) y_t = (1-({\color{blue} \phi_2} - {\color{blue} \phi_3} )L) \epsilon_t. $

\vspace*{0.5cm}

Relationship of ${\color{blue} \phi}$ and ${\color{red} \theta}$: $ {\color{blue} \phi_1} = {\color{red} \theta_1^2}, \quad {\color{blue} \phi_2} = {\color{red} (1-\theta_1^2) }, \quad {\color{blue} \phi_3 - \phi_2} = {\color{red} - \theta_1 \theta_2 }. $ \end{beamerboxesrounded}

Stylized Example: Likelihood Fcn 100 Obs

Reduced form: \( (1- {\color{blue} \phi_1} L)(1-{\color{blue} \phi_2} L) y_t = (1-({\color{blue} \phi_2} - {\color{blue} \phi_3} )L) \epsilon_t. \)
Relationship of \({\color{blue} \phi}\) and \({\color{red} \theta}\): \( {\color{blue} \phi_1} = {\color{red} \theta_1^2}, \quad {\color{blue} \phi_2} = {\color{red} (1-\theta_1^2) }, \quad {\color{blue} \phi_3 - \phi_2} = {\color{red} - \theta_1 \theta_2 }. \)

\begin{center} \includegraphics[width=2in]{static/ss_weakid} \end{center}

Stylized Example: Likelihood Fcn 500 Obs

Reduced form: \( (1- {\color{blue} \phi_1} L)(1-{\color{blue} \phi_2} L) y_t = (1-({\color{blue} \phi_2} - {\color{blue} \phi_3} )L) \epsilon_t. \)
Relationship of \({\color{blue} \phi}\) and \({\color{red} \theta}\): \( {\color{blue} \phi_1} = {\color{red} \theta_1^2}, \quad {\color{blue} \phi_2} = {\color{red} (1-\theta_1^2) }, \quad {\color{blue} \phi_3 - \phi_2} = {\color{red} - \theta_1 \theta_2 }. \)

\begin{center} \includegraphics[width=2in]{static/ss_noglobalid5.pdf} \end{center}

Introduction

Introduction

Review – Importance Sampling

If \(\theta^i\)’s are draws from \(g(\cdot)\) then \[ \mathbb{E}_\pi[h] \approx \frac{ \frac{1}{N} \sum_{i=1}^N h(\theta^i) w(\theta^i)}{ \frac{1}{N} \sum_{i=1}^N w(\theta^i) }, \quad w(\theta) = \frac{f(\theta)}{g(\theta)}. \]

\begin{center} \includegraphics[width=4in]{static/is.pdf} \end{center}

From Importance Sampling to Sequential Importance Sampling

Illustration:

given \(\theta = [0.45, 0.45]’\), which is observationally equivalent to \(\theta = [0.89, 0.22]’\)

Illustration: Tempered Posteriors of \(\theta_1\)

\includegraphics[width=.8\linewidth]{static/smc_ss_density.pdf} \[ \pi_n(\theta) = \frac{{\color{blue}[p(Y|\theta)]^{\phi_n}} p(\theta)}{\int {\color{blue}[p(Y|\theta)]^{\phi_n}} p(\theta) d\theta} = \frac{f_n(\theta)}{Z_n}, \quad \phi_n = \left( \frac{n}{N_\phi} \right)^\lambda \]

Illustration: Posterior Draws

\begin{center} \includegraphics[width=4in]{static/smc_ss_contour.pdf} \end{center}

SMC Algorithm: A Graphical Illustration

\begin{center} \includegraphics[width=3in]{static/smc_evolution_of_particles.pdf} \end{center}

SMC Algorithm

  1. Initialization. (\(\phi_{0} = 0\)). Draw the initial particles from the prior: \(\theta^{i}_{1} \stackrel{iid}{\sim} p(\theta)\) and \(W^{i}_{1} = 1\), \(i = 1, \ldots, N\).
  2. Recursion. For \(n = 1, \ldots, N_{\phi}\),
    1. Correction. Reweight the particles from stage \(n-1\) by defining the incremental weights

      \begin{equation} \tilde w_{n}^{i} = [p(Y|\theta^{i}_{n-1})]^{\phi_{n} - \phi_{n-1}} \label{eq_smcdeftildew} \end{equation}

      and the normalized weights

      \begin{equation} \tilde{W}^{i}_{n} = \frac{\tilde w_n^{i} W^{i}_{n-1}}{\frac{1}{N} \sum_{i=1}^N \tilde w_n^{i} W^{i}_{n-1}}, \quad i = 1,\ldots,N. \end{equation}

      An approximation of \(\mathbb{E}_{\pi_n}[h(\theta)]\) is given by

      \begin{equation} \tilde{h}_{n,N} = \frac{1}{N} \sum_{i=1}^N \tilde W_n^{i} h(\theta_{n-1}^i). \label{eq_deftildeh} \end{equation}

    2. Selection.

    3. Mutation.

SMC Algorithm

  1. Initialization.
  2. Recursion. For \(n = 1, \ldots, N_{\phi}\),
    1. Correction.

    2. Selection. (Optional Resampling)} Let \(\{ \hat{\theta} \}_{i=1}^N\) denote \(N\) \(iid\) draws from a multinomial distribution characterized by support points and weights \(\{\theta_{n-1}^i,\tilde{W}_n^i \}_{i=1}^N\) and set \(W_n^i=1\).

      An approximation of \(\mathbb{E}_{\pi_n}[h(\theta)]\) is given by

      \begin{equation} \hat{h}_{n,N} = \frac{1}{N} \sum_{i=1}^N W^i_n h(\hat{\theta}_{n}^i). \label{eq_defhath} \end{equation}

    3. Mutation. Propagate the particles \(\{ \hat{\theta}_i,W_n^i \}\) via \(N_{MH}\) steps of a MH algorithm with transition density \(\theta_n^i \sim K_n(\theta_n| \hat{\theta}_n^i; \zeta_n)\) and stationary distribution \(\pi_n(\theta)\). An approximation of \(\mathbb{E}_{\pi_n}[h(\theta)]\) is given by

      \begin{equation} \bar{h}_{n,N} = \frac{1}{N} \sum_{i=1}^N h(\theta_{n}^i) W^i_n. \label{eq_defbarh} \end{equation}

Remarks

Theoretical Properties

Theoretical Properties: Correction Step

Theoretical Properties: Selection / Resampling

Theoretical Properties: Mutation

More on Transition Kernel in Mutation Step

Adaptive Choice of \(\zeta_n = (\Sigma_n^*,c_n)\)

Adaption of SMC Algorithm for Stylized State-Space Model

\begin{center} \includegraphics[width=2in]{static/smc_ss.pdf} \end{center}

Notes: The dashed line in the top panel indicates the target acceptance rate of 0.25.

Convergence of SMC Approximation for Stylized State-Space Model

\begin{center} \includegraphics[width=3in]{static/smc_clt_nphi100.pdf} \end{center}

Notes: The figure shows \(N \mathbb{V}[\bar\theta_j]\) for each parameter as a function of the number of particles \(N\). \(\mathbb{V}[\bar\theta_j]\) is computed based on \(N_{run}=1,000\) runs of the SMC algorithm with \(N_\phi=100\). The width of the bands is \((2\cdot 1.96) \sqrt{3/N_{run}} (N \mathbb{V}[\bar\theta_j])\).

More on Resampling

Running Time – It’s all about Mutation

How I do it – distributed memory parallelization in Fortran

Could be better with more programming.

How well does this work?

Gains from Parallelization

\includegraphics[width=4.5in]{static/amdahls_law}

Application 1: Small Scale New Keynesian Model

Effect of \(\lambda\) on Inefficiency Factors \(\mbox{InEff}_N[\bar{\theta}]\)

\begin{center} \includegraphics[width=3in]{static/smc_lambda.pdf} \end{center}

Notes: The figure depicts hairs of \(\mbox{InEff}_N[\bar{\theta}]\) as function of \(\lambda\). The inefficiency factors are computed based on \(N_{run}=50\) runs of the SMC algorithm. Each hair corresponds to a DSGE model parameter.

Number of Stages \(N_{\phi}\) vs Number of Particles \(N\)

\begin{center} \includegraphics[width=3in]{static/smc_nphi_vs_npart.pdf} \end{center}

Notes: Plot of \(\mathbb{V}[\bar{\theta}] / \mathbb{V}_\pi[\theta]\) for a specific configuration of the SMC algorithm. The inefficiency factors are computed based on \(N_{run}=50\) runs of the SMC algorithm. \(N_{blocks}=1\), \(\lambda=2\), \(N_{MH}=1\).

Number of blocks \(N_{blocks}\) in Mutation Step vs Number of Particles \(N\)

\begin{center} \includegraphics[width=3in]{static/smc_nblocks_vs_npart.pdf} \end{center}

Notes: Plot of \(\mathbb{V}[\bar{\theta}] / \mathbb{V}_\pi[\theta]\) for a specific configuration of the SMC algorithm. The inefficiency factors are computed based on \(N_{run}=50\) runs of the SMC algorithm. \(N_{\phi}=100\), \(\lambda=2\), \(N_{MH}=1\).

A Few Words on Posterior Model Probabilities

Marginal Likelihood Approximation

SMC Marginal Data Density Estimates

\begin{center} \begin{tabular}{l@{\hspace*{0.5cm}}cc@{\hspace*{0.5cm}}cc} \hline\hline & \multicolumn{2}{c}{$N_{\phi}=100$} & \multicolumn{2}{c}{$N_{\phi}=400$} \\\ $N$ & Mean($\ln \hat p(Y)$) & SD($\ln \hat p(Y)$) & Mean($\ln \hat p(Y)$) & SD($\ln \hat p(Y)$)\ \hline 500 & -352.19 & (3.18) & -346.12 & (0.20) \\\ 1,000 & -349.19 & (1.98) & -346.17 & (0.14) \\\ 2,000 & -348.57 & (1.65) & -346.16 & (0.12) \\\ 4,000 & -347.74 & (0.92) & -346.16 & (0.07) \\\ \hline \end{tabular} \end{center}

Notes: Table shows mean and standard deviation of log marginal data density estimates as a function of the number of particles \(N\) computed over \(N_{run}=50\) runs of the SMC sampler with \(N_{blocks}=4\), \(\lambda=2\), and \(N_{MH}=1\).

Generalized Data Tempering

Different Kinds of Tempering

\begin{align} \mbox{\color{red}{Likelihood Tempering:} } p_n(Y|\theta) = [p(Y|\theta)]^{\phi_n}, \quad \phi_n \uparrow 1. \label{eq:tempering.lh} \end{align}

\begin{align} \mbox{\color{blue}{Data Tempering:} } p_n(Y|\theta) = p(y_{1: \lfloor \phi_n T \rfloor}), \quad \phi_n \uparrow 1. \label{eq:tempering.data} \end{align}

Cai_2019 generalize both likelihood and data tempering!

Generalized Data Tempering

Imagine one has draws from the posterior

\begin{equation} \tilde{\pi}(\theta) \propto \tilde{p}(\tilde{Y}|\theta) p(\theta), \end{equation}

where the posterior \(\tilde{\pi}(\theta)\) differs from the posterior \(\pi(\theta)\) because of:

  1. The sample (\(Y\) versus \(\tilde{Y}\)), or,
  2. the model (\(p(Y|\theta)\) versus \(\tilde{p}(\tilde{Y}|\theta)\)), or,
  3. of both

Define the stage-\(n\) likelihood function:

\begin{equation} p_n(Y|\theta) = [p(Y|\theta)]^{\phi_n}[\tilde{p}(\tilde{Y}|\theta)]^{1-\phi_n}, \quad \phi_n \uparrow 1. \label{eq:tempering.general} \end{equation}

\color{red}{Generalized Data Tempering}: SMC that use this likelihood.

Some Comments

\[p_n(Y|\theta) = [p(Y|\theta)]^{\phi_n}[\tilde{p}(\tilde{Y}|\theta)]^{1-\phi_n} \]

  1. With \(\tilde{p}(\cdot)=1\): identical to likelihood tempering.

  2. With \(\tilde{p}(\cdot) = p(\cdot)\),\(Y=y_{1: \lfloor \phi_m T \rfloor}\), and \(\tilde{Y}=y_{1: \lfloor \phi_{m-1} T \rfloor}\),

    generalizes data tempering by allowing for a gradual transition between \(y_{1: \lfloor \phi_{m-1} T \rfloor}\) and \(y_{1: \lfloor \phi_m T \rfloor}\).

  3. By allowing \(Y\) to differ from \(\tilde{Y}\): incorporate data revisions between time \(\lfloor \phi_{m-1} T \rfloor\) and \(\lfloor \phi_m T \rfloor\).

  4. \(p(\cdot) \ne \tilde p(\cdot)\): one can transition between the posterior distribution of two models with the same parameters.

Evergreen Problem: How to Pick Tuning Parameters:

The SMC algorithm have a number of tuning parameters:

  1. \textcolor{gray}{Number of Particles $N$: cite:Chopin2004a provides a CLT for Monte Carlo averages in $N$.}
  2. \textcolor{gray}{Hyperparameters determining mutation phase. }
  3. \(\mbox{\textcolor{blue}{The number of stages, $N_{\phi}$ and the schedule $\{\phi_n\}_{n=1}^N$ }}\).

This paper: choose \(\phi_n\) adaptively, with no fixed \(N_{\phi}\).

Key idea: choose \(\phi_n\) to target a desired level \(\widehat{ESS}_n^*\).

the closer the desired \(\widehat{ESS}_n^*\) to the previous \(\widehat{ESS}_{n-1}\), the smaller the increment \(\phi_n - \phi_{n-1}\)

An implementation of this

\begin{align*} w^i(\phi) = [p(Y|\theta^i_{n-1})]^{\phi - \phi_{n-1}}, \quad W^i(\phi) = \frac{w^i(\phi) W^i_{n-1}}{\frac{1}{N}\sum\limits_{i=1}^N w^i(\phi) W^i_{n-1}}, \\\ \widehat{ESS}(\phi) = N \big/ \left( \frac{1}{N} \sum_{i=1}^N ({W}_n^i(\phi))^2\right) \end{align*}

We will choose \(\phi\) to target a desired level of ESS:

\begin{align} f(\phi) = \widehat{ESS}(\phi) - \alpha \widehat{ESS}_{n-1} = 0, \label{eq:adaptive.alpha} \end{align}

where \(\alpha\) (\(\le 1\)) is a tuning constant:

Assessing \(\alpha\).

In time-honored tradition of macroeconometrics, let’s estimate the Smets2007 model.

Compare the accuracy (and precision) of SMC algorithm versus the speed:

Trade-Off Between Runtime and Accuracy

</figure_1_StdVsTime.pdf>

Tempering Schedules

</figure_2_tempering_scheds_all.pdf>

Some Comments

Illustration of Generalized Data Tempering

Scenario 1:

More Details

Let \(Y=y_{1:T}\) and \(\tilde{Y}=\tilde{y}_{1:T_1}\), define the stage \((n)\) posterior: \[ \pi_{n}(\theta) = \frac{p(y_{1:T} | \theta)^{\phi_{n}} p(\tilde{y}_{1:T_1} | \theta)^{1-\phi_{n}} p(\theta) }{\int p(y_{1:T} | \theta)^{\phi_{n}} p(\tilde{y}_{1:T_1} | \theta)^{1-\phi_{n}}p(\theta) d\theta}. \label{eq:lttpost_2} \] The incremental weights are given by \[ \tilde{w}^i_{n}(\theta) = p(y_{1:T} | \theta)^{\phi_{n}-\phi_{n-1}} p(\tilde{y}_{1:T_1} | \theta)^{\phi_{n-1}-\phi_{n}} \] Define the Conditional Marginal Data Density (CMDD)

\begin{align} \text{CMDD}_{2|1} =\prod_{n=1}^{N_\phi} \left( \frac{1}{N} \sum\limits_{i=1}^{N} \tilde{w}^i_{(n)} W^i_{(n-1)} \right) \label{eq:CMDD1} \end{align}

Thus:

\begin{align} \text{CMDD}_{2|1} \approx \frac{\int p(y_{1:T} | \theta) p(\theta) d\theta}{\int p(\tilde{y}_{1:T_1} | \theta) p(\theta) d\theta} = \frac{ p(y_{1:T})}{p(\tilde{y}_{1:T_1})}. \label{eq:CMDD2} \end{align}

An experiment

Trade-Off Between Runtime and Accuracy

</figure_3_all_StdVsTime.pdf>

References

References

Bibliography

[Kohn2010] Kohn, Giordani & Strid, Adaptive Hybrid Metropolis-Hastings Samplers for DSGE Models, Working Paper, (2010).

[ChibR08] Chib & Srikanth Ramamurthy, Tailored Randomized Block MCMC Methods with Application to DSGE Models, Journal of Econometrics, 155(1), 19-38 (2010).

[curdia_reis2010] Curdia & Reis, Correlated Disturbances and U.S. Business Cycles, Manuscript, Columbia University and FRB New York, (2010).

[Herbst2011a] @unpublishedHerbst2011a, author = Ed Herbst, title = Gradient and Hessian-based MCMC for DSGE Models, note= Unpublished Manuscript, Federal Reserve Board, year = 2012

[Chopin2004a] Nicolas Chopin, A Sequential Particle Filter for Static Models, Biometrika, 89(3), 539-551 (2004).

[DelMoraletal2006] Del Moral, Doucet & Jasra, Sequential Monte Carlo Samplers, Journal of the Royal Statistical Society, Series B, 68(Part 3), 411-436 (2006).

[Creal2007] Drew Creal, Sequential Monte Carlo Samplers for Bayesian DSGE Models, Unpublished Manuscript, Vrije Universitiet, (2007).

[DurhamGeweke2011] Durham & Geweke, Massively Parallel Sequential Monte Carlo Bayesian Inference, Unpublished Manuscript, (2011).

[Cai_2019] Cai, Del Negro, Herbst, , Matlin, Sarfati, Schorfheide & Frank, Online Estimation of DSGE Models, SSRN Electronic Journal, (2019). link. doi.

[Smets2007] Smets & Wouters, Shocks and Frictions in US Business Cycles: A Bayesian DSGE Approach, American Economic Review, 97, 586-608 (2007).

[Herbst_2014] Herbst & Schorfheide, Sequential Monte Carlo Sampling for DSGE Models, Journal of Applied Econometrics, 29(7), 1073 1098 (2014). link. doi.