A- A+
Alt. Display

# What the collapse of the ensemble Kalman filter tells us about particle filters

## Abstract

The ensemble Kalman filter (EnKF) is a reliable data assimilation tool for high-dimensional meteorological problems. On the other hand, the EnKF can be interpreted as a particle filter, and particle filters (PF) collapse in high-dimensional problems. We explain that these seemingly contradictory statements offer insights about how PF function in certain high-dimensional problems, and in particular support recent efforts in meteorology to ‘localize’ particle filters, i.e. to restrict the influence of an observation to its neighbourhood.

Keywords:
How to Cite: Morzfeld, M., Hodyss, D. and Snyder, C., 2017. What the collapse of the ensemble Kalman filter tells us about particle filters. Tellus A: Dynamic Meteorology and Oceanography, 69(1), p.1283809. DOI: http://doi.org/10.1080/16000870.2017.1283809
Published on 01 Jan 2017
Accepted on 19 Dec 2016            Submitted on 31 May 2016
1.

## Introduction

Many applications in geophysics require that one updates the state of a numerical model by information from sparse and noisy observations, see, e.g. van Leeuwen (2009), Bocquet et al. (2010), Fournier et al. (2010) and van Leeuwen et al. (2015) for recent reviews in geophysics. Particle filters (PF) are sequential Monte Carlo algorithms that solve such a ‘data assimilation’ problem as follows (Doucet et al., 2001; Arulampalam et al., 2002). One draws samples from a proposal density, and then attaches weights to the samples such that the weighted samples form an empirical estimate of a posterior density. The samples are often called ‘particles’ in the PF literature, or ‘ensemble members’ in the geophysics literature. The empirical estimate approximates the posterior density in the sense that weighted averages over the samples converge to expected values with respect to the posterior density. The PFs in the literature differ in their choice of proposal density, and therefore in their weights. Several particle filters can be found, e.g. in Zaritskii and Shimelevich (1975), Gordon et al. (1993), Liu and Chen (1995), Doucet (1998), Doucet et al. (2000), Chorin and Tu (2009), Weare (2009), Chorin et al. (2010), Morzfeld et al. (2012) and Vanden-Eijnden and Weare (2012).

The variance of the weights determines the efficiency of a PF. If this variance is large, then a PF ‘collapses’, i.e. the weights of all but one sample are near zero. The collapse of particle filters has been described in Snyder et al. (2008); Bickel et al. (2008);Bengtsson et al. (2008); Snyder (2011); Chorin and Morzfeld (2013); Snyder et al. (2015). It is argued that an effective dimension defines the ensemble size required to prevent collapse. While there are various definitions of effective dimensions, we explain in Section 2.4 that the huge state dimension of global weather models (hundreds of millions of state variables) and the vast number of observations (millions) cause any effective dimension to be large, which in turn implies that particle filters would require huge ensemble sizes in meteorological problems.

It is interesting however that the ensemble Kalman filter (EnKF, Tippet et al., 2003; Evensen, 2006), used routinely with ensembles of 50 to a few hundred members in meteorological problems, can in fact be interpreted as a particle filter. Specifically, as shown by Papadakis et al. (2010), one can view the EnKF ensemble members as draws from a proposal density and attach weights to each ensemble member, interpreting the EnKF as a particle filter. It follows from the optimality result of Snyder et al. (2015) that the particle filter interpretation of the EnKF collapses in the same way as other particle filters. The goal of this paper is to clarify this situation: Why does the EnKF do so well in meteorological problems in which its particle filter interpretation collapses? Resolving this apparent contradiction reveals the significance of a certain type of particle filter collapse, and suggests improvements of particle filters for meteorological applications.

2.

## Preliminaries

We consider linear Gaussian data assimilation problems because we can make our points in this simplified setting. Moreover, linear problems are standard in the context of the collapse of particle filters and have been used by Snyder et al. (2008), Bickel et al. (2008), Bengtsson et al. (2008), Snyder (2011), Chorin and Morzfeld (2013), Snyder et al. (2015). Specifically, we consider

((1) )
$\begin{array}{cc}\hfill {\mathbf{x}}_{k}& =\mathbf{M}{\mathbf{x}}_{k-1}+{\mathbit{\eta }}_{k},\hfill \end{array}$
((2) )
$\begin{array}{cc}\hfill {\mathbf{y}}_{k}& =\mathbf{H}{\mathbf{x}}_{k}+{\mathbit{\epsilon }}_{k},\hfill \end{array}$

where the state, $\mathbf{x}$, is a real ${n}_{x}$-dimensional vector, the linear model, $\mathbf{M}$, is an ${n}_{x}×{n}_{x}$ matrix, the observation, $\mathbf{y}$, is a real ${n}_{y}$-dimensional vector, $\mathbf{H}$ is a ${n}_{y}×{n}_{x}$ matrix and where $k=1,2,3,\cdots$ is discrete time. We further assume that the initial condition ${\mathbf{x}}_{0}$ is Gaussian distributed, and that no observations are available at time $k=0$. Throughout this paper, we use subscripts for time indexing and superscripts for indexing samples of random variables. For example, ${\mathbf{x}}_{k}^{j}$ is the jth sample of the state $\mathbf{x}$ at time k. We write ${\mathbf{y}}_{1:k}$ for a set of vectors $\left\{{\mathbf{y}}_{1},\cdots ,{\mathbf{y}}_{k}\right\}$. The random variables ${\mathbit{\eta }}_{k}$ and ${\mathbit{\epsilon }}_{k}$ in Equations (1) and (2) represent model errors and errors in the observations, respectively, and we assume that ${\mathbit{\eta }}_{k}$ and ${\mathbit{\epsilon }}_{k}$ are independent identically distributed Gaussians with mean zero, which we write as ${\mathbit{\eta }}_{k}\sim \mathcal{N}\left(\mathbf{0},\mathbf{Q}\right)$, ${\mathbit{\epsilon }}_{k}\sim \mathcal{N}\left(\mathbf{0},\mathbf{R}\right)$. The covariances $\mathbf{Q}$ and $\mathbf{R}$ are ${n}_{x}×{n}_{x}$, respectively, ${n}_{y}×{n}_{y}$ symmetric positive definite matrices. We exclude data assimilation problems with a deterministic model for which $\mathbf{Q}=\mathbf{0}$, but many of our results hold for data assimilation problems with deterministic models as well by simply setting $\mathbf{Q}=\mathbf{0}$ in equations. We explain which results generalize in Section 6.

2.1.

### Filtering and smoothing densities

The goal of data assimilation is to update the state of the model (1) in view of the noisy observations in (2). Model and observations jointly define the conditional random variable ${\mathbf{x}}_{0:k}|{\mathbf{y}}_{1:k}$, which describes the state trajectory up to time k, given the observations up to time k. This random variable is specified by the ‘smoothing density’

((3) )
$\begin{array}{c}\hfill p\left({\mathbf{x}}_{0:k}|{\mathbf{y}}_{1:k}\right).\end{array}$

The ‘filtering density’ is a marginal of the smoothing density,

((4) )
$\begin{array}{c}\hfill p\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}\right)=\int \cdots \int p\left({\mathbf{x}}_{0:k}|{\mathbf{y}}_{1:k}\right)\phantom{\rule{0.166667em}{0ex}}d{\mathbf{x}}_{0}\cdots d{\mathbf{x}}_{k-1}.\end{array}$

and describes the state at observation time, conditioned on all observations up to now, but it does not describe how the system got to that state. In practice, the filtering density may be more important. Forecast models, for example, require initial conditions sampled from the filtering density; samples from the smoothing density, i.e. descriptions of past weather, are not required. On the other hand, smoothing algorithms have recently received more and more attention, especially in the context of re-analyses.

2.2.

### The localized EnKF

The EnKF uses a Monte Carlo step within the Kalman filter formalism, and recursively approximates the filtering density by an ensemble (Kalman, 1960; Kalman and Bucy, 1961; Tippet et al., 2003; Evensen, 2006). Specifically, let ${\mathbf{x}}_{k-1}^{j}$, $j=1,\cdots ,{N}_{e}$, be samples of the filtering density $p\left({\mathbf{x}}_{k-1}|{\mathbf{y}}_{1:k-1}\right)$ at time $k-1$. The collection of ${N}_{e}$ samples is called the ensemble, and each sample is an ensemble member; ${N}_{e}$ is the ensemble size. The ensemble is evolved to time k using the model (1) to generate the forecast ensemble

$\begin{array}{c}\hfill {\stackrel{^}{\mathbf{x}}}_{k}^{j}=\mathbf{M}{\mathbf{x}}_{k-1}^{j}+{\mathbit{\eta }}_{k}^{j},\end{array}$

where ${\mathbit{\eta }}_{k}^{j}$ is a sample of ${\mathbit{\eta }}_{k}$. Let

((5) )
$\begin{array}{c}\hfill {\mathbf{P}}^{f}=\frac{1}{{N}_{e}-1}\sum _{j=1}^{{N}_{e}}\left({\mathbf{x}}_{k}^{j}-{\overline{\mathbf{x}}}_{k}\right){\left({\mathbf{x}}_{k}^{j}-{\overline{\mathbf{x}}}_{k}\right)}^{T},\end{array}$

be the forecast covariance, where ${\overline{\mathbf{x}}}_{k}=\left({\sum }_{j=1}^{{N}_{e}}{\mathbf{x}}_{k}^{j}\right)/{N}_{e}$ is the forecast mean. The forecast covariance is usually localized and inflated, see, e.g. Houtekamer and Mitchell (2001) and Hamill et al. (2001). During localization, one decreases long-range correlations in the error covariances, typically by Schur (entrywise) multiplication of the forecast covariance with a specified ${n}_{x}×{n}_{x}$ localization matrix $\mathit{\rho }$:

$\begin{array}{c}\hfill {\mathbf{P}}^{f}←\mathit{\rho }\circ {\mathbf{P}}^{f}.\end{array}$

In atmospheric applications, localization matrices as described by Gaspari and Cohn (1999) are commonly used. During inflation, one increases the localized forecast covariance, e.g., by multiplying the forecast covariances by a number slightly larger than one, e.g. ${\mathbf{P}}^{f}←\left(1+\mathit{\alpha }\right){\mathbf{P}}^{f}$, $\mathit{\alpha }>0$. Combined, localization and inflation are known to be key factors in making the EnKF a practical tool for meteorological data assimilation, see, e.g. Houtekamer et al. (2005), Wang et al. (2007) and Raeder et al. (2012). Here, ‘practical’ means that an EnKF with a small ensemble (${N}_{e}=50-100$) yields a small mean square error (MSE), and a comparable ensemble ‘spread’, defined by the trace of the posterior covariance divided by the system dimension ${n}_{x}$ (see Section 4).

The localized and inflated forecast covariance defines the Kalman gain

((6) )
$\begin{array}{c}\hfill \mathbf{K}={\mathbf{P}}^{f}{\mathbf{H}}^{T}{\left(\mathbf{H}{\mathbf{P}}^{f}{\mathbf{H}}^{T}+\mathbf{R}\right)}^{-1},\end{array}$

which is used to compute the analysis ensemble

((7) )
$\begin{array}{c}\hfill {\mathbf{x}}_{k}^{j}={\stackrel{^}{\mathbf{x}}}_{k}^{j}+\mathbf{K}\left({\stackrel{^}{\mathbf{y}}}_{k}^{j}-\mathbf{H}{\stackrel{^}{\mathbf{x}}}_{k}^{j}\right),\end{array}$

where ${\stackrel{^}{\mathbf{y}}}_{k}^{j}={\mathbf{y}}_{k}+{\mathit{\epsilon }}_{k}^{j}$, is a perturbed observation and where ${\mathit{\epsilon }}_{k}^{j}$ is a sample of ${\mathit{\epsilon }}_{k}$. The sample mean and sample covariance of the analysis ensemble approximate the mean and covariance of the filtering density $p\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}\right)$. This is the ‘perturbed observations’ implementation of the localized EnKF. Other implementations of the EnKF are also available, see, e.g. Anderson (2001), Bishop et al. (2001) and Tippet et al. (2003).

2.3.

### Particle filters

Particle filters are sequential importance sampling methods that construct an empirical estimate of the smoothing density (3), see, e.g. Doucet et al. (2001). The empirical estimate consists of a collection of weighted samples, such that expected values of functions of ${\mathbf{x}}_{0:k}|{\mathbf{y}}_{1:k}$ can be approximated by weighted averaging over the samples, see, e.g. Kalos and Whitlock (1986) and Chorin and Hald (2013). In particular, particle filters make use of a recursion of the smoothing density,

((8) )
$\begin{array}{c}\hfill p\left({\mathbf{x}}_{0:k}|{\mathbf{y}}_{1:k}\right)=p\left({\mathbf{x}}_{0:k-1}|{\mathbf{y}}_{1:k-1}\right)\phantom{\rule{0.166667em}{0ex}}\frac{p\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}\right)p\left({\mathbf{y}}_{k}|{\mathbf{x}}_{k}\right)}{p\left({\mathbf{y}}_{k}|{\mathbf{y}}_{1:k-1}\right)},\end{array}$

and draw samples from a proposal density which is a similar product of functions,

((9) )
$\begin{array}{c}\hfill \mathit{\pi }\left({\mathbf{x}}_{0:k}|{\mathbf{y}}_{1:k}\right)\propto {\mathit{\pi }}_{0}\left({\mathbf{x}}_{0}\right)\prod _{l=1}^{k}{\mathit{\pi }}_{k}\left({\mathbf{x}}_{l}|{\mathbf{y}}_{1:l},{\mathbf{x}}_{0:l-1}\right).\end{array}$

Thus, only the most recent product needs to be updated when a new observation is collected. At each step k, the update ${\mathit{\pi }}_{k}\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k},{\mathbf{x}}_{0:k-1}\right)$ is used to propose samples ${\mathbf{x}}_{k}^{j}$, which are appended to the samples of ${\mathbf{x}}_{0:k-1}^{j}$ to extend the trajectory to time k. The weights satisfy the recursion

((10) )
$\begin{array}{c}\hfill {w}_{k}\propto \frac{p\left({\mathbf{x}}_{0:k}|{\mathbf{y}}_{1:k}\right)}{\mathit{\pi }\left({\mathbf{x}}_{0:k}|{\mathbf{y}}_{1:k}\right)}\propto {w}_{k-1}\frac{p\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}\right)p\left({\mathbf{y}}_{k}|{\mathbf{x}}_{k}\right)}{{\mathit{\pi }}_{k}\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k},{\mathbf{x}}_{0:k-1}\right)},\end{array}$

and account for the fact that one draws samples sequentially from the proposal density $\mathit{\pi }\left({\mathbf{x}}_{0:k}|{\mathbf{y}}_{1:k}\right)$, rather than directly from the smoothing density. After each step, the weights are normalized such that their sum is one, and the ensemble is ‘resampled’ (Arulampalam et al., 2002), i.e. ensemble members with a small weight are deleted, and ensemble members with a large weight are repeated. Various choices for the updates ${\mathit{\pi }}_{k}\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k},{\mathbf{x}}_{0:k-1}\right)$ lead to the various particle filters developed over the past years, each with different weights, e.g. Gordon et al. (1993), Doucet (1998), Chorin and Tu (2009), Weare (2009), Chorin et al. (2010), Morzfeld et al. (2012) and Vanden-Eijnden and Weare (2012).

More specifically, the ‘standard particle filter’ (Gordon et al. (1993)) uses the proposal density ${\mathit{\pi }}_{k}=p\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}\right)$, i.e. the ensemble is generated by running the numerical model (1). The weights are the ratio of target and proposal density and, thus, are proportional to the likelihood, ${w}_{k}\propto {w}_{k-1}p\left({\mathbf{y}}_{k}|{\mathbf{x}}_{k}\right)$. Note that the standard particle filter does not make use of the most recent observation when generating the ensemble. A particle filter with proposal density ${\mathit{\pi }}_{k}=p\left({\mathbf{x}}_{k}|{\mathbf{y}}_{k},{\mathbf{x}}_{k-1}\right)$ however does make use of the most recent observation when generating the ensemble. The weights of this particle filter are given by ${w}_{k}\propto {w}_{k-1}p\left({\mathbf{y}}_{k}|{\mathbf{x}}_{k-1}\right)$, and it can be shown that these weights have minimum variance over all particle filters with a proposal density of the form (9) (Snyder et al. (2015)). For that reason, this particle filter is called the ‘optimal particle filter’ (see, e.g. Zaritskii and Shimelevich, 1975; Liu and Chen, 1995; Doucet et al., 2000). There are, however, more general particle filters that make use of more complicated proposal densities than the ones described by Equation (9). For example, one can use ‘tempering’, i.e. techniques that ensure a more gradual transition from a proposal density to a posterior density (see, e.g. Beskos et al., 2014 and Kantas et al., 2014). Such filters may have lower variance weights than the ‘optimal’ particle filter described above. For the remainder of this paper, however, we only consider ‘classical’ particle filters with proposal densities of the form (9).

2.4.

### The collapse of particle filters and effective dimensions

A particle filter collapses if only one ensemble member carries a significant weight. The collapse occurs when the variance of the weights is large, and avoiding the collapse can require huge ensemble sizes. Indeed, it is often said that particle filters require ensemble sizes that are exponential in the ‘dimension’. However, the situation is more delicate. It was argued by Chorin and Morzfeld (2013) that there are several mechanisms that can cause particle filters to collapse, not merely a large state dimension. These mechanisms are described by the theory of the collapse of particle filters and effective dimensions: when the effective dimension is large, an even larger ensemble size is needed to avoid collapse. We briefly review this theory here because it is needed to identify the cause for the collapse of particle filters in meteorological problems.

The collapse of the ‘standard particle filter’ with proposal density ${\mathit{\pi }}_{k}=p\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}\right)$ is described by Snyder et al. (2008), Bickel et al. (2008) and Bengtsson et al. (2008). Specifically, in Snyder et al. (2008), it is shown that the collapse of this particle filter depends on the variance

$\begin{array}{c}\hfill {\mathit{\tau }}^{2}=var\left(logw\right)=\sum _{j=1}^{{n}_{y}}\left(\frac{1}{2}{\mathit{\lambda }}_{j}^{2}+{\mathit{\lambda }}_{j}\phantom{\rule{0.166667em}{0ex}}{\left({\mathbf{y}}_{k}\right)}_{j}^{2}\right),\end{array}$

where ${\left({\mathbf{y}}_{k}\right)}_{j}$ is the jth element of the observation ${\mathbf{y}}_{k}$, and where ${\mathit{\lambda }}_{j}$ are the eigenvalues of the covariance matrix in observation space, normalized by the observation covariance matrix, $cov\left({\mathbf{R}}^{-1/2}\mathbf{H}{\mathbf{x}}_{k}\right)$. Asymptotic statements about the collapse of the standard particle filter can be made for large state dimensions and large number of observations (${n}_{x},{n}_{y}\phantom{\rule{-0.166667em}{0ex}}\gg \phantom{\rule{-0.166667em}{0ex}}1$), when $\sum {\mathit{\lambda }}_{j}\phantom{\rule{-0.166667em}{0ex}}\gg \phantom{\rule{-0.166667em}{0ex}}1$.

In this case, collapse occurs if $\left(log{N}_{e}\right)/{\mathit{\tau }}^{2}\to 0$, i.e. the ensemble size must be exponential in ${\mathit{\tau }}^{2}$ to avoid the collapse. Similar arguments hold for the optimal particle filter, with a slightly different definition of ${\mathit{\tau }}^{2}$ (Snyder, 2011); SBM2015. The exponential scaling of the ensemble size ${N}_{e}$ with ${\mathit{\tau }}^{2}$ suggests that the ensemble’s size required to avoid collapse may be too large to be feasible in numerical weather prediction, where ${n}_{y}$ is so large – typically millions – that even moderate eigenvalues ${\mathit{\lambda }}_{j}$ lead to large ${\mathit{\tau }}^{2}$.

Effective dimensions are quantities related to ${\mathit{\tau }}^{2}$, which describe how difficult it is to solve a given data assimilation problem by a particle filter. There are several definitions of effective dimensions in the literature, some of which are reviewed by Agapiou et al. (2016). For example, in Bickel et al. (2008), an effective dimension is defined by

((11) )
$\begin{array}{c}\hfill {\mathit{\tau }}_{\phantom{\rule{0.333333em}{0ex}}}\text{spf}=\phantom{\rule{0.333333em}{0ex}}\text{trace}\left(cov\left({\mathbf{R}}^{-\frac{1}{2}}\mathbf{H}{\mathbf{x}}_{k}\right)\right)=\sum _{j=1}^{{n}_{y}}{\mathit{\lambda }}_{j}.\end{array}$

Note that this effective dimension is a property of the data assimilation problem and the standard particle filter algorithm. In Chorin and Morzfeld (2013), effective dimensions for the standard and optimal particle filter are defined by Frobenius norms, rather than traces of covariances. The Frobenius norm of a matrix is the square root of the sum of the squares of the singular values. For the symmetric covariance matrices we consider, this is the same as the square root of the sum of the squares of the eigenvalues. For example, an effective dimension of the standard particle filter is defined by

$\begin{array}{c}\hfill {\stackrel{^}{\mathit{\tau }}}_{\phantom{\rule{0.333333em}{0ex}}}{\text{spf}}^{2}=||cov\left({\mathbf{R}}^{-\frac{1}{2}}\mathbf{H}{\mathbf{x}}_{k}\right){||}_{F}^{2}=\sum _{j=1}^{{n}_{y}}{\mathit{\lambda }}_{j}^{2}.\end{array}$

All effective dimensions above depend on the covariance of ${\mathbf{x}}_{k}$ and on the data assimilation algorithm one uses. In Chorin and Morzfeld (2013), the covariance of ${\mathbf{x}}_{k}$ is calculated during steady state of a linear and Gaussian data assimilation problem, so that the effective dimension does not vary over time. Note that the trace of the prior and posterior covariance matrices often oscillates about a steady-state value in non-linear problems as well. Indeed, the Frobenius norm, or the trace, of a (steady state) posterior covariance matrix $\mathbf{P}$ is a natural candidate for an effective dimension of a data assimilation problem, independently of the particle filter one applies to solve it:

((12) )
$\begin{array}{c}\hfill {\mathit{\tau }}_{\phantom{\rule{0.333333em}{0ex}}}{\text{sys}}^{2}={||\mathbf{P}||}_{F}^{2}=\sum _{j=1}^{{n}_{x}}{\mathit{\alpha }}_{j}^{2}.\end{array}$

Here, ${\mathit{\alpha }}_{j}$ are the eigenvalues of $\mathbf{P}$, and the Frobenius norm depends on all aspects of the data assimilation problem, including the number of observations, the number of state variables, as well as the covariances that define model and observation error ($\mathbf{Q}$ and $\mathbf{R}$).

It was pointed out in Chorin et al. (2016) that none of the effective dimensions are in fact ‘dimensions’. The effective dimensions above rather describe ‘effective variance’, and one can define more effective dimensions, for example, by

((13) )
$\begin{array}{c}\hfill {\ell }_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}=min\ell \in \mathbb{Z}:\sum _{j=1}^{\ell }{\mathit{\alpha }}_{j}^{2}\ge \left(1-\mathit{\kappa }\right)\sum _{j=1}^{{n}_{x}}{\mathit{\alpha }}_{j}^{2},\end{array}$

where $\mathit{\kappa }$ is a predetermined small parameter (Chorin et al., 2016). Similar ideas were used in the context of ‘partial noise’ by Morzfeld and Chorin (2012). However, it is not yet clear which effective dimension will ultimately be most useful. Nonetheless, all effective dimensions have in common that they describe that ‘if the effective dimension is large, an even larger ensemble size is needed to avoid collapse’.

2.5.

### The collapse of particle filters in meteorology

We are interested in the collapse of particle filters as it may occur in meteorological problems. Meteorological problems are characterized by

• (a)   large state dimension ${n}_{x}$ (hundreds of millions), and large observation dimension ${n}_{y}$ (millions);
• (b)   large spatial domains, in the sense that there are many correlation lengths of a prior density contained within the overall system domain;
• (c)   the matrix $\mathbf{H}$ is ‘local’, in the sense that each element of the vector $\mathbf{H}x$ depends only on ‘a few’ elements of $\mathbf{x}$.
These conditions are satisfied in a canonical linear example, where particle filter collapse occurs and where the required ensemble size scales exponentially with dimension (see, e.g. Snyder et al., 2008; Bickel et al., 2008; Bengtsson et al., 2008; Snyder, 2011; Chorin and Morzfeld, 2013; Snyder et al., 2015, and also Section 4). More generally, the above criteria imply that all effective dimensions are large. Since even the optimal particle filter requires large ensemble sizes if an effective dimension is large, one can expect that all particle filters collapse in meteorological problems unless the ensemble size is huge (see Chorin and Morzfeld, 2013; Snyder et al., 2015).

To understand why the above conditions imply large effective dimensions, recall that the Frobenius norm of a matrix is equal to the square root of the sum of the squares of all of its elements, ${||\mathbf{P}||}^{2}={\sum }_{i,j}{P}_{i,j}^{2}$. Thus, if ${n}_{x}$ is huge, the effective dimension ${\mathit{\tau }}_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}$ is large unless each entry of $\mathbf{P}$ is tiny.

A large ${\mathit{\tau }}_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}$ implies that the effective dimensions ${\mathit{\tau }}_{\phantom{\rule{0.333333em}{0ex}}}\text{spf}$ and ${\stackrel{^}{\mathit{\tau }}}_{\phantom{\rule{0.333333em}{0ex}}}\text{spf}$ are also large (see Chorin and Morzfeld, 2013). The same arguments apply to effective dimensions defined for the optimal particle filter (Chorin and Morzfeld, 2013; Snyder, 2011; Snyder et al., 2015). The effective dimension ${\ell }_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}$ is large since, by property (b), far away variables de-correlate, so that the overall system can be thought of as a collection of loosely coupled lower dimensional ‘sub-systems’. Since the correlation lengths are small compared to the overall system domain, the overall system contains a large number of subsystems. Thus, the effective dimension ${\ell }_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}$ is large. In fact, it increases with the number of loosely coupled subsystems.

We emphasize that the above statements about the collapse of particle filters in meteorology only apply to particle filters of a certain kind (see Section 2.3). The collapse may be avoided, even for meteorological problems, by a next generation of particle filters that make use of more general approach, e.g. using techniques to transition more smoothly from a proposal density to a posterior density. Indeed, some of these algorithms have already been proven to scale algebraically, rather than exponentially with (effective) dimension (see, e.g. Beskos et al., 2014; Kantas et al., 2014).

3.

## The collapse of the particle filter interpretation of the EnKF

The EnKF performs well in high-dimensional meteorological applications in which particle filters are expected to collapse. On the other hand, it was shown by Papadakis et al. (2010) that the EnKF can be interpreted as a particle filter (PF-EnKF), and, thus, can be expected to collapse under the same conditions as other particle filters. We study this apparent contradiction and in particular the mechanisms that cause the collapse of the PF-EnKF in meteorological problems. This approach allows us to combine what we know about the performance of EnKF in real meteorological problems with what we know about the collapse of particle filters. In this context, we first consider the PF-EnKF as a particle filter for the smoothing density, and then make connections to the filtering density using ideas similar to marginal particle filters introduced by Klaas et al. (2005).

3.1.

### Collapse of the PF-EnKF for the smoothing density

It was pointed out by Papadakis et al. (2010) that the EnKF ensemble members can be viewed as draws from the proposal density

((14) )
$\begin{array}{c}\hfill {\mathit{\pi }}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}^{j}\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}^{j},{\mathbf{y}}_{k}\right)=\mathcal{N}\left({\mathit{\mu }}_{k}^{j},{\mathrm{\Sigma }}_{k}\right),\end{array}$

where

$\begin{array}{cc}\hfill {\mathit{\mu }}_{k}^{j}& =\left(\mathbf{I}-\mathbf{K}H\right)\mathbf{M}{\mathbf{x}}_{k-1}^{j}+\mathbf{K}{\mathbf{y}}_{k},\hfill \\ \hfill {\mathrm{\Sigma }}_{k}& =\left(\mathbf{I}-\mathbf{K}\mathbf{H}\right)\mathbf{Q}{\left(\mathbf{I}-\mathbf{K}\mathbf{H}\right)}^{T}+\mathbf{K}\mathbf{R}{\mathbf{K}}^{T},\hfill \end{array}$

with the Kalman gain $\mathbf{K}$ defined in (6). This proposal density is of the form (9) so that the EnKF can be interpreted as a particle filter. A related construction using Kernel density estimation at each step is discussed in Khlalil et al. (2015). The weights that can be attached to the EnKF ensemble are the ratio of the smoothing and proposal densities

((15) )
$\begin{array}{c}\hfill {w}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}^{j}\propto {w}_{k-1,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}^{j}\phantom{\rule{0.277778em}{0ex}}\frac{p\left({\mathbf{x}}_{k}^{j}|{\mathbf{x}}_{k-1}^{j}\right)p\left({\mathbf{y}}_{k}|{\mathbf{x}}_{k}^{j}\right)}{{\mathit{\pi }}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}^{j}\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}^{j},{\mathbf{y}}_{k}\right)}.\end{array}$

The theory for the collapse of particle filters reviewed in Section 2.4 applies to these weights, and their variance governs the collapse of the PF-EnKF. This variance is at least as large as the corresponding variance of the optimal PF (Snyder et al., 2015). Thus, the PF-EnKF collapses whenever the optimal PF collapses. As a specific example, we will demonstrate in Section 4 that the ensemble size required to avoid the collapse of the PF-EnKF scales exponentially with the dimension ${n}_{x}$ in the canonical system studied in Snyder et al. (2008), Bickel et al. (2008), Bengtsson et al. (2008), Snyder (2011), Chorin and Morzfeld (2013) and Snyder et al. (2015).

While covariance localization and inflation of EnKF will modify the mean and covariance of the proposal density (14), these mechanisms will not avoid the collapse when sampling from the smoothing density. The reason is that, even with tuned localization and inflation, the proposal density defined by the EnKF ensemble cannot improve on the performance of the optimal proposal and must collapse in the same way as the optimal particle filter. Thus, using even a localized EnKF that is operating correctly by tracking the truth to within the ensemble variance to define a proposal density cannot prevent the collapse for small ensemble sizes in meteorological problems.

3.2.

### Collapse of the PF-EnKF for the filtering density

The particle filter interpretation of the EnKF defines a sequential importance sampler for the smoothing density. However, the EnKF ensemble can also be viewed as draws from a proposal density for the filtering density. Indeed, the idea of designing particle filters directly for the filtering density, rather than the smoothing density, was explored by Klaas et al. (2005) via ‘marginal particle filters’. We connect these ideas to the EnKF and study the conditions for the collapse of the particle filter interpretation of the EnKF for the filtering density. The reason is that it may be possible that the PF-EnKF collapses as a sampler for the smoothing density, but performs well as a sampler for the filtering density, and that sampling the filtering density is sufficient in meteorology. Below we argue that this is not the case, i.e. the collapse of the PF-EnKF occurs, under the same conditions, for filtering and smoothing densities, unless the ensemble size is huge.

As before, we interpret the EnKF ensemble members as draws from a suitable proposal density ${\mathit{\pi }}_{\phantom{\rule{0.333333em}{0ex}}}\text{EnKF}\left({\mathbf{x}}_{k}\right)$. We wish to compute the weights,

((16) )
$\begin{array}{c}\hfill w\propto \frac{p\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}\right)}{{\mathit{\pi }}_{\phantom{\rule{0.333333em}{0ex}}}\text{EnKF}\left({\mathbf{x}}_{k}\right)}.\end{array}$

such that the weighted ensemble approximates the filtering density, not the smoothing density as before. Note that the EnKF proposal density for the smoothing density in (14) is conditioned on the previous analysis state but the EnKF proposal density for the filtering density in (16) is not. Thus, the EnKF proposal density for the filtering density is not the same as the EnKF proposal density for the smoothing density. The weights (16), are difficult to compute in general because the filtering density and the EnKF proposal density are not known up to a multiplicative constant (see the Appendix and literature about marginal particle filters, e.g. Klaas et al. (2005), for more detail). However, for the linear Gaussian problems we consider, the filtering density is also Gaussian and we write $p\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}\right)\sim \mathcal{N}\left({\mathit{\mu }}_{k},{\mathbf{P}}_{k}\right)$. The proposal density generated by the EnKF ensemble is the Gaussian,

((17) )
$\begin{array}{c}\hfill {\mathit{\pi }}_{\phantom{\rule{0.333333em}{0ex}}}\text{EnKF}\left({\mathbf{x}}_{k}\right)=\mathcal{N}\left({\mathit{\mu }}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}},{\mathbf{P}}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}\right),\end{array}$

where ${\mathit{\mu }}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}$ is the mean of the analysis ensemble and where

((18) )
$\begin{array}{cc}\hfill {\mathbf{P}}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}=& \left(\mathbf{I}-\mathbf{K}H\right)\mathbf{M}{P}_{k-1,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}{\mathbf{M}}^{T}{\left(\mathbf{I}-\mathbf{K}H\right)}^{T}\hfill \\ \hfill & +\left(\mathbf{I}-\mathbf{K}H\right)\mathbf{Q}{\left(\mathbf{I}-\mathbf{K}H\right)}^{T}+\mathbf{K}R{K}^{T}.\hfill \end{array}$

Under our assumptions, the weights are

((19) )
$\begin{array}{c}\hfill w\propto \frac{exp\left(-\frac{1}{2}{\left({\mathbf{x}}_{k}^{j}-{\mathit{\mu }}_{k}\right)}^{T}{\mathbf{P}}_{k}^{-1}\left({\mathbf{x}}_{k}^{j}-{\mathit{\mu }}_{k}\right)\right)}{exp\left(-\frac{1}{2}{\left({\mathbf{x}}_{k}^{j}-{\mathit{\mu }}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}\right)}^{T}{\mathbf{P}}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}^{-1}\left({\mathbf{x}}_{k}^{j}-{\mathit{\mu }}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}\right)\right)}.\end{array}$

Asymptotically, for large ensembles, the sample mean converges to the posterior mean and the sample covariance converges to the posterior covariance, so that the variance of the PF-EnKF weights for the filtering density goes to zero asymptotically (see, e.g. Burgers et al., 1998).

We are interested in the collapse of the particle filter interpretation of a localized EnKF with small ensemble sizes, and study this situation by making simplifying assumptions. Specifically, we neglect errors in the ensemble mean, i.e. we set ${\mathit{\mu }}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}={\mathit{\mu }}_{k}$, and assume that errors in the ensemble covariance matrix are small, i.e. ${\mathbf{P}}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}=\left(1+\mathit{\beta }\right){\mathbf{P}}_{k}$, where $\mathit{\beta }\ge 0$ is a small number. One could also consider perturbations of the form $\left(1-\mathit{\beta }\right){\mathbf{P}}_{k}$, but, in the context of importance sampling, perturbations as above are more favourable, and we wish to study a ‘best-case-scenario’ for EnKF. The reason why $\mathit{\beta }>0$ is favourable is that importance sampling requires that the support of the proposal density contains the support of the target density, i.e. one wants to use proposal densities with variances larger than those of the target densities.

While drastic, the above simplifications do lead to useful expansions that can accurately describe the behaviour of the weights of a localized EnKF with small ensembles (see also Section4.3). Those simplifications require that the EnKF mean and covariance are close to correct. In typical large geophysical problems, this can be achieved for small ensemble size (${N}_{e}\approx 100$) only when using covariance localization. Without localization, ensemble sizes substantially larger than ${n}_{x}$ would be necessary. Thus, the subsequent arguments in this section hold only for a localized EnKF and an EnKF with large ensembles. We further assume, for simplicity, that the posterior covariance ${\mathbf{P}}_{k}$ is full rank, so that all its eigenvalues are positive. While the rank of ${\mathbf{P}}_{k}$ may not be equal to the state dimension ${n}_{x}$, conditions (a)–(c) in Section 2.5 imply that the rank of ${\mathbf{P}}_{k}$ is large (we discuss what happens when some eigenvalues are zero below). In the context of the localized EnKF, our above assumption that the EnKF covariance is close to correct implies that ${\mathbf{P}}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}$ has the same rank as the posterior covariance.

Under our assumptions, the weights in Equation (19) become

((20) )
$\begin{array}{c}\hfill w\propto exp\left(\frac{1}{2}{\mathbf{s}}^{T}\mathbf{s}\right),\phantom{\rule{1em}{0ex}}\mathbf{s}=\sqrt{\frac{\mathit{\beta }}{1+\mathit{\beta }}}\phantom{\rule{0.166667em}{0ex}}\left({\mathbf{P}}_{k}^{-1/2}\left({\mathbf{x}}_{k}^{j}-{\mathit{\mu }}_{k}\right)\right),\end{array}$

where ${\mathbf{x}}_{k}^{j}\sim \mathcal{N}\left({\mathit{\mu }}_{k},{\mathbf{P}}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}\right)$, $\mathbf{s}=\mathcal{N}\left(\mathbf{0},\mathit{\beta }\mathbf{I}\right)$.

The variance of the above weights can be computed from the expected values

$\begin{array}{cc}\hfill E\left[w\right]& ={\int }_{\infty }^{\infty }exp\left(-{\mathbf{s}}^{T}\mathbf{s}/2\right)\frac{exp\left(-{\mathbf{s}}^{T}\mathbf{s}/\left(2\mathit{\beta }\right)\right)}{{\left(2\mathit{\pi }\mathit{\beta }\right)}^{n/2}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.333333em}{0ex}}\text{d}\mathbf{s}\hfill \\ \hfill & ={\left(\frac{1}{1+\mathit{\beta }}\right)}^{{n}_{x}/2},\hfill \\ \hfill E\left[{w}^{2}\right]& ={\int }_{\infty }^{\infty }exp\left(-{\mathbf{s}}^{T}\mathbf{s}\right)\frac{exp\left(-{\mathbf{s}}^{T}\mathbf{s}/\left(2\mathit{\beta }\right)\right)}{{\left(2\mathit{\pi }\mathit{\beta }\right)}^{n/2}}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.333333em}{0ex}}\text{d}\mathbf{s}\hfill \\ \hfill & ={\left(\frac{1}{1+2\mathit{\beta }}\right)}^{{n}_{x}/2}.\hfill \end{array}$

Indeed, one can define an ‘effective number of samples’ based on the variance of the weights (Arulampalam et al., 2002; Owen, 2013)

((21) )
$\begin{array}{c}\hfill {N}_{e,\phantom{\rule{0.333333em}{0ex}}\text{eff}}=\frac{{N}_{e}}{G},\phantom{\rule{1em}{0ex}}G=\frac{var\left(w\right)}{E{\left(w\right)}^{2}}-1=\frac{E\left({w}^{2}\right)}{E{\left(w\right)}^{2}}.\end{array}$

Thus, the collapse occurs when ${N}_{e,\phantom{\rule{0.333333em}{0ex}}\text{eff}}=1$, i.e. if $G={N}_{e}$, and the ‘quality measure’ G can be used to estimate the ensemble size required to avoid collapse. Under our assumptions,

$\begin{array}{c}\hfill G\left(\mathit{\beta }\right)={\left(\frac{1+\mathit{\beta }}{\sqrt{1+2\mathit{\beta }}}\right)}^{{n}_{x}}\approx 1+\frac{{\mathit{\beta }}^{2}\phantom{\rule{0.166667em}{0ex}}{n}_{x}}{2}+O\left({\mathit{\beta }}^{3}\right).\end{array}$

The approximation is obtained by expanding $G\left(\mathit{\beta }\right)$ around $\mathit{\beta }=0$, i.e. for small errors $\mathit{\beta }$ in the covariance matrix. Indeed, if we interpret $\mathit{\beta }\propto 1/\sqrt{{N}_{e}}$ as sampling error, then G can remain constant as dimension ${n}_{x}$ increases, provided that the ensemble size increases linearly with the dimension. Thus, the collapse of the PF-EnKF for the filtering density does not occur for ensemble sizes proportional to the dimension ${n}_{x}$.

A small G however is not a guarantee that a sampling scheme is ‘working well’. In particular, G may be small if the variance of the proposal density is chosen too small compared to the target density. In this case, one may find that the weights are well distributed; however, other indicators, such as MSE, will show that one is sampling too narrowly within state space.

We can relax the assumption that all eigenvalues of the posterior covariance ${\mathbf{P}}_{k}$ are positive by making use of the effective dimension ${\ell }_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}$ in Equation (13). If some eigenvalues are zero, or if some are close to zero, then one can substitute the ${\ell }_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}\le {n}_{x}$ dimensional approximation of this covariance matrix ${\mathbf{P}}_{k}$ into the above formulas. Using the pseudo-inverse of this matrix in (20), one obtains the same expression for G as above with ${\ell }_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}$ replacing ${n}_{x}$. We argued above that while ${\ell }_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}$ may be smaller than ${n}_{x}$ in meteorological problems, ${\ell }_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}$ can be expected to be large in these problems (see property (b) in Section 2.5). A large ${\ell }_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}$ implies that the collapse of the particle filter interpretation of the EnKF can only be avoided by large ensemble sizes, in particular much larger than the 50–100 ensemble members used in practice.

4.

## Performance of the EnKF during the collapse of its particle filter interpretation

We have shown that the collapse of the PF-EnKF for the filtering density can be prevented by ensemble sizes that scale linearly with the dimension ${n}_{x}$, or, more generally, with the effective dimension ${\ell }_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}$ in (13). We further argued that the large dimensions ${n}_{x}$ and ${n}_{y}$ in meteorological problems cause all particle filters for the smoothing density, including the particle filter interpretation of a localized EnKF, to collapse. On the other hand, localized EnKFs with small ensemble sizes (50 to several hundred) are used in meteorological problems with hundreds of millions of state variables and millions of observations and produce a MSE comparable to the normalized trace of the posterior covariance matrix, see, e.g. Houtekamer et al. (2005), Wang et al. (2007) and Raeder et al. (2012). Thus, it seems to happen that the EnKF ensemble is ‘good’, i.e. there is sufficient spread in the ensemble and MSE is small, while at the same time, the weights are ‘bad’, i.e. all but one weight are near zero. We now demonstrate that this situation indeed occurs: localized EnKFs with small ensemble sizes yield a small MSE while their particle filter interpretation collapses.

4.1.

### Scaling of MSE with dimension

The MSE is defined by

((22) )
$\begin{array}{c}\hfill \phantom{\rule{0.333333em}{0ex}}\text{MSE}=\frac{1}{n}\sum _{i=1}^{n}{\left({\left({\overline{\mathbf{x}}}_{k}\right)}_{i}-{\left({\mathbf{x}}_{k}^{t}\right)}_{i}\right)}^{2},\phantom{\rule{1em}{0ex}}{\overline{\mathbf{x}}}_{k}=\frac{1}{{N}_{e}}\sum _{j=1}^{{N}_{e}}{\mathbf{x}}_{k}^{j}.\end{array}$

Here, ${\mathbf{x}}_{k}^{t}$ is the ‘true’ state of the model at time k, ${\left({\mathbf{x}}^{t}\right)}_{i}$, $i=1,\cdots n$, are its ${n}_{x}$ components and ${\overline{\mathbf{x}}}_{k}$ is the sample mean at time k, computed from the EnKF analysis ensemble members ${\mathbf{x}}_{k}^{j}$, $j=1,\cdots ,{N}_{e}$. We obtain useful scaling equations under the same assumptions as above, i.e. we consider a Gaussian filtering density $\mathcal{N}\left({\mathit{\mu }}_{k},{\mathbf{P}}_{k}\right)$ and assume that the EnKF ensemble members are draws from $\mathcal{N}\left({\mathit{\mu }}_{k},\left(1+\mathit{\beta }\right){\mathbf{P}}_{k}\right)$, where $\mathit{\beta }>0$. As indicated above, these assumptions simplify the problem, but describe a best-case scenario for the EnKF, and in particular model the localized EnKF (see Section 3.2). Indeed, the scalings below are not valid for unlocalized EnKF, unless ${N}_{e}$ is sufficiently large, and, thus, larger than ever used in practice.

Under our assumptions, the statistics of the ensemble mean are

$\begin{array}{c}\hfill {\overline{\mathbf{x}}}_{k}\sim \mathcal{N}\left({\mathit{\mu }}_{k},\frac{1+\mathit{\beta }}{{N}_{e}}\phantom{\rule{0.166667em}{0ex}}{\mathbf{P}}_{k}\right).\end{array}$

If the true state is a sample of the filtering density, one can combine the above expression for the sample mean with the expression (22) for the MSE to obtain the statistics of the MSE:

((23) )
$\begin{array}{cc}& \phantom{\rule{0.333333em}{0ex}}\text{MSE}\sim \mathcal{N}\left(\frac{1+\mathit{\beta }+{N}_{e}}{{N}_{e}}\left(\frac{\phantom{\rule{0.333333em}{0ex}}\text{tr}\left({\mathbf{P}}_{k}\right)}{{n}_{x}}\right),\hfill \\ \hfill & \phantom{\rule{1em}{0ex}}2\phantom{\rule{0.166667em}{0ex}}\left(\frac{\left(1+\mathit{\beta }+{N}_{e}\right)}{{N}_{e}}\phantom{\rule{0.166667em}{0ex}}\frac{||{\mathbf{P}}_{k}{||}_{F}}{{n}_{x}}{\right)}^{2}\right).\hfill \end{array}$
If we assume that the errors in the covariance matrix are due to sampling error, we can set $\mathit{\beta }=O\left(1/\sqrt{{N}_{e}}\right)$, so that the mean of the normalized MSE is
((24) )
$\begin{array}{c}\hfill {n}_{x}·\frac{E\left(\phantom{\rule{0.333333em}{0ex}}\text{MSE}\right)}{\phantom{\rule{0.333333em}{0ex}}\text{tr}\left({\mathbf{P}}_{k}\right)}=1+O\left({N}_{e}^{-1}\right)+O\left({N}_{e}^{-3/2}\right).\end{array}$

Thus, even with small ensemble sizes, and even if the dimensions ${n}_{x}$ and ${n}_{y}$ are huge, one can have that the MSE is on the order of the ‘spread’, defined by the trace of the posterior covariance matrix ${\mathbf{P}}_{k}$ divided by ${n}_{x}$. At the same time, the particle filter interpretation of the EnKF collapses for small, or even moderate ensemble sizes (see Section 3.2).

4.2.

### Local error and MSE vs. global error and collapse of the PF-EnKF

Note that MSE describes the error one can expect on average per dimension. Thus, MSE assesses the performance of the EnKF locally, rather than globally. This assessment of performance is useful in practice. For example, errors are usually examined locally, e.g. one might state that the data assimilation is ‘working poorly in the tropics, but well in mid-latitudes’. Similarly, an EnKF user routinely asserts that data assimilation has succeeded if average errors at stations around the globe are small.

However, when the performance of the PF-EnKF is assessed by the variance of its weights, then one does the opposite. The variance of the weights is due to differences between the posterior density and the proposal density generated by the EnKF ensemble. If the proposal and posterior densities are effectively high-dimensional (with large ${l}_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}$ and ${\mathit{\tau }}_{\phantom{\rule{0.333333em}{0ex}}}\text{sys}$, see Sections 2.4 and 2.5), the differences, or errors, may be small in each dimension, corresponding to small ${\mathit{\lambda }}_{j}$, but the errors add up to a large overall error during the weight calculation. This leads to the collapse of the PF-EnKF, and happens similarly for filtering or smoothing densities.

We have shown that it can indeed occur that localized EnKFs with small ensemble sizes can produce a small MSE while at the same time their particle filter interpretation collapses. The success of the EnKF in meteorology, where MSE is the gold standard, thus suggests that assessing errors locally is sufficient for these problems. There is no need for keeping a ‘global’ variance of the weights small: the EnKF can work well even when this variance is huge, as is illustrated by the collapse of the particle filter interpretation of the EnKF. Indeed, this collapse may occur regularly and without significant practical consequences. However, it is important to note that the particle filter interpretation of the EnKF performs poorly no matter how we evaluate its performance – by MSE or the variance of the weights. As we will describe below, this problem is caused by the fact that the PF-EnKF is not using localization during the weight calculation, which is equivalent to its ‘update’, or analysis. More specifically, localization is used for generating the ensemble, but it is not used when calculating the weights (see Section 5).

4.3.

### Numerical example

We illustrate that the localized EnKF with small ensemble sizes ${N}_{e}$ can yield a small MSE while at the same time its particle filter interpretation collapses by the canonical linear example usually studied in the context of filter collapse (Snyder et al., 2008; Bickel et al., 2008; Bengtsson et al., 2008; Snyder, 2011; Chorin and Morzfeld, 2013; Snyder et al., 2015). The example is defined by

((25) )
$\begin{array}{c}\hfill \mathbf{M}=\mathbf{I},\phantom{\rule{0.277778em}{0ex}}\mathbf{H}=\mathbf{I},\phantom{\rule{0.277778em}{0ex}}\mathbf{Q}=\mathbf{I},\phantom{\rule{0.277778em}{0ex}}\mathbf{R}=\mathbf{I},\end{array}$

and all effective dimensions reviewed in Section 2.4 increase with dimension $n={n}_{x}={n}_{y}$. Thus, for large dimension n, any particle filter will collapse unless the ensemble size is huge. However, the localized EnKF with small ensemble sizes yields a small MSE even when its particle filter interpretation collapses. Here, we demonstrate this scenario by computing the quality measure G in (21) for various n, which is equivalent to computing the ensemble size required to avoid collapse. Specifically, if G depends exponentially on dimension n, then the required ensemble size to avoid collapse also depends exponentially on dimension. Since all effective dimensions increase with dimension n, the scaling of G also illustrates how the required ensemble size scales with effective dimensions.

Computing G for one time step, from $k-1$ to k, requires that one knows the density of the state at $k-1$. We assume that this density is Gaussian with mean ${\mathit{\mu }}_{k-1}=\mathbf{0}$ and covariance ${\mathbf{P}}_{k-1}=\mathbf{I}$. We thus have

$\begin{array}{cc}\hfill p\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}^{j}\right)& =\mathcal{N}\left({\mathbf{x}}_{k-1}^{j},\mathbf{I}\right),\phantom{\rule{1em}{0ex}}p\left({\mathbf{y}}_{k}|{\mathbf{x}}_{k}^{j}\right)=\mathcal{N}\left({\mathbf{x}}_{k}^{j},\mathbf{I}\right),\hfill \\ \hfill p\left({\mathbf{x}}_{k-1}|{\mathbf{y}}_{1:k-1}\right)& =\mathcal{N}\left(\mathbf{0},\mathbf{I}\right).\hfill \end{array}$

We consider the following algorithms:

• (1)   an unlocalized/uninflated EnKF and its PF interpretation for the filtering density, ${N}_{e}=50$,
• (2)   a localized/inflated EnKF and its PF interpretation for the filtering density, ${N}_{e}=50$,
• (3)   a localized/inflated EnKF and its PF interpretation for the filtering density, ${N}_{e}=n$,
• (4)   a localized/inflated EnKF and its PF interpretation for the smoothing density, ${N}_{e}=50$,
• (5)   an optimal PF for the smoothing density with ${N}_{e}=50$.
For each algorithm, we approximate G in (21) from an ensemble of size ${N}_{e}={10}^{6}$ and repeat the calculation 100 times to average out errors in our approximations of G. We do this for problems with dimensions between $n=5$ and $n=500$.

4.3.1.

#### Implementation, localization and inflation of the EnKF

We localize the forecast covariance by the identity matrix, i.e. we set all off-diagonal elements of the EnKF forecast covariance matrix to zero. Note that this is a ‘perfect’ localization for this diagonal example and therefore we are depicting a best-case scenario for the localized EnKF and its particle filter interpretation. Prior inflation is inversely proportional to the ensemble size, i.e. we multiply the forecast covariance matrix by $\left(1+1/{N}_{e}\right)$. However, we tested our results with and without inflation and found that, for this example, the impact of inflation is insignificant compared with localization.

The weights of the PF-EnKF for the smoothing density are computed by (15). Here, we assume that ${\mathbf{x}}_{k-1}\sim p\left({\mathbf{x}}_{k-1}|{\mathbf{y}}_{1:k-1}\right)$, i.e. the PF-EnKF starts off with samples from the correct density, so that ${w}^{k-1}=1/{N}_{e}$. The weights of the PF-EnKF for the filtering density are computed by (16). The proposal density is given by (17), and its covariance is localized by the identity matrix as before. The filtering density is Gaussian, $p\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}\right)\sim \mathcal{N}\left({\mathit{\mu }}_{k},{\mathbf{P}}_{k}\right)$, and its mean and covariance are computed by the Kalman filter formulas.

4.3.2.

#### Results

The left panels of Fig. 1 show the values of G for the algorithms for each experiment (dots) along with the log-linear least-squares approximations and one standard deviation confidence intervals. The right panels show the scaling of MSE with dimension, specifically the mean and two standard deviation error bars of the MSE based on the 100 experiments we perform. We observe the expected exponential scaling of G for the particle filter interpretation of the localized EnKF for the smoothing density. The optimal PF is also characterized by an exponential scaling of G, however slower the rate (see also Snyder et al. (2015)). The particle filter interpretations of localized or unlocalized EnKFs for the filtering density are also characterized by an exponentially increasing G if the ensemble size is fixed at ${N}_{e}=50$. However, the collapse of the particle filter interpretation of the localized EnKF for the filtering density can be avoided if the ensemble size is proportional to dimension, as described by our theoretical considerations above. Specifically, we observe that the PF-EnKF for the filtering density produces the same G independently of n when ${N}_{e}=n$.

Figure 1.

Left: quality measure G as a function of the dimension for the various methods. Shown are the values of G for each experiment (dots) along with log-linear least-squares fits (straight lines) and one standard deviation confidence intervals (shaded regions). Right: normalized MSE as a function of the dimension for the various methods. Shown are the mean (dots) and confidence intervals (error bars) computed from 100 numerical experiments. Also shown (dashed lines) is the mean and two standard deviation confidence intervals of the ideal MSE (dashed green).

In the right panels of Fig. 1, we observe that the localized EnKFs give a small MSE while the unlocalized EnKF and the optimal PF cause a large MSE, which is consistent with what is reported in practice in large-dimensional problems. Specifically, the localized EnKF with ${N}_{e}=50$ gives a small MSE even when its particle filter interpretation collapses. Also shown is the mean and variance of the MSE we obtain without sampling error, i.e. by an optimal algorithm. The localized EnKFs and the optimal sampling algorithm give almost identical MSE. In summary, our numerical experiments confirm our scaling of MSE with dimension and illustrate that a localized EnKF with a small ensemble size indeed can yield a small MSE, while at the same time its particle filter interpretation collapses.

5.

## The importance of localization for particle filters

We argued in Section 2.5 that the collapse of particle filters in meteorology is caused by extremely large state dimensions ${n}_{x}$ and very large numbers of observations ${n}_{y}$, which in turn imply large effective dimensions. On the other hand, we expect that meteorological problems are characterized by correlation lengths that are short compared to the overall domain size, and that the observation operator $\mathbf{H}$ is ‘local’ (see Section 2.5). These assumptions imply that forecast and posterior covariances are sparse, but the PF-EnKF does not utilize this sparsity during the weight calculation, but it does use it when generating the ensemble. If one could make use of the sparsity of the data assimilation problem during weight calculation, then one may be able to avoid this type of filter collapse of the PF-EnKF, even for small ensemble sizes. The situation is perhaps analogous to numerical linear algebra, where computations with large dense matrices are expensive, but large sparse matrices can be dealt with efficiently. We now explain in more detail that ‘localization for particle filters’ means exactly that: making use of sparsity of forecast and posterior covariances. We start by discussing a diagonal problem, for which it is straightforward to define what ‘localization for particle filters’ is, and then move onto discuss more realistic cases.

5.1.

### Localizing particle filters for diagonal problems

In the diagonal example of Section 4.3, exploiting sparsity of covariance matrices means to make use of the fact that the system is diagonal. Since a diagonal problem uncouples into independent scalar sub-problems, one can apply scalar PF-EnKFs to each sub-problem and also compute the weights for each sub-problem independently. The ensemble size required for each scalar sub-problem is independent of dimension, so that the catastrophic scaling of the required ensemble size of the PF-EnKF with dimension disappears. In fact, this strategy of decoupling the problem applies to any particle filter (for smoothing or filtering densities) so that the scaling of the required ensemble size with dimension disappears for every particle filter (see also Rebeschini and van Handel, 2015).

We can connect the above localization of particle filters by decoupling to covariance localization in EnKF as follows. If the EnKF is localized by the identity matrix, which is optimal for this diagonal problem, then covariance localization is equivalent to applying n scalar EnKFs to each sub-problem. Thus, for diagonal examples, there is a one-to-one analogy between covariance localization of EnKF, and a localization of the particle filter interpretation of EnKF by decoupling of the problem.

5.2.

### Localizing particle filters for coupled problems

While a localization of particle filters by decoupling diagonal problems is rather intuitive, the situation is less clear in ‘real’ meteorological problems, which are not diagonal. In this context, it is important to realize that making the weights vary over space is not enough to define a localization strategy. The weights must be local and, in addition, must be used in a way to modify the ensemble locally. In the diagonal example, this is done by resampling each variable separately. In more general situations, this step may not be as straightforward and introduces bias, as highlighted in the mathematically rigorous paper by Rebeschini and van Handel (2015). However, the same is true for covariance localization of EnKF. We argued that covariance localization of the EnKF and localization of particle filters by (approximate) decoupling require the same conditions and we argued that these conditions are satisfied in meteorological problems (see Section 2.5). The impressive performance of localized EnKFs in meteorology suggests that the bias introduced by the covariance localization is small. Thus, if particle filters can be localized by making use of the same properties, then the bias introduced by localization of particle filters can also be expected to be small.

Indeed, several localization strategies for particle filters have been published recently (Reich, 2013; Poterjoy, 2015; Poterjoy and Anderson, 2016; Penny and Miyoshi, 2015; Rebeschini and van Handel, 2015; Tödter and Ahrens, 2015). Earlier work includes the study by Bengtsson et al. (2003), where localization of mixture filters is considered, as well as the paper by Lei and Bickel (2011). All of these strategies yield good results in numerical simulations and in one way or another exploit sparsity of the data assimilation problem by restricting the effects of an observation to its neighbourhood. Our study supports these efforts and connects localization of particle filters to real meteorological problems. Specifically, we draw attention to the fact that the successful performance of the localized EnKF in numerical weather prediction in conjunction with the collapse of its particle filter interpretation on the same problem indicates that localization schemes for particle filters may prevent particle filter collapse in meteorology at the cost of a small bias.

6.

## Discussion

Localizing particle filters by decoupling as described in the context of a diagonal problem applies similarly to block diagonal problems – one may apply independent particle filters to each independent block, which removes any dependence of the performance of the particle filter on dimension or the number of the blocks considered. A meteorological problem is unlikely to be block diagonal, but it may be similar to a large number of loosely coupled sub-problems (see also Section 2). This makes the localization of particle filters less straightforward, as described above, but one can anticipate that localized particle filters can only be successful if the effective dimensions of each sub-problem are small enough to prevent the collapse of particle filters. Our study does not provide any estimate for the effective dimension of each sub-problem, or the number of sub-problems or the degree to which the various blocks are ‘coupled’. Thus, we cannot provide ‘proof’ that localization will indeed make particle filters effective in meteorology. Our study merely indicates that sparsity of the data assimilation problem can be exploited by particle filters and that such strategies may prevent filter collapse in meteorology, provided that each sub-problem is manageable.

There are more analogies between covariance localization of the EnKF and localizing particle filters by (approximate) decoupling. Recall that covariance localization of the EnKF reduces or deletes long-range correlations by Schur products (see Section 2.2). The localization thus restricts the influence of an observation to its neighbourhood. The ‘localized’ weight calculation we describe for diagonal problems does precisely that. Moreover, unlocalized EnKFs with small ensemble sizes fail in high-dimensional problems, and localization makes them effective. Similarly, particle filters collapse in high-dimensional problems if the ensemble size is small; however, a localization by decoupling may make them effective.

Finally, a localization of particle filters is only useful if local errors such as MSE are of greater importance than the global errors measured by the variance of the weights. We showed in Section 4 that the MSE of a (localized) EnKF with a small ensemble can be small even when global errors, measured by the weights of the PF-EnKF, are huge. The success of EnKFs in meteorology suggests that it is sufficient to evaluate errors locally.

6.1.

### Extensions of our results

Our results concerning the PF-EnKF for the filtering density in Section 3.2, as well as the results concerning the MSE in Section 4 generalize to data assimilation problems with a ‘perfect’ model for which $\mathbf{Q}=\mathbf{0}$. However, the results concerning the smoothing density cannot be carried over to data assimilation problems with deterministic models by setting $\mathbf{Q}=\mathbf{0}$. The reason is that the smoothing density $p\left({\mathbf{x}}_{1:k}|{\mathbf{y}}_{1:k}\right)$ no longer exists. In the limit $\mathbf{Q}\to \mathbf{0}$, one can consider either the filtering density $p\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}\right)$, or the density $p\left({\mathbf{x}}_{0}|{\mathbf{y}}_{1:k}\right)$. Using the formulas and setting $\mathbf{Q}=\mathbf{0}$, one considers methods for the latter density, i.e. the density that describes initial conditions conditioned on data collected at later times. The formulas for the optimal PF with $\mathbf{Q}=\mathbf{0}$ lead to a sampling algorithm for $p\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}\right)$, but this algorithm is not optimal. Other optimal algorithms can be obtained, as linear examples show.

Throughout our paper, we assume that the data assimilation problem is linear and Gaussian. In NWP practice, this is not the case. In fact, EnKF can struggle in non-linear problems (see, e.g. Hodyss, 2011; Hodyss and Campbell, 2013) and in these situations computing weights and using the particle filter interpretation of the EnKF may be particularly helpful. While our results do not directly extend to non-linear problems, the results we collect for linear problems indicate that weight calculations for the EnKF must be localized for non-linear problems as well or else the weights may not be useful in meteorology.

6.2.

### EnKF stability in time

Work towards a stability theory for the EnKF is partially done (Kelly et al., 2014; Kelly et al., 2016a) and more is underway ((Kelly et al., 2016b); Kelly et al., 2016c; Majda and Tong, 2016). The goal of these papers is to understand the EnKF and its approximations better by studying stability of its estimates in time. For example, Kelly et al. (2014) interpret the EnKF is as a method for approximating the mean of the filtering density, and the expected distance of an ensemble member to ‘the truth’ is analysed over time. It is shown that this distance is bounded as time evolves and can become small, for large t, if the forecast covariances are inflated. We took a different perspective and address scaling of the computational requirements of the particle filter interpretation of the EnKF with dimension. We considered this scaling during a single time step, as is customary in the particle filter literature (Snyder et al., 2008; Bickel et al., 2008; Bengtsson et al., 2008; Snyder, 2011; Chorin and Morzfeld, 2013; Snyder et al., 2015). We thus neglected accumulation of variance of the weights over time due to the sequential approach, and our study complements recent papers about stability of the EnKF, as well as earlier results by Moral (2004), Chopin (2004) and Künsch (2005) that address accumulation of variances over time.

7.

## Summary

We summarize the main results we have collected and derived.

• (1)   We reviewed and extended the literature about the collapse of particle filters which is caused by a large ‘effective dimension’. There are several mechanisms that can cause a large effective dimension, and, thus, lead to particle filter collapse. We describe how particle filter collapse occurs in meteorology due to huge numbers of state variables and observations, and a ‘local’ structure of the system (see Section 2). Moreover, while there are several definitions of effective dimension in the literature, we argue that every effective dimension can be expected to be large in meteorology.
• (2)   We made explicit connections between the collapse of particle filters and the EnKF. Our study and our assumptions are specific to the localized EnKF and are violated by the EnKF without localization. Specifically, we showed that the particle filter interpretation of the localized EnKF for the filtering density collapses unless the ensemble size increases linearly with dimension. This means in particular that using a localized EnKF as the proposal density does not prevent particle filter collapse in meteorology where dimension is large.
• (3)   We have demonstrated that a localized EnKF with a small ensemble size can produce a small MSE even while its particle filter interpretation collapses. The MSE is small because it is a local measure of error, and the covariance localization of the EnKF can keep local errors small. The collapse of the particle filter interpretation of the EnKF on the other hand is caused by a global weight calculation, during which small errors at various locations add up, rather than average out. We argued that the success of the EnKF indicates that a ‘global’ assessment of error may not be required in meteorology and that assessing errors locally is sufficient.
• (4)   We have described analogies between covariance localization of EnKF and ‘localization’ of particle filters. We first consider a diagonal problem and show that localization of a particle filter by decoupling the problem independent sub-problems is the equivalent to localizing the EnKF by a diagonal matrix. Relaxing the assumption of diagonality provides hints about how localization for particle filters may be implemented in more realistic, coupled problems. In particular, covariance localization for the EnKF and a localization of particle filters rely on the same assumptions of short correlation lengths within a large system domain, and local assessment of error. The impressive performance of the localized EnKF in meteorological problems indicates that these assumptions are indeed valid in meteorology, which implies that localization of particle filters may be feasible in meteorology without introducing large bias. Our collection of results thus corroborates the applicability of recently proposed localization schemes for particle filters.

## Disclosure statement

1No potential conflict of interest was reported by the authors.

## Acknowledgements

We thank Prof. Alexandre J. Chorin of UC Berkeley and Berkeley National Laboratory for interesting discussion and encouragement.

## References

1. Agapiou , S. , Papaspiliopoulos , O. , Sanz-Alonso , D. and Stuart , A. ( 2016 ). Importance sampling: Computational complexity and intrinsic dimension . submitted, pre-print available on https://arxiv.org .

2. Anderson , J. ( 2001 ). An ensemble adjustment Kalman filter for data assimilation . Mon. Weather Rev. 129 , 2884 – 2903 .

3. Arulampalam , M. , Maskell , S. , Gordon , N. and Clapp , T. ( 2002 ). A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking . IEEE Trans. Signal Process. 50 ( 2 ), 174 – 188 .

4. Bengtsson , T. , Bickel , P. and Li , B. ( 2008 ). Curse of dimensionality revisited: the collapse of importance sampling in very large scale systems., IMS Collections: Probability and Statistics: Essays in Honor of David A . Freedman 2 , 316 – 334 .

5. Bengtsson , T. , Snyder , C. and Nychka , D. ( 2003 ). Toward a nonlinear ensemble filter for high-dimensional systems . J. Geophys. Res. 108 , 8775 .

6. Beskos , A. , Crisan , D. and Jasra , A. ( 2014 ). On the stability of sequential Monte Carlo methods in high dimensions . Ann. Appl. Probab. 24 ( 4 ), 1396 – 1445 .

7. Bickel , P. , Li , B. and Bengtsson , T. ( 2008 ). Sharp failure rates for the bootstrap particle filter in high dimensions., Pushing Limits Contemp. Stat.: Contrib . Honor Jayanta K. Ghosh 3 , 318 – 329 .

8. Bishop , C. , Etherton , B. and Majumdar , S. ( 2001 ). Adaptive sampling with the ensemble transform Kalman filter. part I: Theoretical aspects . Monthly Weather Review 129 , 420 – 436 .

9. Bocquet , M. , Pires , C. and Wu , L. ( 2010 ). Beyond Gaussian statistical modeling in geophysical data assimilation . Mon. Weather Rev. 138 , 2997 – 3023 .

10. Burgers , G. , van Leeuwen , P. J. and Evensen , G. ( 1998 ). Analysis scheme in the ensemble Kalman filter . Mon. Weather Rev. 126 , 1719 – 1724 .

11. Chopin , N. ( 2004 ). Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference . Ann. Statistics 32 ( 6 ), 2385 – 2411 .

12. Chorin , A. and Hald , O. ( 2013 ). Stochastic Tools in Mathematics and Science , 3rd ed. , Springer , New York .

13. Chorin , A. , Lu , F. , Miller , R. , Morzfeld , M. and Tu , X. ( 2016 ). Sampling, feasibility, and priors in Bayesian estimation, Discrete Continuous Dyn . Syst. 36 , 4227 – 4246 .

14. Chorin , A. and Morzfeld , M. ( 2013 ). Conditions for successful data assimilation . J. Geophys. Res. – Atmos. 118 , 11522 – 11533 .

15. Chorin , A. , Morzfeld , M. and Tu , X. ( 2010 ). Implicit particle filters for data assimilation . Commun. Appl. Math. Comput. Sci. 5 ( 2 ), 221 – 240 .

16. Chorin , A. and Tu , X. ( 2009 ). Implicit sampling for particle filters . Proc. Nat. Acad. Sci. 106 ( 41 ), 17249 – 17254 .

17. Doucet , A. ( 1998 ). On sequential Monte Carlo methods for Bayesian filtering . Technical Report . Cambridge : Department of Engineering, University Cambridge .

18. Doucet , A. , de Freitas , N. and Gordon , N. (eds.) ( 2001 ). Sequential Monte Carlo Methods in Practice . Springer , New York .

19. Doucet , A. , Godsill , S. and Andrieu , C. ( 2000 ). On sequential Monte Carlo sampling methods for Bayesian filtering . Stat. Comput. 10 , 197 – 208 .

20. Evensen , G. ( 2006 ). Data Assimilation: The Ensemble Kalman Filter . Springer , New York .

21. Fournier , A. , Hulot , G. , Jault , D. , Kuang , W. , Tangborn , W. , co-authors. ( 2010 ). An introduction to data assimilation and predictability in geomagnetism, Space Sci . Rev. 155 , 247 – 291 .

22. Gaspari , G. and Cohn , S. ( 1999 ). Construction of correlation functions in two and three dimensions . Q. J. R. Meteorol. Soc. 125 , 723 – 757 .

23. Gordon , N. , Salmond , D. and Smith , A. ( 1993 ). Novel approach to nonlinear/non-Gaussian Bayesian state estimation . IEEE Proc. Radar Signal Process. 140 ( 2 ), 107 – 113 .

24. Hamill , T. M. , Whitaker , J. and Snyder , C. ( 2001 ). Distance-dependent filtering of background covariance estimates in an ensemble Kalman filter . Mon. Weather Rev. 129 , 2776 – 2790 .

25. Hodyss , D. ( 2011 ). Ensemble state estimation for nonlinear systems using polynomial expansions in the innovation . Mon. Weather Rev. 139 , 3571 – 3588 .

26. Hodyss , D. and Campbell , W. ( 2013 ). Square root and perturbed observation ensemble generation techniques in Kalman and quadratic ensemble filtering algorithms . Mon. Weather Rev. 141 , 2561 – 2573 .

27. Houtekamer , P. and Mitchell , H. ( 2001 ). A sequential ensemble Kalman filter for atmospheric data assimilation . Mon. Weather Rev. 129 , 123 – 136 .

28. Houtekamer , P. L. , Mitchell , H. L. , Pellerin , G. , Buehner , M. , Charron , M. , co-authors. ( 2005 ). Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations . Mon. Weather Rev. 133 , 604 – 620 .

29. Kalman , R. ( 1960 ). A new approach to linear filtering and prediction theory . Trans. ASME-J. Basic Eng. 82 ( Series D ), 35 – 48 .

30. Kalman , R. and Bucy , R. ( 1961 ). New results in linear filtering and prediction theory . ASME J. Basic Eng. Ser. D 83 , 95 – 108 .

31. Kalos , M. and Whitlock , P. ( 1986 ). Monte Carlo Methods, , 1st ed. , Vol. 1 . John Wiley & Sons , New York .

32. Kantas , N. , Beskos , A. and Jasra , A. ( 2014 ). Sequential Monte Carlo methods for high-dimensional inverse problems: A case study for the Navier-Stokes equations, SIAM/ASA . J. Uncertainty Quantification 2 , 464 – 489 .

33. Kelly , D. , Law , K. and Stuart , A. ( 2014 ). Well-posedness and accuracy of the ensemble Kalman filter in discrete and continuous time . Nonlinearity 27 , 2579 – 2603 .

34. Kelly , D. , Majda , A. and Tong , X. ( 2016a ). Concrete ensemble Kalman filters with rigorous catastrophic filter divergence . Proc. Nat. Acad. Sci. 112 ( 34 ), 10589 – 10594 .

35. Kelly , D. , Majda , A. and Tong , X. ( 2016b ). Nonlinear stability of the ensemble Kalman filter with adaptive covariance inflation . Commun. Math. Sci .

36. Kelly , D. , Majda , A. and Tong , X. ( 2016c ). Nonlinear stability and ergodicity of ensemble based kalman filters . Nonlinearity 29 ( 2 ), 657 – 691 .

37. Khlalil , M. , Sarkar , A. , Adhikari , S. and Poirel , D. ( 2015 ). The estimation of time-invariant parameters of noisy nonlinear oscillatory systems . J. Sound Vib. 344 , 81 – 100 .

38. Klaas , M. , de Freitas , N. and Doucet , A. ( 2005 ). Towards practical N2 Monte Carlo: The marginal particle filter , (Proceedings of the 21st Annual Conference on Uncertainty in Artificial Intelligence (UAI-05), Arlington, VA). . pp. 308 – 315 .

39. Künsch , H. ( 2005 ). Recursive Monte Carlo filters: Algorithms and theoretical analysis . Ann. Statistics 33 ( 5 ), 1983 – 2021 .

40. Lei , J. and Bickel , P. ( 2011 ). A moment matching ensemble filter for nonlinear non-Gaussian data assimilation . Mon. Weather Rev. 139 , 3964 – 3973 .

41. Liu , J. and Chen , R. ( 1995 ). Blind deconvolution via sequential imputations . J. Am. Stat. Assoc. 90 ( 430 ), 567 – 576 .

42. Majda , A. \ & Tong , X. ( 2016 ). Robustness and accuracy of finite ensemble Kalman filters in large dimensions . arXiv , p. 1606.09321 .

43. Moral , P. D. ( 2004 ). Feynman-Kac Formulae: Genealogical and Interacting Particle Systems with Applications . Springer , New York .

44. Morzfeld , M. and Chorin , A. ( 2012 ). Implicit particle filtering for models with partial noise, and an application to geomagnetic data assimilation, Nonlinear Process . Geophys. 19 , 365 – 382 .

45. Morzfeld , M. , Tu , X. , Atkins , E. and Chorin , A. ( 2012 ). A random map implementation of implicit filters . J. Comput. Phys. 231 , 2049 – 2066 .

46. Owen , A. B. ( 2013 ). Monte Carlo Theory, Methods and Examples . Online at: http://statweb.stanford.edu/~owen/mc/

47. Papadakis , N. , Memin , E. , Cuzol , A. and Gengembre , N. ( 2010 ). Data assimilation with the weighted ensemble Kalman filter . Tellus 62A , 673 – 697 .

48. Penny , S. and Miyoshi , T. ( 2015 ). A local particle filter for high dimensional geophysical systems, Nonlinear Process . Geophys. 2 , 1631 – 1658 .

49. Poterjoy , J. ( 2015 ). A localized particle filter for high-dimensional nonlinear systems . Mon. Weather Rev. 144 , 59 – 76 .

50. Poterjoy , J. and Anderson , J. ( 2016 ). Efficient assimilation of simulated observations in a high-dimensional geophysical system using a localized particle filter . Mon. Weather Rev . 144 , 2007 – 2020 .

51. Raeder , K. , Anderson , J. L. , Collins , N. , Hoar , T. J. , Kay , J. E. and co-authors. ( 2012 ). DART/CAM: An ensemble data assimilation system for CESM atmospheric models . J. Clim. 25 , 6304 – 6317 .

52. Rebeschini , P. and van Handel , R. ( 2015 ). Can local particle filters beat the curse of dimensionality? Ann. Appl. Probab. 25 ( 5 ), 2809 – 2866 .

53. Reich , S. ( 2013 ). A nonparametric ensemble transform method for Bayesian inference . Mon. Weather Rev. 35 ( 4 ), 1337 – 1367 .

54. Snyder , C. ( 2011 ). Particle filters, the "optimal" proposal and high-dimensional systems . Proceedings of the ECMWF Seminar on Data Assimilation for Atmosphere and Ocean , Reading, UK , 1 – 10 .

55. Snyder , C. , Bengtsson , T. , Bickel , P. and Anderson , J. ( 2008 ). Obstacles to high-dimensional particle filtering . Mon. Weather Rev. 136 ( 12 ), 4629 – 4640 .

56. Snyder , C. , Bengtsson , T. and Morzfeld , M. ( 2015 ). Performance bounds for particle filters using the optimal proposal . Mon: Weather Rev. 143 ( 11 ), 4750 – 4761 .

57. Tippet , M. , Anderson , J. , Bishop , C. , Hamill , T. and Whitaker , J. ( 2003 ). Ensemble square root filters . Mon. Weather Rev. 131 , 1485 – 1490 .

58. Tödter , J. and Ahrens , B. ( 2015 ). A second-order exact ensemble square root filter for nonlinear data assimilation . Mon. Weather Rev. 143 , 1337 – 1367 .

59. van Leeuwen , P. ( 2009 ). Particle filtering in geophysical systems . Mon. Weather Rev. 137 , 4089 – 4144 .

60. van Leeuwen , P. , Cheng , Y. and Reich , S. ( 2015 ). Nonlinear Data Assimilation . Springer .

61. Vanden-Eijnden , E. and Weare , J. ( 2012 ). Data assimilation in the low noise regime with application to the Kuroshio . Mon. Weather Rev. 141 , 1822 – 1841 .

62. Wang , X. , Hamill , T. M. , Whitaker , J. S. and Bishop , C. H. ( 2007 ). A comparison of hybrid ensemble transform Kalman filter-optimum interpolation and ensemble square root filter analysis schemes . Mon. Weather Rev. 135 , 1055 – 1076 .

63. Weare , J. ( 2009 ). Particle filtering with path sampling and an application to a bimodal ocean current model . J. Comput. Phys. 228 , 4312 – 4331 .

64. Zaritskii , V. and Shimelevich , L. ( 1975 ). Monte Carlo technique in problems of optimal data processing . Autom Remote Control 12 , 95 – 103 .

Appendix 1

### Weights for the particle filter interpretation of the EnKF for the filtering density of non-linear problems

The weights of the PF-EnKF when sampling the filtering density are given in (16) and can be re-written as

$\begin{array}{c}\hfill w\propto \frac{p\left({\mathbf{y}}_{k}|{\mathbf{x}}_{k}\right)p\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k-1}\right)}{{\mathit{\pi }}_{\phantom{\rule{0.333333em}{0ex}}}\text{EnKF}\left({\mathbf{x}}_{k}\right)}.\end{array}$

Evaluation of these weights requires integration since

((A1) )
$\begin{array}{c}\hfill p\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k-1}\right)=\int p\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}\right)p\left({\mathbf{x}}_{k-1}|{\mathbf{y}}_{1:k-1}\right)d{\mathbf{x}}_{k-1},\end{array}$

and evaluating this integral is difficult unless the problem is linear and Gaussian (see above). Similarly, evaluating the density of the analysis ensemble at time k also requires integration since

((A2) )
$\begin{array}{c}\hfill {\mathit{\pi }}_{\phantom{\rule{0.333333em}{0ex}}}\text{EnKF}\left({\mathbf{x}}_{k}\right)=\int {\mathit{\pi }}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1},{\mathbf{y}}_{k}\right)p\left({\mathbf{x}}_{k-1}|{\mathbf{y}}_{1:k-1}\right)d{\mathbf{x}}_{k-1},\end{array}$

where ${\mathit{\pi }}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1},{\mathbf{y}}_{k}\right)$ is not Gaussian. However, a basic requirement for weighted sampling (and many other sampling schemes, e.g. Markov Chain Monte Carlo) is that the target and proposal densities be known up to a multiplicative constant. This requirement is not met here so that exact weights for the PF-EnKF when sampling the filtering density are out of reach.

Using ideas of Klaas et al. (2005) for marginal particle filters, one can develop approximate weights. These approximations rely on Monte Carlo for evaluating the integral (A1) and become exact as the ensemble size ${N}_{e}$ goes to infinity. We demonstrate here how to use these ideas. Let ${\mathbf{x}}_{k-1}^{j}$, $j=1,\cdots ,{N}_{e}$, be the analysis ensemble at time $k-1$, which defines an approximation to the filtering density at time $k-1$:

((A3) )
$\begin{array}{c}\hfill p\left({\mathbf{x}}_{k-1}|{\mathbf{y}}_{1:k-1}\right)\approx \frac{1}{{N}_{e}}\sum _{j=1}^{{N}_{e}}\mathit{\delta }\left({\mathbf{x}}_{k-1}-{\mathbf{x}}_{k-1}^{j}\right),\end{array}$

which can be used in the integral (A1) to obtain

$\begin{array}{c}\hfill p\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k-1}\right)\approx \frac{1}{{N}_{e}}\sum _{j=1}^{{N}_{e}}p\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}^{j}\right).\end{array}$

The target density can thus be approximated by

((A4) )
$\begin{array}{c}\hfill p\left({\mathbf{x}}_{k}|{\mathbf{y}}_{1:k}\right)\approx p\left({\mathbf{y}}_{k}|{\mathbf{x}}_{k}\right)\left(\frac{1}{{N}_{e}}\sum _{j=1}^{{N}_{e}}p\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}^{j}\right)\right).\end{array}$

Similarly, we obtain by Monte Carlo approximation of the integral (A2)

$\begin{array}{c}\hfill {\mathit{\pi }}_{\phantom{\rule{0.333333em}{0ex}}}\text{EnKF}\left({\mathbf{x}}_{k}\right)\approx \frac{1}{{N}_{e}}\sum _{j=1}^{{N}_{e}}{\mathit{\pi }}_{\phantom{\rule{0.333333em}{0ex}}}\text{EnKF}\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}^{j},{\mathbf{y}}_{k}\right).\end{array}$

With these approximations, the PF-EnKF weights with respect to the marginal posterior are

((A5) )
$\begin{array}{c}\hfill {w}_{\phantom{\rule{0.333333em}{0ex}}}\text{EnKF}\approx {\stackrel{^}{w}}_{\phantom{\rule{0.333333em}{0ex}}}\text{EnKF}\propto \frac{p\left({\mathbf{y}}_{k}|{\mathbf{x}}_{k}\right){\sum }_{j=1}^{{N}_{e}}p\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}^{j}\right)}{{\sum }_{j=1}^{{N}_{e}}{\mathit{\pi }}_{\phantom{\rule{0.333333em}{0ex}}}\text{EnKF}\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}^{j},{\mathbf{y}}_{k}\right)}.\end{array}$

However, the approximate weights are difficult to compute unless the model’s time stepper is in sync with the observations. The difficulties can be illustrated by considering a problem with one additional model step between two consecutive observations. In this case, the proposal density can be written as

$\begin{array}{cc}\hfill {\mathit{\pi }}_{\phantom{\rule{0.333333em}{0ex}}}\text{EnKF}\left({\mathbf{x}}_{k}\right)=& \int \int {\mathit{\pi }}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1},{\mathbf{y}}_{k}\right)p\left({\mathbf{x}}_{k-1}|{\mathbf{x}}_{k-2}\right)\hfill \\ \hfill & ×p\left({\mathbf{x}}_{k-2}|{\mathbf{y}}_{1:k-2}\right)\mathrm{d}{\mathbf{x}}_{k-1}\mathrm{d}{\mathbf{x}}_{k-2}.\hfill \end{array}$

We can use the analysis ensemble at time $k-2$ to approximate $p\left({\mathbf{x}}_{k-2}|{\mathbf{y}}_{1:k-2}\right)$ as before, but we then must perform the integration over ${\mathbf{x}}_{k-1}$ (assuming there is no observation available at that time):

$\begin{array}{cc}\hfill {\mathit{\pi }}_{\phantom{\rule{0.333333em}{0ex}}}\text{EnKF}\left({\mathbf{x}}_{k}\right)\approx & \frac{1}{{N}_{e}}\sum _{j=1}^{{N}_{e}}\int {\mathit{\pi }}_{k,\phantom{\rule{0.333333em}{0ex}}\text{EnKF}}\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1},{\mathbf{y}}_{k}\right)\hfill \\ \hfill & ×p\left({\mathbf{x}}_{k-1}|{\mathbf{x}}_{k-2}^{j}\right)\mathrm{d}{\mathbf{x}}_{k-1}.\hfill \end{array}$

This integration is difficult in general, which complicates the computation of the weights for the PF-EnKF for the filtering density. In fact, these weights may be impractical because the observations are usually not frequently available (at least not atevery step of the model). These results can be extended to the case of deterministic models by putting $p\left({\mathbf{x}}_{k}|{\mathbf{x}}_{k-1}\right)=\mathit{\delta }\left({\mathbf{x}}_{k}-{\mathbf{x}}_{k-1}\right)$, where $\mathit{\delta }\left(·\right)$ is the delta distribution.