1.

## Introduction

Data assimilation is the process of estimating the system state given previous information and current observational information. The methods typically used for data assimilation in the ocean and atmosphere are variational schemes (Talagrand and Courtier, 1987; Daley, 1991; Courtier et al., 1998) and ensemble Kalman filters (EnKFs) (Evensen, 2003; Bishop et al., 2001). Both methods are built on linear hypotheses and have led to useful results in quasi-nonlinear situations. These methods are optimal for the case where the observations and model error are Gaussian. These methods have been successful in numerical weather prediction (NWP) (Buehner et al., 2010; Kuhl et al., 2013) but are suboptimal for nonlinear model dynamics as the Fokker-Plank equations that govern the evolution of a pdf may only be solved exactly for certain cases. This sub-optimality has led to a proliferation of methods to perform data assimilation each with their own advantages (see, for example, Daley, 1991; Anderson, 2001; Bishop et al., 2001; Sondergaard and Lermusiaux, 2013; Poterjoy, 2016).

While it is well known that atmospheric and oceanic models may have non-Gaussian statistics (Morzfeld and Hodyss, 2019), computational resources limit our ability to fully resolve the data assimilation problem. It was shown in Miyoshi et al., (2014) that ensembles need to have on the order of a thousand members to represent non-Gaussian prior pdfs in an EnKF for a general circulation model, however, typical ensemble sizes are on the order of a hundred (Houtekamer et al., 2014). Additionally, computational constraints lead to data assimilation systems using lower resolutions than the forecasting model and are therefore more linear. In targeting this specific application, algorithmic efficiencies may be found.

Gaussian quadrature filters explicitly assume conditional pdfs are Gaussian in the Bayesian filtering equations. Then powerful numerical integration techniques are used, e.g. Gaussian quadrature and cubature, to evaluate the resulting integral equations. The first of these types of filters appeared in the early 2000s with Ito and Xiong (2000) and Wu et al. (2006) but it was not until the cubature Kalman filter (Arasaratnam and Haykin, 2009) that Gaussain quadrature filters became popular. Since then, they have seen extensive use in radar tracking (Haykin et al., 2011), traffic flow (Liu et al., 2017), power systems (Sharma et al., 2017), etc.; however, they have not enjoyed the same popularity in atmospheric and oceanographic sciences. This is likely due to their expense as quadrature rules require many evaluations of the nonlinear model. The central difference filter (CDF) (Ito and Xiong, 2000) uses low-order polynomial quadrature requiring twice the number of model evaluations as the size of the state space. Higher order quadrature methods require even more model evaluations. The CDF has successfully outperformed Extended Kalman Filter (EKF) (Ito and Xiong, 2000), unscented Kalman filter (UKF) (Ito and Xiong, 2000), and 4 D-Var (King et al., 2016) for low dimensional problems. The nonlinear filter presented here, the Assumed Gaussian Reduced (AGR) filter, is essentially a square root version of the CDF with dynamical sampling.

The AGR filter uses low-order polynomial quadrature that takes advantage of the properties of Gaussian distributions to achieve an effective higher order of accuracy. To further reduce the computational costs of the filter, singular value sampling is used. These two techniques make the AGR filter efficient in terms of nonlinear model evaluations giving it potential for atmospheric and oceanic applications. The algorithm for the AGR filter is similar to that of a square-root EnKF but with a different prediction step. This prediction step will cost more computationally to perform than a typical EnKF prediction step in terms of matrix and vector operations. However, the AGR filter formulation of this prediction step will be more accurate for numerical models with small fourth order derivatives, i.e., moderately nonlinear systems.

This manuscript is organized as follows: Section 2 begins with a brief review of Bayesian filtering followed by details regarding assumptions about the associated pdfs to arrive at a discrete filter in terms of Gaussian integrals. The evaluation of these Gaussian integrals is discussed in Section 3 in terms of low-rank polynomial quadrature for scalar and multi-dimensional problems. Results are presented relating to the performance of this quadrature to help to define the scenarios in which this filter should be used. The algorithm for the full AGR filter is presented in Section 4. Section 5 uses a one-dimensional Korteweg-de Vries model and a two-dimensional Boussinesq model to compare the performance of the AGR filter versus a square root EnKF filter. Final remarks are in Section 6. The appendix contains the formulas used in Sections 2 and 3.

2.

We begin our discussion with a review of Bayesian filtering in order to highlight the differences between common types of nonlinear filters. The aim of Bayesian filtering is to estimate the pdf $p\left({x}_{t}|{Y}_{T}\right),$ where xt is the current state at time t and ${Y}_{T}=\left\{{y}_{1},..,{y}_{t}\right\}$ contains the previous observations up to time t. The Bayesian filter is most commonly developed as a recursive filter formed by first applying Bayes’ rule to $p\left({x}_{t}|{y}_{t}\right)$ and then applying the Markovian properties of the observations, i.e. the property that observations depend only on the current state. The filter was first described in Ho and Lee (1964) and is discussed detail in Särkkä (2013) and Chen (2003). This filter is typically divided into two steps: the first step, which we will refer to as the prediction step, computes the prior distribution using preliminary information given by the Chapman–Kolmogorov equation

((2.1))
$p\left({x}_{t}|{Y}_{T-1}\right)=\int p\left({x}_{t}|{x}_{t-1}\right)p\left({x}_{t-1}|{Y}_{T-1}\right)\mathrm{d}{x}_{t-1}.$

The second step, which we will refer to as the correction step, computes the posterior distribution

((2.2))
$p\left({x}_{t}|{Y}_{T}\right)=\frac{1}{{Z}_{t}}p\left({y}_{t}|{x}_{t}\right)p\left({x}_{t}|{Y}_{T-1}\right)$
where
${Z}_{t}=\int p\left({y}_{t}|{x}_{t}\right)p\left({x}_{t}|{Y}_{T-1}\right)\mathrm{d}{x}_{t}$
is the normalization constant. The exact solutions of (2.1) and (2.2) are unknown except in special cases. In particular, for linear state dynamics where the prior pdf $p\left({x}_{t}|{x}_{t-1}\right)$ is Gaussian and the measurement likelihood $p\left({y}_{t}|{x}_{t}\right)$ is Gaussian, the filter (2.2) has an exact solution given by the Kalman filter (Kalman, 1960). Otherwise (2.2) may be approximated using a particle filter (Särkkä, 2013; Poterjoy, 2016). In practice, the full pdf $p\left({x}_{t}|{Y}_{T}\right)$ is not used and instead only its first two moments, the mean and covariance, are used. Under Gaussian assumptions this leads to what are referred to as Kalman-type filters.

To summarize the relationship between the Bayesian filter in (2.1) and (2.2) and Kalman-type filters we begin by considering the system given by

((2.3))
${x}_{t}=f\left({x}_{t-1}\right)+{w}_{t}$
with the observation process
((2.4))
${y}_{t}=H{x}_{t}+{v}_{t}$
where $x\in {\mathbb{R}}^{n},y\in {\mathbb{R}}^{d},$f is the model, H is the linear map between the state space and the observation space, w is the Gaussian model error with covariance Q, and v is the Gaussian observation error with covariance R. At time t, the mean of the predictive distribution (2.1) is given by
((2.5))
${x}_{t}^{b}=E\left[{x}_{t},{Y}_{T-1}\right]$
((2.6))
$={\int }_{{\mathbb{R}}^{n}}{x}_{t}p\left({x}_{t}|{Y}_{T-1}\right)\mathrm{d}{x}_{t}$
((2.7))
$={\int }_{{\mathbb{R}}^{n}}f\left({x}_{t-1}\right)p\left({x}_{t-1}|{Y}_{T-1}\right)\mathrm{d}{x}_{t-1}$
where b in ${x}_{t}^{b}$ indicates that x is the background estimate of the mean at time t. Equation (2.7) is computed using (A.3) from Appendix A where $E\left[·\right]$ is the expectation.

Similarly, the covariance of (2.1) is given by

((2.8))
${P}_{t}^{b}=E\left[\left({x}_{t}-{x}_{t}^{b}\right){\left({x}_{t}-{x}_{t}^{b}\right)}^{\mathrm{T}}\right]$
((2.9))
$={\int }_{{\mathbb{R}}^{n}}\left({x}_{t}-{x}_{t}^{b}\right){\left({x}_{t}-{x}_{t}^{b}\right)}^{T}p\left({x}_{t}|{Y}_{t-1}\right)\mathrm{d}{x}_{t}$
((2.10))
$={\int }_{{\mathbb{R}}^{n}}\left(f\left({x}_{t-1}\right)-{x}_{t}^{b}\right){\left(f\left({x}_{t-1}\right)-{x}_{t}^{b}\right)}^{T}p\left({x}_{t-1}|{Y}_{T-1}\right)\mathrm{d}{x}_{t-1}+Q$
using (A.6). The equations for the prediction step, (2.7) and (2.10), are both a consequence of the model error w being Gaussian. To approximate the correction step, it is first assumed that the joint distribution of $\left({x}_{t},{\stackrel{̂}{y}}_{t}\right)$ is Gaussian, more specifically,
((2.11))
$p\left({x}_{t},{y}_{t}|{Y}_{T-1}\right)=p\left({y}_{t}|{x}_{t}\right)p\left({x}_{t}|{Y}_{t-1}\right)$
((2.12))
$=N\left(\left(\begin{array}{c}{{x}_{t}}^{b}\\ {{\stackrel{^}{y}}_{t}}^{b}\end{array}\right),\left(\begin{array}{cc}{{P}_{t}}^{b}& {{P}_{t}}^{xy}\\ {\left({{P}_{t}}^{xy}\right)}^{T}& {P}^{y}\end{array}\right)\right)$
((2.13))
$=N\left(\left(\begin{array}{c}{{x}_{t}}^{b}\\ {{\stackrel{^}{y}}_{t}}^{b}\end{array}\right),\left(\begin{array}{cc}{{P}_{t}}^{b}& {{P}_{t}}^{b}{H}^{T}\\ H{{P}_{t}}^{b}& H{{P}_{t}}^{b}{H}^{T}+R\end{array}\right)\right)$
where ${\stackrel{̂}{y}}_{t}^{b}$ is the estimated observations computed via (2.4) using ${x}_{t}^{b},{P}_{t}^{xy}$ is the cross-covariance between ${x}_{t}^{b}$ and ${\stackrel{̂}{y}}_{t}^{b},$ and ${P}_{t}^{y}$ is the covariance of ${\stackrel{̂}{y}}_{t}^{b}.$ The observation process ${\stackrel{̂}{y}}_{t}^{b}$ and ${P}_{t}^{y}$ are computed similar to (2.7) and (2.10). The computation of the cross covariance ${P}_{t}^{xy}$(2.13) may be found in the appendix (Equation (A.12)). Then it follows from (2.13) that the conditional distribution of xt given yt in (2.2) is approximated in terms of the mean ${x}_{t}^{a}$ and covariance ${P}_{t}^{a}$
$\begin{array}{c}p\left({x}_{t}|{y}_{t},{Y}_{t-1}\right)=p\left({x}_{t}|{Y}_{T}\right)\\ =N\left({x}_{t}|{Y}_{T}\right)\end{array}$
where the mean and covariance are given by the Kalman equations
((2.14))
${x}_{t}^{a}={x}_{t}^{b}+{K}_{t}\left({y}_{t}-H{x}_{t}^{b}\right)$
((2.15))
${P}_{t}^{a}=\left(I-{K}_{t}H\right){P}_{t}^{b}$
((2.16))
${K}_{t}={P}_{t}^{b}{H}^{T}{\left(H{P}_{t}^{b}{H}^{T}+R\right)}^{-1}$
where ${x}_{t}^{a}$ is the mean at the analysis (denoted by the a) at time t, y are the observations, and Kt is the Kalman gain. Note that in the above Kalman-type filter framework we have assumed the observation operator H is linear, however, this need not be the case (Ito and Xiong, 2000; Särkkä, 2013). In general, solving (2.7) and (2.10) explicitly is intractable for large problems, including the large problems found in geosciences. One strategy for approximating (2.7) and (2.10) is to use sampling which leads to the expressions for the sample mean and covariance used in EnKFs. Another strategy is to make the further simplifying assumption that $p\left({x}_{t-1}|{Y}_{T-1}\right)$ is Gaussian, arriving at a particular type of assumed density filter referred to as a Gaussian filter in literature. Since EnKF filters also contain Gaussian assumptions, to differentiate these filters we will refer to Gaussian filters as Gaussian quadrature filters.

To form the basis for Gaussian quadrature filters, we will make the additional simplifying assumption

((2.17))
$p\left({x}_{t-1}|{Y}_{T-1}\right)=N\left({x}_{t-1}|{Y}_{T-1}\right),$
i.e., that our prior distribution is Gaussian. With this additional assumption, Equations (2.7) and (2.10) simplify and we arrive at the algorithm

(1) Prediction step:

((2.18))
${x}_{t}^{b}={\int }_{{\mathbb{R}}^{n}}f\left({x}_{t-1}\right)N\left({x}_{t-1}|{Y}_{T-1}\right)\mathrm{d}{x}_{t-1}$
((2.19))
${P}_{t}^{b}={\int }_{{\mathbb{R}}^{n}}\left(f\left({x}_{t-1}\right)-{x}_{t}^{b}\right){\left(f\left({x}_{t-1}\right)-{x}_{t}^{b}\right)}^{T}N\left({x}_{t-1}|{Y}_{T-1}\right)\mathrm{d}{x}_{t-1}+Q.$

(2) Correction step:

$\begin{array}{c}{K}_{t}={P}_{t}^{b}{H}^{T}{\left(R+H{P}_{t}^{b}{H}^{T}\right)}^{-1}\\ {x}_{t}^{a}={x}_{t}^{b}+{K}_{t}\left({y}_{t}-H{x}_{t}^{b}\right)\\ {P}_{t}^{a}=\left(I-{K}_{t}H\right){P}_{t}^{b}.\end{array}$

With this formulation it is easily verified that for a linear f(x) in (2.3), we arrive at the Kalman filter equations exactly. In this regard, the Gaussian quadrature filters can be seen as a nonlinear extension of the Kalman filter. Other nonlinear filters such as the extended Kalman filter or UKF (Julier et al., 2000) may also be formulated using this framework (Särkkä, 2013).

3.

## Gaussian integration

The distinct feature of Gaussian quadrature filters is the evaluation of the Gaussian integrals (2.18) and (2.19) which are multidimensional integrals of the form

((3.1))
$\mathcal{I}={\int }_{{\mathbb{R}}^{n}}F\left({x}_{t-1}\right)N\left({x}_{t-1}|{x}_{t-1}^{a},{P}_{t-1}^{a}\right)\mathrm{d}{x}_{t-1}$
where $F\left(·\right)$ is a general function and $N\left({x}_{t-1}|{x}_{t-1}^{a},{P}_{t-1}^{a}\right)$ is equivalent to $N\left({x}_{t-1}|{Y}_{T-1}\right).$ These types of filters are differentiated by the type of quadrature they use, for example, the Gauss-Hermite Kalman filter (Ito and Xiong, 2000; Wu et al., 2006), the cubature Kalman filter (Arasaratnam and Haykin, 2009), and the central difference filter (Ito and Xiong, 2000). The quadrature rules in these methods entail model evaluations and the computation of weights requiring a trade-off between cost and performance. Higher order methods provide greater numerical accuracy but require substantially more model evaluations which may be cost prohibitive. We will use low-order polynomial quadrature to balance computational cost and performance.

3.1.

### Gaussian pdf integration: scalar case

To discuss the evaluation of the Gaussian integrals of the form (3.1), we begin with the scalar case given by

$\mathcal{I}={\int }_{\mathbb{R}}F\left({x}_{t-1}\right)N\left({x}_{t-1}|{x}_{t-1}^{a},{P}_{t-1}^{a}\right)\mathrm{d}{x}_{t-1}.$

Using the change of variables ${x}_{t-1}=\sqrt{P}\eta +{x}_{t-1}^{a},$ where $\sqrt{P}$ is the square root of ${P}_{t-1}^{a},$ we arrive at the integral in standard form given by

$\mathcal{I}={\int }_{\mathbb{R}}\stackrel{˜}{F}\left(\eta \right)N\left(\eta |0,1\right)\mathrm{d}\eta$
where $\stackrel{˜}{F}\left(\eta \right)=F\left(\sqrt{P}\eta +{x}_{t-1}^{a}\right).$ This form of the Gaussian integral allows for the development of explicit formulas to evaluate it. We approximate $F\left(·\right)$ by a second-degree polynomial $\gamma \left(s\right)$ given by
((3.2))
$\gamma \left(s\right)=\stackrel{˜}{F}\left(0\right)+{a}_{1}s+\frac{1}{2}{a}_{2}{s}^{2}$
where
((3.3))
${a}_{1}=\frac{\stackrel{˜}{F}\left(d\right)-\stackrel{˜}{F}\left(-d\right)}{2d}\text{and}{a}_{2}=\frac{\stackrel{˜}{F}\left(d\right)-2\stackrel{˜}{F}\left(0\right)+\stackrel{˜}{F}\left(-d\right)}{{d}^{2}}$
where d > 0 is the step size. Note that because of the change in variables the first and second derivatives, a1 and a2, are in the direction of $\sqrt{P}.$ Then using (3.2) in (2.18), the prior mean estimate is given by
((3.4))
${x}_{t}^{b}={\int }_{\mathbb{R}}f\left(\sqrt{P}\eta +{x}_{t-1}^{a}\right)N\left(\eta |0,1\right)\mathrm{d}\eta$
((3.5))
$={\int }_{\mathbb{R}}\left(f\left(\sqrt{P}·0+{x}_{t-1}^{a}\right)+{a}_{1}\eta +\frac{1}{2}{a}_{2}{\eta }^{2}\right)\frac{1}{\sqrt{2\pi }}{e}^{-\frac{1}{2}{\eta }^{2}}\mathrm{d}\eta$
((3.6))
$=f\left({x}_{t-1}^{a}\right)+\frac{1}{2}{a}_{2}.$

The odd term in (3.5) zeros out and the mean estimate is now the previous mean propagated forward with a second-order correction term. Similarly, using (3.2) and (3.6), we may compute the prior covariance prediction (2.19) as

((3.7))
${P}_{t}^{b}={\int }_{\mathbb{R}}\left(f\left(\sqrt{P}\eta +{x}_{t-1}^{a}\right)-{x}_{t}^{b}{\right)}^{2}N\left(\eta |0,1\right)\mathrm{d}\eta +Q$
((3.8))
$={\int }_{\mathbb{R}}{\left({a}_{1}\eta +\frac{1}{2}{a}_{2}{\eta }^{2}-\frac{1}{2}{a}_{2}\right)}^{2}\frac{1}{\sqrt{2\pi }}{e}^{-\frac{1}{2}{\eta }^{2}}\mathrm{d}\eta +Q$
((3.9))
$={a}_{1}^{2}+\frac{1}{2}{a}_{2}^{2}+Q.$

The variance is now in terms of the first and second derivatives of the model. The primary cost of evaluating (3.6) and (3.9) comes from computing a1 and a2 via (3.3) which requires three evaluations of the model (2.3): $f\left({x}_{t-1}^{a}\right),f\left({x}_{t-1}^{a}-d\sqrt{P}\right),$ and $f\left({x}_{t-1}^{a}+d\sqrt{P}\right).$

One of the reasons this method is effective is that the quadrature error of the mean estimation in (3.6) is based on the fourth derivative of the model f even though we are using a second-order polynomial approximation, see (B.3) in Appendix B. This is due to the fact that odd terms drop out in Gaussian polynomial integration. Meanwhile, the quadrature error in the estimation of the covariance, see (B.6), is related to the size of the third derivative of f.

3.2.

### Non-Gaussian pdf integration

For comparison, we now consider the case of (2.7) and (2.10) without making a Gaussian assumption. To simplify our notation, we will denote $p\left({x}_{t-1}|{x}_{t-1}^{a},{P}_{t-1}^{a}\right)$ the prior pdf by $p\left({x}_{t-1}\right).$ Assume at time t we have ${x}_{t-1}$ sampled from $p\left({x}_{t-1}\right)$ we may then determine the expected error in the mean and variance at time t by propagating samples drawn from $p\left({x}_{t-1}\right)$ forward, and determining their error (see Section 3.3). As in the previous case where $p\left({x}_{t-1}\right)$ is Gaussian, we will relate the error to the moments of $p\left({x}_{t-1}\right).$ This is most conveniently done through a Taylor-series expansion of (2.3). To this end, note that

((3.10))
${x}_{t}=f\left({\mu }_{t-1}\right)+\frac{\mathrm{d}f}{\mathrm{d}{x}_{t-1}}\left({x}_{t-1}-{\mu }_{t-1}\right)+\frac{1}{2}\frac{{\mathrm{d}}^{2}f}{\mathrm{d}{x}_{t-1}^{2}}{\left({x}_{t-1}-{\mu }_{t-1}\right)}^{2}+\cdots$
where ${\mu }_{t-1}$ is the true mean of $p\left({x}_{t-1}\right).$ Applying (3.10) to the expectation of xt gives
((3.11))
${\mu }_{t}=E\left[{x}_{t}\right]$
((3.12))
$={\int }_{\mathbb{R}}f\left({x}_{t-1}\right)p\left({x}_{t-1}\right)\mathrm{d}{x}_{t-1}$
((3.13))
$=f\left({\mu }_{t-1}\right)+\frac{1}{2}\frac{{\mathrm{d}}^{2}f}{\mathrm{d}{x}_{t-1}^{2}}{\sigma }_{t-1}^{2}+\cdots$
where ${\sigma }_{t-1}^{2}$ is the variance derived from $p\left({x}_{t-1}\right).$ Similarly, subtracting (3.13) from (3.10), squaring the result, and applying the expectation one obtains
((3.14))
${\sigma }_{t}^{2}=\frac{\mathrm{d}f}{\mathrm{d}{x}_{t-1}}{\sigma }_{t-1}^{2}\frac{\mathrm{d}f}{\mathrm{d}{x}_{t-1}}+\frac{1}{2}\frac{\mathrm{d}f}{\mathrm{d}{x}_{t-1}}{T}_{t-1}\frac{{\mathrm{d}}^{2}f}{\mathrm{d}{x}_{t-1}^{2}}+\frac{1}{4}\frac{{\mathrm{d}}^{2}f}{\mathrm{d}{x}_{t-1}^{2}}\left({F}_{t-1}-{\sigma }_{t-1}^{4}\right)\frac{{\mathrm{d}}^{2}f}{\mathrm{d}{x}_{t-1}^{2}}+\cdots$
where ${T}_{t-1}$ and ${F}_{t-1}$ are the third and fourth moments of $p\left({x}_{t-1}\right),$ respectively. Without the simplifying assumptions used in the Gaussian pdf case, we arrive at these infinite sums for the mean and covariance.

3.3.

### EnKF framework

To evaluate integrals of the form (2.7) and (2.10), or the expressions in (3.13) and (3.14), in an EnKF framework, statistical sampling is used. The sample mean and variance at time t are

((3.15))
${\overline{x}}_{t}=\frac{1}{k}\sum _{i=1}^{k}{x}_{t}^{\left(i\right)}$
((3.16))
${s}_{t}^{2}=\frac{1}{k-1}\sum _{i=1}^{k}{\left({x}_{t}^{\left(i\right)}-{\overline{x}}_{t}\right)}^{2}$
where k is the number of samples. The error in these estimates is well-known form central limit theorem-type arguments (for example, see Hodyss et al., 2016). The error may be quantified by calculating the squared deviation about the true mean and variance:
((3.17))
$E\left({\left({\overline{x}}_{t}-{\mu }_{t}\right)}^{2}\right)=\frac{{\sigma }_{t}^{2}}{k}$
((3.18))
$E\left({\left({s}_{t}^{2}-{\sigma }_{t}^{2}\right)}^{2}\right)=\frac{1}{k}\left({F}_{t}-\frac{k-3}{k-1}{\sigma }_{t}^{4}\right).$

The AGR filter update Equations (3.6) and (3.9) are only approximating the first few terms in (3.13) and (3.14) assuming the pdf $p\left({x}_{t-1}\right)$ is Gaussian. In contrast, the sample mean (3.15) and sample covariance (3.16) are attempting to approximate the full sums in (3.13) and (3.14) without knowledge of $p\left({x}_{t-1}\right)$ which is a more difficult task.

3.4.

### Scalar example

In this example, we explore the differences in the predicted mean and covariance estimates used by the AGR filter and EnKF filters. In the scalar case, the AGR filter is full rank allowing for comparison between the error caused by the low-order polynomial approximation (3.2) versus the sampling error in an EnKF estimate. Consider the scalar model given by

((3.19))
$f\left(x\right)={c}_{1}x+{c}_{2}{x}^{2}+{c}_{3}{x}^{3}+{c}_{4}{x}^{4}$
with $p\left({x}_{0}\right)$ Gaussian and ${\mu }_{0}=0.$ This implies from (3.13) and (3.14) that the true mean and variance are given by
((3.20))
${\mu }_{1}=〈{x}_{1}〉={c}_{2}{\sigma }_{0}^{2}+\cdots$
((3.21))
${\sigma }_{1}^{2}={c}_{1}{\sigma }_{0}^{2}{c}_{1}+2{c}_{2}{\sigma }_{0}^{4}{c}_{2}+\cdots$

In this example, and the following examples, we are not considering model error. For the EnKF case, where we approximate (3.20) and (3.21), the mean and covariance depend on c1 and c2. We set the variance P = 1 and ${c}_{1},{c}_{3}=0$ and let $0\le {c}_{2}\le 0.6$ and $0\le {c}_{4}\le 0.05.$ We define the true solution to this problem to be given by (3.15) with k = 50,000. In this case, we perform a random draw from P to form the ensembles. We propagate the mean estimate for the AGR filter and the ensemble for the EnKF using (3.19) and compute the error in the predicted means and covariances. The error map of the mean estimates of (3.6) and (3.15) for the different values of c2, c4 and ensemble sizes k = 5, 10, 100 for the EnKF are shown in Fig. 1. Note for this example the AGR filter only requires 3 model evaluations as described in Section 3.1 whereas the EnKF requires the same number of model evaluations as the ensemble size.

Fig. 1.

The L2 error in the estimated (a) EnKF mean (3.15) for k = 5, (b) EnKF mean for k = 10, (c) EnKF mean for k = 100, and (d) AGR filter mean (3.6) for increasing values of c2 (horizontal axis) and c4 (vertical axis). Note that the color scales are different between the first two plots (a) & (b) and the second two (c) & (d).

In Fig. 1a and (b), for a similar number of model evaluations to the AGR filter, the sampling error in the EnKF estimates are quite large. Note that the color bars in (a) and (b) are the same and are of a different order than the color bars used in (c) and (d). In panels (c) and (d), the amount error in the EnKF estimate with k = 100 and the AGR filter is comparable. The AGR filter quadrature error is invariant with respect to changes in c2, whereas the EnKF estimation error depends on both c2 and c4 as expected given (3.20). If c4, which the fourth derivative depends on, is sufficiently small we expect better performance from the AGR filter estimated mean (3.6) regardless of the size of c2.

In the prior covariance estimates in Fig. 2, we see in (a) that the error in the EnKF covariance estimate with k = 100 grows with increases in c2 and c4. By comparison, the error in the AGR filter covariance in (b) is small when c4 is small and grows as the fourth-order derivative grows as expected since the error depends on ${c}_{4}^{2}.$ The AGR filter covariance estimation is equal to or better than the EnKF estimate for small c4. For larger c4, the EnKF covariance estimate performs better. Note for this example we do not have a c3 term which the error in the AGR filter and EnKF depends on as well.

Fig. 2.

The L2 error in (a) EnKF prior covariance estimate for k = 100 and (b) the error in the AGR filter prior covariance estimate for increasing values of c2 (horizontal axis) and c4 (vertical axis).

This example demonstrates the types of scenarios where one might choose one type of filter over another. For small ensemble sizes, the AGR filter may be the preferable choice as well as for the case where the model is moderately nonlinear, i.e. small magnitude higher order terms. For a large ensemble with large model fourth derivatives, the EnKF may provide a better estimate of the predicted mean.

3.5.

### Gaussian pdf integration: multi-dimensional case

We will now extend the results in Section 3.1 to higher dimensions. To evaluate integrals of the form (3.1), we begin by first applying the coordinate transform ${x}_{t-1}={S}^{T}\eta +{x}_{t-1}^{a},$ where S is the square root of the covariance ${P}_{t-1}^{a}$ such that ${P}_{t-1}^{a}={S}^{T}S.$ Using this change of coordinates, we can convert (3.1) to the standard form with N(0, I), where I is the identity matrix. Then

((3.22))
$\mathcal{I}={\int }_{{\mathbb{R}}^{n}}\stackrel{˜}{F}\left(\eta \right)\frac{1}{{\left(2\pi \right)}^{n/2}}{e}^{-\frac{1}{2}|\eta {|}^{2}}\mathrm{d}\eta$
where
((3.23))
$\stackrel{˜}{F}\left(\eta \right)=F\left({S}^{T}\eta +{x}_{t-1}^{a}\right).$

Using (3.22) we can develop formulas to evaluate (2.18) and (2.19) explicitly based on polynomial quadrature. In Ito and Xiong (2000), $\stackrel{˜}{F}\left(\eta \right)$ is approximated by the function $\gamma \left(\eta \right)$ such that $\stackrel{˜}{F}\left({z}_{i}\right)=\gamma \left({z}_{i}\right)$ for points $\left\{{z}_{i}\right\}$ in ${\mathbb{R}}^{n}.$ The multivariate polynomial $\gamma \left(\eta \right)$ is given by

((3.24))
$\gamma \left(\eta \right)=\stackrel{˜}{F}\left(0\right)+\sum _{i=1}^{n}{a}_{i}{s}_{i}+\frac{1}{2}\sum _{i=1}^{n}{b}_{i}{s}_{i}^{2}$
where ${a}_{i}\in {\mathbb{R}}^{n}$ is the ith column of a, the first-order variation or Jacobian, si is the ith column of S, and bi is the ith column approximation of the second-order variation, or Hessian. The coefficients a and b may be determined using centered differencing, similar to the scalar case, via
((3.25))
${a}_{i}=\frac{f\left(d{e}_{i}\right)-f\left(-d{e}_{i}\right)}{2d},1\le i\le n$
where $\left\{{e}_{i}\right\}\in {\mathbb{R}}^{n}$ are unit vectors and d > 0. We approximate bi via
((3.26))
${b}_{i}=\frac{f\left(d{e}_{i}\right)-2f\left(0\right)+f\left(-d{e}_{i}\right)}{{d}^{2}},1\le i\le n.$

Evaluating a and b requires $2n+1$ model evaluations. Note that we do not use cross derivative terms in the Hessian which would require an additional $\frac{1}{2}n\left(n-1\right)$ model evaluations to compute.

Using the polynomial in (3.24), we can write the integral (3.22) as

((3.27))
$\mathcal{I}={\int }_{{\mathbb{R}}^{n}}\gamma \left(\eta \right)\frac{1}{{\left(2\pi \right)}^{n/2}}{e}^{-1/2|\eta {|}^{2}}\mathrm{d}\eta$
and create explicit formulas for the mean and covariance:
((3.28))
${x}_{t}^{b}={\int }_{{\mathbb{R}}^{n}}\gamma \left(\eta \right)\frac{1}{{\left(2\pi \right)}^{n/2}}{e}^{-1/2|\eta {|}^{2}}\mathrm{d}\eta$
((3.29))
$=\frac{1}{{\left(2\pi \right)}^{n/2}}{\int }_{{\mathbb{R}}^{n}}\left(\stackrel{˜}{F}\left(0\right)+\sum _{i=1}^{n}{a}_{i}{\eta }_{i}+\frac{1}{2}\sum _{i=1}^{n}{b}_{i}{\eta }_{i}^{2}\right){e}^{-1/2|\eta {|}^{2}}\mathrm{d}\eta$
((3.30))
$=f\left({x}_{t-1}^{a}\right)+\frac{1}{2}\sum _{i=1}^{n}{b}_{i}$
and
((3.31))
${P}_{t}^{b}=Q+{\int }_{{\mathbb{R}}^{n}}\left(\gamma \left(\eta \right)-{x}_{t}^{b}\right){\left(\gamma \left(\eta \right)-{x}_{t}^{b}\right)}^{\mathrm{T}}\frac{1}{{\left(2\pi \right)}^{n/2}}{e}^{-1/2|\eta {|}^{2}}\mathrm{d}\eta$
((3.32))
$=Q+\frac{1}{{\left(2\pi \right)}^{n/2}}{\int }_{{\mathbb{R}}^{n}}\left(\sum _{i=1}^{n}{a}_{i}{\eta }_{i}+\frac{1}{2}\sum _{i=1}^{n}{b}_{i}{\eta }_{i}^{2}-\frac{1}{2}\sum _{i=1}^{n}{b}_{i}\right)$
((3.33))
$·{\left(\sum _{i=1}^{n}{a}_{i}{\eta }_{i}+\frac{1}{2}\sum _{i=1}^{n}{b}_{i}{\eta }_{i}^{2}-\frac{1}{2}\sum _{i=1}^{n}{b}_{i}\right)}^{\mathrm{T}}{e}^{-1/2|\eta {|}^{2}}\mathrm{d}\eta$
((3.34))
$=Q+\sum _{i=1}^{n}{a}_{i}{a}_{i}^{T}+\frac{1}{2}\sum _{i=1}^{n}{b}_{i}{b}_{i}^{T}.$

To summarize, a change of coordinates is used to transform the Gaussian integrals into standard form. We then approximate $\stackrel{˜}{F}\left(s\right)$ by a quadratic polynomial. Using this approximation, we create self-contained formulas for the predicted mean and covariance.

Similar to the scalar case, odd polynomial terms drop out in the polynomial quadrature. This results in the quadrature error in estimating the mean (3.30) on the order of the fourth derivative of the nonlinear model (see (B.8) in the appendix) even though our polynomial approximation (3.24) is only second order. We do not see as much benefit in the computation of the covariance as the error given by (B.11) is related to the cross terms in the Hessian approximation that were dropped in (3.24). Overall, the contribution to the filter error from the low-order polynomial quadrature is minimized for moderately nonlinear systems.

3.5.1

#### Multidimensional example

For this example, we will again look at the effects of nonlinearity versus sampling in the AGR filter and the EnKF. We consider a variable coefficient Korteweg-de Vries (KdV) model that governs the evolution of Rossby waves in a jet flow (Hodyss and Nathan, 2002). This may be written as

((3.35))
${A}_{t}-{A}_{\mathit{xxxx}}+{m}_{p}\left(x\right){A}_{x}+{m}_{g}\left(x\right)-A{A}_{x}=0$
where
$\begin{array}{ccc}mp\left(x\right)& =& 1- \text{exp} \left(-a{x}^{2}\right)\\ {m}_{g}\left(x\right)& =& -2ax \text{exp} \left(-a{x}^{2}\right)\end{array}$
and a = 0.0005. The derivatives are vanishing on the boundary, the initial condition is given by a solitary Rossby wave, and we use 512 model computational nodes. A contour plot of the true solution in time is shown in Fig. 3.

Fig. 3.

Contour plot of the wave amplitude over the domain (vertical axis) of the KdV equation over time (horizontal axis).

We begin by creating a 35,000 member ensemble that will be used as the true solution in our experiments. This ensemble was created by drawing the members from climatology then using an EnKF to perform three system cycles using observations created from an ensemble member. This was done to improve the quality of the ensemble. The resulting covariance ${P}_{0}^{b}$ of this ensemble has eigenvalues plotted in Fig. 4.

Fig. 4.

The 512 sorted eigenvalues of the initial background error covariance ${P}_{0}^{b}$ created from a 35000 member climatological ensemble. The horizontal axis is the eigenvalue number and the vertical axis is the magnitude of each eigenvalue.

The eigenvalues of ${P}_{0}^{b}$ and their corresponding eigenvectors will be used to form $S=\sqrt{P}$ needed by the AGR filter. Additionally, members for smaller ensemble sizes will be drawn randomly from the 35,000-member ensemble. Since ${P}_{0}^{b}$ has near-zero eigenvalues, we will consider only the first 250 eigen-directions thus

${P}_{0}^{b}\approx {U}_{m}{\Sigma }_{m}{U}_{m}^{\mathrm{T}}$
where Σm is a truncated matrix with the first 250 eigenvalues of ${P}_{0}^{b}$ along the diagonal and Um is composed of the corresponding eigenvectors. The square root of ${P}_{0}^{b}$ is then given by ${S}_{m}={U}_{m}\sqrt{{\Sigma }_{m}}$ which is used in the coordinate transform (3.23). In this example, 501 model evaluations are used to compute (3.30) and (3.34), thus the solution will be compared to an ensemble with 500 members for fairness. Similar to the one-dimensional example, the error in the prior mean estimates of the AGR filter and the EnKF is examined as nonlinearity is increased. The nonlinearity is further developed by increasing the amount of time (t0) the model is integrated forward.

One way to observe the impact of the increased nonlinearity is to look at the influence of b in (3.24). For comparison we consider the filter without the second-order correction term which uses the first-order polynomial quadrature as AGR1, and with b which uses the second-order polynomial quadrature as AGR2.

Both filters are initialized using the mean of the 35,000 member ensemble, which we consider to be the true mean. The perturbations for the 500 member ensemble are drawn from the 35,000 member ensemble and then re-centered on the true mean. The S for the AGR filters is described above. All methods are integrated forward to t0 and the prior means and covariances are computed. Fig. 5a compares the L2 error in the EnKF prior mean solution with K = 500 and the AGR filter solutions with m = 250. The AGR2 filter significantly outperforms the AGR1 filter, demonstrating the importance of the second-order correction term. The AGR2 filter outperforms the EnKF until about ${t}_{0}=0.55$ or 5501 model time steps. The AGR2 filter performs well prior to this point having half the error of the EnKF at ${t}_{0}=0.25$ or 2501 model time steps. Fig. 5b compares the covariances of the EnKF and AGR filter using the Frobenius norm given by

$||A|{|}_{\text{FRO}}=\sqrt{t\mathit{\text{race}}\left({A}^{\mathrm{T}}A\right)}.$

Fig. 5.

(a) The L2 error (vertical axis) in the estimate of the prior mean for the EnKF with k = 500, the AGR1 with m = 250, and AGR2 with m = 250 for time step length t0 (horizontal axis). (b)The error in the Frobenius norm of the corresponding covariance estimates.

The AGR1 and AGR2 filters have about the same error in their covariances and outperform the EnKF until about ${t}_{0}=1.$ For model regimes which do not have overly large higher order terms, the AGR2 may provide better estimation.

For large n, evaluating (3.30) and (3.34) is prohibitively expensive since it requires $2{n}_{e}+1$ model evaluations where ne is the number of nonzero eigenvalues. To reduce the computational cost, we consider the case where only the leading m eigenvalues are kept. Ideally, m would be chosen so that the singular values capture the essential dynamics, however, in atmospheric applications this is may not be possible due to computational constraints. The truncation error in the estimation of the square root Sm of ${P}_{t-1}^{a}$ is given by

((3.36))
$|S-{S}_{m}|\le \sum _{i>m}\sqrt{{\sigma }_{i}}.$

If ${P}_{t-1}^{a}$ has nm eigenvalues approaching zero this estimation is very accurate. In other words, the extent of the correlations in ${P}_{t-1}^{a}$ determines the accuracy of this truncation. The error in evaluating (3.30) and (3.34) now comes from both quadrature and this truncation.

We repeat the previous experiment with K = 40 introducing undersampling for the EnKF estimate and m = 20 for the AGR estimates. Again we see the importance of the second-order correction term when comparing AGR1 and AGR2 in Fig. 6a. In (a) the AGR2 filter again has half the error of the EnKF at ${t}_{0}=0.25.$ However, due to the presence of sampling error in both of the prior mean estimates, the AGR2 continues to outperform the EnKF until about ${t}_{0}=1.55$ or 15,501 model time steps after which time the EnKF has a slight edge in performance. In (b) both the AGR1 and AGR2 estimates outperform the EnKF covariance estimates for various values of t0.

Fig. 6.

(a) The L2 error (vertical axis) in the estimate of the prior mean for the EnKF with k = 40, the AGR1 with m = 20, and AGR2 with m = 20 for time step length t0 (horizontal axis). (b) The corresponding error in the Forbenius norm for the prior covariance estimates.

In both the cases with undersampling and without undersampling the AGR2 consistently outperformed AGR1 due to the inclusion of the second-order correction term b. Additionally, in both cases, there was a moderately nonlinear regime in which the AGR2 filter outperformed the EnKF. Similar to the scalar case, the AGR2 filter was found to be more sensitive to increased nonlinearity than the EnKF; however, the EnKF proved to be more sensitive to undersampling. This broadened the regime in which the AGR2 filter outperformed the EnKF.

3.5.2

#### A note on ${P}_{0}^{b}$

For this example, Sm was computed from ${P}_{0}^{b}$ for the AGR filters. This ${P}_{0}^{b}$ was created using a 35,000 member climatological ensemble. Using fewer ensemble members to create ${P}_{0}^{b}$ introduces another source of error at the starting time. For example, if ${P}_{0}^{b}$ is constructed with ${k}_{e}=40,80,160,35000$ ensemble members, then the accuracy of the AGR2 filter for m = 20 decreases accordingly for computing the prior covariance estimates as in Fig. 7. For convenience, we have included the error estimate for the EnKF in this plot. Note that the ensemble of the 40 member EnKF is drawn from the 35,000 member climatological ensemble. There are numerous strategies to develop a more accurate and higher rank ${P}_{0}^{b}$ (Clayton et al., 2013; Derber and Bouttier, 1999) which are beyond the scope of this paper.

Fig. 7.

The error in the Frobenius norm (vertical axis) of the prior covariance estimates in the AGR2 filter for m = 20 with ${P}_{0}^{b}$ computed using ${k}_{e}=40,80,160,35000$ ensemble members for time step length t0 (horizontal axis).

## AGR filters

In order to utilize the mean (3.30) and covariance (3.34) updates, we develop an algorithm in the same vein as the ensemble square root filters (Whitaker and Hamill, 2002), i.e. we will update Sm keeping Pb in factored form. To begin with we note that after some algebraic manipulation and dropping Q, we may rewrite (3.34) as

((4.1))
${P}_{t}^{b}=a\left(I+{\left({a}^{\mathrm{T}}{\left(\frac{1}{2}b{b}^{\mathrm{T}}\right)}^{-1}a\right)}^{-1}\right){a}^{\mathrm{T}}.$
where $a=\left[{a}_{1},\dots ,{a}_{m}\right]$ and $b=\left[{b}_{1},\dots ,{b}_{m}\right]$ are computed using the centered differencing scheme
((4.2))
${a}_{i}=\frac{f\left({S}^{T}\left(d{e}_{i}\right)+{x}_{t-1}^{a}\right)-f\left({S}^{\mathrm{T}}\left(-d{e}_{i}\right)+{x}_{t-1}^{a}\right)}{2d},1\le i\le m$
and we approximate bi via
((4.3))
${b}_{i}=\frac{f\left({S}^{T}\left(d{e}_{i}\right)+{x}_{t-1}^{a}\right)-2f\left({x}_{t-1}^{a}\right)+f\left({S}^{\mathrm{T}}\left(-d{e}_{i}\right)+{x}_{t-1}^{a}\right)}{{d}^{2}},1\le i\le m.$

The above equations are the same as (3.25) and (3.26) but with the truncated S = Sm. Letting $\xi =\sqrt{2}{b}^{†}a,$ where ${b}^{†}$ is the Moore-Penrose pseudo inverse, and using (4.1), then $P=\stackrel{˜}{a}{\stackrel{˜}{a}}^{T}$ where

((4.4))
$\stackrel{˜}{a}=a\sqrt{I+{\left({\xi }^{T}\xi \right)}^{-1}}.$

Note that $\xi \in {\mathbb{R}}^{m}$ so the expression in (4.4) may not be overly expensive to compute. To form the filter, we use the Potter method (Potter, 1963) for the Kalman square root update in reduced order form. This will improve the numerical robustness by ensuring $P={S}^{\mathrm{T}}S$ is symmetric and reducing the amount of storage required by the AGR filter by only storing the square root S. To form the filter, let

((4.5))
$\beta =H\stackrel{˜}{a}\in {R}^{p×m}$
then
((4.6))
$Z=R+H{P}^{b}{H}^{\mathrm{T}}=R+\beta {\beta }^{\mathrm{T}},{K}_{t}=\stackrel{˜}{a}{\beta }^{\mathrm{T}}{Z}^{-1}.$

Thus,

${P}_{t}^{a}=\stackrel{˜}{a}\left(I-{\beta }^{T}{Z}^{-1}\beta \right){\stackrel{˜}{a}}^{T}.$

Letting $\eta =I-{\beta }^{T}{Z}^{-1}\beta =VD{V}^{T}$ then we update S by

${S}_{t}=\left(\sqrt{D}+ϵI\right){V}^{\mathrm{T}}\stackrel{˜}{a}$
where $ϵ>0$ is a tunable parameter. We have chosen to form a regularized S which will help with the conditioning of the matrix and decrease dispersion. Other inflation methods such as multiplicative covariance inflation may also be used. To summarize, the algorithm for the AGR2 filter is as follows:
1. Given ${S}_{t-1}=\left[{s}_{1},\dots ,{s}_{m}\right]$ compute ${x}_{t}^{b}$ and ai, bi for $1\le i\le m.$
2. Compute $\stackrel{˜}{a}$ as in (4.4).
3. Let $\beta =H\stackrel{˜}{a}$ then
${x}_{t}^{a}={x}_{t}^{b}+{K}_{t}\left({y}_{t}-H{x}_{t}^{b}\right)$

where

${K}_{t}=\stackrel{˜}{a}{\beta }^{T}{Z}^{-1}$
and
$Z=R+\beta {\beta }^{T}.$

1. Decompose η such that $\eta =VD{V}^{T}$ where D is diagonal and V is unitary. Then ${S}_{t}=\left(\sqrt{D}+ϵI\right){V}^{T}\stackrel{˜}{a}.$

The algorithm itself is readily implemented and requires minimal tuning of the parameter d from Equations (4.2) and (4.3). For quasi-linear systems, the second-order correction term b may be dropped giving the AGR1 filter. In this case, we may further reduce computational cost by using finite differencing instead of centered differencing. Then to evaluate (3.30) and (3.34), we use the finite differencing scheme to approximate the ${a}_{i},i=1,\dots ,m,$ i.e.,

((4.7))
${a}_{i}=\frac{f\left({S}^{T}\left(d{e}_{i}\right)+{x}_{t-1}^{a}\right)-f\left({x}_{t-1}^{a}\right)}{d},1\le i\le m$
after the coordinate change where d > 0 is the step size. The benefit of computing ai in this manner is that this only requires m + 1 model evaluations. Note that the expression (4.7) amounts to a directional derivative determined by the truncated S. In using S the derivative is computed in the direction of the largest change in the dynamics. Meanwhile, the parameter d restricts the search direction to a constrained set. This is a generalization of the standard derivative, in fact it is a numerical approximation of the Jacobian under a coordinate transformation. In this way, the AGR1 may be viewed as a form of an extended Kalman filter.

5.

## Data assimilation

In this section, we present data assimilation comparisons between the AGR2 filter, described in the previous section, and the ensemble square root filter (Tippett et al., 2003) as the example EnKF method. We use this particular filter as the correction step as it is most similar to the AGR filter while having an ensemble estimate for the mean and covariance.

5.1.

### 1 D Example

We return to the KdV model given by (3.35). As before we will use k = 40 ensemble members drawn from the 35,000 member ensemble for the EnKF and the AGR2 filter with m = 20. This time the initial ${x}_{0}^{a}$ for the AGR2 filter will be the mean of the k = 40 EnKF ensemble. Both the EnKF and AGR2 filter will use the same 32 observations at assimilation time. We use localization and multiplicative inflation wherein the correlation length scale used in the localization and the inflation factor were tuned so that the ensemble variance correspond to the true error variance. We again consider different values of t0, the time the model is integrated forward before assimilation, to see how increasing the nonlinearity affects these two filtering algorithms. To reduce the influence of the initial conditions, we will only consider assimilation cycles 200–450.

Figure 8 is a plot of the average error across a data assimilation window for various t0. For smaller t0 the model integration is less nonlinear and we can see that the AGR2 filter has about 30% less error than the EnKF. As t0 gets larger, the model integration is more nonlinear and the error in the solution of the AGR2 grows more rapidly than the EnKF and by ${t}_{0}=2.05$ or 20001 model time steps the AGR2 error is about 24% less than the EnKF.

Fig. 8.

The L2 error (vertical axis) averaged over the assimilation window for increasing t0 (horizontal axis), the time between cycles, for the EnKF and the AGR2 filter.

This cycling experiment result demonstrates that the improvement in the predicted mean and covariance estimates seen in Fig. 6 leads to an improvement in the data assimilation state estimation or analysis. Also it demonstrates that increasing the nonlinearity has more of an impact on the quality of the AGR2 solution versus the EnKF solution.

5.2.

### 2 D Example

We will now investigate the performance of the proposed AGR filter using a two-dimensional Boussinesq model that develops Kelvin-Helmhotz waves, specifically, we use the model developed in (Hodyss et al., 2013). The governing equations given by

((5.1))
$\begin{array}{ccc}\frac{\partial \zeta }{\partial t}& =& -\left(u\frac{\partial \zeta }{\partial x}+w\frac{\partial \zeta }{\partial z}+\frac{g}{{\theta }_{0}}\frac{\partial \theta }{\partial x}\right)+F,\\ \frac{\partial \theta }{\partial t}& =& -\left(u\frac{\partial \theta }{\partial x}+w\frac{\partial \theta }{\partial z}+w\frac{\partial {\theta }_{0}}{\partial z}\right)+H,\end{array}$
where
$u=\frac{\partial \psi }{\partial z},w=-\frac{\partial \psi }{\partial x},\text{and}\zeta ={\nabla }^{2}\psi ,$

${\nabla }^{2}$ is the Laplacian operator, u and w are zonal and vertical winds, respectively, θ is the potential temperature, and ζ is the vorticity. The vorticity source F and the heat source H both have sub-grid scale parameterizations, more details may be found in Hodyss et al. (2013). The buoyancy frequency of the reference state ${\Theta }_{0}$ is given by the background potential temperature: ${N}_{0}^{2}=\frac{g}{{\theta }_{0}}\frac{d{\theta }_{0}}{dz}={10}^{-4}{s}^{-1}.$ And

${U}_{0}=\frac{V}{2}\left[1+\mathrm{tanh}\left(\mu \frac{z-{z}_{0}}{L}\right)\right]$
is the reference state for the zonal wind with $V=10m{s}^{-1},$μ = 8, L = 1 km, and ${z}_{0}=0.5$ km. The z boundary conditions are a mirrored forcing the vertical velocity to vanish. Additionally, there are sponge boundaries along the left and right sides of the channel. At time t = 0, the flow is perturbed leading to waves that amplify as they travel then break. For this experiment, the model was run with 128 computational nodes in the x direction and 33 nodes (unmirrored) in the z direction. All told the state vector has 8448 elements. The true solution at the end of the assimilation window may be seen in Fig. 9 for (a) the vorticity and (b) the temperature. As the waves move across the atmospheric slice, they grow and eventually shear.

Fig. 9.

The true solution of (5.1) at time t = 15000, or 200 cycles of 75 seconds, for (a) vorticity and (b) temperature. The vertical axis is the height and the horizontal axis is the distance.

During the assimilation window, the model is advanced, then the filtering is performed with 112 temperature and 112 wind observations. The observations are created by perturbing the truth via

((5.2))
${y}_{t}={y}_{t}^{\text{true}}+{R}^{1/2}{\xi }_{t}$
where R is the instrument error covariance and ξ is white noise. For this experiment, $R=1e-2$ for both the temperature and wind. A 24,000 member ensemble was created by cycling random perturbations through the model. The smaller k = 20 and k = 40 ensembles were drawn from this 24,000 member ensemble and which was also used to create ${P}_{0}^{b}$ for the AGR filter. We initialize the ${x}_{0}^{a}$ used in the AGR filter with the mean of the k = 20 ensemble for the EnKF. Both the EnKF with k = 20, 40 and the AGR2 filter will use the same observations for a particular t0. Again both types of filters are using localization and inflation tuned to so that the ensemble variance matches the true error variance. We will compute the error averaged over assimilation cycles 100–400 to reduce the influence of the initial conditions.

The error in the mean estimation plots in Fig. 10a and b demonstrate similar results to the one-dimensional KdV example. For the more linear case $t0=75,$ the AGR2 filter significantly outperforms the EnKF. As t0 is increased, the nonlinearity increases and the AGR2 filter loses its performance advantage over the EnKF until around $t0=300.$ As before, the increased nonlinearity has a greater impact on the performance of the AGR2 filter as opposed to the EnKF.

Fig. 10.

The averaged L2 error (vertical axis) across the data assimilation window in the mean estimates for particular t0 (horizontal axis) for (a) vorticity and (b) temperature.

We have presented two example problems comparing the AGR filter and the EnKF. The first example was a one-dimensional KdV model in which the AGR filter outperformed the EnKF but was more influenced by nonlinearity. In the second example, a two-dimensional Boussinesq model was considered. In this case, starting with $t0=75,$ the AGR filter out performed the EnKF. When $t0=300,$ the error in the mean estimation has more than doubled and the performance between the AGR filter and the EnKF are comparable. Again we see that the AGR filter is more affected by the nonlinearity in the model than the EnKF.

6.

## Final remarks

We have presented a quadrature Kalman filter, the AGR filter, for moderately nonlinear systems. The filter uses numerical quadrature to evaluate the Bayesian formulas for optimal filtering under Gaussian assumptions. The AGR filter has the Gaussian noise assumptions and Gaussian joint distribution assumption from Kalman filtering with the added assumption that the prior distribution is Gaussian. This leads to Gaussian integrals which are evaluated using the second-order polynomial quadrature. Due to the properties of Gaussian distributions, using this polynomial achieves the same precision as a third-order polynomial quadrature. This effective higher order quadrature is key to the success of this filter.

In numerical tests, the AGR filter was found to outperform a comparable square-root EnKF in regions of low-to-moderate nonlinearity for a KdV model and a Boussinesq model. We expect these results to extend to more realistic atmospheric models, given that fourth and higher order terms of the model are sufficiently small. For highly nonlinear dynamical systems, the AGR filter is affected more than the square-root EnKF but may still provide performance benefit if the system is severely under-sampled as demonstrated in the scalar example in Section 3.4. It is also possible to use higher order quadrature to reduce the effect of nonlinearity but this would, of course, increase the computational costs of the filter.

While the Gaussian assumption made in this filter may seem restrictive, this assumption is commonly made, or effectively made, in data assimilation. For example, recent results indicate that it may require an ensemble with on the order of one thousand members to capture non-Gaussianity pdfs present in an EnKF for a simplified general circulation model (Miyoshi et al., 2014). This is already significantly more than the O(100) ensemble members typically used in EnKFs for full complexity atmospheric models. Effectively, a Gaussian assumption is being made due to the sample size. The computational efficiency of the AGR filter means that there is greater opportunity to pursue non-Gaussian pdfs via Gaussian mixture models (GMMs). In GMMs a non-Gaussian distribution is approximated by a series of Gaussian distributions which, in this case, would lead to an optimally weighted ensemble of AGR filters.