Data assimilation is the process of estimating the system state given previous information and current observational information. The methods typically used for data assimilation in the ocean and atmosphere are variational schemes (Talagrand and Courtier, 1987; Daley, 1991; Courtier et al., 1998) and ensemble Kalman filters (EnKFs) (Evensen, 2003; Bishop et al., 2001). Both methods are built on linear hypotheses and have led to useful results in quasi-nonlinear situations. These methods are optimal for the case where the observations and model error are Gaussian. These methods have been successful in numerical weather prediction (NWP) (Buehner et al., 2010; Kuhl et al., 2013) but are suboptimal for nonlinear model dynamics as the Fokker-Plank equations that govern the evolution of a pdf may only be solved exactly for certain cases. This sub-optimality has led to a proliferation of methods to perform data assimilation each with their own advantages (see, for example, Daley, 1991; Anderson, 2001; Bishop et al., 2001; Sondergaard and Lermusiaux, 2013; Poterjoy, 2016).
While it is well known that atmospheric and oceanic models may have non-Gaussian statistics (Morzfeld and Hodyss, 2019), computational resources limit our ability to fully resolve the data assimilation problem. It was shown in Miyoshi et al., (2014) that ensembles need to have on the order of a thousand members to represent non-Gaussian prior pdfs in an EnKF for a general circulation model, however, typical ensemble sizes are on the order of a hundred (Houtekamer et al., 2014). Additionally, computational constraints lead to data assimilation systems using lower resolutions than the forecasting model and are therefore more linear. In targeting this specific application, algorithmic efficiencies may be found.
Gaussian quadrature filters explicitly assume conditional pdfs are Gaussian in the Bayesian filtering equations. Then powerful numerical integration techniques are used, e.g. Gaussian quadrature and cubature, to evaluate the resulting integral equations. The first of these types of filters appeared in the early 2000s with Ito and Xiong (2000) and Wu et al. (2006) but it was not until the cubature Kalman filter (Arasaratnam and Haykin, 2009) that Gaussain quadrature filters became popular. Since then, they have seen extensive use in radar tracking (Haykin et al., 2011), traffic flow (Liu et al., 2017), power systems (Sharma et al., 2017), etc.; however, they have not enjoyed the same popularity in atmospheric and oceanographic sciences. This is likely due to their expense as quadrature rules require many evaluations of the nonlinear model. The central difference filter (CDF) (Ito and Xiong, 2000) uses low-order polynomial quadrature requiring twice the number of model evaluations as the size of the state space. Higher order quadrature methods require even more model evaluations. The CDF has successfully outperformed Extended Kalman Filter (EKF) (Ito and Xiong, 2000), unscented Kalman filter (UKF) (Ito and Xiong, 2000), and 4 D-Var (King et al., 2016) for low dimensional problems. The nonlinear filter presented here, the Assumed Gaussian Reduced (AGR) filter, is essentially a square root version of the CDF with dynamical sampling.
The AGR filter uses low-order polynomial quadrature that takes advantage of the properties of Gaussian distributions to achieve an effective higher order of accuracy. To further reduce the computational costs of the filter, singular value sampling is used. These two techniques make the AGR filter efficient in terms of nonlinear model evaluations giving it potential for atmospheric and oceanic applications. The algorithm for the AGR filter is similar to that of a square-root EnKF but with a different prediction step. This prediction step will cost more computationally to perform than a typical EnKF prediction step in terms of matrix and vector operations. However, the AGR filter formulation of this prediction step will be more accurate for numerical models with small fourth order derivatives, i.e., moderately nonlinear systems.
This manuscript is organized as follows: Section 2 begins with a brief review of Bayesian filtering followed by details regarding assumptions about the associated pdfs to arrive at a discrete filter in terms of Gaussian integrals. The evaluation of these Gaussian integrals is discussed in Section 3 in terms of low-rank polynomial quadrature for scalar and multi-dimensional problems. Results are presented relating to the performance of this quadrature to help to define the scenarios in which this filter should be used. The algorithm for the full AGR filter is presented in Section 4. Section 5 uses a one-dimensional Korteweg-de Vries model and a two-dimensional Boussinesq model to compare the performance of the AGR filter versus a square root EnKF filter. Final remarks are in Section 6. The appendix contains the formulas used in Sections 2 and 3.
Linking Bayesian filtering to Gaussian quadrature filters
We begin our discussion with a review of Bayesian filtering in order to highlight the differences between common types of nonlinear filters. The aim of Bayesian filtering is to estimate the pdf where xt is the current state at time t and contains the previous observations up to time t. The Bayesian filter is most commonly developed as a recursive filter formed by first applying Bayes’ rule to and then applying the Markovian properties of the observations, i.e. the property that observations depend only on the current state. The filter was first described in Ho and Lee (1964) and is discussed detail in Särkkä (2013) and Chen (2003). This filter is typically divided into two steps: the first step, which we will refer to as the prediction step, computes the prior distribution using preliminary information given by the Chapman–Kolmogorov equation
The second step, which we will refer to as the correction step, computes the posterior distribution
Similarly, the covariance of (2.1) is given by
To form the basis for Gaussian quadrature filters, we will make the additional simplifying assumption
(1) Prediction step:
(2) Correction step:
With this formulation it is easily verified that for a linear f(x) in (2.3), we arrive at the Kalman filter equations exactly. In this regard, the Gaussian quadrature filters can be seen as a nonlinear extension of the Kalman filter. Other nonlinear filters such as the extended Kalman filter or UKF (Julier et al., 2000) may also be formulated using this framework (Särkkä, 2013).
The distinct feature of Gaussian quadrature filters is the evaluation of the Gaussian integrals (2.18) and (2.19) which are multidimensional integrals of the form
Gaussian pdf integration: scalar case
To discuss the evaluation of the Gaussian integrals of the form (3.1), we begin with the scalar case given by
Using the change of variables where is the square root of we arrive at the integral in standard form given by
The odd term in (3.5) zeros out and the mean estimate is now the previous mean propagated forward with a second-order correction term. Similarly, using (3.2) and (3.6), we may compute the prior covariance prediction (2.19) as
The variance is now in terms of the first and second derivatives of the model. The primary cost of evaluating (3.6) and (3.9) comes from computing a1 and a2 via (3.3) which requires three evaluations of the model (2.3): and
One of the reasons this method is effective is that the quadrature error of the mean estimation in (3.6) is based on the fourth derivative of the model f even though we are using a second-order polynomial approximation, see (B.3) in Appendix B. This is due to the fact that odd terms drop out in Gaussian polynomial integration. Meanwhile, the quadrature error in the estimation of the covariance, see (B.6), is related to the size of the third derivative of f.
Non-Gaussian pdf integration
For comparison, we now consider the case of (2.7) and (2.10) without making a Gaussian assumption. To simplify our notation, we will denote the prior pdf by Assume at time t we have sampled from we may then determine the expected error in the mean and variance at time t by propagating samples drawn from forward, and determining their error (see Section 3.3). As in the previous case where is Gaussian, we will relate the error to the moments of This is most conveniently done through a Taylor-series expansion of (2.3). To this end, note that
The AGR filter update Equations (3.6) and (3.9) are only approximating the first few terms in (3.13) and (3.14) assuming the pdf is Gaussian. In contrast, the sample mean (3.15) and sample covariance (3.16) are attempting to approximate the full sums in (3.13) and (3.14) without knowledge of which is a more difficult task.
In this example, we explore the differences in the predicted mean and covariance estimates used by the AGR filter and EnKF filters. In the scalar case, the AGR filter is full rank allowing for comparison between the error caused by the low-order polynomial approximation (3.2) versus the sampling error in an EnKF estimate. Consider the scalar model given by
In this example, and the following examples, we are not considering model error. For the EnKF case, where we approximate (3.20) and (3.21), the mean and covariance depend on c1 and c2. We set the variance P = 1 and and let and We define the true solution to this problem to be given by (3.15) with k = 50,000. In this case, we perform a random draw from P to form the ensembles. We propagate the mean estimate for the AGR filter and the ensemble for the EnKF using (3.19) and compute the error in the predicted means and covariances. The error map of the mean estimates of (3.6) and (3.15) for the different values of c2, c4 and ensemble sizes k = 5, 10, 100 for the EnKF are shown in Fig. 1. Note for this example the AGR filter only requires 3 model evaluations as described in Section 3.1 whereas the EnKF requires the same number of model evaluations as the ensemble size.
In Fig. 1a and (b), for a similar number of model evaluations to the AGR filter, the sampling error in the EnKF estimates are quite large. Note that the color bars in (a) and (b) are the same and are of a different order than the color bars used in (c) and (d). In panels (c) and (d), the amount error in the EnKF estimate with k = 100 and the AGR filter is comparable. The AGR filter quadrature error is invariant with respect to changes in c2, whereas the EnKF estimation error depends on both c2 and c4 as expected given (3.20). If c4, which the fourth derivative depends on, is sufficiently small we expect better performance from the AGR filter estimated mean (3.6) regardless of the size of c2.
In the prior covariance estimates in Fig. 2, we see in (a) that the error in the EnKF covariance estimate with k = 100 grows with increases in c2 and c4. By comparison, the error in the AGR filter covariance in (b) is small when c4 is small and grows as the fourth-order derivative grows as expected since the error depends on The AGR filter covariance estimation is equal to or better than the EnKF estimate for small c4. For larger c4, the EnKF covariance estimate performs better. Note for this example we do not have a c3 term which the error in the AGR filter and EnKF depends on as well.
This example demonstrates the types of scenarios where one might choose one type of filter over another. For small ensemble sizes, the AGR filter may be the preferable choice as well as for the case where the model is moderately nonlinear, i.e. small magnitude higher order terms. For a large ensemble with large model fourth derivatives, the EnKF may provide a better estimate of the predicted mean.
Gaussian pdf integration: multi-dimensional case
We will now extend the results in Section 3.1 to higher dimensions. To evaluate integrals of the form (3.1), we begin by first applying the coordinate transform where S is the square root of the covariance such that Using this change of coordinates, we can convert (3.1) to the standard form with N(0, I), where I is the identity matrix. Then
Using (3.22) we can develop formulas to evaluate (2.18) and (2.19) explicitly based on polynomial quadrature. In Ito and Xiong (2000), is approximated by the function such that for points in The multivariate polynomial is given by
Evaluating a and b requires model evaluations. Note that we do not use cross derivative terms in the Hessian which would require an additional model evaluations to compute.
To summarize, a change of coordinates is used to transform the Gaussian integrals into standard form. We then approximate by a quadratic polynomial. Using this approximation, we create self-contained formulas for the predicted mean and covariance.
Similar to the scalar case, odd polynomial terms drop out in the polynomial quadrature. This results in the quadrature error in estimating the mean (3.30) on the order of the fourth derivative of the nonlinear model (see (B.8) in the appendix) even though our polynomial approximation (3.24) is only second order. We do not see as much benefit in the computation of the covariance as the error given by (B.11) is related to the cross terms in the Hessian approximation that were dropped in (3.24). Overall, the contribution to the filter error from the low-order polynomial quadrature is minimized for moderately nonlinear systems.
For this example, we will again look at the effects of nonlinearity versus sampling in the AGR filter and the EnKF. We consider a variable coefficient Korteweg-de Vries (KdV) model that governs the evolution of Rossby waves in a jet flow (Hodyss and Nathan, 2002). This may be written as
We begin by creating a 35,000 member ensemble that will be used as the true solution in our experiments. This ensemble was created by drawing the members from climatology then using an EnKF to perform three system cycles using observations created from an ensemble member. This was done to improve the quality of the ensemble. The resulting covariance of this ensemble has eigenvalues plotted in Fig. 4.
The eigenvalues of and their corresponding eigenvectors will be used to form needed by the AGR filter. Additionally, members for smaller ensemble sizes will be drawn randomly from the 35,000-member ensemble. Since has near-zero eigenvalues, we will consider only the first 250 eigen-directions thus
One way to observe the impact of the increased nonlinearity is to look at the influence of b in (3.24). For comparison we consider the filter without the second-order correction term which uses the first-order polynomial quadrature as AGR1, and with b which uses the second-order polynomial quadrature as AGR2.
Both filters are initialized using the mean of the 35,000 member ensemble, which we consider to be the true mean. The perturbations for the 500 member ensemble are drawn from the 35,000 member ensemble and then re-centered on the true mean. The S for the AGR filters is described above. All methods are integrated forward to t0 and the prior means and covariances are computed. Fig. 5a compares the L2 error in the EnKF prior mean solution with K = 500 and the AGR filter solutions with m = 250. The AGR2 filter significantly outperforms the AGR1 filter, demonstrating the importance of the second-order correction term. The AGR2 filter outperforms the EnKF until about or 5501 model time steps. The AGR2 filter performs well prior to this point having half the error of the EnKF at or 2501 model time steps. Fig. 5b compares the covariances of the EnKF and AGR filter using the Frobenius norm given by
The AGR1 and AGR2 filters have about the same error in their covariances and outperform the EnKF until about For model regimes which do not have overly large higher order terms, the AGR2 may provide better estimation.
For large n, evaluating (3.30) and (3.34) is prohibitively expensive since it requires model evaluations where ne is the number of nonzero eigenvalues. To reduce the computational cost, we consider the case where only the leading m eigenvalues are kept. Ideally, m would be chosen so that the singular values capture the essential dynamics, however, in atmospheric applications this is may not be possible due to computational constraints. The truncation error in the estimation of the square root Sm of is given by
If has n–m eigenvalues approaching zero this estimation is very accurate. In other words, the extent of the correlations in determines the accuracy of this truncation. The error in evaluating (3.30) and (3.34) now comes from both quadrature and this truncation.
We repeat the previous experiment with K = 40 introducing undersampling for the EnKF estimate and m = 20 for the AGR estimates. Again we see the importance of the second-order correction term when comparing AGR1 and AGR2 in Fig. 6a. In (a) the AGR2 filter again has half the error of the EnKF at However, due to the presence of sampling error in both of the prior mean estimates, the AGR2 continues to outperform the EnKF until about or 15,501 model time steps after which time the EnKF has a slight edge in performance. In (b) both the AGR1 and AGR2 estimates outperform the EnKF covariance estimates for various values of t0.
In both the cases with undersampling and without undersampling the AGR2 consistently outperformed AGR1 due to the inclusion of the second-order correction term b. Additionally, in both cases, there was a moderately nonlinear regime in which the AGR2 filter outperformed the EnKF. Similar to the scalar case, the AGR2 filter was found to be more sensitive to increased nonlinearity than the EnKF; however, the EnKF proved to be more sensitive to undersampling. This broadened the regime in which the AGR2 filter outperformed the EnKF.
A note on
For this example, Sm was computed from for the AGR filters. This was created using a 35,000 member climatological ensemble. Using fewer ensemble members to create introduces another source of error at the starting time. For example, if is constructed with ensemble members, then the accuracy of the AGR2 filter for m = 20 decreases accordingly for computing the prior covariance estimates as in Fig. 7. For convenience, we have included the error estimate for the EnKF in this plot. Note that the ensemble of the 40 member EnKF is drawn from the 35,000 member climatological ensemble. There are numerous strategies to develop a more accurate and higher rank (Clayton et al., 2013; Derber and Bouttier, 1999) which are beyond the scope of this paper.
In order to utilize the mean (3.30) and covariance (3.34) updates, we develop an algorithm in the same vein as the ensemble square root filters (Whitaker and Hamill, 2002), i.e. we will update Sm keeping Pb in factored form. To begin with we note that after some algebraic manipulation and dropping Q, we may rewrite (3.34) as
Note that so the expression in (4.4) may not be overly expensive to compute. To form the filter, we use the Potter method (Potter, 1963) for the Kalman square root update in reduced order form. This will improve the numerical robustness by ensuring is symmetric and reducing the amount of storage required by the AGR filter by only storing the square root S. To form the filter, let
Letting then we update S by
- Given compute and ai, bi for
- Compute as in (4.4).
- Decompose η such that where D is diagonal and V is unitary. Then
The algorithm itself is readily implemented and requires minimal tuning of the parameter d from Equations (4.2) and (4.3). For quasi-linear systems, the second-order correction term b may be dropped giving the AGR1 filter. In this case, we may further reduce computational cost by using finite differencing instead of centered differencing. Then to evaluate (3.30) and (3.34), we use the finite differencing scheme to approximate the i.e.,
In this section, we present data assimilation comparisons between the AGR2 filter, described in the previous section, and the ensemble square root filter (Tippett et al., 2003) as the example EnKF method. We use this particular filter as the correction step as it is most similar to the AGR filter while having an ensemble estimate for the mean and covariance.
1 D Example
We return to the KdV model given by (3.35). As before we will use k = 40 ensemble members drawn from the 35,000 member ensemble for the EnKF and the AGR2 filter with m = 20. This time the initial for the AGR2 filter will be the mean of the k = 40 EnKF ensemble. Both the EnKF and AGR2 filter will use the same 32 observations at assimilation time. We use localization and multiplicative inflation wherein the correlation length scale used in the localization and the inflation factor were tuned so that the ensemble variance correspond to the true error variance. We again consider different values of t0, the time the model is integrated forward before assimilation, to see how increasing the nonlinearity affects these two filtering algorithms. To reduce the influence of the initial conditions, we will only consider assimilation cycles 200–450.
Figure 8 is a plot of the average error across a data assimilation window for various t0. For smaller t0 the model integration is less nonlinear and we can see that the AGR2 filter has about 30% less error than the EnKF. As t0 gets larger, the model integration is more nonlinear and the error in the solution of the AGR2 grows more rapidly than the EnKF and by or 20001 model time steps the AGR2 error is about 24% less than the EnKF.
This cycling experiment result demonstrates that the improvement in the predicted mean and covariance estimates seen in Fig. 6 leads to an improvement in the data assimilation state estimation or analysis. Also it demonstrates that increasing the nonlinearity has more of an impact on the quality of the AGR2 solution versus the EnKF solution.
2 D Example
We will now investigate the performance of the proposed AGR filter using a two-dimensional Boussinesq model that develops Kelvin-Helmhotz waves, specifically, we use the model developed in (Hodyss et al., 2013). The governing equations given by
is the Laplacian operator, u and w are zonal and vertical winds, respectively, θ is the potential temperature, and ζ is the vorticity. The vorticity source F and the heat source H both have sub-grid scale parameterizations, more details may be found in Hodyss et al. (2013). The buoyancy frequency of the reference state is given by the background potential temperature: And
During the assimilation window, the model is advanced, then the filtering is performed with 112 temperature and 112 wind observations. The observations are created by perturbing the truth via
The error in the mean estimation plots in Fig. 10a and b demonstrate similar results to the one-dimensional KdV example. For the more linear case the AGR2 filter significantly outperforms the EnKF. As t0 is increased, the nonlinearity increases and the AGR2 filter loses its performance advantage over the EnKF until around As before, the increased nonlinearity has a greater impact on the performance of the AGR2 filter as opposed to the EnKF.
We have presented two example problems comparing the AGR filter and the EnKF. The first example was a one-dimensional KdV model in which the AGR filter outperformed the EnKF but was more influenced by nonlinearity. In the second example, a two-dimensional Boussinesq model was considered. In this case, starting with the AGR filter out performed the EnKF. When the error in the mean estimation has more than doubled and the performance between the AGR filter and the EnKF are comparable. Again we see that the AGR filter is more affected by the nonlinearity in the model than the EnKF.
We have presented a quadrature Kalman filter, the AGR filter, for moderately nonlinear systems. The filter uses numerical quadrature to evaluate the Bayesian formulas for optimal filtering under Gaussian assumptions. The AGR filter has the Gaussian noise assumptions and Gaussian joint distribution assumption from Kalman filtering with the added assumption that the prior distribution is Gaussian. This leads to Gaussian integrals which are evaluated using the second-order polynomial quadrature. Due to the properties of Gaussian distributions, using this polynomial achieves the same precision as a third-order polynomial quadrature. This effective higher order quadrature is key to the success of this filter.
In numerical tests, the AGR filter was found to outperform a comparable square-root EnKF in regions of low-to-moderate nonlinearity for a KdV model and a Boussinesq model. We expect these results to extend to more realistic atmospheric models, given that fourth and higher order terms of the model are sufficiently small. For highly nonlinear dynamical systems, the AGR filter is affected more than the square-root EnKF but may still provide performance benefit if the system is severely under-sampled as demonstrated in the scalar example in Section 3.4. It is also possible to use higher order quadrature to reduce the effect of nonlinearity but this would, of course, increase the computational costs of the filter.
While the Gaussian assumption made in this filter may seem restrictive, this assumption is commonly made, or effectively made, in data assimilation. For example, recent results indicate that it may require an ensemble with on the order of one thousand members to capture non-Gaussianity pdfs present in an EnKF for a simplified general circulation model (Miyoshi et al., 2014). This is already significantly more than the O(100) ensemble members typically used in EnKFs for full complexity atmospheric models. Effectively, a Gaussian assumption is being made due to the sample size. The computational efficiency of the AGR filter means that there is greater opportunity to pursue non-Gaussian pdfs via Gaussian mixture models (GMMs). In GMMs a non-Gaussian distribution is approximated by a series of Gaussian distributions which, in this case, would lead to an optimally weighted ensemble of AGR filters.