Download

A- A+
Alt. Display

# Asynchronous data assimilation with the EnKF in presence of additive model error

## Abstract

The term ‘asynchronous data assimilation’ (ADA) refers to modifications of sequential data assimilation methods that take into consideration the observation time. In Sakov et al. [Tellus A, 62, 24–29 (2010)], a simple rule has been formulated for the ADA with the ensemble Kalman filter (EnKF). To assimilate scattered in time observations, one needs to calculate ensemble forecast observations using the forecast ensemble at observation time. Using then these ensemble observations in the EnKF update matches the optimal analysis in the linear perfect model case. In this note, we generalise this rule for the case of additive model error.

Keywords:
How to Cite: Sakov, P. and Bocquet, M., 2018. Asynchronous data assimilation with the EnKF in presence of additive model error. Tellus A: Dynamic Meteorology and Oceanography, 70(1), p.1414545. DOI: http://doi.org/10.1080/16000870.2017.1414545
Published on 01 Jan 2018
Accepted on 20 Nov 2017            Submitted on 20 Jul 2017
1.

## Introduction

The standard sequential data assimilation (DA) framework assumes that the time is discrete, and that observations assimilated at each time step are made at the analysis time. In practice, the time intervals between consecutive analyses can be substantial, and model states within the observation window can be quite different. It is often important in such cases for a DA method to take into account, in one way or another, the timing of observations. The method is then referred to as asynchronous, in contrast to its standard sequential formulation, which is referred to as synchronous.

The ensemble Kalman filter (EnKF, Evensen, 1994) is based on the Kalman filter (KF), which is a sequential method; however, in the linear perfect model case, the EnKF can be easily adapted to accommodate the asynchronous DA (ADA). Consider the associated minimisation problem:

((1a) )
$\begin{array}{cc}& {\mathbf{x}}_{0}^{\star }=argmin\phantom{\rule{0.277778em}{0ex}}J\left({\mathbf{x}}_{0}\right),\hfill \end{array}$
((1b) )
$\begin{array}{cc}& J\left({\mathbf{x}}_{0}\right)=‖{\mathbf{x}}_{0}-{\mathbf{x}}_{0}^{\mathrm{a}}{‖}_{{\left({\mathbf{P}}_{0}^{\mathrm{a}}\right)}^{-1}}^{2}+\sum _{i=1}^{k}{‖{\mathbf{y}}_{i}-{\mathcal{H}}_{i}\left({\mathbf{x}}_{i}\right)‖}_{{\mathbf{R}}_{i}^{-1}}^{2},\hfill \end{array}$
((1c) )
$\begin{array}{cc}& {\mathbf{x}}_{i}={\mathcal{M}}_{i-1\to i}\left({\mathbf{x}}_{i-1}\right),\phantom{\rule{1em}{0ex}}i=1,\cdots ,k.\hfill \end{array}$
Here the lower index i refers to time ${t}_{i}$, ${\mathbf{x}}_{i}$ is the model state, ${\mathbf{x}}_{0}^{\mathrm{a}}$ – the initial state estimate, ${\mathbf{P}}_{0}^{\mathrm{a}}$ – the initial state error covariance estimate,1${\mathbf{y}}_{i}$ – observations, ${\mathcal{H}}_{i}$ and ${\mathbf{R}}_{i}$ – the corresponding observation operator and observation error covariance matrix, ${\mathcal{M}}_{i\to j}$ is the model resolvent that maps the state from time ${t}_{i}$ to ${t}_{j}$; the norm notation ${‖\mathbf{x}‖}_{\mathbf{B}}^{2}\equiv {\mathbf{x}}^{\mathrm{T}}\mathbf{B}\mathbf{x}$ is used.

In order to simplify the notations, we define $\mathrm{b}lkdiag\left({\mathbf{A}}_{1},\dots ,{\mathbf{A}}_{k}\right)$ as the block diagonal matrix (or operator) made of the square matrices (or square operators) ${\left\{{\mathbf{A}}_{i}\right\}}_{i=1,\dots ,k}$. We also define $\mathrm{v}ec\left({\mathbf{A}}_{1},\dots ,{\mathbf{A}}_{k}\right)$ as the matrix (or operator) that stacks the matrices (or operators) ${\left\{{\mathbf{A}}_{i}\right\}}_{i=1,\dots ,k}$ that are assumed to have the same column number. With these notations, the cost function (1b) can be written as follows:

((2) )
$\begin{array}{c}\hfill J\left({\mathbf{x}}_{0}\right)=‖{\mathbf{x}}_{0}-{\mathbf{x}}_{0}^{\mathrm{a}}{‖}_{{\left({\mathbf{P}}_{0}^{\mathrm{a}}\right)}^{-1}}^{2}+{‖\mathbf{y}-\mathcal{H}\circ \mathcal{M}\left({\mathbf{x}}_{0}\right)‖}_{{\mathbf{R}}^{-1}}^{2},\end{array}$

where

((3) )
$\begin{array}{cc}& \mathbf{y}\equiv \mathrm{v}ec\left({\mathbf{y}}_{1},\dots ,{\mathbf{y}}_{k}\right),\hfill \end{array}$
((4) )
$\begin{array}{cc}& \mathbf{R}\equiv \mathrm{b}lkdiag\left({\mathbf{R}}_{1},\cdots ,{\mathbf{R}}_{k}\right)\hfill \end{array}$

are the concatenated observation vector and corresponding observation error covariance matrix, respectively, and the forward map $\mathcal{H}\circ \mathcal{M}\left({\mathbf{x}}_{0}\right)$ relates the initial state ${\mathbf{x}}_{0}$ to the observations:

((5) )
$\begin{array}{c}\hfill \mathcal{H}\circ \mathcal{M}\left({\mathbf{x}}_{0}\right)\equiv \mathrm{v}ec\left({\left\{{\mathcal{H}}_{i}\circ {\mathcal{M}}_{0\to i}\left({\mathbf{x}}_{0}\right)\right\}}_{i=1,\dots ,k}\right).\end{array}$

Apart from the forward map (5) that yields forecast observations at times ${t}_{1},\cdots ,{t}_{k}$, the cost function (2) has the same form as that for a single DA cycle with synchronous observations. Consequently, in the linear case, when the tangent linear forward operator $\mathbf{H}\circ \mathbf{M}$ (the sensitivity of observations to the initial state) does not depend on ${\mathbf{x}}_{0}$, the cost function (2) can be minimised using existing EnKF solutions, provided that the standard (synchronous) observation functions are replaced by the generic forward function (5):

((6) )
$\begin{array}{c}\hfill \mathcal{H}\left({\mathbf{E}}^{\mathrm{f}}\right)←\mathrm{v}ec\left({\left\{{\mathcal{H}}_{i}\circ {\mathcal{M}}_{0\to i}\left({\mathbf{E}}_{0}^{\mathrm{a}}\right)\right\}}_{i=1,\dots ,k}\right),\end{array}$

where $\mathbf{E}$ is the EnKF ensemble; upper “f” refers to forecast; $\mathcal{H}\left({\mathbf{E}}^{\mathrm{f}}\right)$ denotes the forecast ensemble observations and ${\mathbf{E}}_{0}^{\mathrm{a}}$ is the initial ensemble. In practice, this means that to assimilate asynchronously with the EnKF one simply needs to calculate ensemble observations using forecast ensemble at observation time (Sakov et al., 2010). There are no specific restrictions on $\mathbf{R}$ in (2) – in particular, it does not have to be block-diagonal – therefore in theory observation errors can be correlated in time within a DA cycle.

Before formulated in Sakov et al. (2010), the recipe (6) has been used in a scheme called 4D-LETKF (Hunt et al., 2007), where it was empirically derived from a method called 4D-EnKF (Hunt et al., 2004).

In practical terms, the ADA makes it possible to avoid storing the full ensemble at each observation time, storing only ensemble observations instead. Because in large-scale geophysical systems the model size is typically much larger than the number of observations, this yields a significant economy of resources.

Formulation () of the minimisation problem implies assimilation time ${t}_{0}$; however, one interesting feature of the EnKF in the linear perfect model case is the time invariance of the optimising ensemble transform. If the update is conducted at time ${t}_{i}$ by applying transform $\mathbf{X}$ to a forecast ensemble ${\mathbf{E}}_{i}^{\mathrm{f}}$,

((7) )
$\begin{array}{c}\hfill {\mathbf{E}}_{i}^{a}={\mathbf{E}}_{i}^{f}\mathbf{X},\end{array}$

then, to the same effect, it can be applied at other time ${t}_{j}$:

((8) )
$\begin{array}{c}\hfill {\mathbf{E}}_{j}^{\mathrm{a}}={\mathbf{M}}_{i\to j}{\mathbf{E}}_{i}^{\mathrm{a}}={\mathbf{M}}_{i\to j}\left({\mathbf{E}}_{i}^{\mathrm{f}}\mathbf{X}\right)=\left({\mathbf{M}}_{i\to j}{\mathbf{E}}_{i}^{\mathrm{f}}\right)\mathbf{X}={\mathbf{E}}_{j}^{\mathrm{f}}\mathbf{X}.\end{array}$

This time invariance of the ensemble transform has been used in the ensemble Kalman smoother (Evensen, 2003) and made it possible to interpret the ensemble transform as a linear combination of the ensemble trajectories in the 4D-EnKF (Hunt et al., 2004).

The above framework for the ADA with the EnKF is based on two important assumptions: linearity and perfect model. While the KF is a linear method, it can be (and indeed is) successfully applied to weakly nonlinear cases, when the Jacobians $\mathbf{M}\left(\mathbf{x}\right)={\mathrm{\nabla }}_{x}\mathcal{M}\left(\mathbf{x}\right)$ and $\mathbf{H}\left(\mathbf{x}\right)={\mathrm{\nabla }}_{x}\mathcal{H}\left(\mathbf{x}\right)$ do not change much through the DA cycle and over the characteristic uncertainty range of the state. The same principle motivates the applicability of the approach (6) to the ADA with the EnKF. This is why we kept the nonlinear notation for the forward operator.2 In strongly nonlinear cases, the method is, generally, no longer applicable, and iterative minimisation algorithms (e.g. Gu and Oliver, 2007; Bocquet and Sakov, 2014) must be used.

The situation in the case with the model error is somewhat similar. Various aspects of additive deterministic (possibly time-correlated) model error and how formulation () should be generalised have been considered in Lorenc (2003), Trémolet (2006) and Carrassi and Vannitsem (2010). However, because (1c) assumes a lossless transfer of information between DA system states in time by the model, it has not been clear whether the concept of the ADA can still be useful in cases with substantial model error. Recently, an iterative method for strongly nonlinear systems with additive stochastic model error called IEnKF-Q has been proposed (Sakov et al., in press) that successfully employs combining propagated ensemble anomalies and model noise ensemble anomalies into a single ensemble of anomalies to obtain an algebraically simple solution for the Gauss–Newton minimisation. The authors suggested that a similar approach can possibly be used to generalise the concept of the ADA with the EnKF in the case of additive model error. This study proposes such a generalisation.

2.

## The minimisation problem

Below we mainly follow the formulation and approach to the solution of the minimisation problem from Sakov et al. (in press), with the aim to put up an analogue of the recipe (6) that would permit a simultaneous assimilation of observations made at different times in the case of additive model error without loss of optimality. Sakov et al. (in press) considered a single DA cycle starting at time ${t}_{0}$ with observations at ${t}_{1}$, while we consider a cycle with an arbitrary number of model steps:

((9a) )
$\begin{array}{cc}& \left\{{\mathbf{x}}_{0}^{\star },\cdots ,{\mathbf{x}}_{k}^{\star }\right\}=arg\underset{\left\{{\mathbf{x}}_{0},\cdots ,{\mathbf{x}}_{k}\right\}}{min}J\left({\mathbf{x}}_{0},\cdots ,{\mathbf{x}}_{k}\right),\hfill \end{array}$
((9b) )
$\begin{array}{cc}\hfill & J\left({\mathbf{x}}_{0},\cdots ,{\mathbf{x}}_{k}\right)={∥{\mathbf{x}}_{0}-{\mathbf{x}}_{0}^{\mathrm{a}}∥}_{{\left({\mathbf{P}}_{0}^{\mathrm{a}}\right)}^{-1}}^{2}+\sum _{i=1}^{k}{∥{\mathbf{y}}_{i}-{\mathcal{H}}_{i}\left({\mathbf{x}}_{i}\right)∥}_{{\mathbf{R}}_{i}^{-1}}^{2}\hfill \\ \hfill & \phantom{\rule{2em}{0ex}}\phantom{\rule{2em}{0ex}}\phantom{\rule{2em}{0ex}}\phantom{\rule{1em}{0ex}}+\sum _{i=1}^{k}{∥{\mathbf{x}}_{i}-{\mathcal{M}}_{i-1\to i}\left({\mathbf{x}}_{i-1}\right)∥}_{{\mathbf{Q}}_{i}^{-1}}^{2}.\hfill \end{array}$
Here ${\mathbf{x}}_{0}^{\mathrm{a}}$ is the initial state (the analysis from the previous cycle), ${\mathbf{P}}_{0}^{\mathrm{a}}$ – the initial state error covariance, ${\mathbf{Q}}_{i}$ – the model error covariance at ${t}_{i}$, and upper star refers to the solution of the minimisation problem.

From a statistical perspective, (9b) relates to the joint probability density function on state and observation vectors since $J\left({\mathbf{x}}_{0},\cdots ,{\mathbf{x}}_{k}\right)$ coincides with $-2lnp\left({\mathbf{x}}_{0},\cdots ,{\mathbf{x}}_{k},{\mathbf{y}}_{1},\cdots ,{\mathbf{y}}_{k}\right)$ up to a constant. This implies that the model noise, as random variable, is assumed Gaussian, unbiased, of covariance matrix ${\mathbf{Q}}_{i}$ at ${t}_{i}$ and uncorrelated in time (see Carrassi and Vannitsem, 2010, for a generalisation).

Let the covariances ${\mathbf{P}}_{0}^{\mathrm{a}}$ and ${\mathbf{Q}}_{i}$ be factorised by the corresponding ensemble anomalies ${\mathbf{A}}_{0}^{\mathrm{a}}$ and ${\mathbf{A}}_{i}^{\mathrm{q}}$:

((10a) )
$\begin{array}{cc}& {\mathbf{A}}_{0}^{\mathrm{a}}{\left({\mathbf{A}}_{0}^{\mathrm{a}}\right)}^{\mathrm{T}}={\mathbf{P}}_{0}^{\mathrm{a}},\phantom{\rule{1em}{0ex}}{\mathbf{A}}_{0}^{\mathrm{a}}\mathbf{1}=\mathbf{0},\hfill \end{array}$
((10b) )
$\begin{array}{cc}& {\mathbf{A}}_{i}^{\mathrm{q}}{\left({\mathbf{A}}_{i}^{\mathrm{q}}\right)}^{\mathrm{T}}={\mathbf{Q}}_{i},\phantom{\rule{1em}{0ex}}{\mathbf{A}}_{i}^{\mathrm{q}}\mathbf{1}=\mathbf{0},\phantom{\rule{1em}{0ex}}i=1,\cdots ,k,\hfill \end{array}$
where $\mathbf{1}$ is a vector of entries 1. In many applications, the representation of model errors through the anomalies ${\mathbf{A}}_{i}^{\mathrm{q}}$ is more natural and convenient than through the ${\mathbf{Q}}_{i}$. A discussion on this specific topic can be found in Section 4 of Sakov et al. (in press). Let us seek the solution of the minimisation problem () in the ensemble space of these anomalies:
((11a) )
$\begin{array}{cc}& {\mathbf{x}}_{0}={\mathbf{x}}_{0}^{\mathrm{a}}+{\mathbf{A}}_{0}^{\mathrm{a}}{\mathbf{w}}_{0},\hfill \end{array}$
((11b) )
$\begin{array}{cc}& {\mathbf{x}}_{i}={\mathcal{M}}_{i-1\to i}\left({\mathbf{x}}_{i-1}\right)+{\mathbf{A}}_{i}^{\mathrm{q}}{\mathbf{w}}_{i},\phantom{\rule{1em}{0ex}}i=1,\cdots ,k.\hfill \end{array}$
The weights ${\mathbf{w}}_{i}$ are similar to the control variable ${\mathbit{\eta }}_{i}$ in the ‘forcing’ formulation of the weak-constraint 4D-Var (see Equation (2) in Trémolet, 2006), although using ${\mathbf{A}}_{i}^{\mathrm{q}}{\mathbf{w}}_{i}$ instead of ${\mathbit{\eta }}_{i}$ has a convenience of diagonalising ${\mathbf{Q}}_{i}$.

The problem () becomes equivalent to

((12a) )
$\begin{array}{cc}& \left\{{\mathbf{w}}_{0}^{\star },\cdots ,{\mathbf{w}}_{k}^{\star }\right\}=arg\underset{\left\{{\mathbf{w}}_{0},\cdots ,{\mathbf{w}}_{k}\right\}}{min}\stackrel{~}{J}\left({\mathbf{w}}_{0},\cdots ,{\mathbf{w}}_{k}\right),\hfill \end{array}$
((12b) )
$\begin{array}{cc}& \stackrel{~}{J}\left({\mathbf{w}}_{0},\cdots ,{\mathbf{w}}_{k}\right)=\sum _{i=0}^{k}{\mathbf{w}}_{i}^{\mathrm{T}}{\mathbf{w}}_{i}+\sum _{i=1}^{k}{‖{\mathbf{y}}_{i}-{\mathcal{H}}_{i}\left({\mathbf{x}}_{i}\right)‖}_{{\mathbf{R}}_{i}^{-1}}^{2}.\hfill \end{array}$
Let us concatenate ${\mathbf{w}}_{0},\dots ,{\mathbf{w}}_{k}$ into $\mathbf{w}$:
((13) )
$\begin{array}{c}\hfill \mathbf{w}\equiv \mathrm{v}ec\left({\mathbf{w}}_{0},\dots ,{\mathbf{w}}_{k}\right);\end{array}$

then (12b) becomes

((14) )
$\begin{array}{c}\hfill \stackrel{~}{J}\left(\mathbf{w}\right)={\mathbf{w}}^{\mathrm{T}}\mathbf{w}+{‖\mathbf{y}-\mathcal{H}\left(\mathbf{x}\right)‖}_{{\mathbf{R}}^{-1}}^{2},\end{array}$

where $\mathbf{y}$ and $\mathbf{R}$ are given by (3) and (4), and

((15) )
$\begin{array}{cc}& \mathbf{x}\equiv \mathrm{v}ec\left({\mathbf{x}}_{0},\cdots ,{\mathbf{x}}_{k}\right),\hfill \end{array}$
((16) )
$\begin{array}{cc}& \mathcal{H}\left(\mathbf{x}\right)\equiv \mathrm{v}ec\left({\left\{{\mathcal{H}}_{i}\left({\mathbf{x}}_{i}\right)\right\}}_{i=1,\cdots ,k}\right).\hfill \end{array}$

The cost function (14) is similar to the formalism developed by Lorenc (2003) or to Equation (11) of Desroziers et al. (2014), but differs in that it is not incremental and can be obtained directly.

In (14), $\mathbf{x}$ is a function of $\mathbf{w}$ because of (). Let us expand $\mathbf{x}$ in power series of $\mathbf{w}$ and retain the linear term only:

((17) )
$\begin{array}{c}\hfill \mathbf{x}={\mathbf{x}}^{\mathrm{f}}+\mathbf{A}\mathbf{w}+{O\left(‖\mathbf{w}‖}^{2}\right),\end{array}$

where3

((18) )
$\begin{array}{c}\hfill {\mathbf{x}}^{\mathrm{f}}\equiv \mathrm{v}ec\left({\left\{{\mathcal{M}}_{0\to i}\left({\mathbf{x}}_{0}^{\mathrm{a}}\right)\right\}}_{i=0,\cdots ,k}\right),\end{array}$

and

((19a) )
$\begin{array}{cc}& \mathbf{A}\equiv \mathrm{v}ec\left({\stackrel{~}{\mathbf{A}}}_{0},\cdots ,{\stackrel{~}{\mathbf{A}}}_{k}\right),\hfill \end{array}$
((19b) )
$\begin{array}{cc}& {\stackrel{~}{\mathbf{A}}}_{i}\equiv \left[{\mathbf{A}}_{i},\mathbf{0}\right],\phantom{\rule{1em}{0ex}}i=0,\cdots ,k,\hfill \end{array}$
((19c) )
$\begin{array}{cc}& {\mathbf{A}}_{i}\equiv \left\{\begin{array}{cc}{\mathbf{A}}_{0}^{\mathrm{a}},\hfill & i=0\hfill \\ \left[{\mathbf{M}}_{i-1\to i}{\mathbf{A}}_{i-1},{\mathbf{A}}_{i}^{\mathrm{q}}\right],\hfill & \phantom{\rule{4pt}{0ex}}i=1,\cdots ,k.\hfill \end{array}\right\\hfill \end{array}$
Here the ensemble anomalies ${\stackrel{~}{\mathbf{A}}}_{i}$ are completed by zeros so that they all have the same number of columns as ${\mathbf{A}}_{k}$. Note that the augmentation of the propagated state error anomalies and model error anomalies has also been used in the reduced rank square root filter by Verlaan and Heemink (1997) in their Equation (28). Substituting (17) into (14) yields a form convenient for Gauss–Newton minimisation:
((20) )
$\begin{array}{cc}\hfill \stackrel{~}{J}\left(\mathbf{w}\right)=& {\mathbf{w}}^{\mathrm{T}}\mathbf{w}\hfill \\ \hfill & +{∥\mathbf{y}-\mathcal{H}\left({\mathbf{x}}^{\mathrm{f}}\right)-\mathbf{Y}\mathbf{w}+{O\left(‖\mathbf{w}‖}^{2}\right)∥}_{{\mathbf{R}}^{-1}}^{2},\hfill \end{array}$

where

((21) )
$\begin{array}{c}\hfill \mathbf{Y}\equiv \mathrm{v}ec\left({\left\{{\mathbf{H}}_{i}{\stackrel{~}{\mathbf{A}}}_{i}\right\}}_{i=1,\cdots ,k}\right).\end{array}$

In the nonlinear case, (20) can be minimised by Gauss–Newton iterations. This involves using the approximate Hessian obtained by neglecting nonlinear terms $O\left(‖\mathbf{w}{‖}^{2}\right)$. In the linear case, the cost function becomes quadratic, and the solution is given by:

((22a) )
$\begin{array}{cc}& {\mathbf{x}}^{\star }={\mathbf{x}}^{\mathrm{f}}+\mathbf{A}{\mathbf{w}}^{\star },\hfill \end{array}$
((22b) )
$\begin{array}{cc}& {\mathbf{A}}^{\phantom{\rule{-0.166667em}{0ex}}\star }=\mathbf{A}\mathbf{T},\phantom{\rule{1em}{0ex}}\mathbf{T}={\mathbf{D}}^{-1/2}\mathbf{U},\hfill \end{array}$
((22c) )
$\begin{array}{cc}& {\mathbf{w}}^{\star }={\mathbf{D}}^{-1}{\mathbf{Y}}^{\mathrm{T}}{\mathbf{R}}^{-1}\left[\mathbf{y}-\mathcal{H}\left({\mathbf{x}}^{\mathrm{f}}\right)\right],\hfill \end{array}$
((22d) )
$\begin{array}{cc}& \mathbf{D}\equiv \mathbf{I}+{\mathbf{Y}}^{\mathrm{T}}{\mathbf{R}}^{-1}\mathbf{Y},\hfill \end{array}$
where ${\left(·\right)}^{1/2}$ denotes the unique positive (semi)definite square root of a positive (semi)definite matrix, and $\mathbf{U}$ is an arbitrary mean preserving orthonormal matrix4: $\mathbf{U}{\mathbf{U}}^{\mathrm{T}}=\mathbf{I},\phantom{\rule{4pt}{0ex}}\mathbf{U}\mathbf{1}=\mathbf{1}$.

These equations provide the update of the ensemble at all time levels. Similarly to the perfect model case, the same coefficients ${\mathbf{w}}^{\star }$ and the same ensemble transform $\mathbf{T}$ are used over the whole assimilation window. However, due to the augmentation of the propagated ensemble anomalies with model error anomalies in (19c), the updated trajectories of the initial ensemble ${\mathbf{E}}_{0}^{\star }$ are no longer continuous in time.

3.

## Asynchronous DA

Based on the solution of the minimisation problem in the previous section, we can now formulate a framework (rules) for the ADA with the EnKF in presence of additive model error. According to equations (), to update the system state, one needs to calculate the forecast observations $\mathcal{H}\left({\mathbf{x}}^{\mathrm{f}}\right)$ and forecast ensemble observation anomalies $\mathbf{Y}$; they in turn allow one to calculate the Hessian $2\mathbf{D}$ (approximate Hessian in the nonlinear case), coefficients ${\mathbf{w}}^{\star }$ and transform $\mathbf{T}$. Therefore, the essence of the ADA with the EnKF can be formulated as follows:

((23) )
$\begin{array}{cc}& \mathcal{H}\left({\mathbf{x}}^{\mathrm{f}}\right)=\mathrm{v}ec\left({\left\{{\mathcal{H}}_{i}\left({\mathbf{x}}_{i}^{\mathrm{f}}\right)\right\}}_{i=1,\cdots ,k}\right),\hfill \end{array}$
((21) )
$\begin{array}{c}\hfill \mathbf{Y}=\mathrm{v}ec\left({\left\{{\mathbf{H}}_{i}{\stackrel{~}{\mathbf{A}}}_{i}\right\}}_{i=1,\cdots ,k}\right),\end{array}$

where ${\stackrel{~}{\mathbf{A}}}_{i}$ is defined by (19b) and (19c).

Subject to the new evolution equation for ensemble anomalies (19c), the rules (21) and (23) can be interpreted in the same way as in the perfect model case: calculate ensemble observations using the forecast ensemble at observation time.

Figure 1.

The sequence of the complete analysis cycle in both the linear (upper) and the nonlinear (lower) case. From left to right, the nodes indicate the equation to use as referred to in the paper and the yielded quantities. The second, lower chain in the nonlinear case is needed if the observation operator is nonlinear. Arrows mean that the computation of the target node content requires the content of the origin node.

The sequence of the complete linear analysis cycle is synthesised in Fig. 1 (upper panel).

4.

## Discussion

4.1.

### Increasing ensemble size

The update equations () have the same form as in the perfect model case; however, the propagation of the ensemble anomalies in (19c) is substantially different. According to (19c), in presence of model error, the forecast ensemble of anomalies is not only transformed with the tangent linear model, but is also augmented with the corresponding ensemble of model error anomalies. This procedure implicitly implements the forecast covariance equation of the KF

((24) )
$\begin{array}{c}\hfill {\mathbf{P}}_{i}={\mathbf{M}}_{i}{\mathbf{P}}_{i-1}{\mathbf{M}}_{i}^{\mathrm{T}}+{\mathbf{Q}}_{i},\end{array}$

so that ${\mathbf{A}}_{i}{\mathbf{A}}_{i}^{\mathrm{T}}={\mathbf{P}}_{i}$, and manifests an effective increase of the ensemble size at every time step:

((25) )
$\begin{array}{c}\hfill {m}_{i}={m}_{i-1}+{m}_{i}^{\mathrm{q}},\phantom{\rule{1em}{0ex}}i=1,\cdots ,k,\end{array}$

where ${m}_{i}$ is the size of ${\mathbf{A}}_{i}$; ${m}_{i}^{\mathrm{q}}$ is the size of ${\mathbf{A}}_{i}^{\mathrm{q}}$ and ${m}_{0}\equiv m$ is the size of ${\mathbf{A}}_{0}^{\mathrm{a}}$, i.e. the initial ensemble size. The practical choice for the ${m}_{i}^{\mathrm{q}}$ has been discussed in Section 4 of Sakov et al. (in press).

Note that this increase of the ensemble size may not be strictly necessary for implementing (24). For example, if the column space of ${\mathbf{A}}_{i}^{\mathrm{q}}$ is a subspace of the column space of ${\mathbf{M}}_{i}{\mathbf{A}}_{i-1}$, then adding model error covariance can be done without changing the ensemble size. Yet, the augmentation of the state error and model error ensemble anomalies in (19c) is an essential part of the described minimisation procedure, regardless of whether it is possible to factorise the forecast state error covariance with a smaller ensemble or not. It enables to concatenate the vectors of ensemble weights ${\mathbf{w}}_{i}$ into a single vector $\mathbf{w}$ in (13) so that each component ${\mathbf{w}}_{i}$ of this vector acts only on the columns of the augmented ensemble associated with the model error added at time ${t}_{i}$ (except that ${\mathbf{w}}_{0}$ is indeed associated with the initial state error). In this way, the minimisation uses the same vector of coefficients $\mathbf{w}$ over the whole observation window, similarly to the perfect model case.

4.2.

### Nonlinearity and approximation of $\mathbf{M}$ and $\mathbf{H}$

Although the EnKF is a linear method targeting weakly nonlinear systems, it is still desirable to optimise it for strongly nonlinear situations, for two reasons. Firstly, it is quite common for a weakly nonlinear system to become strongly nonlinear during relatively short and infrequent transition periods. Secondly, there are a variety of EnKF-based iterative methods for strongly nonlinear systems that would benefit from such an optimisation.

The EnKF is a derivative-less method: it approximates the Jacobians ${\mathbf{M}}_{i}$ and ${\mathbf{H}}_{i}$ based on application of the forward function ${\mathcal{H}}_{i}\circ {\mathcal{M}}_{i}$ to the ensemble of model states ${\mathbf{E}}_{i-1}$. In the linear case, the results of these approximations do not depend on the ensemble spread; in the nonlinear case to achieve correct sampling of nonlinear effects, the magnitude of the ensemble anomalies should be of order of the standard deviation of the state error. For a system with additive model error, this means that the forecast error anomalies should have magnitude similar to that of samples from $\mathcal{N}\left(\mathbf{0},{\mathbf{P}}_{i}\right)$, where the forecast covariance ${\mathbf{P}}_{i}$ is given by (24).

Such sampling can be associated with the statistical estimator of the covariance,

((26) )
$\begin{array}{c}\hfill \mathbf{P}=\frac{1}{{m}_{A}-1}\mathbf{A}{\mathbf{A}}^{\mathrm{T}},\end{array}$

where $\mathbf{A}$ is an ensemble of ${m}_{A}$ anomalies. Using this factorisation yields the characteristic magnitude of the anomalies of the order of its standard deviation, regardless of the ensemble size.

The minimisation procedure in Section 2 uses a different factorisation,

((27) )
$\begin{array}{c}\hfill \mathbf{P}=\mathbf{A}{\mathbf{A}}^{\mathrm{T}},\end{array}$

which simplifies the algebraic side. Using this factorisation reduces the characteristic magnitude of anomalies with increasing ensemble size. To obtain anomalies of physically ‘right’ magnitude, one needs to scale them up with a factor of $\sqrt{{m}_{A}-1}$, but only if all samples in the ensemble are drawn from the same pool. If, for example, we introduce a new ensemble

$\begin{array}{c}\hfill \mathbf{B}=\left[\mathbf{A},\mathbf{0}\right],\end{array}$

then we should still scale it with $\sqrt{{m}_{A}-1}$ rather than with $\sqrt{{m}_{B}-1}$, where ${m}_{B}$ is the size of $\mathbf{B}$.

Because the magnitudes of the anomalies ${\mathbf{M}}_{i}{\mathbf{A}}_{i-1}$ and ${\mathbf{A}}_{i}^{\mathrm{q}}$ in (19c) cannot be assumed to be similar, it seems technically challenging to put up a generic re-scaling procedure for switching between factorisations (26) and (27) in the course of minimisation. This leaves us the following two main options.

• (1)   Using factorisation (27) over the whole DA cycle, which means no rescaling before or after application of nonlinear functions $\mathcal{M}$ and $\mathcal{H}$. This approach underestimates the ensemble spread in approximations and shifts the minimisation towards the Newton method from the more inherent for the EnKF (and generally more robust) secant method.
• (2)   Scaling components of ${\mathbf{A}}_{i}$ appended at different time steps according to the size of these components before application of $\mathcal{M}$ and $\mathcal{H}$, and scaling back after that. This approach still does not make all anomalies of ${\mathbf{A}}_{i}$ being sampled from $\mathcal{N}\left(\mathbf{0},{\mathbf{P}}_{i}\right)$, but can be a good practical compromise. It is elaborated below.
First, we define the forecast ensembles ${\mathbf{E}}_{i}$ as
((28) )
$\begin{array}{c}\hfill {\mathbf{E}}_{i}=\left\{\begin{array}{cc}{\mathbf{E}}_{0}^{\mathrm{a}},\hfill & i=0,\hfill \\ \left[{\mathcal{M}}_{i-1\to i}\left({\mathbf{E}}_{i-1}\right),\right\hfill \\ \phantom{\rule{1em}{0ex}}{\mathbf{x}}_{i}^{\mathrm{f}}{\mathbf{1}}^{\mathrm{T}}+\sqrt{{m}_{i}^{\mathrm{q}}-1}{\mathbf{A}}_{i}^{\mathrm{q}}],\hfill & i=1,\cdots ,k,\hfill \end{array}\right\\end{array}$

where ${\mathbf{E}}_{0}^{\mathrm{a}}$ is the initial ensemble and

((29) )
$\begin{array}{c}\hfill {\mathbf{x}}_{i}^{\mathrm{f}}\equiv {\mathcal{M}}_{i-1\to i}\left({\mathbf{E}}_{i-1}\right)\phantom{\rule{0.166667em}{0ex}}\mathbf{1}/{m}_{i-1}.\end{array}$

We then define the new anomalies ${\stackrel{~}{\mathbf{A}}}_{i}$ that yield $\mathbf{A}$ via (). They are composed of sub-blocks ${\mathbf{A}}_{i}^{j}$, each of them corresponding to the ensemble appended at time ${t}_{j}$:

((30) )
$\begin{array}{c}\hfill {\stackrel{~}{\mathbf{A}}}_{i}\equiv \left[{\mathbf{A}}_{i}^{0},\dots ,{\mathbf{A}}_{i}^{k}\right],\end{array}$

where

((31) )
$\begin{array}{c}\hfill {\mathbf{A}}_{i}^{j}\equiv \left\{\begin{array}{cc}\left({\mathbf{E}}_{i}^{0}-{\mathbf{x}}_{i}^{\mathrm{f}}{\mathbf{1}}^{\mathrm{T}}\right)/\sqrt{{m}_{0}-1},\hfill & j=0\hfill \\ \left({\mathbf{E}}_{i}^{j}-{\mathbf{x}}_{i}^{\mathrm{f}}{\mathbf{1}}^{\mathrm{T}}\right)/\sqrt{{m}_{j}^{\mathrm{q}}-1},\hfill & 1\le j\le i\hfill \\ \mathbf{0},\hfill & i

The submatrix ${\mathbf{E}}_{i}^{0}$ corresponds to the first ${m}_{0}$ columns of ${\mathbf{E}}_{i}$; ${\mathbf{E}}_{i}^{j}$, for $1\le j\le k$, corresponds to the submatrix from column ${m}_{j-1}+1$ to column ${m}_{j}$ of ${\mathbf{E}}_{i}$.

The forecast observation anomalies (21) are then calculated as

((32) )
$\begin{array}{c}\hfill \mathbf{Y}=\mathrm{v}ec\left({\mathbf{Y}}_{1},\dots ,{\mathbf{Y}}_{k}\right),\end{array}$

where

((33) )
$\begin{array}{c}\hfill {\mathbf{Y}}_{i}\equiv \left[{\mathbf{Y}}_{i}^{0},\dots ,{\mathbf{Y}}_{i}^{k}\right],\end{array}$

and

((34) )
$\begin{array}{c}\hfill {\mathbf{Y}}_{i}^{j}\equiv \left\{\begin{array}{cc}\left[{\mathcal{H}}_{i}\left({\mathbf{E}}_{i}^{0}\right)-{\mathcal{H}}_{i}\left({\mathbf{x}}_{i}^{\mathrm{f}}\right){\mathbf{1}}^{\mathrm{T}}\right]/\sqrt{{m}_{0}-1},\hfill & j=0\hfill \\ \left[{\mathcal{H}}_{i}\left({\mathbf{E}}_{i}^{j}\right)-{\mathcal{H}}_{i}\left({\mathbf{x}}_{i}^{\mathrm{f}}\right){\mathbf{1}}^{\mathrm{T}}\right]/\sqrt{{m}_{j}^{\mathrm{q}}-1},\hfill & 1\le j\le i\hfill \\ \mathbf{0},\hfill & i

To summarise, the sequence of the complete nonlinear analysis cycle is synthesised in Fig. 1 (lower panel).

4.3.

### Observations correlated in time

The formulation of the minimisation problem () assumes that observations at different times are noncorrelated, so that the merged observation error covariance matrix $\mathbf{R}$ in (4) is block-diagonal; however, we can see no problem in removing this restriction, that is assuming that the observations within a DA cycle can be time correlated.

4.4.

### Ensemble resizing

One practical implication of the proposed framework for ADA in presence of additive model error is the need for ensemble resizing at the end of a cycle to cut back the increased ensemble size due to (19c). If the augmented ensemble ${\mathbf{A}}_{k}$ has the same rank as ${\mathbf{A}}_{0}$, then the ensemble size reduction from ${m}_{k}$ to ${m}_{0}$ can be performed losslessly (that is, without changing the analysed state error covariance). In large-scale geophysical EnKF systems, the ensemble size is always much smaller than the model size, and realistic implementations of model noise can be expected, generally, to project onto the left null-space of the ensemble. In this case, the resizing will involve removing the principal components with the smallest variance (Verlaan and Heemink, 1997; Raanes et al., 2015; Sakov et al., in press).

4.5.

### Observation time binning

The minimisation problem () associates each time step with observations $\left({\mathbf{y}}_{i},{\mathbf{R}}_{i}\right)$ and model error ${\mathbf{Q}}_{i}$. In practice the observations can be distributed in time continuously. They will need then to be binned by some time intervals, and model error estimates will need to be prescribed for each of these intervals. Because of the overhead associated with augmenting the ensembles of model error anomalies into the ensemble anomalies by (19c) one may have to use a rather coarse time binning for the model error. It is then possible to assume that the forward function ${\mathcal{H}}_{i}\circ {\mathcal{M}}_{i}$ in (34) propagates ensemble to the time of each particular observation, as the overhead associated with the time binning of ensemble observations can be much smaller. This is the same observation time binning strategy as the one proposed by Trémolet (2006) with his Equation (9).

4.6.

### ADA: further extending the KF

The EnKF yields a large number of advantages over the KF: scalability, derivative-less formulation, ease of implementing the covariance propagation equation, robustness, and so on. Most of these advantages can be viewed as concerning the technical side of the Kalman filtering. By contrast, the ADA aims at extending the very framework of the sequential DA. While in the perfect model case it is possible to modify the square root formulations of the KF to permit the ADA similarly to that in the EnKF, the technique proposed in this note involves using nonsquare covariance factorisations of varying size within a DA cycle. We see it as yet another step in the evolution of the KF-based methods towards greater universality, well beyond the original scope of the KF.

5.

## Summary

Equations (23), (21) and (19c) provide a framework for the ADA with the EnKF in presence of additive model error. Its derivation in the course of solving the minimisation problem warrants its optimality in the linear case.

The most essential new component of the proposed framework is Equation (19c) for propagation of ensemble anomalies. This equation manifests an effective increase of the ensemble size at each time step by the size of the ensemble of model error anomalies. We argue that this increase is necessary for the optimal ADA in presence of additive model error regardless of whether the model error anomalies project onto the column space of the ensemble or not; in practice, it deems it necessary to resize the ensemble at the end of the cycle.

## Disclosure statement

{ label (or @symbol) needed for fn[@id='FN0005'] }No potential conflict of interest was reported by the authors.

1 The upper label ‘a’ refers to analysis as, later on, these initial state and error covariance matrix will coincide with the outcome of a previous analysis.

2 Another reason is that in the linear case functions $\mathcal{M}$ and $\mathcal{H}$ can be affine: $\mathcal{M}\left(\mathbf{x}\right)=\mathbf{m}+\mathbf{M}\mathbf{x},\phantom{\rule{4pt}{0ex}}\mathcal{H}\left(\mathbf{x}\right)=\mathbf{h}+\mathbf{H}\mathbf{x}$.

3 In the EnKF instead of explicit propagation of state, it is common to use the average of the propagated ensemble: $\mathcal{M}\left(\mathbf{x}\right)←\mathcal{M}\left(\mathbf{E}\right)\mathbf{1}/m$, where m is the size of ensemble $\mathbf{E}$.

4 The presence of $\mathbf{U}$ here symbolically accounts for the variety of the deterministic EnKF schemes.

## References

1. Bocquet, M. and Sakov, P.2014. An iterative ensemble Kalman smoother. Q. J. R. Meteorol. Soc.140, 1521–1535.

2. Carrassi, A. and Vannitsem, S.2010. Accounting for model error in variational data assimilation: a deterministic formulation. Mon. Wea. Rev.138, 3369–3386.

3. Desroziers, G., Camino, J.-T. and Berre, L.2014. 4DEnVar: link with 4D state formulation of variational assimilation and different possible implementations. Q. J. R. Meteorol. Soc.140, 2097–2110.

4. Evensen, G.1994. Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte-Carlo methods to forecast error statistics. J. Geophys. Res.99, 10143–10162.

5. Evensen, G.2003. The Ensemble Kalman Filter: theoretical formulation and practical implementation. Ocean Dyn.53, 343–367.

6. Gu, Y. and Oliver, D. S.2007. An iterative ensemble Kalman filter for multiphase fluid flow data assimilation. SPE J.12, 438–446.

7. Hunt, B. R., Kalnay, E., Kostelich, E. J., Ott, E., Patil, D. J. and co-authors2004. Four-dimensional ensemble Kalman filtering. Tellus56A, 273–277.

8. Hunt, B. R., Kostelich, E. J. and Szunyogh, I.2007. Efficient data assimilation for spatiotemporal chaos: a local ensemble transform Kalman filter. Physica D230, 112–126.

9. Lorenc, A. C.2003. Modelling of error covariances by 4D-Var data assimilation. Q. J. R. Meteor. Soc.129, 3167–3182.

10. Raanes, P. N., Carrassi, A. and Bertino, L.2015. Extending the square root method to account for additive forecast noise in ensemble methods. Mon. Wea. Rev.143, 3857–38730.

11. Sakov, P., Evensen, G. and Bertino, L.2010. Asynchronous data assimilation with the EnKF. Tellus A62, 24–29.

12. Sakov, P., Haussaire, J.-M. and Bocquet, M.In press. An iterative ensemble Kalman filter in presence of additive model error. Q. J. R. Meteorol. Soc.. DOI: https://doi.org/10.1002/qj.3213.

13. Trémolet, Y.2006. Accounting for an imperfect model in 4D-Var. Q. J. R. Meteorol. Soc.132, 2483–2504.

14. Verlaan, M. and Heemink, A. W.1997. Tidal flow forecasting using reduced rank square root filters. Stoch. Hydrol. Hydraul.11, 349–368.

comments powered by Disqus