## Context

The ensemble Kalman filter (EnKF) has been shown to be a successful data assimilation technique for filtering and forecasting in complex chaotic fluids (see Evensen, **2009**, and references therein). Thus, it has been used as a powerful tool for deterministic as well as ensemble forecast of geofluids (Houtekamer et al., **2005**; Sakov et al., **2012**). It is based on an unavoidably limited ensemble size due to the numerical cost of realistic geofluid models. As a trade-off, the noisy covariance estimates obtained from this ensemble must be regularised, primarily using the technique known as *localisation*. Localisation was shown to be necessary with a chaotic model whenever the ensemble size is smaller than the number of unstable and neutral modes of the dynamics (Bocquet and Carrassi, **2017**) and possibly still beneficial for larger ensemble size (Anderson, **2012**).

Localisation assumes that correlations between spatially distant parts of the physical system decrease at a fast rate with the physical distance, e.g. exponentially. As a consequence, one can make the assimilation of observations local or, alternatively, artificially taper distant spurious correlations that emerge from sampling errors (Hamill et al., **2001**; Houtekamer and Mitchell, **2001**). As a result, two broad types of localisation techniques have been considered so far: *domain localisation* and *covariance localisation*.

Domain localisation consists of a collection of local updates, e.g. centred on the grid points using nearby observations (Houtekamer and Mitchell, **2001**; Ott et al., **2004**). These updates can be carried out in parallel since they are assumed independent. The full updated ensemble is obtained by assembling these local updates. Moreover, the transition between the updates of two adjacent domains can be made smoother by tapering the precision of the attached observations, which leads to refined domain localisation approaches (Hunt et al., **2007**; Nerger and Gregg, **2007**). This can reduce the *imbalance* generated by assembling this collection of updates to form the global updated ensemble (Kepert, **2009**; Greybush et al., **2011**). Imbalance is defined in this study as a measure of the distance between the updated ensemble members and the model’s attractor, a discrepancy one would like to be as small as possible.

The second type of localisation is covariance localisation which is enforced through a direct tapering of all sample covariances. This is usually implemented using a Schur product of the sample covariance matrix with a correlation matrix with fast decreasing entries with the distance. The Schur product output is mathematically guaranteed to be a covariance matrix and, with a proper localisation correlation matrix, is likely to make the regularised covariance matrix full-rank.

Even though based on the same diagnostic, the two types of localisation are distinct in their philosophy, and in their algorithmic and numerical implementation. Domain localisation does not allow assimilating non-local observations such as radiances without ad hoc approximations, but the scheme is embarrassingly parallel by nature. Covariance localisation is mathematically grounded in the tapering of the background covariance only and could hence be seen as a well understood scheme, but its numerical implementation, relying on a single global analysis, is much less simple, especially for deterministic EnKFs. In practice, the two schemes have been shown to coincide in the limit where the analysis is driven by the background statistics, i.e. weak assimilation (Sakov and Bertino, **2011**). They could differ otherwise.

Note that a third route for localisation is through the statistics technique known as *shrinkage*. It consists in adding a possibly adaptively tuned full-rank covariance matrix to the background error covariance matrix (see Hannart and Naveau, **2014**, and references therein). The approach was successfully tested by Bocquet et al. (**2015**) in the case of a hybrid EnKF.

From a theoretical standpoint, the localisation schemes seem ad hoc in spite of their remarkable practical efficiency. There could be room for improvements based on theoretical considerations. For instance, localisation can be made multiscale (Buehner and Shlyaeva, **2015**) or adaptive (Anderson and Lei, **2013**; Ménétrier et al., **2015**; De La Chevrotière and Harlim, **2017**). However, these two subjects are not topics of this paper.

In this paper, we would like to revisit the perturbation update step of the EnKF when relying on covariance localisation. We especially focus on the local ensemble square root Kalman filter (LEnSRF). Traditional EnKF schemes offer a *consistent* view on the perturbations which are generated in the analysis and propagated in the forecast. By consistent, it is meant here that the sample statistics (mean and covariances) of the analysed and forecast ensembles are supposed to match those of the actual analysis and forecast distributions. This consistency in the EnKFs is often approximate as evidenced by the need for inflation. Our goal is to further improve on this consistency and offer a more coherent view on the perturbations in the EnKF.

In Section 2, we recall the principle of covariance localisation, explain and shed some new light on how the perturbations are updated. In Section 3, we discuss the consistency of the perturbation update, and we propose a new approach for this update. In addition, we discuss the numerical cost of this approach. In Section 4, we present numerical results on a simple covariance model as well as on two low-order chaotic models that show potential benefits of the new scheme. Conclusions are given in Section 5.

## Motivation

### Principle of covariance localisation

In this study, the main focus is covariance localisation within deterministic EnKFs, and in particular the LEnSRF, defined as the ensemble square root Kalman filter with covariance localisation. Nonetheless, some of the results or remarks are likely to be valid for other variants of the EnKF.

The ensemble is denoted by the matrix $\mathbf{E}$ of size ${N}_{\mathrm{x}}\times {N}_{\mathrm{e}},$ whose columns are the ensemble members ${\left\{{\mathbf{x}}_{i}\right\}}_{i=1,\dots ,{N}_{\mathrm{e}}},$ which are state vectors of size ${N}_{\mathrm{x}}.$ The mean of the ensemble is

*sample*or

*empirical*covariance matrix based on ensemble $\mathbf{E}$ is

To fix this, covariance localisation uses a localisation (i.e. correlation) matrix $\mathit{\rho}$ of size ${N}_{\mathrm{x}}\times {N}_{\mathrm{x}}$ and regularises the background error sample covariance matrix via a Schur product

**2012**). In practice $\mathbf{B}$ is always full-rank (and hence positive definite).

### Mean update with regularised covariances

The mean analysis in the EnKF is then typically carried out using the Kalman gain matrix

Note that an alternative way to implement the mean update is to use the *α* control variable trick, which is meant to be used in an hybrid or EnVar context (Lorenc, **2003**; Buehner, **2005**; Wang et al., **2007**), but can also be used with the LEnSRF (see sections 6.7.2.3 and 7.1.3 of Asch et al., **2016**). Nonetheless, to our knowledge, this does not simply generalise to perturbation update. Our focus in this paper is on the perturbation update, which often discriminates variants of the EnKF. This is discussed in the following sections.

### Perturbation update of deterministic EnKFs

With the local stochastic EnKF (Houtekamer and Mitchell, **2001**), the perturbation update is exclusively based on the computation of the gain Eq. (6), which is applied to each member of the ensemble and the associated perturbed observations.

The perturbation update with a local deterministic EnKF is not as straightforward since localisation must also be enforced in the square root update scheme besides the mean update. However, there are deterministic EnKFs where this operation is actually simple. In the DEnKF (Sakov and Oke, **2008**a), which stands for deterministic EnKF but is actually one member of the family, the deterministic update is an approximation of the square-root update, which is based on the gain Eq. (6) only, similarly to the stochastic EnKF. In the local serial square root Kalman filter (serial LEnSRF), the tapering of the covariances is applied entry-wise using entries of ${\mathit{\rho}}_{\text{xy}}.$ The square-root correction to the gain needed for the perturbation update, for the global as well as for the serial LEnSRF, is just a scalar and can easily be computed (Whitaker and Hamill, **2002**). Serial EnKFs, however, come with their own issues, and it is also desirable to have a competitive perturbation update for the EnSRF in matrix form. Both the local DEnKF and the serial LEnSRF can be seen as approximate implementations of the LEnSRF. Note that the local ensemble transform Kalman filter (LETKF) of Hunt et al. (**2007**) achieves the update in a more straightforward manner, but it does not rely on background error covariance matrix localisation and it uses local domains instead. Let us now recall how the perturbation update is usually enforced in the global and then local EnKF.

#### Global deterministic EnKFs

In the absence of localisation, the perturbation update of a deterministic EnKF is rigorously implemented by a transformation on the right of the prior perturbation matrix (Bishop et al., **2001**; Hunt et al., **2007**):

**2008**; Sakov and Oke,

**2008**b). The updated perturbation matrix ${\mathbf{X}}_{\mathrm{a}}$ is of size ${N}_{\mathrm{x}}\times {N}_{\mathrm{e}}.$

The $\frac{1}{2}$ exponent in Eq. (7) denotes the square root of any diagonalisable matrix with non-negative eigenvalues that we choose to define as follows. If $\mathbf{M}=\mathbf{GD}{\mathbf{G}}^{-1},$ where $\mathbf{G}$ is an invertible matrix and $\mathbf{D}$ is the diagonal matrix containing the non-negative eigenvalues of $\mathbf{M},$ then ${\mathbf{M}}^{\frac{1}{2}}=\mathbf{G}{\mathbf{D}}^{\frac{1}{2}}{\mathbf{G}}^{-1},$ where ${\mathbf{D}}^{\frac{1}{2}}$ is the diagonal matrix containing the square root of the eigenvalues of $\mathbf{M}.$ Other choices would be possible.^{1}

Equation (7) is algebraically equivalent to the left transform:

**2016**). Note that the matrix ${\mathbf{I}}_{\mathrm{x}}+{\mathbf{P}}^{\mathrm{e}}{\mathbf{H}}^{\top}{\mathbf{R}}^{-1}\mathbf{H}$ is not necessarily symmetric. However, it is diagonalisable with non-negative eigenvalues. To see this, assume for the sake of simplicity that $\mathbf{B}$ is positive definite. Then $\mathbf{B}{\mathbf{H}}^{\top}{\mathbf{R}}^{-1}\mathbf{H}$ is similar (in the matrix sense) to ${\mathbf{B}}^{-\frac{1}{2}}\mathbf{B}{\mathbf{H}}^{\top}{\mathbf{R}}^{-1}\mathbf{H}{\mathbf{B}}^{\frac{1}{2}}={\mathbf{B}}^{\frac{1}{2}}{\mathbf{H}}^{\top}{\mathbf{R}}^{-1}\mathbf{H}{\mathbf{B}}^{\frac{1}{2}}$ which is obviously symmetric positive semi-definite. Hence, $\mathbf{B}{\mathbf{H}}^{\top}{\mathbf{R}}^{-1}\mathbf{H}$ is diagonalisable with non-negative eigenvalues and ${\mathbf{I}}_{\mathrm{x}}+\mathbf{B}{\mathbf{H}}^{\top}{\mathbf{R}}^{-1}\mathbf{H}$ is diagonalisable with positive eigenvalues. The generalisation to positive semi-definite matrices is given in Corollary 7.6.2 of Horn and Johnson (

**2012**).

Equation (8), where ${\mathbf{T}}_{\mathrm{x}}$ is of size ${N}_{\mathrm{x}}\times {N}_{\mathrm{x}},$ is the update form which, in this paper, defines the EnSRF. When observations are assimilated one at a time, the scheme is called serial EnSRF. The EnSRF is algebraically equivalent and shares the left transform update with the adjustment EnKF (Anderson, **2001**).

From now on, we shall omit the rotation matrices $\mathbf{U}$ in Eqs. (7,8) for the sake of clarity. Nonetheless, it should be kept in mind that these degrees of freedom could be accounted for.

#### Local EnSRF

The right-transform ${\mathbf{T}}_{\mathrm{e}}$ acts in ensemble subspace. As a result, there is no way to enforce covariance localisation (defined in state space) using this approach. By contrast, the left-transform ${\mathbf{T}}_{\mathrm{x}}$ acts on state space and can thus support covariance localisation.

An approximate update formula extrapolates Eq. (8) to the local case using $\mathbf{B}=\mathit{\rho}\xb0{\mathbf{P}}^{\mathrm{e}}$ in place of ${\mathbf{P}}^{\mathrm{e}}=\mathbf{X}{\mathbf{X}}^{\top}$ (Sakov and Bertino, **2011**):

Similarly to Eq. (8), note that ${\mathbf{I}}_{\mathrm{x}}+\mathbf{B}{\mathbf{H}}^{\top}{\mathbf{R}}^{-1}\mathbf{H}$ is not necessarily symmetric. But it is diagonalisable with non-negative eigenvalues and its square root is well-defined as per the above definition of the matrix square root. Note that, contrary to domain localisation (e.g. the LETKF), Eq. (9) is applied globally and only once per assimilation cycle. This update defines the LEnSRF.

### Mode expansion of the perturbation left update

It is numerically challenging to apply Eq. (9) to high-dimensional systems since it requires the inverse square root of a hardly storable covariance matrix defined in state space. Part of a solution consists in the mode (i.e. empirical orthogonal function, EOF) expansion of $\mathit{\rho}\xb0{\mathbf{P}}^{\mathrm{e}}$ using a preliminary mode expansion of the climatological $\mathit{\rho}.$ This *modulation* was proposed by Buehner (**2005**) and later applied to localisation in the EnKF by Bishop and Hodyss (**2009**); Brankart et al. (**2011**). It is not difficult to check that the resulting modes are those on which the *α* control variable is based (Bishop et al., **2011**). The interest of a direct mode expansion of $\mathit{\rho}\xb0\left(\mathbf{X}{\mathbf{X}}^{\top}\right),$ in place of the modulation, and its potential numerical advantage is investigated in Farchi and Bocquet (**2019**).

Independently from how it was obtained, this mode expansion can be written as $\mathbf{B}\approx {\mathbf{X}}_{\mathrm{r}}{\mathbf{X}}_{\mathrm{r}}^{\top},$ where ${\mathbf{X}}_{\mathrm{r}}$ is of size ${N}_{\mathrm{x}}\times {N}_{\mathrm{r}}.$${N}_{\mathrm{r}}$ should be large enough to capture the spatial variability of $\mathbf{B}$ and small enough to be computationally tractable and storable, typically ${N}_{\mathrm{e}}\ll {N}_{\mathrm{r}}\ll {N}_{\mathrm{x}}.$ Considering chaotic low-order models, Bocquet (**2016**) has argued that the number of modes ${N}_{\mathrm{r}}$ should typically be greater than the dimension of the unstable and neutral subspace of the model dynamics.

With such a mode expansion, the updated perturbation matrix reads:

**2016**) has shown that this update is algebraically equivalent to a formula where computations are mostly done in the ensemble ($\mathbf{X}$) or in the mode (${\mathbf{X}}_{\mathrm{r}}$) subspaces:

**2016**). For the sake of completeness and because we will use it again, we propose an alternate but rigorous proof in Appendix A of the present paper.

This update was later rediscovered in Bishop et al. (**2017**) and the principle behind it named *Gain Form of the Ensemble Transform Kalman Filter*. It is not difficult to show that their formula Eq. (25) is actually mathematically equivalent to Eq. (25) of Bocquet (**2016**). However, their formula is prone to numerical cancellation errors as opposed to Eq. (11).

As proven in Appendix A, we can go further and write this left update mainly using linear algebra in observation space as

Note that both Eq. (11) and Eq. (12) support an approximation similar to the DEnKF by Sakov and Oke (**2008**a), which yields:

**2016**) and Eq. (14) is new. These are useful update formulas since they avoid the square root and fall back to an ensemble Kalman gain.

This type of updates can make the LEnSRF numerically affordable, especially with parallelisation (Farchi and Bocquet, **2019**). It also becomes affordable when combined with an approach based on local domains (à la LETKF) by enforcing covariance localisation on a decomposition of subdomains, or enforcing covariance localisation on the vertical while domain localisation is used on the horizontal.

## A new perturbation update scheme

In Section 2, we have defined the LEnSRF and explained how it could be implemented. In this section, we focus on the perturbation update step of the LEnSRF.

### On the consistency of the perturbation update

The regularised background error covariance matrix $\mathbf{B}=\mathit{\rho}\xb0\left(\mathbf{X}{\mathbf{X}}^{\top}\right),$ which is likely to be full-rank, can be written in the form $\mathbf{B}={\mathbf{X}}_{\mathrm{r}}{\mathbf{X}}_{\mathrm{r}}^{\top}$ provided ${\mathbf{X}}_{\mathrm{r}}$ is of size ${N}_{\mathrm{x}}\times {N}_{\mathrm{x}},$ i.e. ${N}_{\mathrm{r}}={N}_{\mathrm{x}}.$ With this $\mathbf{B},$ the theoretical analysis error covariance matrix

**X**

_{a,r}is the anomaly matrix of the

*N*

_{x}updated perturbations. It is an exact (hence consistent by definition) representation of the uncertainty since it is readily checked that

Of course, this is only theoretical, since, in practice, we can only afford to generate and propagate ${N}_{\mathrm{e}}\ll {N}_{\mathrm{x}}$ such perturbations. Since we look for ${N}_{\mathrm{e}}$ perturbations that capture most of the uncertainty of the update, it is tempting to apply the left transform ${\mathbf{T}}_{\mathrm{x}}$ to ${\widehat{\mathbf{X}}}_{\mathrm{r}},$ defined as the perturbation matrix of the ${N}_{\mathrm{e}}$ dominant modes (EOFs) of ${\mathbf{X}}_{\mathrm{r}}.$ Hence, we could propose:

**1998**) show that this update is ineffective and systematically leads to the divergence of the filter. This seems contradictory with the fact that this update captures as much uncertainty as possible, at least as measured using matrix norms.

The reason behind this apparent paradox is that in a cycled LEnSRF experiment based on Eq. (18) the localisation is essentially applied twice per cycle. Indeed, ${\widehat{\mathbf{X}}}_{\mathrm{r}}$ already captures the dominant contributions from a regularised $\mathbf{B},$ hence a first footprint of localisation. The resulting perturbations would then form an ensemble to be forecasted. The next cycle background statistics would be based on this forecast ensemble. The regularisation of the covariances would then require localisation, once again. Since localisation by Schur product is not idempotent – unless one uses a boxcar-like $\mathit{\rho}$ in which case $\mathit{\rho}$ would not be a proper correlation matrix – localisation is applied once too many. That is why Eq. (18) is not fit to a cycled LEnSRF.

In retrospect, this clarifies why Eq. (9) is well suited to a LEnSRF: localisation is applied once in each cycle. This argument also implies that the perturbations should not be blindly identified with the modes that carry most of the uncertainty. However, it is tacitly hoped that the forecast of the ensemble at the next cycle will be adequately regularised by the localisation matrix $\mathit{\rho}.$

The perturbations of the serial LEnSRF, the DEnKF and the local stochastic EnKF follow the same paradigm. By contrast, the local update perturbations of the LETKF are meant to capture most of the uncertainty within each local domain. Hence, the anomalies of the forecast ensemble are representative of the main uncertainty modes, as opposed to the other EnKF schemes. However, even though the local updated perturbations of the LETKF may offer better samples of the conditional pdf, this property could eventually fade away in the forecast because of their local validity.

Incidentally, this suggests that the LETKF could be better suited for ensemble short-term forecast, which could be investigated in a future study. Numerical clues supporting this idea are nonetheless provided at the end of Section 4.

### Improving the consistency of the perturbation update

We have just seen that the widespread view on the local EnKF perturbation update which assumes a low-rank extraction ${\mathbf{X}}_{\mathrm{a}}$ from ${\mathbf{P}}^{\mathrm{a}}$ with the hope that ${\mathbf{X}}_{\mathrm{a}}$ captures the most important directions of uncertainty: ${\mathbf{P}}^{\mathrm{a}}\approx {\mathbf{X}}_{\mathrm{a}}{\mathbf{X}}_{\mathrm{a}}^{\top},$ is only accurate for the LETKF. For the other schemes mentioned above, the perturbations do not have to coincide with the dominant modes.

For the LEnSRF update, we believe that it would be more consistent with how the perturbations are defined to look for a low-rank perturbation matrix ${\mathbf{X}}_{\mathrm{a}}$ such that

With the definition

Eq. (19) reads ${\mathbf{P}}^{\mathrm{a}}\approx {S}_{\mathit{\rho}}({\mathbf{X}}_{\mathrm{a}}).$ Our objective is to look for a solution to the optimisation problem

This problem is similar to the *weighted low-rank approximation* (WLRA) problem, which consists in solving

**2003**; Srebro and Jaakkola,

**2003**). With the identification ${\mathbf{P}}^{\mathrm{a}}\equiv \mathit{\rho}\xb0\mathbf{B}$ and imposing $\mathbf{A}$ to be symmetric positive semi-definite, our optimisation problem Eq. (21) is seen to belong to the class of WLRA problems. As opposed to the uniform case, ${[\mathit{\rho}]}_{n,m}\equiv 1,$ for which the minimiser of $\left|\right|\mathbf{X}{\mathbf{X}}^{\top}-{\mathbf{P}}^{\mathrm{a}}|{|}_{\mathrm{F}}$ simply coincides with the truncated singular value decomposition of ${\mathbf{P}}^{\mathrm{a}}$ (Eckart-Young theorem), the $\mathit{\rho}$-based problem has no simple solution.

^{2}

Hence, we expect that our problem Eq. (21) has no tractable solution. Note that the literature of the WLRA problem focuses on the non-symmetric case which would correspond for our problem to $\mathcal{L}(\mathbf{X},\mathbf{Y})=\hspace{0.17em}\text{ln}\hspace{0.17em}\left|\right|\mathit{\rho}\xb0\left(\mathbf{X}{\mathbf{Y}}^{\top}\right)-{\mathbf{P}}^{\mathrm{a}}|{|}_{\mathrm{F}}.$ By contrast, our focus is on the symmetric case, which has less degrees of freedom. Still, it is unlikely to be amenable to a convex problem. Let us see why.

The minimisation problem Eq. (21) is defined on the space of the $\mathbf{X}$ which is a convex subspace. It is equivalent to minimise $\mathcal{L}(\mathbf{X})$ or $\left|\right|{S}_{\mathit{\rho}}(\mathbf{X})-{\mathbf{P}}^{\mathrm{a}}|{|}_{\mathrm{F}}^{2},$ which is algebraic but nonetheless quartic in $\mathbf{X}$ and hence cannot be guaranteed to be convex. The problem is also equivalent to finding $\mathbf{P}$ of rank smaller or equal to ${N}_{\mathrm{e}}-1$ which minimises $\left|\right|\mathit{\rho}\xb0\mathbf{P}-{\mathbf{P}}^{\mathrm{a}}|{|}_{\mathrm{F}}^{2}.$ This function is quadratic in $\mathbf{P}.$ However, the space of the $\mathbf{P}$ of rank lower than ${N}_{\mathrm{e}}-1<{N}_{\mathrm{x}}$ is not convex. Hence our problem may have several or even an infinite number of solutions (a *variety*). For instance, there are many redundant degrees of freedom such as ${S}_{\mathit{\rho}}(\mathbf{XU})={S}_{\mathit{\rho}}(\mathbf{X})$ with $\mathbf{U}$ an ${N}_{\mathrm{e}}\times {N}_{\mathrm{e}}$ orthogonal matrix, so that the optimisation problem Eq. (21) is degenerate. The modified LEnSRF with this new update scheme follows the paradigm depicted in Fig. 1.

With a view to efficiently minimising $\mathcal{L}(\mathbf{X}),$ let us compute its gradient with respect to $\mathbf{X}$ of size ${N}_{\mathrm{x}}\times {N}_{\mathrm{e}}.$ The variation of $\mathcal{L}(\mathbf{X})$ with respect to $\mathbf{X}$ is

This yields the matrix gradient

**1995**). Note that the function $\mathcal{L}(\mathbf{X})$ may not only have many global minima, but it may also have many local minima. As a consequence it may not be possible to find a global minimum with the L-BFGS-B method.

### Parametrised minimisation

Instead of minimising $\mathcal{L}$ over $\mathbf{X}$ which has redundant degrees of freedom, we use an RQ decomposition of $\mathbf{X},$ which is obtained from a QR decomposition (Golub and van Loan, **2013**) of ${\mathbf{X}}^{\top}:$

A parametrised minimisation can easily be implemented using the function

We use this parametrised minimisation in all the numerical experiments. However, the plain method using the unparametrised minimisation on $\mathbf{X}$ works as well, although there is no guarantee to find the same local minimum because of the potential non-convexity of $\mathcal{L}(\mathbf{X}).$

In Appendix B, we address the question of the matrix norm choice in Eq. (21). In particular, we test the use of the spectral and nuclear matrix norms, and, more generally, of the Schatten *p*-norms. We found that these choices did not make much of a difference but that the choice of either the spectral or the nuclear norm, at the ends of the Schatten range, could lead to inaccurate numerical results.

Finally, coming back to the definition of $\mathcal{L}(\mathbf{X}),$ we have chosen to apply a logarithm function to the norm to level off the ups and downs of the function. Since the functions are non-convex, a quasi-Newton minimiser such as BFGS may behave differently in terms of convergence and local minima depending on the nature of the transformation. Hence, the log-transformation should not be considered totally innocuous. In practice, we found using the log-transform systematically beneficial.

###
*3.4. Forecast of the*
$\mathit{\rho}\xb0\left(\mathbf{X}{\mathbf{X}}^{\top}\right)$*representation*

Because we have offered a novel view on the posterior perturbations and how they are generated in the analysis, we now need to examine how the forecast step of the scheme is affected by this change of standpoint. If not, there would be a risk of breaking the consistency in the forecast step of the cycle.

As previously explained at the end of Section 3.1, an asset of the LETKF approach is that the updated perturbations represent the dominant modes of the posterior error covariance matrix Eq. (15). Hence, the forecast uncertainty must be approximated by the forecast of these modes. Nonetheless, by construction, the statistics of these modes before or after forecasting are only valid on local domains, i.e. for short spatial separations.

By contrast, with the new LEnSRF scheme, recognising that the posterior error covariance matrix is $\mathit{\rho}\xb0\left({\mathbf{X}}_{\mathrm{a}}{\mathbf{X}}_{\mathrm{a}}^{\top}\right)$ makes forecasting more intricate. This representation $\mathit{\rho}\xb0\left({\mathbf{X}}_{\mathrm{a}}{\mathbf{X}}_{\mathrm{a}}^{\top}\right)$ is meant to model statistics valid for larger spatial separations. How would one forecast this representation of the posterior error covariance matrix?

With the assumption that the error dynamics are linear, which would only be valid on short time scales, Bocquet (**2016**) has proposed a way to forecast $\mathit{\rho}\xb0\left({\mathbf{X}}_{\mathrm{a}}{\mathbf{X}}_{\mathrm{a}}^{\top}\right).$ First, the ${\mathbf{X}}_{\mathrm{a}}$ are assumed to be genuine physical perturbations that are forecasted by the tangent linear resolvent ${\mathbf{M}}_{k+1:k}$ from time *t _{k}* to time ${t}_{k+1}:$

The tangent linear model $\mathbf{M}{\prime}_{k}$ at *t _{k}* is defined by the expansion of the resolvent: ${\mathbf{M}}_{k+1:k}=\mathbf{I}+\mathbf{M}{\prime}_{k}({t}_{k+1}-{t}_{k})+o({t}_{k+1}-{t}_{k}).$ Second, the localisation matrix should be made time-dependent and satisfy – in the time continuum limit – the following Liouville equation:

In the case where the dynamics can be approximated as hyperbolic, and in the limit where space is continuous, a closed-form equation can be obtained for $\rho ({x}_{1},{x}_{2},t)$ (see Eq. (A14) of Bocquet, **2016**). If diffusion is present, there is no such closed-form equation. See also Kalnay et al. (**2012**); Desroziers et al. (**2016**) who have considered this issue in other contexts.

The key point is that in practice and for moderate forecast lead times, $\mathit{\rho}$ can roughly be assumed to be static. This is what will be used in the numerical experiments of Section 4. For larger ${t}_{k+1}-{t}_{k},$ one could assume at the next order approximation that the localisation length used to obtain the prior at ${t}_{k+1}$ is larger than the one used in the perturbation update new scheme at *t _{k}*, because of an effective diffusion either generated by genuine diffusion or by averaged mixing advection (as stressed in the Appendix A of Bocquet,

**2016**).

This suggests that $\mathit{\rho}\xb0\left({\mathbf{X}}_{\mathrm{f}}{\mathbf{X}}_{\mathrm{f}}^{\top}\right),$ obtained from the forecast perturbation matrix ${\mathbf{X}}_{\mathrm{f}}$ from ${\mathbf{X}}_{\mathrm{a}},$ is an acceptable approximation of the forecast error covariance matrix.

### Numerical cost of computing the gradient and the cost function

In this section, we analyse the cost of computing the cost function and the gradient. Indeed, both would be required by a quasi-Newton minimiser and both involve ${\mathbf{P}}^{\mathrm{a}}.$ In the following, $\mathit{\rho}$ will be assumed either sparse or homogeneous. These are sine qua none conditions for the feasibility of covariance localisation with high-dimensional models.

#### Bottlenecks

The cost function $\mathcal{L}(\mathbf{X})$ requires computing

As a consequence, the cost of evaluating $\mathcal{L}(\mathbf{X})$ is essentially driven by the evaluation of

Thus, we need to consider the cost of evaluating both terms in the right-hand side. The normalising factor $\left|\right|\mathbf{\Delta}|{|}_{\mathrm{F}}^{2}$ coincides with Eq. (33).

Hence, for both the cost function and its gradient, we need to evaluate a first term in the form ${\mathit{\rho}}_{2}\xb0\left(\mathbf{X}{\mathbf{X}}^{\top}\right)\xb7\mathbf{X},$ and a second term in the form $\left(\mathit{\rho}\xb0{\mathbf{P}}^{\mathrm{a}}\right)\xb7\mathbf{X}.$

#### Efficient evaluation

It can be shown that

*i*-th column of $\mathbf{X}.$ This can easily be shown by writing the matrix and vector indices explicitly (see e.g. Desroziers et al.,

**2014**).

The numerical complexity of Eq. (36) is:

- If $\mathit{\rho}$ is banded with a bandwidth of ${N}_{\mathrm{b}}:$ $\mathcal{O}\left({N}_{\mathrm{e}}{N}_{\mathrm{x}}{N}_{\mathrm{b}}\right).$ Hence, the numerical complexity of computing the first term of Eqs. (34,35) is $\mathcal{O}\left({N}_{\mathrm{e}}^{2}{N}_{\mathrm{x}}{N}_{\mathrm{b}}\right)$ in this case.
- If $\mathit{\rho}$ represents homogeneous correlations, corresponding to an invariance by translation: $\mathcal{O}\left({N}_{\mathrm{e}}{N}_{\mathrm{x}}\hspace{0.17em}\text{ln}\hspace{0.17em}{N}_{\mathrm{x}}\right).$ Hence, the numerical complexity of computing the first term of Eqs. (34,35) is $\mathcal{O}\left({N}_{\mathrm{e}}^{2}{N}_{\mathrm{x}}\hspace{0.17em}\text{ln}\hspace{0.17em}{N}_{\mathrm{x}}\right)$ in this case.

Let us now consider the complexity of computing the second term. Assuming $\mathbf{P}$ is entirely known, we have

If $\mathit{\rho}$ is banded, then the cost of the evaluation of ${\left[\mathit{\rho}\xb0\mathbf{P}\xb7\mathbf{v}\right]}_{n}$ is $\mathcal{O}\left({N}_{\mathrm{b}}\right),$ so that the cost of evaluating $\mathit{\rho}\xb0\mathbf{P}\xb7\mathbf{v}$ is $\mathcal{O}\left({N}_{\mathrm{x}}{N}_{\mathrm{b}}\right)$ and the cost of evaluating $\mathit{\rho}\xb0\mathbf{P}\xb7\mathbf{X}$ is $\mathcal{O}\left({N}_{\mathrm{x}}{N}_{\mathrm{b}}{N}_{\mathrm{e}}\right).$ This cost is acceptable, i.e. it does not departs much from $\mathcal{O}({N}_{\mathrm{x}}).$ However, it does not account for the cost of evaluating $\mathbf{P},$ which is the real issue when one considers ${\mathbf{P}}^{\mathrm{a}}.$

#### Mode expansion estimation of ${\mathbf{P}}^{\mathrm{a}}$

If we assume that we have extracted ${N}_{\mathrm{r}}$ modes stored in ${\widehat{\mathbf{X}}}_{\mathrm{a}}$ such that ${\mathbf{P}}^{\mathrm{a}}\approx {\widehat{\mathbf{X}}}_{\mathrm{a}}{\widehat{\mathbf{X}}}_{\mathrm{a}}^{\top}$ (the ${N}_{\mathrm{r}}$ largest EOFs of ${\mathbf{P}}^{\mathrm{a}}$), then the second term of Eqs. (34,35) can be written $\mathit{\rho}\xb0\left({\widehat{\mathbf{X}}}_{\mathrm{a}}{\widehat{\mathbf{X}}}_{\mathrm{a}}^{\top}\right)\xb7\mathbf{X}$ which can also be computed using Eq. (36) since, typically, ${N}_{\mathrm{e}}\u2a7d{N}_{\mathrm{r}}\ll {N}_{\mathrm{x}}.$ The cost of obtaining ${\widehat{\mathbf{X}}}_{\mathrm{a}}$ is the subject of Farchi and Bocquet (**2019**). Still assuming that we have ${\widehat{\mathbf{X}}}_{\mathrm{a}}$ such that ${\widehat{\mathbf{X}}}_{\mathrm{a}}{\widehat{\mathbf{X}}}_{\mathrm{a}}^{\top}\approx {\mathbf{P}}^{\mathrm{a}},$ the numerical complexity of computing the second term of Eqs. (34,35) becomes $\mathcal{O}\left({N}_{\mathrm{e}}{N}_{\mathrm{r}}{N}_{\mathrm{x}}{N}_{\mathrm{b}}\right)$ ($\mathcal{O}\left({N}_{\mathrm{e}}{N}_{\mathrm{r}}{N}_{\mathrm{x}}\hspace{0.17em}\text{ln}\hspace{0.17em}{N}_{\mathrm{x}}\right)$) in case (i) (case (ii)), respectively. However, note that these computations can be embarrassingly parallelised, easily alleviating the cost by a factor of ${N}_{\mathrm{e}}$ or ${N}_{\mathrm{r}}$ on a parallel computer.

Note that Eqs. (9, 10, 11) may be irrelevant in computing the required ${\widehat{\mathbf{X}}}_{\mathrm{a}}$ since they do no strictly represent a mode expansion of ${\mathbf{P}}^{\mathrm{a}}.$ Instead, a systematic, variance-driven, expansion of ${\mathbf{P}}^{\mathrm{a}},$ as studied in Farchi and Bocquet (**2019**) would be required. The alternative is to use the modulation by Bishop and Hodyss (**2009**). But it could yield a substantially larger ${N}_{\mathrm{r}}$ and might be numerically costly.

#### Local evaluation of ${\mathbf{P}}^{\mathrm{a}}$

If the observations are assumed to be local, i.e. each one of them is only correlated to nearby model variables, then the main diagonals of ${\mathbf{P}}^{\mathrm{a}}$ can be estimated using local approximations, in a way similar to the strategy followed by the LETKF. Indeed, the LETKF is able to estimate rows or columns of ${\mathbf{P}}^{\mathrm{a}}$ using local analyses. Denoting ${\mathbf{P}}_{n}^{\mathrm{a}}\equiv {\left[{\mathbf{P}}^{\mathrm{a}}\right]}_{n}$ as the *n*-th column of ${\mathbf{P}}^{\mathrm{a}},$ one has

*n*and where ${\mathbf{R}}_{n}^{-1}$ is the tapered precision matrix with respect to site

*n*.

Hence, the evaluation of ${\mathbf{P}}_{n}^{\mathrm{a}}$ is of complexity $\mathcal{O}\left({N}_{\mathrm{b}}{N}_{\mathrm{e}}^{2}+{N}_{\mathrm{e}}^{3}\right),$ so that the evaluation of the entries of ${\mathbf{P}}^{\mathrm{a}}$ required in the evaluation of $\mathit{\rho}\xb0{\mathbf{P}}^{\mathrm{a}}$ is $\mathcal{O}\left({N}_{\mathrm{x}}{N}_{\mathrm{b}}{N}_{\mathrm{e}}^{2}\right),$ and a factor less if parallelisation is enforced.

Of course, one of the primary reasons for using covariance localisation is its ability to assimilate non-local observations. Hence, the assumption of locality made here defeats one of the key purpose of using covariance localisation. Nonetheless, we shall see that even with local observations, the update scheme developed in Section 3.2 can be beneficial.

## Numerical experiments

### Properties of the new perturbations

At first, we are interested in comparing the shape of the updated perturbations from a standard scheme compared to those of the new scheme. We also wish to explore how much can $\mathcal{L}({\mathbf{X}}_{\star})$ be rendered small, i.e. if there exists ${\mathbf{X}}_{\star}$ such that $\mathit{\rho}\xb0\left({\mathbf{X}}_{\star}{\mathbf{X}}_{\star}^{\top}\right)\approx {\mathbf{P}}^{\mathrm{a}}.$ To that the end, we first consider a (random) Gaussian model of covariance $\mathbf{B}$ over a periodic one-dimensional domain for which ${N}_{\mathrm{x}}=400.$ The vector $\mathit{\sigma}$ of the standard deviations of $\mathbf{B}$ is obtained from a random draw from a log-normal distribution with Gaussian covariance matrix of correlation length ${L}_{\mathrm{v}}=10.$ The correlation matrix $\mathbf{C}$ associated to $\mathbf{B}$ is built from the piece-wise rational Gaspari–Cohn (Gaspari and Cohn, **1999**) function (hereafter referred to as the GC function) with correlation length parameter ${L}_{\mathrm{c}}=10.$ From these definitions, we have $\mathbf{B}=\mathbf{\Sigma}\mathbf{C}\mathbf{\Sigma}$ where $\mathbf{\Sigma}=\text{diag}(\mathit{\sigma})$ is the diagonal matrix of the standard deviations.

We compare the shape of ${N}_{\mathrm{e}}=8$ perturbations, whose sample covariance matrix may be regularised using a correlation matrix $\mathit{\rho}$ built with the GC function with a localisation length parameter ${L}_{\mathit{\rho}}=10.$ The perturbations are generated by

- random draws ${\mathbf{X}}_{\mathrm{e}}$ from the covariance matrix $\mathbf{B}.$ We associate to them the sample covariance matrix ${\mathbf{P}}^{\mathrm{e}}={\mathbf{X}}_{\mathrm{e}}{\mathbf{X}}_{\mathrm{e}}^{\top}$ and its regularised counterpart ${\mathbf{P}}_{\mathit{\rho}}^{\mathrm{e}}=\mathit{\rho}\xb0{\mathbf{P}}^{\mathrm{e}};$
- extracting the main ${N}_{\mathrm{e}}$ modes, $\widehat{\mathbf{X}}$ of $\mathbf{B}.$ We associate to them the sample covariance matrix $\widehat{\mathbf{P}}=\widehat{\mathbf{X}}{\widehat{\mathbf{X}}}^{\top},$ and its regularised counterpart ${\widehat{\mathbf{P}}}_{\mathit{\rho}}=\mathit{\rho}\xb0\widehat{\mathbf{P}};$
- extracting ${N}_{\mathrm{e}}$ modes using the new scheme, ${\mathbf{X}}_{\star}={S}_{\mathit{\rho}}^{\star}\left(\mathbf{B}\right).$ We associate to them the sample covariance matrix ${\mathbf{P}}^{\star}={\mathbf{X}}_{\star}{\mathbf{X}}_{\star}^{\top},$ and its regularised counterpart ${\mathbf{P}}_{\mathit{\rho}}^{\star}=\mathit{\rho}\xb0{\mathbf{P}}^{\star}.$ The starting point of the minimisation (first guess) is chosen to be $\widehat{\mathbf{X}}.$

Figure 2 displays, for a single realisation of the covariance model, the true covariance model $\mathbf{B},$ the sample covariance matrices ${\mathbf{P}}^{\mathrm{e}},\widehat{\mathbf{P}},{\mathbf{P}}^{\star},$ and the regularised sample covariance matrices ${\mathbf{P}}_{\mathit{\rho}}^{\mathrm{e}},{\widehat{\mathbf{P}}}_{\mathit{\rho}}$ and ${\mathbf{P}}_{\mathit{\rho}}^{\star}.$

For the same realisation, Fig. 3 displays the perturbations ${\mathbf{X}}_{\mathrm{e}},\widehat{\mathbf{X}},$${\mathbf{X}}_{\star}.$ We also consider a second optimal solution where the first guess is ${\mathbf{X}}_{\mathrm{e}},$ which yields another set of perturbations, ${\mathbf{X}}_{\u2022}$ in order to investigate the dependence on the starting point of the minimisation.

It is clear from Fig. 2 that ${\mathbf{P}}^{\star}$ seems unphysical with rather long-range correlations, but that ${\mathbf{P}}_{\mathit{\rho}}^{\star}$ is, as a result of its construction, a remarkably close match to $\mathbf{B}.$$\widehat{\mathbf{P}}$ seems a rather good approximation of $\mathbf{B}.$ However, it is clear that ${\widehat{\mathbf{P}}}_{\mathit{\rho}}$ has a thinner structure along the diagonal than $\mathbf{B},$ which can be seen as a manifestation of the double application of localisation. These visual impressions on a single realisation are confirmed by computing the Frobenius norm of the difference between the true covariance matrix $\mathbf{B}$ and either the sample covariance matrix or the regularised sample covariance matrix. The norm is averaged over 10^{3} realisations. The results are reported in Table 1. In particular, either ${\mathbf{P}}_{\mathit{\rho}}^{\star}$ or ${\mathbf{P}}_{\mathit{\rho}}^{\u2022}$ are a close match to $\mathbf{B},$ and their residual discrepancy to $\mathbf{B}$ as measured by these matrix norms are very small and similar, though not identical.

As seen in Fig. 3 the perturbations $\widehat{\mathbf{X}}$ are rather local and peaked functions, which could be expected since they represent the first EOFs of $\mathbf{B}.$ The perturbations ${\mathbf{X}}_{\star}$ obtained with the new scheme starting with $\widehat{\mathbf{X}}$ are much broader functions with a larger support. This is due to the weaker constraints imposed on these modes. However, they remain partially localised, in that they partly vanish on the domain. The perturbations ${\mathbf{X}}_{\u2022}$ obtained with the new scheme but starting with ${\mathbf{X}}_{\mathrm{e}}$ are also broad functions. However, as opposed to ${\mathbf{X}}_{\star},$ they do not partially vanish, and are barely local. This shows that ${S}_{\mathit{\rho}}(\mathbf{B})$ indeed represents a set of potentially distinct solutions and that the solution to which the minimisation converges captures traits of the starting perturbations.

### Accuracy of the scheme

#### Lorenz–96 model

The performance of the new scheme is tested in a mildly nonlinear configuration of the discrete 40-variable one-dimensional Lorenz–96 (L96) model (Lorenz and Emanuel, **1998**), with the standard forcing *F* = 8. The corresponding ordinary differential equations defined on a periodic domain are for $n=1,\dots ,{N}_{\mathrm{x}}=40:$

We consider twin experiments where synthetic observations are generated from the true model trajectory every $\Delta t=0.05.$ The observation operator is chosen to be $\mathbf{H}={\mathbf{I}}_{\mathrm{x}};$ in particular, the model is fully observed. The observation errors are Gaussian with distribution $\mathcal{N}(\mathbf{0},\mathbf{R})$ and observation error covariance matrix $\mathbf{R}={\mathbf{I}}_{\mathrm{x}}.$ A sparse observation network configuration will be studied in Section 4.3.

We test the following data assimilation schemes:

- The standard LETKF as defined by Hunt et al. (
**2007**). - The LEnSRF as defined in Section 2.3.2. The ${\mathbf{T}}_{\mathrm{x}}$ matrices are computed exactly in this low-order setup. Section 2.4 would be used for higher dimensional models.
- The new LEnSRF with the new updating scheme. The ${\mathbf{P}}^{\mathrm{a}}$ matrices are computed exactly in this low-order setup. The strategies defined in Section 3.5 would be used for higher dimensional models. We choose to start the minimisation of $\mathcal{L}(\mathbf{X})$ from the background perturbations, the natural incremental standpoint.

When ${N}_{\mathrm{e}}-1\u2a7d14,$ which corresponds to the size of the unstable and neutral subspace of this model, localisation is mandatory to avoid divergence of the filters (Bocquet and Carrassi, **2017**). The localisation function used to build the localisation matrix for covariance localisation (LEnSRF) or for tapering the observation error precision matrix (LETKF) is the GC function. In order to achieve a good (though approximate) match between the LETKF and the LEnSRF, the tapering of the perturbations in the LETKF, known to be equivalent to the tapering of the precision matrix, is carried out using the square root of the GC function (Sakov and Bertino, **2011**). The performance of the algorithms are mainly assessed by the time-averaged root mean square error (RMSE) between the analysis and the truth. The multiplicative inflation (in the range $\lambda \in [1,1.08]$), which is applied to the prior perturbations, and the localisation radius (in the range $r\in [4,38]$ sites) are optimally tuned so as to yield the lowest RMSE. Random rotations are applied after each update (Sakov and Oke, **2008**b). It does marginally improve the RMSE scores for large values of ${N}_{\mathrm{e}}.$

For each configuration, 10 data assimilation experiments are run. Each run is $2\times {10}^{4}$ cycle-long after a spin-up of $2\times {10}^{3}$ cycles. All statistics are averaged over these 10 runs. The results are displayed in the left column of Fig. 4.

First, the LETKF and the LEnSRF show similar RMSEs, and optimal inflation for all ensemble sizes, but the LETKF has the edge for both the RMSE and the inflation. The optimal localisation lengths for the three schemes are similar, in particular thanks to the approximate correspondence between the way the observation precision matrix is tapered in the LETKF and the way the background error covariance is tapered in the LEnSRF. Nonetheless the localisation length of the traditional LEnSRF is smaller than the other two EnKFs, especially for larger ensemble sizes.

Second, the new LEnSRF with the new update shows lower RMSEs, and significantly lower optimal inflation than the other two schemes. The improvement in the RMSE is in the range $3\%-6\%,$ which is significant in these very well-tuned and documented configurations, where such gain is very difficult to obtain.

Focusing on the multiplicative inflation requirement, we have computed the RMSE as a function of the inflation, with the localisation length optimally tuned so as to minimise the RMSE, for three ensemble sizes ${N}_{\mathrm{e}}=4,8,16.$ The results are plotted in the left column in Fig. 5.

It shows that the requirement of the new LEnSRF for inflation is actually very small. In the case ${N}_{\mathrm{e}}=8,16$ inflation is barely needed, while the extreme case ${N}_{\mathrm{e}}=4$ does show a need for inflation but much smaller than that of the LEnSRF and LETKF. This points to the robustness of the new LEnSRF.

By construction, ${S}_{\mathit{\rho}}\left({S}_{\mathit{\rho}}^{\star}({\mathbf{P}}^{\mathrm{a}})\right)$ as implicitly relied upon in the new LEnSRF is a better match to ${\mathbf{P}}^{\mathrm{a}}$ than ${S}_{\mathit{\rho}}\left({\mathbf{X}}_{\mathrm{a}}\right)$ where ${\mathbf{X}}_{\mathrm{a}}$ is defined by Eq. (9) as used in the LEnSRF. This might explain the lesser requirement for multiplicative inflation.

We speculate that this lesser need for multiplicative inflation in the new LEnSRF may also be interpreted as a reduced imbalance of the updated perturbations. If true, this implies that for the L96 model in this standard setup, the residual inflation required in the LETKF and LEnSRF does not so much originate from the sampling errors but from the imbalance generated by localisation. This, however, can only be validated on physically more complex, $2-$ or $3-$dimensional models, beyond the scope of this paper.

#### Kuramoto–Sivashinsky model

We have performed similar experiments with the Kuramoto–Sivashinsky (KS) model (Kuramoto and Tsuzuki, **1975**, **1976**; Sivashinsky, **1977**). It is defined by the partial differential equation

**2005**) with the time step $\delta t=0.50$ in time unit of the KS model. Synthetic observations are collected every $\Delta t=1$ on all collocation grid points. The observation operator is chosen to be $\mathbf{H}={\mathbf{I}}_{\mathrm{x}};$ in particular, the model is fully observed. The observation errors are Gaussian with distribution $\mathcal{N}(\mathbf{0},\mathbf{R})$ and observation error covariance matrix $\mathbf{R}={\mathbf{I}}_{\mathrm{x}}.$ Like for the L96 model experiments, the localisation matrix used in either the LEnSRFs or the LETKF is built from the GC function, and random rotations are applied after each update. The performance of the algorithms are assessed by the time-averaged analysis RMSE as well. The multiplicative inflation (in the range $\lambda \in [1,1.16]$) and the localisation radius (in the range $r\in [10,80]$ sites) are optimally tuned so as to yield the best RMSE.

For each configuration, 10 data assimilation experiments are run. Each run is $2\times {10}^{4}$ cycle-long after a spin-up of $2\times {10}^{3}$ cycles. All statistics are averaged over these 10 runs. Note that for ${N}_{\mathrm{e}}-1\u2a7d14,$ which corresponds to the size of the unstable and neutral subspace of this model, localisation is mandatory to avoid divergence of the filters. The results are displayed in the right column of Fig. 4.

The results are very similar to those of the L96 model. The LEnSRF with the new update scheme outperforms the other two schemes, with a much lower need for inflation, and optimal localisation lengths similar to that of the LEnSRF without the new update scheme. For this model, the optimal localisation length for the LETKF is however larger than for both LEnSRFs.

The requirement for multiplicative inflation is further studied similarly to the L96 case. The right column of Fig. 5 shows the RMSE as a function of multiplicative inflation for optimally tuned localisation length and for ${N}_{\mathrm{e}}=4,8,16.$ Again, it shows that the need for inflation is substantially reduced and not really needed for ${N}_{\mathrm{e}}=8,16,$ and even ${N}_{\mathrm{e}}=4,$ demonstrating the robustness of the new LEnSRF.

### Sparse and infrequent observations

Localisation schemes can behave differently in presence of sparse and inhomogeneous observations. Moreover, we have conjectured that the new perturbations update scheme could generate an ensemble with less imbalance (closer to the attractor), which could be evidenced with longer forecasts in the EnKF than those considered so far. Hence, in this section, we consider:

- A first set of experiments where the state vector entries are uniformly and randomly observed with a fixed density ${N}_{\mathrm{y}}/{N}_{\mathrm{x}},$ which is varied from 0.25 to 1. Specifically, at each observation time, ${N}_{\mathrm{y}}$ grid cells are randomly selected (without replacement) over the total ${N}_{\mathrm{x}}$ grid cells and the observation operator $\mathbf{H}$ directly yields the value of the state vector at each of these grid cells. The observations are collected every $\Delta t=0.05$ time unit.
- A second set of experiments where the observations are spatially densely observed (
$\mathbf{H}={\mathbf{I}}_{\mathrm{x}}$) but with a fixed time step which is varied from
$\Delta \mathrm{t}=0.05$ to the much less frequent
$\Delta \mathrm{t}=0.40.$ For such long forecast, the more accurate local iterative ensemble Kalman filter (IEnKF) would yield better RMSEs (Bocquet,
**2016**), but applying the new update scheme to the IEnKF with localisation is outside the scope of this paper.

We choose to focus on the L96 model and an ensemble size of ${N}_{\mathrm{e}}=8$ and $\mathbf{R}={\mathbf{I}}_{\mathrm{y}}.$ In both experiments, the localisation length is optimally tuned so as to minimise the RMSE.

For the first set of experiments, we plot in Fig. 6 the time-averaged analysis RMSE (left panel) and the optimal inflation (right panel) required to minimise this RMSE as a function of the fixed density of observations ${N}_{\mathrm{y}}/{N}_{\mathrm{x}},$ for the three EnKFs considered in the previous experiments. The localisation length is optimally tuned so as to minimise the RMSE.

The results are very similar to those obtained in the previous subsection: the new LEnSRF scheme yields a typical 5% improvement in the RMSE, while using a significantly lower multiplicative inflation. In the left panel of Fig. 8, the RMSEs of the three schemes for ${N}_{\mathrm{e}}=8,\Delta t=0.05$ and ${N}_{\mathrm{y}}/{N}_{\mathrm{x}}=0.50,$ are plotted as a function of the multiplicative inflation, while the localisation length is optimally tuned so as to minimise the RMSE. Again, this emphasises the little need for multiplicative inflation of the new LEnSRF.

For the second set of experiments, we plot in Fig. 7 the time-averaged RMSE (left panel) and the optimal inflation (right panel) required to minimise this RMSE as a function of the time interval between observations $\Delta t,$ for the three considered EnKFs. Again, the new LEnSRF yields smaller RMSEs than the classical LEnSRF and the LETKF. As $\Delta t$ increases, the multiplicative inflation required to compensate for the error generated by sampling errors increases too. This is known to be due to the increased nonlinearity of the forecast (Bocquet et al., **2015**; Raanes et al., **2019**). The optimal multiplicative inflation required by the new LEnSRF does increase with $\Delta t$ but remains significantly smaller than the one required by the other two EnKFs. Differently from the previous numerical experiments, the LETKF outperforms the classical LEnSRF and its RMSE curve gets closer to that of the new LEnSRF with larger $\Delta t.$ This supports our claim made in Sect. 3.1 that the LETKF might generate better forecast ensembles.

In the right panel of Fig. 8, the RMSEs of the three schemes for ${N}_{\mathrm{e}}=8,\Delta t=0.20$ and ${N}_{\mathrm{y}}/{N}_{\mathrm{x}}=1,$ are plotted as a function of the multiplicative inflation, while the localisation length is optimally tuned so as to minimise the RMSE. This shows that the new LEnSRF can yield good RMSE scores even with small inflation factors and for longer forecasts.

We have also computed the ratio of the analysis RMSE over the ensemble spread, as $\Delta t$ is increased, the multiplicative inflation and localisation length being tuned so as the minimise the RMSE. The new LEnSRF and the LETKF behave quite similarly with a ratio progressively increasing from 1 to 1.10 when $\Delta t$ goes from 0.05 to 0.40. Quite differently, the classical LEnSRF shows a ratio that increases from 1 to 1.30 when $\Delta t$ goes from 0.05 to 0.40. Again, this supports the idea that the forecast ensembles of the new LEnSRF and the LETKF are of better quality than those of the classical LEnSRF.

Note that we have also considered time-averaged forecast RMSE and spread for a range of forecast lead times. They follow the same trend as the analysis RMSE and analysis spread but are progressively amplified with increasing lead time.

All of these experiments have also been conducted with the KS model. The results are qualitatively very similar and yield the same conclusions for both the sparse and infrequent observation experiments.

## Conclusions

In this paper, we have looked back at the perturbation update scheme in the EnKFs based on covariance localisation. We have argued that updated perturbations in the local EnKFs based on covariance localisation do not represent the main modes of the analysis error covariance matrix, in contrast to the updated perturbations of the LETKF. In particular, we have focused on the LEnSRF. We have explained why Eq. (9) still is, on theoretical grounds, a good substitute for generating these perturbations.

Using these considerations, we have proposed a perturbation update scheme potentially more consistent in the sense that the perturbations $\mathbf{X}$ are related to the error covariance matrix by $\mathbf{P}\approx \mathit{\rho}\xb0\left(\mathbf{X}{\mathbf{X}}^{\top}\right)$ throughout the EnKF scheme. It consists in getting one solution of the minimisation problem Eq. (21). The updated perturbations are expected to be more accurate in forming short spatial separation sample covariances because less constraints are exerted on large separation sample covariances. Since we can compute the gradient of the function to be minimised, the solution can be obtained using an off-the-shelf quasi-Newton algorithm. The evaluation of the function and its gradient requires knowledge of $\mathit{\rho}\xb0{\mathbf{P}}^{\mathrm{a}},$ hence a partial knowledge of ${\mathbf{P}}^{\mathrm{a}},$ which is one difficulty of the method. Depending on the problem, its geometry and dimension, such knowledge could be obtained through mode expansion or through local estimations of ${\mathbf{P}}^{\mathrm{a}}.$

We have tested this idea and defined a new LEnSRF with the new perturbation update scheme. We have compared it numerically to the LETKF and to a vanilla LEnSRF based on an implementation of Eq. (9), using two low-order one-dimensional models: the discrete 40-variable Lorenz–96 model and a 128-variable spectral discretisation of the continuous Kuramoto–Sivashinsky model. We have shown that for both models, the requirement for residual multiplicative inflation still needed in spite of localisation is much weaker with the new LEnSRF than with both the LETKF and the LEnSRF. For large enough ensemble sizes, the new LEnSRF actually performs very well without any inflation. This weaker requirement for inflation stems from a better consistency of the analysis error covariance matrix as inferred by the updated perturbation to the actual one. We conjecture that it could be physically interpreted as a much weaker imbalance generated by the new update scheme. Moreover, there is an accuracy improvement of up to 6% in the analysis RMSE in mildly nonlinear conditions, which is significant in these very well-tuned configurations. The RMSE/spread score is shown to be closer to 1 for the LETKF and the new LEnSRF than for the vanilla LEnSRF. These results have been confirmed and further strengthened in sparse and infrequent observation network configurations.

We plan on testing this new scheme on two-dimensional models and more sophisticated physics. We also plan to study the potential benefit of such update scheme in an hybrid setup (i.e. using hybrid covariances).