Ensemble Kalman filtering with residual nudging

Covariance inflation and localization are two important techniques that are used to improve the performance of the ensemble Kalman filter (EnKF) by (in effect) adjusting the sample covariances of the estimates in the state space. In this work an additional auxiliary technique, called residual nudging, is proposed to monitor and, if necessary, adjust the residual norms of state estimates in the observation space. In an EnKF with residual nudging, if the residual norm of an analysis is larger than a pre-specified value, then the analysis is replaced by a new one whose residual norm is no larger than a pre-specified value. Otherwise the analysis is considered as a reasonable estimate and no change is made. A rule for choosing the pre-specified value is suggested. Based on this rule, the corresponding new state estimates are explicitly derived in case of linear observations. Numerical experiments in the 40-dimensional Lorenz 96 model show that introducing residual nudging to an EnKF may improve its accuracy and/or enhance its stability against filter divergence, especially in the small ensemble scenario.


Introduction
The ensemble Kalman filter (EnKF) (Anderson, 2001;Bishop et al., 2001;Burgers et al., 1998;Evensen, 1994;Hoteit et al., 2002;Houtekamer and Mitchell, 1998;Pham, 2001;Whitaker and Hamill, 2002) is an efficient algorithm for data assimilation in high dimensional systems.Because of its runtime efficiency and simplicity in implementation, it is receiving ever-increasing attentions from researchers in various fields.In many applications of the EnKF, due to limited computational resources, one is only able to run an EnKF with an ensemble size much smaller than the dimension of the state space.In such circumstances, problems often arise, noticeably on the quality of the sample covariances, including, for instance, rank-deficiency, underestimation of the covariance matrices (Sacher and Bartello, 2008;Whitaker and Hamill, 2002), and spuriously large crossvariances between independent (or uncorrelated) state variables (Hamill et al., 2001).To mitigate these problems, it is customary to introduce two auxiliary techniques, namely covariance inflation (Anderson and Anderson, 1999) and localization (Hamill et al., 2001), to the EnKF.On the one hand, covariance inflation increases the estimated sample covariances in order to compensate for the effect of underestimation, which in fact increases the robustness of the EnKF in the sense of Luo and Hoteit (2011).On the other hand, covariance localization introduces a "distance"-dependent tapering function to the elements of the sample covariances, and smooths out the spuriously large values in them.In addition, covariance localization also increases the ranks of the sample covariances (Hamill et al., 2009).130 Suppose that the i-th ensemble member x b k,i of X b k con- x b k,i /n.To introduce covariance inflation to the fil- 135 the ensemble of deviations with respect to X b k , and λ 1 136 the inflation factor, then the inflated background ensemble ( 152 With the incoming observation y o k , one updates ŷb k and pb yy,k 153 to their analysis counterparts, ŷa k and pa yy,k , respectively, 154 through the following formulae (Anderson, 2007, Eq. (3.2 -155 3.3)). 156 One can verify that the sample mean and covariance of Y a k 162 are ŷa k and pa yy,k , respectively.Also note the difference be-163 tween the concepts of deviations and increments.For distinc-164 tion we have used ∆ to denote deviations, and δ increments.

165
After the above quantities are calculated, one proceeds 166 to update the background ensemble X b k to the analysis one , where the incre-168 ment δx k,i with respect to the i-th background ensemble 169 member x b k,i is an mx dimensional vector, i.e., δx k,i = 170 , where the j-th element (δx k,i )j 171 of δx k,i is given by with pj xy,k being the sample cross-variance between all the j-th elements of the ensemble members of X b k , and the pro- 233 234 The term  (Engl et al., 2000, ch.2).We refer to x o k as the observa-238 tion inversion hereafter.With Eq. ( 9), the new residual β trace(R k ) according to Eq. ( 8).

241
In residual nudging we only attempt to adjust the anal-    (Gandin, 1988).The main differences between residual nudging and quality control are the following.383 For instance, one may adopt the Tikhonov regularization 384 (Engl et al., 2000, ch. 4) so that the solution in Eq. ( 9b) in the literature (see, for example, Engl et al. 2000).

397
The computation of the matrix product is a non-trivial issue in large-scale problems, and is wor-399 thy of further discussion 3 .In general cases where the ob-400 servation operator H k is time varying, the computational 401 cost is comparable to that in evaluating the Kalman gain.

402
In terms of numerical computations, one possible choice 403 is to apply QR factorization (Meyer, 2001, ch. 5) to H T k 404 such that H T k is factorized as the product of an orthogonal, trix U, where for notational convenience we drop the time With some algebra, it can be shown that for instance, when H k is sparse (Meyer, 2001, ch. 5); or when 421 H k is time invariant, e.g., in a static observation network.

422
In the latter case, one only needs to evaluate the product 423 once and for all.observations, the KF provides the optimal estimate in the 433 sense of, for instance, minimum variance (Jazwinski, 1970).

438
The scalar AR1 model is given by where u k represents the dynamical noise and follows the 441 condition to achieve residual nudging is, for example, c k ( ra , with the (possibly) new estimate xa k again given by Eq. (9a). 3 For the experiments to be presented later, since the dimensions of the dynamical models are relatively low, we choose to directly compute the matrix product Gaussian distribution with zero mean and variance 1, and is 442 thus denoted by u k ∼ N (u k : 0, 1).The observation model 443 is described by 445 where v k ∼ N (v k : 0, 1) is the observation noise, and is 446 uncorrelated with u k .

447
In the experiment, we integrate the AR1 model forward 448 for 10, 000 steps (integration steps hereafter), with the ini-449 tial value randomly drawn from the Gaussian distribution 450 N (0, 1), and the associated initial prior variance being 1.

451
The T with respect to the true state vector The average spread ŝk and the time mean (average) spread ŝ are defined in a way similar to their counterparts with respect to the RMSE.In Table 1 we report the minimum time mean RMSEs 576 In  ing term (Lorenz, 1996).Throughout this work, we choose 617 c 0000 Tellus, 000, 000-000 F = 8 unless otherwise stated.For consistency, we define    In the full observation scenario (upper left panels), for a fixed γ, the time mean RMSEs of both filters exhibit the U-turn behaviour with respect to F , achieving their minima at F = 8.This point also appears to be valid in other observation scenarios.On the other hand, for a fixed F , the behaviour of the filters is very similar to that reported in Figs. 5 and 6, since the role of the (possibly) mis-specified variance γ is similar to the observation noise level coefficient β (note, though, that γ also appears in the computation of the Kalman gain).When γ is relatively small (say γ 2), the EAKF-RN tends to perform better than the normal EAKF in terms of time mean RMSE.Moreover, the normal EAKF diverges at (F = 12, γ = 0.25), while the EAKF-RN avoids the divergence.On the other hand, when γ is relatively large (say γ 6), the EAKF-RN and the normal EAKF have almost indistinguishable performance, not only for the current experiment results, but also for those in the other observation scenarios.This is largely because mistakenly over-estimating the variance γ has an effect similar to increasing β, so that the observation inversion in Eq. (9a) becomes less influential for state estimation, and the EAKF-RN has almost the same estimate as the normal EAKF.
In the 1/2 observation scenario (upper right panels), when γ is relatively small (say γ 1), the normal EAKF tends to diverge for all F .The EAKF-RN avoids filter divergences in some of the areas, though there are still two cases spotted at F = 12, with γ = 1 and γ = 2, respectively.As γ becomes larger, the performance of the normal EAKF and the EAKF-RN are very close to each other, similar to the situation in the full observation scenario.In both the 1/4 and 1/8 observation scenarios (lower panels), there are also almost no differences between the time mean RMSEs of the two filters, although the time mean RMSE of the EAKF-RN appears to be slightly lower than that of the normal EAKF in the 1/4 observation scenario for relatively small F and γ (around the lower left corners).Both filters diverge in the 1/4 observation scenario, at (F = 10, γ = 0.25), otherwise neither filter diverges.
We also thank two anonymous reviewers for their constructive comments and suggestions.One reviewer points out the similarity between residual nudging and the adaptive inflation schemes in Anderson (2007;2009)       c 0000 Tellus, 000, 000-000 background ensemble X b k = {x b k,i } n i=1 with n members.The 105 incoming observation y o k is obtained from the following ob-becomes y o k , with the dimension my = 1.The algorithm 129 description below mainly follows Anderson (2007).
what follows, we do not particularly distin-141 guish background ensembles with and without covariance 142 inflation through different notations.Instead, we always de-143 note the background ensemble by X b k , no matter whether it 144 is inflated or not.One can tell whether a background ensem-145 ble is inflated by checking the value of λ, e.g., λ = 1 means 146 no inflation, and λ > 1 with covariance inflation.147 On the other hand, suppose that the projection of 148 X b k onto the observation space is Y b k = {y b k,i : y b k,i = 149 H k x b k,i } n i=1 , then one can compute the sample mean ŷb k ) 224 where the function min(a, b) finds the minimum between 225 the scalars a and b.
where α is the reg-386 ularization parameter chosen according to a certain crite-387 rion.The observation inversion in Eq. (9b) can be treated 388 as a special case of the Tikhonov regularization solution with 389 α = 0, while the concept of residual nudging is also applica-390 ble to the general cases with α = 0 following our deduction 391 in § 2.2 2 .In this sense, the state estimate of the EAKF-RN can be considered as a hybrid of the original EAKF estimate 393 and the (regularized) least-squares solution of the equation 394 H k x = y o k .This point of view opens up many other possibilities, given the various types of regularization techniques 396 408 and U = [U T my , 0 T (mx −my )my ] T , with Im x being the mx-409 dimensional identity matrix, 0 (mx −my )my the (mx − my) × 410 my zero matrix, and Um y a non-singular, upper-triangular, 411 my × my matrix in which all elements below the main di-412 agonal are zero.
where Qm x my is a matrix that is comprised of the first my columns of Q, and the inverse U −1 my of the upper-triangular matrix Um y can be computed element-417 by-element in a recursive way (called back substitution, 418 Meyer 2001, ch.5).In certain circumstances, further reduc-419 tion of computational cost and/or storage can be achieved, 420

4243
Numerical results in a linear scalar system 425 Here we use a scalar, first order autoregressive (AR1) 426 model driven by Gaussian white noise, to investigate the per-427 formance of the Kalman filter (KF,Kalman, 1960), and that 428 of the KF with residual nudging (KF-RN), in which residual 429 nudging is introduced to the posterior estimate of the KF in 430 the same way as in the EAKF.The motivation in conducting 431 this experiment is the following.With linear and Gaussian 432

434
Therefore, we use the KF estimate as the reference to ex-435 amine the behaviour of the KF-RN under different settings, 436 which reveals how residual nudging may affect the perfor-437 mance of the KF.
) 484 The average RMSE êk at time instant k over M repetitions 485 of the same experiment is thus defined as êk = M j=1 e j k /M 486 (M = 20 in our setting), where e j k denotes the RMSE at 487 time instant k in the jth repetition of the experiment.We 488 also define the time mean RMSE ê as the average of êk over 489 the assimilation time window with N integration steps, i.e., 490 ê = N i=1 êk /N (N = 10000 here).491 We also use the spread to measure the estimated un-492 certainty associated with an estimation.To this end, let Pk 493 be the estimated covariance matrix with respect to the es-494 timate xk .Then the spread s k at time instant k is defined 495 as 496 s k = trace( Pk )/mx.(14) 497

Fig. 1
Fig. 1 shows the time mean RMSEs of the KF-RN (dash-dot lines marked by diamonds), as functions of the noise level coefficient β, at different assimilation steps Sa.Given the different orders of magnitudes of β, we adopt the logarithmic scale for the x-axes.For comparison, we also plot the time mean RMSEs of the KF (solid lines) at each Sa.Since the time mean RMSEs of the KF are independent of the choice of β, they are horizontal lines in the plots.However, the choice of β does influence the performance of the KF-RN.As shown in all of the plots of Fig.1, if one adopts a small β, say β = 0.01, for the KF-RN, then the resulting time mean RMSE is higher than that of the KF.This is because such a choice may force the KF-RN to rely excessively on the observations when updating the prior estimates, such that the information contents in the prior estimates are largely ignored.As β grows, the time mean RMSE of the KF-RN decreases, and eventually converges to that of the KF when β is sufficiently large, say β 3.These results are consistent with our expectation of the behaviour of a filter equipped with residual nudging, as has been discussed in § 2.3.It is also of interest to gain some insights of the behaviour of the fraction coefficients c k in the KF-RN with different β.To this end, Fig. 2 plots two sample time series of c k in the KF-RN with β = 0.1 (upper left panel), and β = 1 (lower left panel), respectively, together with their corresponding histograms (right panels).For convenience of visualization, the assimilation time window is shortened to 1000 steps (with the observations assimilated for every 4 steps).At β = 0.1, c k tends to be relatively small, with the mean value being 0.4213 and the median 0.3027.Among the 250 c k values, 210 of them are less than 1, meaning that residual nudging is effective at those steps.A histogram of c k is also shown on the upper right panel.There it indicates that c k distributes like a U-shape, with relatively large proportions of c k taking values that are less than 0.2, or equal to 1. On the other hand, at β = 1, c k tends to remain close to 1, with the mean being 0.9892 and the median 1, and only 16 out of 250 c k values are less than 1.These are also manifested in the histogram on the lower right panel, where one can see that c k largely concentrate on 1.
that the KF-RN can achieve by varying the value of β at different Sa, together with the values of the β at which the minima are obtained for specific Sa.When Sa = 1, 2, the minimum time mean RMSEs of the KF-RN, both achieved at β = 2, are (very) slightly lower than the time mean RM-SEs of the KF; and the time mean RMSEs of the KF-RN become the same as those of the KF when β 3. On the other hand, when Sa = 4, 8, the minimum time mean RM-SEs of the KF-RN are identical to the time mean RMSEs of the KF, and are obtained when β 2. The reason that the KF-RN can have lower time mean RMSEs than the "op-561 timal" KF at Sa = 1, 2 might be the following.The clas-562 sic filtering theory states that the KF is optimal under the 563 minimum variance (MV) criterion (Jazwinski, 1970), that 564 is, taking the mean of the posterior conditional pdf as the 565 state estimate, the KF has the lowest possible expectation 566 of squared estimation error.Note that here the expectation 567 is taken over all possible values of the truth (i.e., by treating 568 the truth as a random variable).Therefore, in principle one 569 has to repeat the same experiment for a sufficiently large 570 number of times (with randomly drawn truth) in order to 571 verify the performance of the filters under the MV criterion.572 For computational convenience, though, we only repeat the 573 experiment 20 times.Thus in our opinion the slight out-574 performance of the KF-RN might be largely attributed to 575 statistical fluctuations.

618 x− 1
= x39, x0 = x40, and x41 = x1 in Eq. (15), and con-619 struct the state vector x ≡ [x1, x2, • • • , x40] T .ter divergence occurring in either filter, the performance of 732 the EAKF and the EAKF-RN is very close to each other, 733 with the time mean RMSEs of the EAKF-RN slightly lower 734 than those of the EAKF, except at (lc = 0.1, λ = 1.15) and 735 (lc = 0.1, λ = 1.25).The situation in the 1/4 observation is similar.As shown in Fig. 9 shows the time mean RMSEs of the normal Figs. 10 and 11 show the time mean RMSEs of the , and suggests conducting the experiment with respect to the KF in the AR1 model.Another reviewer points out the possible combinations of the EnKF and the regularization approaches in inverse problems.Luo acknowledges partial financial support from the Research Council of Norway and industrial partners, through the project "Transient well flow modelling and modern estimation techniques for accurate production allocation".

Figure 1 .Figure 2 .
Figure 1.Time mean RMSEs of the KF and the KF-RN as functions of the noise level coefficient in the AR1 model, with different Sa.

Figure 3 .
Figure 3.Time mean RMSEs of the normal EAKF and the EAKF-RN, as functions of inflation factor and half-width, in the full and 1/8 observation scenarios.
The rationale behind Eq. (8) is this: if and by the triangle true states (truth) {x k } 10000 452samples of dynamical noise from the distribution N (0, 1), 453 and adding them to x k to obtain x k+1 at the next inte-454 gration step, and so on.The synthetic observations y o k are 455 obtained by adding to model states x k samples of obser-456 vation noise from the distribution N (0, 1).For convenience 457 of comparison, we generate and store synthetic observations 458 at every integration step.However, we choose to assimilate 459 them for every Sa integration steps, with Sa ∈ {1, 2, 4, 8}, in 460 order to investigate the impact of Sa on filter performance.461 In doing so, data assimilation with different Sa, or other ex-462 periment settings (e.g., the noise level coefficient β in the 463 KF-RN), will have identical observations at the same inte-464 gration steps.For convenience, hereafter we may sometimes 465 use the concept "assimilation step", with one assimilation 466 step equal to Sa integration steps.In addition, we may also 467 call Sa the assimilation step when it causes no confusion.468 In the KF-RN, we also choose to vary the noise level 469 coefficient β, with β ∈ {0.01, 0.05, 0.1, 0.5, 1, 2, 3, 4, 6, 8, 10}, 470 in order to investigate its effect on filter performance.To 471 reduce statistical fluctuations, we repeat the experiment 20 472 times, each time with randomly drawn initial value, samples 473 of dynamical and observation noise (so that the truth and 474 the corresponding observations are produced at random).475 Except for the introduction of residual nudging, the KF-RN 476 have the same configurations and experiment settings as the 477 KF. 478 We use the average root mean squared error (average 479 RMSE) to measure the accuracy of a filter estimate.For 480 an mx-dimensional system, the RMSE e k of an estimate 481

Table 1
reports the time mean RMSEs and spreads of the KF at different assimilation steps Sa.The time mean RMSE of the KF grows as Sa increases, indicating that the Table 1 we do not present the time mean spreads of Luo and Hoteit, 2011)they are in fact identical to those of the 578 KF.This is because in the KF, the forecast and update of 579 the (estimated) covariance matrix of the system state are 580 not influenced by the mean estimate of the system state 581(Jazwinski, 1970).Since residual nudging only changes the 582 estimate of the system state (if necessary) and nothing else, 583 it is expected that the KF and KF-RN share the same covari-584 ance matrix.This point, however, is not necessarily true in 585 the context of ensemble filtering in a nonlinear system.For 586 instance, if the dynamical model is nonlinear, then the back-587 ground covariance at the next assimilation time is affected 588 by the analysis mean at the current time, such that two 589 analysis ensembles with different sample (analysis) means 590 but identical sample (analysis) covariance may result in dif-591 ferent sample (background) means and covariances at the 592 next assimilation time.593Theaboveresultssuggest that it may not be very mean-594 ingful to introduce residual nudging to a Bayesian filter that 595 already performs well.In practice, though, due to the ex-596 istence of various sources of uncertainties(Anderson, 2007;597Luo and Hoteit, 2011), a Bayesian filter is often sub-optimal, 598 and is even likely to suffer from divergence(Schlee et al.,   599   1967).In such circumstances, instead of only looking into 600 the accuracy of a filter, it may also be desirable to take 601 the stability of the filter into account.Through the experi-602 ments below we show that equipping the EAKF with resid-603 ual nudging can not only help improve its stability, but also 604 achieve a filter accuracy that is comparable to, sometimes 605 even (much) better than, that of the normal EAKF, espe-606 cially in the small ensemble scenario.
Table 3, the EAKF diverges in 17 out 749at the same place).750Wethenexaminetheimpact of residual nudging on the 751 time mean spreads of the filters in different observation sce-752 narios.For the full and 1/8 observation scenarios, we plot 753 the time mean spreads of the EAKF and the EAKF-RN in 754 Fig.4; while for the 1/2 and 1/4 observation scenarios, we 755 report them in Tables2 and 3, in the parentheses after the 756 RMSE values.In all the reported cases in which the EAKF 757 does not diverge, the time mean spreads of the EAKF-RN in 758 general do not significantly deviate from those of the EAKF.759 In cases that the EAKF does diverge, the EAKF-RN may 760 still maintain positive and finite time mean spreads.The 761 closeness of the time mean spreads of the EAKF and EAKF-762 RN in the former cases, though, may depend on the exper-763 iment settings, e.g., the choice of the noise level coefficient 764 β.However, from our experience, as long as β is reasonably 765 large (say β 2), the time mean spread of the EAKF-RN 766 often approaches that of the EAKF.For brevity, hereafter 767 we do not report the spread values any more.

Table 1 .
Time mean RMSEs and spreads of the KF, and the minimum time mean RMSEs (over different β) of the KF-RN, in the AR1 model with different Sa.The KF and KF-RN have identical time mean spreads, therefore only those of the KF are presented.In the bottom row we also report the ranges of β in which the minimum time mean RMSEs of the KF-RN are achieved.

Table 2 .
Time mean RMSEs (spreads) of the normal EAKF and the EAKF-RN in the 1/2 observation scenario, as functions of the covariance inflation factor and the half-width of covariance localization.

Table 3 .
As in Table2, except that it is in the 1/4 observation scenario.
Figure 12.Time mean RMSEs of the EAKF, as functions of the (possibly) mis-specified driving force F and the observation noise variance γ, in different observation scenarios.c0000 Tellus, 000, 000-000 30 BY X. LUO AND I. HOTEIT