A fundamental challenge in data assimilation lies in the computationally intractable process of estimating and updating the error covariance for high dimensional problems. Various methodologies, from ensemble-based methods to approaches of hybrid error covariance, have been extensively studied in the literature (Houtekamer and Zhang, 2016). Due to the limitation of computational power, the error covariance has to be estimated based upon a relatively small number of ensemble members. Rank deficiency, underestimation, and sampling errors are some notable drawbacks of these approaches. As a result, additional numerical treatments, such as covariance localisation and variance inflation, are required to improve the stability of algorithm and estimation accuracy. Some key parameters are critical for the effectiveness of these numerical treatments, for instance the radius of localisation and the magnitude of inflation. These parameters are estimated based on physical and statistical grounds and are validated through numerical experimentations (Bannister, 2008a, 2008b; Ménétrier et al., 2015).
In Gaspari and Cohn (1999), the magnitude and decay of error covariances are quantitatively outlined using correlation functions. The result has been widely used for the localisation of ensemble Kalman filters (EnKFs). Based on Schur product theorem, the element-wise multiplication of two positive definite matrices guarantees that the resulting symmetric matrix is a positive definite matrix itself. However, the underlying relationship between the parameter in a correlation function and the system model to be estimated is not clear. In this paper, we study some closely related subjects, the magnitude and decay of elements in error covariance, but taking a different approach. We derive and prove some quantitative characteristics of error covariance that are determined by various parts in a system model, including its dynamic model, the observation model and the covariances of random variables representing model uncertainties and observation error. The quantitative characteristics studied in this paper include the peak value and its location in the matrix, the decay rate from peak to bottom, the approximate sparsity pattern, etc.
The contributions of this paper consist of three parts. (i) We mathematically prove a theorem and several propositions on the quantitative characteristics of error covariance for linear Kalman filters. The results unveil some interconnections between system models and error covariance. (ii) We develop computational methods so that quantitative characteristics can be numerically computed. (iii) We demonstrate some potential applications of the discoveries through three examples. In the first example, we sketch the outline shape of error covariance using its peak and bottom value as well as the curve of a decay function. The method’s goal is not about replacing the need of estimating the error covariance in a process, such as EnKF or 4D-Var. Instead, the outline shape provides information that helps in the validation of an estimated error covariance if the full matrix is intractable or if its estimation is based on a relatively small ensemble size. In the second example, the quantitative characteristics of error covariance are used as an indicator of the upper bound for the localisation radius of an EnKF. In the third example, the results proved for linear systems are tested using a nonlinear model, the shallow water equations.
The theoretical approach in this paper is based on the use of the controllability Gramian of dynamical systems (see, for instance, Kailath (1980)). The concept has been widely used in control theory to measure the impact of a system’s inputs, such as the model uncertainty in this study, on the system’s trajectories. We prove that the error covariance of the Kalman-Bucy filter equals the difference between two controllability Gramians, in addition to the propagation of the initial background error covariance. This relationship leads to a matrix upper bound of the error covariance. Based upon the concept of duality in the control theory, each element in the upper bound matrix can be computed individually without the need of estimation of the entire error covariance matrix of a Kalman filter. This property makes the computational algorithms in this paper being component-based, i.e. the quantitative characteristics of error covariance can be computed over any given component in the matrix, such as a submatrix, a block of any shape, a row/column, etc. It is also proved that, if the observation measures a state variable, some elements in error covariance satisfy an inequality constraint.
In Section 2, the concept of controllability Gramian is introduced. The relationship between error covariance and controllability Gramian is proved. Then a matrix upper bound of error covariance is added. In Section 3.1, some quantitative characteristics of controllability Gramian, such as the location of peak covariance and the decay rate from peak to bottom, are proved. The tedious algebraic derivations in the proof are included in the Appendix at the end of the paper. In Section 3.2, an algebraic inequality of error covariance deduced from the observation model is proved. In Section 4, we introduce two computational algorithms that are component-based. Section 5 contains three examples demonstrating how to collectively apply the theorems and computational algorithms from previous sections to outline the features of the error covariance.
Error covariance and controllability Gramian
Shown in Figure 1 are two error covariance matrices of Kalman filters, one for a linear system of ordinary differential equations and the other for a Discretised shallow water equation. The x- and y- axes represent row and column indices, i and j; the z-axis represents the absolute value of the covariance, In this way, a matrix is visualised as a 3D graph. In this paper, the geometric outline of the 3D graph is referred as the shape of error covariance. Both graphs in Figure 1 have similar geometric patterns in which a large number of elements has relatively small value and the covariance has its peaks along one or multiple diagonal lines. Relative to their peak value, both matrices are approximately sparse. In this paper, the shape of error covariance is studied using a set of quantitative characteristics that include, but not limited to, the following list of quantifiable geometric properties:
- the peak value and its location,
- the decay rate from peak to bottom,
- inequality constraints of covariance elements.
For high dimensional problems, the true error covariance of a Kalman filter is computationally intractable. In mathematical analysis, a widely used idea in the estimation of an unknown variable is to find a computationally feasible upper/lower bound. In this section, we prove a relationship between the error covariance of Kalman-Bucy filters and two controllability Gramians. As a corollary of the relationship, a matrix upper bound of the error covariance is found. An advantage of this matrix upper bound is that the quantitative characteristics of its shape are computationally tractable. Let’s consider a dynamical system model defined by ordinary differential equations (ODEs)
The controllability Gramian of the system
The Gramian, a symmetric positive semidefinite matrix, is a quantitative measure of the sensitivity of system trajectories to the variation of the system’s input. If the Gramian has a set of large eigenvalues, then x(t) can reach any given state using a control input that has a relatively small L2-norm. The expressions in (3) and (4) are summarised in the following propositions.
The error covariance, P(t), of the Kalman-Busy filter (2) satisfies,
The error covariance, P(t), of the Kalman-Busy filter (2) has an upper bound matrix
Propositions 1 and 2 can be interpreted as follows. In addition to the propagation of P(0), P(t) is determined by the controllability Gramian of the system model subtracting the controllability Gramian of the Kalman filter. If we consider the model uncertainty, w, as a control variable, then its controllability Gramian represents the worst estimation error without any error correction. On the other hand, the controllability Gramian of the Kalman-Bucy filter represents the reduction of error covariance in the filtering process. A more controllable filter results in more aggressive error correction, thus smaller estimation error. Among the three terms in (9), we focus on the the second term, the controllability Gramian of the original system (1). However, the same computational algorithm that we use for the estimation of GC is also applicable to the propagation of initial error covariance, the first term in (9). It is summarised in Proposition 6. In this paper, we do not address the computation of the 3rd term in (9). Although this is a very important term that reduces the matrix upper bound using observation model, an efficient computational algorithm for this term in high dimensional spaces has yet to be found.
The inequality (10) has a discrete-time version. Consider the system model,
whereis the (discrete-time) controllability Gramian
If the space dimension is high, the controllability Gramian is computationally intractable. However, there exist numerical methods that can compute individual elements in the Gramian, i.e. component-based algorithms. They are introduced in Section 4, including Proposition 5 for linear systems, Proposition 6 for the propagation of the initial error covariance, and (34) for nonlinear systems.
Some quantitative characteristics of the shape of error covariance
In this section, we prove a decay rate for and a set of algebraic constraints for the error covariance of Kalman filters.
The decay rate of
Some quantitative characteristics of the shape of can be mathematically proved and numerically computed. The main result in this section is inspired by the approximate sparsity pattern of exponential matrices proved in Iserles (2000). The following function, is used repeatedly in the analysis,
Suppose that α and x are positive numbers. Thenis an increasing function ofand a decreasing function of. Furthermore
In the following, a matrix A is said to be banded with bandwidth if Aij = 0 whenever If s = 0, then the matrix is diagonal.
Suppose that A andin (1) are banded. For any t > 0, there exist positive constants, α, β, γ, and L so that
Note that the inequality (17) is an upper bound that approximates the decay of as increases. It implies that peaks around the diagonal in the matrix; and the value of decreases faster than exponential if moving away from the diagonal. In this theorem, it is assumed that A is banded along the main diagonal. If A consists of multiple banded submatrices, simulations show that the error covariance peaks along corresponding sub-diagonals. In the following, the right-hand side of the equation (17) is called the decay function.
The proof of this theorem involves lengthy and tedious algebraic derivations, which is given in Appendix at the end of the paper. In (17), implies that the result is meaningful only if where n is the dimension of the state space. In fact, the results in Iserles (2000), the theoretical foundation of Theorem 1, are based upon the assumption that the bandwidth of A satisfies Then L is much smaller than n. In Appendix, an analytic estimation of the parameters in (41), i.e. α, β, and γ is proved. However, the estimated value is conservative, which leads to a decay rate much slower than that of the true matrix. In Section 4, a numerical optimisation problem is formulated in (38) that allows to fit the optimal value of the parameters to the component-based estimate of the upper bound matrix (34). Showing in an example, the estimated decay function closely follows the true decay function of as increases.
Some covariance constraints deduced from observation model
An error covariance, as a matrix, satisfies some constraints such as symmetry and positive definiteness. In the following, we prove some covariance constraints resulting from the observation model. In this section, the study is focussed on the discrete-time system (11) and its Kalman Filter (12).
Suppose the diagonal elements ofare bounded by a positive number Pmax, i.e.
Different from the matrix upper bound in Theorem 1, the algebraic inequalities in Proposition 4 provide upper bounds for the individual elements in an error covariance. For example, if the sensor that measures the ith state variable, is very accurate, then Rii is very small. In this case, (20) is a tight upper bound for
We prove (20) first. Without loss of generality, let i = 1. In this case, the first state variable, is measured, i.e.
To prove (19), let’s focus on the first row of Hk. The proof for other rows is similar. Because is not a zero vector, we assume (if it is zero, we can re-arrange the index to move a nonzero element to the first position). Define an invertible matrix
Using T as a transformation we define a new state variable
The results in previous sections describe various aspects of error covariance. When applied collectively these results together give a global picture of the shape of error covariance. To summarise, we have proved the following quantitative characteristics of the shape of P(t):
- Upper bound
- Decay rate from peak
- Covariance constraints deduced from observation
Or in general
Computing the elements in
The observability Gramian of the system
The controllability Gramian has the same dimension as the error covariance. Therefore propagating the entire matrix along a trajectory is intractable for problems that have high dimensions. However, the Gramian matrix can be computed element-by-element. In control theory, given a system with state variable x and control input u
Consider (1a) and its dual system (31).
(i) The controllability Gramian of (1a) inequals the observability Gramian of (31) at t, which is
(ii) Let ei represent the ith basis vector of. Letandbe the solutions of (31) with final statesand. Then
The formula (33) implies that individual elements in GC can be computed without the need to evaluate the entire matrix, provided that the dual system, or the adjoint model, can be numerically propagated reversely in time. Integrating the adjoint model twice, and one can compute three elements in the matrix, and
Based on Definitions 1 - 2, Part (i) is obvious. To prove (ii), Note that where ei denotes the ith basis vector. Therefore, the ith row and jth column of is the integration of the inner product of and Therefore, (33) holds, where □
The idea used in Proposition 5 is also applicable to the propagation of initial error covariance.
Suppose B0 is a matrix satisfying
Decay rate approximation
In the next, we introduce a method of approximating the parameters, in the decay function (27). Their value given in the proof of Theorem 1 (Appendix) is conservative, which results in a decay rate that is much slower than that of GC. If one can compute the value of a subset of elements in which is possible as proved in Proposition 5, then we can find by using curve fitting. More specifically, assume that the set of elements
Because the elements in GC can be computed individually using (33) or its Discretisation (34), one can focus on a given area or a block in a matrix to find its shape without the need to know the covariance value outside the block, thus the term ‘component-based.’ Three examples are given in Section 5. The main computational steps are summarised as follows.
- Partition a large matrix into blocks shown in Figure 2. The elements in a matrix are represented by ‘*’ in the plot. A partition of the matrix consists of blocks of elements, which are represented by shaded areas with different colours. The partition is shown in the figure for the upper half of the matrix. The lower half has the same value due to the symmetry of error covariance.
- The peak value in each block is the average of its diagonal. The computation of diagonal elements is based on (34), the Discretisation of (33).
- Decay function is estimated using (38). To apply this formula of curve fitting, a set of elements in GC (represented by circles in Figure 2) is computed using (34).
Note that the results in Section 3.2, i.e. covariance constraints (28) deduced from observation model, are not included in these steps. The constraints can be applied to individual rows/columns separately in addition to the computed decay rate. It tends to be a more accurate upper bound but applicable to only special rows determined by the observation model.
In the following, we give three examples in which the quantitative characteristics in Section 3 are computed to approximate the upper bound matrix of the error covariance. The outline shapes of the upper bound matrix and the true error covariance are shown in plots. In the first example we demonstrate the computational algorithm. In the second example, the approximated outline shape of an upper bound matrix is used as an indicator of upper bound in the determination of localisation radius. The third example is a case study in which we explore the application of the results proved in previous sections to a nonlinear system that models a tsunami wave using the shallow water equations.
The illustrative example has n = 150 state variables. This dimension is chosen because it is small enough so that the Kalman filter is computationally tractable; and it is large enough so that one can demonstrate the sparsity of error covariance. It is defined in the form of differential equation (1). The matrices are partially shown in (39). The A matrix is banded whose bandwidth is s = 8. For many methods of PDE Discretisation, the resulting ODE system has a bandwidth less than s = 8. To generate the elements in A, we first form a matrix using random numbers with the uniform distribution in Then we add a diagonal matrix to it, where a is chosen to make sure the system is stable, i.e. placing the real parts of all eigenvalues less than zero. Assuming that model uncertainty has its impact in a local area only, the matrix q is also banded in which s = 4; thus the bandwidth of is 8. The matrix associated with observation error is r. We assume a sensor sampling rate dt = 0.125. The elements of r in the Discretised model are random numbers uniformly distributed in This range is about 2% of the standard deviation of the initial state. The observation measures x1, x50, x100, and x150. The initial error covariance is i.e. the only known information about the background error is the model uncertainty. In the simulations, the final time is tf = 10.
The σQ value above is equivalent to continuous-time white noise of model uncertainty with a standard deviation The component-based computational method makes it possible to focus on any given area in the covariance matrix. In this example we divide the matrix into ten blocks. Each block is a strip of rows above the diagonal of the matrix (Figure 2). The goal is to approximate the shape of the error covariance in each block. The number of blocks in the partition depends on various factors. Basically, increasing the number of blocks refines the approximation with the cost of higher computational load. As the first step, we apply the inequalities (28) deduced from the observation model to this problem. More specifically, we have
In the next step, we approximate the peak value of each block. We use the peak value of GC as an approximation because the impact of the initial error decays rapidly for the stable system. The computation is based on (34), the Discretisation of (33). Then the average value is used to represent the peak value over each block. This is shown in Figure 4.
Finally, we estimate the decay rate by finding the decay function of the matrix upper bound. For this purpose, it is necessary to evaluate a subset of elements for each component. As shown in Figure 2, five diagonals are evaluated for each block. Then, the solution of (38) based on the computed determines the parameter value in the decay function The computations are based on the absolute value of entries in GC. The approximated shape of error covariance is shown in Figure 5 (left). For comparison, the plot of the matrix upper bound is also shown in Figure 5 (right). The approximate shape captures the the key feature, such as peak and decay, very well.
For a more detailed comparison, the average value along the main diagonal as well as all other diagonals are shown in Figure 6 for every block. All computations are based on the absolute value of entries in covariance matrices. There are a total of ten blocks. The gap between the true error covariance and its matrix upper bound varies. Shown in Figure 7 are two enlarged plots from Figure 6. The plot on the left shows that the matrix upper bound (red) is quite close to the true error covariance (blue); and the approximated decay curve (green) is very close to both. In this block, the approximated shape is very close to the true shape of the error covariance. However, this is not always the case. The plot on the right shows a block in which the Kalman filter is very effective in making error corrections. As a result, the true error covariance is much smaller than the matrix upper bound. In this block, the matrix upper bound over estimates the true error covariance by a large amount. For improvement, one needs to compute the third term in (9), a problem that cannot be solved using the results in this paper. To summarise, the matrices of error covariance and its upper bound are computationally intractable for high dimensional problems. However, using the component-based algorithm, the decay function of the error covariance upper bound can be numerically computed over a given region.
In applications of EnKFs, finding a localisation radius is essential to the filter’s accuracy. Because an upper bound matrix decays exponentially when increases (Theorem 1), this upper bound provides valuable information about the localisation radius. The matrices, A, q, and r in this example are generated in the same way as in Example 1 with some parameter value changes. More specifically, σQ and r in (39) are assigned the following value
The value of c determines the localisation radius. The curves in Figure 8 are three correlation functions for c = 7, 15, 30.
In the simulation, we first apply the localisation with the largest radius, c = 30. Following the same approach in Example 1, we divide the covariance matrix into ten blocks. Each block is a strip of rows above the diagonal of the matrix (Figure 2). The matrix upper bound over each block at tf = 10 is computed and then compared to the average value of the EnKF error covariance along diagonals shown in Figure 2 (taking absolute value before averaging). In Figure 9, the horizontal axis in each plot represents the indices of diagonals, where the middle point is the main diagonal. The vertical axis of each plot represents the average value of the elements of error covariance along diagonals. The blue curve is the error covariance of an EnKF. The red curve is the matrix upper bound of error covariance. For problems with high dimensions, both matrices are computationally intractable. In this case, what can be computed is the approximated decay function. It is shown as the green curve in the plots. In Figure 9, Plot(1,1) (the plot located in the first row first column), Plot(1,2), Plot(2,3) and Plot(2,4) show that the average value of error covariance of the EnKF is much larger than the decay function, implying that the localisation radius is too large. Figure 10 shows the state and its EnKF estimation at t = tf, where The estimation missed all peaks located in the middle part of the plot. The relative error is greater than 0.51.
Using the correlation function with c = 15, the radius of localisation is much smaller. The computation shows that the average value of error covariance is below or close to the decay function in eight of the ten blocks. However, there are two blocks where the average value of error covariance is still much larger when it is compared to the decay function. Comparing to the previous case of c = 30, the estimated value of the final state, follows the peaks of the truth, although the filter overestimates the value at some points. The overall relative error is reduced to 0.47.
If c = 7, the localisation has the smallest radius among the three correlation functions in Figure 8. In Figure 11, all blue curves are below or close to the green line, implying that the radius of localisation is bounded by the decay function. The final state and its estimation is shown in Figure 12. It is obviously improved relative to the previous cases when the location radius is too large. In fact, the relative estimation error is less than 0.25.
In this example, the decay function provides an indicator of upper bound in the determination of localisation radius. From (9), the radius of localisation should be the radius of GC with a correction due to the term (for the simplicity in argument we assume that the impact of the initial error covariance is small). This example shows that such correction reduces the radius. Elements of the error covariance matrix resulting from a sound localisation radius are expected to lie under the estimate of the matrix upper bound.
Example 3 - the shallow water equations
The theorems proved in this paper are based on a common assumption: the system is linear. Generalising the theory to nonlinear systems is beyond the scope of this paper. Instead, the following is a case study in which we explore the application of the results proved in previous sections to a nonlinear system. It is an example adopted from Deleanu and Dumitrache (2019) in which a tsunami wave is simulated using the shallow water equations. Illustrated in Figure 13, the water wave travels a horizontal distance of m. It approaches the shore on a variable seabed depth that has a constant slope The wave propagation is modelled using the shallow water equations
The boundary conditions are
An error covariance in full dimension, 8002 × 8002, is computed using unscented Kalman Filter (UKF) (Julier et al., 2000; Julier and Uhlmann, 2004). The model uncertainty and the observation noise are introduced in the same way as that in (1),
The goal of this example is to explore if the results proved for linear systems are applicable to a nonlinear system such as the shallow water equations. We first check the results in Proposition 2 and Theorem 1. In other words, we want to check if the controllability Gramin GC provides a matrix upper bound; and if it decays at the rate that is claimed in Theorem 1. From Figure 17, the average value of (red) is located on top of the average Pij (blue). It implies that GC is an upper bound of the error covariance. A decay function (green curve) is plotted as an approximation of GC. It is computed based on four diagonals of GC in the block. For this nonlinear example, it shows that the decay curve agrees with GC (red) around the diagonal when the covariance is high. However, different from the linear examples, GC approaches zero slowly when the location is moving away from the diagonal. Nevertheless, the decay curve approaches zero anyway. As a result, the decay curve serves as a better estimate of the error covariance than GC. It is unclear if this is a general phenomenon or just a special case, a topic for further research.
In the next, we verify the results in Proposition 4, more specifically the inequalities in (20). It is a set of upper bounds determined by the observation model. As an example, we focus on the row i = 2001 and i = 2002 in The corresponding state variables, xi, are directly measured by a sensor. In Figure 18, the green line on top is the constant on the right side of the inequality (20b). The blue curve below it is the value of elements in the ith row of (only the columns are shown here). Clearly, the inequality (20b) holds for this nonlinear example. The red mark at j = 2001 in the left plot and j = 2002 in the right plot in Figure 18 is Rii, the right side of the inequality (20a). It is below the blue curve, which implies that Rii is not an upper bound of Therefore, (20a) does not hold for this nonlinear example. The difference is for i = 2001 and 0.0076 for i = 2002. In fact, this difference is relatively insignificant because the covariance at grid points without a sensor has much larger value. For instance, the row adjacent to i = 2002 in is shown in Figure 19 (green curve). Relative to it, the ith row (blue curve) is so small the different between Rii and is negligible. In other words, even though the inequality (20a) does not hold, Rii still provides a good approximation of with a small relative error. The observation model in this example is linear. We speculate that the reason why the inequality does not hold is because of the nonlinear dynamics of the system. The deep investigation of this behaviour is the subject of an on-going study.
Several quantitative characteristics are introduced and studied that can be used to outline the shape of error covariance as a 3D graph. The proved quantitative characteristics include matrix upper bounds of error covariance, the decay rate from peak to bottom of an upper bound matrix, and some inequality constraints of error covariance deduced from observation models. The concept of controllability Gramian and its computational methods play an important role in several parts of the study. These unveiled interconnections between error covariance and system models are new discoveries in the literature of Kalman filters. Although the computation of error covariance is generally intractable for high dimensional systems, computational methods developed in this paper can numerically compute the quantitative characteristics with a limited computational load. Some ideas of using the results to improve Kalman filter applications are explored in three examples, in which more questions are raised for future research than that are solved. For instance, the quantitative characteristics are collectively, but not interactively, applied. Specifically, the computational algorithm of the decay function does not use the constraints deduced from the observation model. The constraints are applied to the approximated shape of error covariance after the decay function is computed. In addition, the controllability Gramian of the Kalman-Bucy filter in (9) is a correction term. Based on the observation model, this term reduces the error of the matrix upper bound. As a problem for future research, an integration of these algorithms and constraints may lead us to computational methods of approximating error covariance with improved accuracy.