A- A+
Alt. Display

# A dilemma of the uniqueness of weather and climate model closure parameters

## Abstract

Parameterisation schemes of subgrid-scale physical processes in atmospheric models contain so-called closure parameters. Their precise values are not generally known; thus, they are subject to fine-tuning for achieving optimal model performance. In this article, we show that there is a dilemma concerning the optimal parameter values: an identical prediction model formulation can have two different optimal closure parameter value settings depending on the level of approximations made in the data assimilation component of the prediction system. This result tends to indicate that the prediction model re-tuning in large-scale systems is not only needed when the prediction model undergoes a major change, but also when the data assimilation component is updated. Moreover, we advocate an accurate albeit expensive method based on so-called filter likelihood for the closure parameter estimation that is applicable in fine-tuning of both prediction model and data assimilation system parameters. In this article, we use a modified Lorenz-95 system as a prediction model and extended Kalman filter and ensemble adjustment Kalman filter for data assimilation. With this setup, we can compute the filter likelihood for the chosen parameters using the output of the two versions of the Kalman filter and apply a Markov chain Monte Carlo algorithm to explore the parameter posterior distributions.

Keywords:
How to Cite: Hakkarainen, J., Solonen, A., Ilin, A., Susiluoto, J., Laine, M., Haario, H. and Järvinen, H., 2013. A dilemma of the uniqueness of weather and climate model closure parameters. Tellus A: Dynamic Meteorology and Oceanography, 65(1), p.20147. DOI: http://doi.org/10.3402/tellusa.v65i0.20147
Published on 01 Dec 2013
Accepted on 19 Mar 2013            Submitted on 21 Nov 2012

## 1. Introduction

Long-term improvements in atmospheric general circulation models (GCMs) used in numerical weather prediction and climate simulations originate from gradually refined representations of atmospheric phenomena, especially of the subgrid-scale physical processes. In these models, the processes are represented by parameterisation schemes where the subgrid-scale variability is expressed using model variables of the resolved scales. Moreover, the schemes contain so-called closure parameters. The purpose of these model parameters is to encapsulate some atmospheric processes or properties which are not affordable for explicit modelling. The development process of the physical parameterisation schemes involves basically two steps: derivation of individual parameterisation schemes, and constraining of the schemes with large-scale observations (Lohmann et al., 2007). Here, large-scale observations means observations that are available for model tuning at the GCM scale. In the first step, dedicated laboratory measurements or focused field measurements campaigns are used, if a scheme cannot be derived from the first principles. The second step accounts for the fact that individual schemes are often valid for small scales, while in the context of GCMs, the schemes are applied in a relatively coarse resolution over a wide range of spatial and temporal scales. Thus, the schemes of interest have to be tuned such that the model indeed simulates the key atmospheric quantities within the uncertainty of some large-scale observational constraint. In practice, the closure (or, tuning) parameters provide the necessary degrees-of-freedom so that a realistic model response can be obtained.

In GCMs, a comprehensive set of physical parameterisation schemes are assembled together and embedded in the solver of the atmospheric dynamics. Time-space truncation and numerical approximations render, however, all model components to some extent imperfect. These model imperfections lead to systematic and random simulation errors. The systematic errors appear as a climate drift in the model simulations. The geographical patterns of these errors tend to develop very early in the simulation (Jung, 2005), and the so-called systematic initial tendency errors can be used for diagnosis and attribution of model errors to some particular atmospheric processes (Klinker and Sardeshmukh, 1992). In fact, there are attempts to estimate the initial tendency model errors and thus parameterise unrepresented physical phenomena in order to reduce systematic model errors (Kaas et al., 1999; D'Andrea and Vautard, 2000). Clearly, in the model simulations, there are multitudes of errors with different sources and the errors are in mutual non-linear interaction. Thus, it is not a surprise that it is very hard to improve the general model performance by improving the realism of representation of individual physical parameterisations (Jakob, 2010). Therefore, for a given model formulation, it is natural to strive towards optimal predictive skill by tuning the available degrees-of-freedom, that is, the closure parameter values. The estimation techniques range from trial-and-error to sophisticated parameter estimation techniques (e.g. Severijns and Hazeleger, 2005; Kunz et al., 2008; Järvinen et al., 2010; Neelin et al., 2010). The closure parameter values corresponding to the optimal predictive skill naturally depend on the level of approximation of the prediction system.

There is thus a dilemma on the uniqueness of weather and climate model closure parameters: one can either anchor the parameter values to the observable truth obtained, for example, via measurements from focused field campaigns, or tune the parameter values to account for the imperfections in the prediction system. It is qualitatively obvious that these two do not necessarily coincide and a practical optimum is a compromise between the two. In this article, we demonstrate that this dilemma exists. An outline of the demonstration is as follows. We use a low-order prediction system where synthetic observations are created using a modified version of the Lorenz-95 model (Lorenz, 1995; Wilks, 2005). As a prediction model, we use the standard Lorenz-95 model where the subgrid-scale effects on resolved scales are represented by a linear bulk parameterisation containing two closure parameters. Thus, the prediction model is imperfect. Data assimilation of the synthetic observations is performed using either extended Kalman filter (EKF) (Kalman, 1960) or ensemble adjustment Kalman filter (EAKF) (Anderson, 2001). We observe that the optimal prediction model corresponds to a unique but different parameter setting depending on the choice of the data assimilation component. The conclusion is that the dilemma exists.

Since the optimal closure parameter values depend on the implementation of the prediction system, such as the choice of the data assimilation algorithm, methodology for tuning the parameters specifically for each prediction system is needed. Here, we present such a method, using a combination of the so-called filter likelihood approach, where the likelihood is formulated based on the output of the filtering methods (Singer, 2002; Hakkarainen et al., 2012) and a Markov chain Monte Carlo algorithm (Haario et al., 2001, MCMC). We show that the approach yields parameter values that are close to the ones corresponding to optimal filter accuracy. In addition, we demonstrate how the approach can be implemented via the Data Assimilation Research Testbed software environment (Anderson et al., 2009, DART).

## 2. Methods and experimentation

In this section, an introduction to parameter estimation using the filter likelihood approach is given. It is then tested using the modified Lorenz-95 model.

### 2.1. On state estimation methods

One of the most common techniques for state estimation is Kalman filtering (Kalman, 1960). The basic idea in all filters is to obtain the posterior state estimate ${x}_{k}^{\text{est}}$ and some error statistics, typically the error covariance matrix ${C}_{k}^{\text{est}}$, given the prior information ${x}_{k}^{\text{p}}$ and the observations yk at time step k.

The posterior state estimate can be transported to the next filtering time step to become the new prior state by using the forecast model. For the state vector, this is usually straightforward, but the fundamental question in filtering is how to transport the uncertainty in time. In EKF, this problem is solved by calculating the prior covariance matrix ${C}_{k}^{p}$ at time k as

(1 )
${C}_{k}^{p}={M}_{k}{C}_{k-1}^{\text{est}}{M}_{k}^{\text{T}}+{Q}_{k},$

where Qk is the model error covariance matrix and Mk is the tangent-linear forecast model. If the dimension of the state space is high, such as in large-scale atmospheric models, eq. (1) becomes impossible to compute without approximations.

One approximative approach is ensemble filtering, where the uncertainty is transported to the next filtering time step by transporting an ensemble of states using the forecast model. The difficulty (and variety) of the ensemble methods lies in how to update the prior ensemble in the analysis step. There are many alternatives, for instance the ensemble adjustment Kalman filter (Anderson, 2001, 2010), which belongs to the family of the so-called ensemble square root filters (Tippett et al., 2003). In this study, the EAKF was chosen because it is a deterministic ensemble based filter, that is, it does not include randomisation of the ensembles. The non-randomness is important, since in Section 2.3 we want to create a likelihood function based on the filtering outputs and the randomness in the likelihood would complicate the parameter inference (Dowd, 2011; Hakkarainen et al., 2012). Also, EAKF was chosen because it is conveniently available in the data assimilation research testbed (Anderson et al., 2009, DART) and widely applied (e.g. Schwartz et al., 2011; Torn and Davis, 2012).

In EKF, the modelling errors of the forecast model are taken into account via the error covariance matrix Qk. In EAKF (and in ensemble square root filtering in general), the neglect of unknown model errors can lead to over-confidence in the prior state and hence ignorance of the observations in the analysis, which can lead to filter divergence. This is effectively circumvented by introducing covariance inflation factors (Anderson and Anderson, 1999). In addition, the sampling errors related to the limited ensemble size (undersampling), are mostly removed by localisation (Hamill et al., 2001), where observations affect only nearby grid points.

### 2.2. On parameter estimation

Parameter estimation techniques developed for the atmospheric models can be divided in two categories: online and offline methods. In online methods, it is assumed that model parameters are not static quantities, but can evolve adaptively, for example, as a part of a state estimation process (e.g. Annan et al., 2005) and are thus computationally relatively cheap to implement. Typically, a statistical interpretation is lacking in online methods, since the rate of change of the parameters is controlled by the user. In this article, these online methods are not discussed further. Instead we use offline methodology, where a predetermined (training) set of observations y is used for tuning the parameter values.

In Bayesian methodology (e.g. Gelman et al., 2003), the knowledge about the unknown parameters is inferred from the posterior distribution:

(2 )
$p\left(\theta \mid y\right)\propto p\left(\theta \right)p\left(y\mid \theta \right),$

which is evaluated using the prior p(θ) and the likelihood p(y|θ). The prior contains the information that we have about the parameters based on the accumulated information from the past. The likelihood function specifies how plausible the observed data are given model parameter values. Therefore, defining a proper likelihood function is the central question in parameter estimation.

The parameter estimation problem is often written in a (cost) functional form with parameters as arguments. Different numerical methods can be used for optimising the cost function and point estimates, such as the Maximum a Posteriori (MAP) estimate, can be obtained. In addition, Monte Carlo methods like MCMC can be used for producing samples from the posterior distribution instead of a single point estimate, and studying the uncertainty in the parameter values. For large-scale applications, applying MCMC is challenging, since the methods can involve thousands of repeated likelihood evaluations, but ways to improve the efficiency of MCMC for high-CPU models have been recently developed (Solonen et al., 2012).

### 2.3. Likelihood via filtering methods

In this section, the filter likelihood concept is discussed in the context of the extended Kalman filter and the ensemble adjustment Kalman filter. The approach is known in the parameter estimation of stochastic models (e.g. Singer, 2002), but less studied in connection with deterministic, chaotic systems (Hakkarainen et al., 2012). For the sake of completeness, the derivation of the likelihood computation is briefly reviewed in the Appendix.

When EKF is considered, the filter likelihood formula, i.e., the likelihood for observing y1:n given the parameters θ, can be written as

(3 )
$p\left({y}_{1:n}\mid \theta \right)=p\left({y}_{1}\mid \theta \right)\prod _{k=2}^{n}p\left({y}_{k}\mid {y}_{1:k-1},\theta \right)= \prod _{k=1}^{n}exp\left(-\frac{1}{2}{r}_{k}^{T}\left({C}_{k}^{y}\right)-1{r}_{k}\right)\left(2\pi \right)-d/2\mid {C}_{k}^{y}{\mid }^{-1/2}\propto exp\left(-\frac{1}{2}\sum _{k=1}^{n}\underset{={J}_{k}}{\underset{\mid }{\left[{r}_{k}^{T}{\left({C}_{k}^{y}\right)}^{-1}{r}_{k}+log\mid {C}_{k}^{y}\mid \right]}}\right),$

where ${r}_{k}={y}_{k}-\mathrm{ℋ}\left({x}_{k}^{p}\right)$ is the innovation vector and ${C}_{k}^{y}={H}_{k}{C}_{k}^{p}{H}_{k}^{T}+{R}_{k}$ is its error covariance matrix at time k. Operator $\mid ·\mid$ denotes the matrix determinant. In filtering, $\mathrm{ℋ}$ is the observation operator that maps from the state space to the observation space and H is its linearization. The prior state ${x}_{k}^{p}$ is given by the model, the prior covariance matrix ${C}_{k}^{p}$ is given by the Kalman filtering formulas [eq. (1)] and Rk is the observation error covariance matrix.

Essentially, the filter likelihood formula [eq. (3)] depends on the summation over the cost functions Jk, where the normalising term $log\mid {C}_{k}^{y}\mid$ has to be accounted for, because it implicitly depends on the parameters θ. The dependence comes from eq. (1) via the tangent-linear forecast model. It should be noted that eq. (3) itself does not require that ${x}_{k}^{p}$ and its covariance matrix ${C}_{k}^{p}$ are calculated using EKF. In principle, any state estimation method that produces these prior estimates could be used.

In EAKF, as implemented in the DART system, the discrepancy between the prior ensemble and the observations are calculated in the observation space for every single observation sequentially. The update of the ensemble and the weights are also calculated sequentially in the observation space. Thus, it is natural to adopt this approach also in the filter likelihood calculations. The cost function part of the filter likelihood formula [eq. (3)] can be calculated sequentially as a sum

(4 )
${J}_{k}=\sum _{l=1}^{L}\frac{{\left({y}_{l}-\overline{{z}_{l}^{p}}\right)}^{2}}{\left({\sigma }_{{y}_{l}}^{2}+{\sigma }_{{z}_{l}^{p}}^{2}\right)}+log\left({\sigma }_{{y}_{l}}^{2}+{\sigma }_{{z}_{l}^{p}}^{2}\right),$

where yl is an individual observation and $\overline{{z}_{l}^{p}}$ is the individual prior ensemble mean in observation space and ${\sigma }_{{y}_{l}}^{2}$ and ${\sigma }_{{z}_{l}^{p}}^{2}$ are their variances, respectively. The prior ensemble mean $\overline{{z}_{l}^{p}}$ depends implicitly on yl−1, because of the sequential update. The summation goes through all the individual observations y1:L,k at time k. The DART system does not provide the above computation, but including these in the DART code can be easily implemented.

### 2.4. Model and data

To explore the filter likelihood calculation, a modified version of the Lorenz-95 system is used (Lorenz, 1995; Wilks, 2005). The full system consists of two interrelated variables X and Y, and it is written as

(5 )
$\frac{d{X}_{i}}{dt}=-{X}_{i-1}\left({X}_{i-2}-{X}_{i+1}\right)-{X}_{i}+F-\frac{hc}{b}\sum _{j=J\left(i-1\right)+1}^{Ji}{Y}_{j}$
(6 )
$\frac{d{Y}_{j}}{dt}=-cb{Y}_{j+1}\left({Y}_{j+2}-{Y}_{j-1}\right)-c{Y}_{j}+\frac{c}{b}{F}_{Y}+\frac{hc}{b}{X}_{1+⌊\frac{j-1}{J}⌋}$

where i=1,…, I and j=1,…, JI. That is, each of the ‘slow’ state variables Xi are forced by a sum of the additional fast variables Yj. The fast variables have dynamics similar to the slow variables, but they are also coupled with the slow variables. In the model, cyclic boundary conditions, i.e., XI+1=X1 and YJI+1=Y1, are used. In this paper, values I=40, J=8, F=FY=10, h=1 and c=b=10, adopted from Leutbecher (2010), are used.

The parametrized forecast model for the full system reads

(7 )
$\frac{d{X}_{i}}{dt}=-{X}_{i-1}\left({X}_{i-2}-{X}_{i+1}\right)-{X}_{i}+F-g\left({X}_{i},\theta \right),$

where $g\left({X}_{i},\theta \right)={\theta }_{0}+{\theta }_{1}{X}_{i}$ is the parameterization in which the effect of the missing ‘sub-grid’ scale fast variables Yj are modeled using only the single, ‘resolved’, local variable Xi.

The aim of the experiment is to tune the parameters θ=(θ0, θ1) given a synthetic set of observations y1:n, computed by the full set of equations in (5) and (6) with additive noise, using the filter likelihood technique. In the experiment, one filtering step (‘24 h’) is 8 model integration steps (‘3 h’). Here the number of assimilation steps n=100 (‘days’), and hence during each run the model is integrated 100×8 times. The experiment is adopted from Hakkarainen et al. (2012) and a more comprehensive introduction is given there.

### 2.5. Experiment setup

In the experiment, the parameter values are sampled from the posterior distribution {consisting of filter likelihood [eq. (3)] and uninformative prior} using an adaptive Metropolis algorithm (Haario et al., 2001). The chain length is selected to be 3000 and the first 500 samples (the burn-in period) are discarded. The filter likelihood MCMC experimentation is synthesised with the following pseudo-algorithm, where the superscript m on θ denotes MCMC chain index:

Step 0:

Initialise the MCMC run and select θ1.

Step 1:

Propose a new candidate $\stackrel{ˆ}{\theta }$ for the parameters.

Step 2:

Run the state estimation system using $\stackrel{ˆ}{\theta }$ and evaluate formula [eq. (3)] during the run.

Step 3:

Accept $\stackrel{ˆ}{\theta }$ with probability $min\left(1,\frac{p\left({y}_{1:n}\mid \stackrel{̂}{\theta }\right)p\left(\stackrel{̂}{\theta }\right)}{p\left({y}_{1:n}\mid {\theta }^{m}\right)p\left({\theta }^{m}\right)}\right)$.

Step 4:

If $\stackrel{ˆ}{\theta }$ is accepted set ${\theta }^{m+1}=\stackrel{ˆ}{\theta }$, else set ${\theta }^{m+1}={\theta }^{m}$.

Step 5:

Set m=m+1 and go to step 1 until m is equal to the chain length.

In our implementation, we use an uninformative (flat) prior, which means that the prior is a constant, p(θ) α1. Then, the prior term cancels out in Step 3 of the above algorithm, since $p\left(\stackrel{̂}{\theta }\right)/p\left({\theta }^{m}\right)=1$.

For readers not familiar with MCMC, a short introduction is in order. The MCMC algorithm works by generating candidate values from a proposal distribution, which in our case is Gaussian centred at the current parameter value. The candidates are then either accepted or rejected according to a simple rule that contains the ratio of the posterior densities at the proposed point and at the current point, see Step 3 of the above algorithm. One can see that moves ‘upward’ (to a point with higher posterior density) are always accepted. However, also moves downward can be accepted, and the resulting random walk ends up exploring the posterior distribution without converging to a single point. The algorithm is a standard tool in statistical analysis and can be shown to yield samples from the correct target density. Here, we apply an adaptive version of the algorithm Haario et al. (2001), where the proposal distribution is automatically tuned to better match the posterior distribution. For more details on MCMC, see Robert and Casella (2005).

Using the filter likelihood [eq. (3)] for MCMC, we are able to estimate both the model parameters θ and the filter-specific tuning parameters. In the experiments, for the EKF version, the model error covariance matrix is parameterised as follows

(8 )
$Q=exp\left(\lambda \right)I,$

where I is an identity matrix. The parameter λ is tuned together with the model parameters θ1 and θ1. In the EAKF version of the filter likelihood, the prior covariance inflation factor and the cut-off radius related to localisation (Gaspari and Cohn, 1999), defined in the DART setup, are tuned together with the model parameters θ.

As a validation metric for the parameters, we use the average root mean squared error (RMSE) defined as

(9 )
$r\left(\theta \right)=\frac{1}{n}\sum _{k=1}^{n}\sqrt{\frac{1}{I}\sum _{i=1}^{I}\left({x}_{i,k}^{\text{est}}\left(\theta \right)-{x}_{i,k}^{\text{true}}\right)2}$

where ${x}_{k}^{\text{true}}$ is the true state and ${x}_{k}^{\text{est}}$ is the posterior estimate. We evaluate RMSE on a fixed grid for θ1 and θ2 keeping the filter-specific parameters fixed.

### 2.6. Results

The experiments are started by a step, where all parameters (i.e. filter parameters and the model parameters) are estimated together with MCMC. In Fig. 1, pairwise MCMC samples using EKF (left panel) and EAKF (right panel) are shown. It can be seen, that the marginal distributions are rather Gaussian and all parameters are identified. Based on the mean values of the MCMC chains, values −4.8824, 1.4310 and 0.1889 were chosen for λ, the prior inflation factor and the cut-off radius, respectively.

In the second step, to further underline the difference in the model parameter estimates in the usual situation where the filter parameters are fixed, we calculate the conditional θ distributions using MCMC by fixing the filter parameters to the mean values found in the previous step. For the validation purposes, the RMS error [eq. (9)] is evaluated on a fixed grid to explore the parameter space for ‘all’ possible pairs of θ0 and θ1.

The results are illustrated in Fig. 2, where the 50% and 95% probability contours calculated from the EKF and EAKF MCMC chains are plotted using solid and dashed lines, respectively. The colours on the background indicate the average RMSE values [eq. (9)] for EKF (left panel) and EAKF (right panel). We observe that the closure parameter values that correspond to the lowest average RMSE values for EKF and EAKF are not the same. This implies that the optimal parameter values for EKF are not optimal for EAKF and vice versa. This shows that there is a dilemma of the uniqueness of the model closure parameters: an identical model formulation can have more than one optimal parameter value combination. This can be explained by the fact that the two data assimilation components have different systematic approximation errors, and these errors are compensated, to some extent, by the different parameter values in the forecast model. In this sense, the forecast model parameter values should be considered as filter-dependent quantities. The consequences of this result are discussed below in Section 3.

Figure 2 also depicts that the filter likelihood method captures the optimal parameter values correctly: the lowest posterior RMSE values are close to the obtained posterior distributions. In addition, it can be seen that the results produced by the EKF and EAKF versions of the filter likelihood technique are similar, but not identical. The posterior distribution obtained using EKF is closer to the parameter values that yield low RMS error for EKF. And vice versa: the parameters estimated using EAKF are closer to the optimal values for EAKF than the ones obtained by EKF. That is, applying the filter likelihood technique seems to provide rather good filter-specific parameter estimates. In this study, the RMSE values produced by EKF are lower than those produced by EAKF (see the RMSE values at the background of Fig. 2), which was expected. In addition, it can be noted, that the choice of data assimilation component has a much larger effect on the RMSE than the exact values of the model parameters.

## 3. Discussion and conclusions

In this article, we presented a numerical experiment with the purpose to simulate the prediction model parameter estimation problem in chaotic systems. We studied a simplified prediction system, which consists of a modified Lorenz-95 system as the prediction model, where the net effect of fast variables is represented by a two-parameter linear scheme. Two versions of the Kalman filter were used as different data assimilation components. We have shown that depending on the version of data assimilation that is applied, either the extended Kalman filter or the ensemble adjustment Kalman filter, the optimal prediction model tuning parameters corresponding to the highest system accuracy have two distinct optima. Accuracy is measured here as the RMS errors of the state estimate against the known truth. Replacement of EKF with EAKF implies a major change to the prediction system and it is only natural that the prediction model parameters need re-tuning.

In addition, we have demonstrated a method that can be used to estimate the closure parameters for each prediction system component separately. The approach is based on computing the likelihood using the output of the data assimilation system. We also demonstrate how Markov chain Monte Carlo sampling methods can be used on top of this likelihood formulation. The method recovers the optimal forecast model parameter values for both data assimilation systems. In addition, both the forecast model parameters and the data assimilation related parameters can be estimated. In our case, the data assimilation related parameters were the model error covariance matrix in EKF, and covariance inflation and localisation parameters in EAKF.

The models applied in operational weather forecasting are far more complex than the system applied here. If we assume that the behaviour discovered here scales up, it has some consequences for tuning of large-scale systems. Traditionally, model releases are carefully fine-tuned and parallel tested associated with forecast model changes. Changes in data assimilation are, however, typically considered independent of forecast model fine-tuning. According to the result presented here, this should not be the case. If the data assimilation component undergoes a major revision but the prediction model remains the same, then the prediction model tuning corresponds to the previous version of the data assimilation component and may thus be suboptimal. In this case, the correct procedure would be to consider model tuning, too.

Finally, this work considered a low-order prediction system. Our future work will be directed towards quantifying how this result scales-up to more realistic systems. We will use the ECHAM6 prediction model together with the DART toolbox to study this question.

## 4. Acknowledgements

We would like to acknowledge the National Center for Atmospheric Research for making the Data Assimilation Research Testbed available on-line. The research has been financially supported by the Academy of Finland (project numbers 127210, 132808, 133142, 134935 and 134999) and the Centre of Excellence in Inverse Problems Research. Additionally, this work was funded by the European Commission's 7th Framework Programme, under Grant Agreement number 282672, EMBRACE project, and the Nessling Foundation.

## 5. Appendix

A.1. Derivation of the filter likelihood formula

We present the derivation of the filter likelihood formula [eq. (3)], briefly. A more comprehensive discussion is given in Hakkarainen et al. (2012).

Fig. 1

Pairwise MCMC samples using EKF (left panel) and EAKF (right panel) version of the filter likelihood formula [eq. (3)]. In addition to the model parameters θ0 and θ1, filter related parameters are sampled too. Contours illustrate 95% and 50% probability regions. Also, the one-dimensional marginal densities are illustrated.

Let us consider the following general state space model.

(A1 )
${x}_{k}˜p\left({x}_{k}\mid {x}_{k-1},\theta \right)$
(A2 )
${y}_{k}˜p\left({y}_{k}\mid {x}_{k}\right)$
(A3 )
$\theta ˜p\left(\theta \right)$

Fig. 2

Average posterior RMSE values [eq. (9)] when using EKF (left panel) and EAKF (right panel). Lower values indicate better performance. Solid and dashed lines indicate the 50% and 95% probability regions as obtained with MCMC in case of EAKF and EKF, respectively. RMSE values are depicted by the background colour scheme. Note the difference in colour scales.

of state xk, observations yk and parameters θ, respectively. Our goal is to find the posterior distribution $p\left(\theta \mid {y}_{1:n}\right)$. In filtering, the previous state estimate can be transported to the next time step's prior by using the prediction model

(A4 )
$p\left({x}_{k}\mid {y}_{1:k-1},\theta \right)=\int p\left({x}_{k}\mid {x}_{k-1},\theta \right)×p\left({x}_{k-1}\mid {y}_{1:k-1},\theta \right)d{x}_{k-1}$

and in the analysis step the prior is updated by yk using the Bayes’ formula

(A5 )
$p\left({x}_{k}\mid {y}_{1:k},\theta \right)\propto p\left({y}_{k}\mid {x}_{k},\theta \right)p\left({x}_{k}\mid {y}_{1:k-1},\theta \right).$

The predictive distribution for next observations can be obtained by

(A6 )
$p\left({y}_{k}\mid {y}_{1:k-1},\theta \right)=\int p\left({y}_{k}\mid {x}_{k},\theta \right)p\left({x}_{k}\mid {y}_{1:k-1},\theta \right)d{x}_{k}.$

Using Bayes’ formula and the chain rule we obtain

(A7 )
$p\left(\theta \mid {y}_{1:n}\right)\propto p\left({y}_{1:n}\mid \theta \right)p\left(\theta \right)=p\left({y}_{n}\mid {y}_{1:n-1},\theta \right)×p\left({y}_{n-1}\mid {y}_{1:n-2},\theta \right)\dots p\left({y}_{2}\mid {y}_{1},\theta \right)p\left({y}_{1}\mid \theta \right)p\left(\theta \right),$

where $p\left({y}_{k}\mid {y}_{1:k-1},\theta \right)$ can be calculated using eq. (A6).

In extended Kalman filtering the predictive distribution is

(A8 )
${y}_{k}\mid {y}_{1:k-1},\theta ˜N\left(\mathrm{ℋ}\left({x}_{k}^{p}\right),{C}_{k}^{y}\right).$

Now, applying this to eq. (A8) we can obtain eq. (3).

## References

1. AndersonJ, HoarT, RaederK, LiuH, CollinsN, co-authors. The data assimilation research testbed: a community facility. B. Am. Meteorol. Soc. 2009; 90(9): 1283–1296.

2. AndersonJ. L. An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev. 2001; 129(12): 2884–2903. DOI: 10.1175/1520-0493(2001)129<2884:AEAKFF>2.0.CO.

3. AndersonJ. L. A non-Gaussian ensemble filter update for data assimilation. Mon. Wea. Rev. 2010; 138(11): 4186–4198.

4. AndersonJ. L, AndersonS. L. A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Wea. Rev. 1999; 127(12): 2741–2758. DOI: 10.1175/1520-0493(1999)127<2741:AMCIOT>2.0.CO.

5. AnnanJ. D, LuntD. J, HargreavesJ. C, ValdesP. J. Parameter estimation in an atmospheric GCM using the Ensemble Kalman Filter. Nonlin. Processes Geophys. 2005; 12(3): 363–371.

6. D'AndreaF, VautardR. Reducing systematic errors by empirically correcting model errors. Tellus A. 2000; 52(1): 21–41.

7. DowdM. Estimating parameters for a stochastic dynamic marine ecological system. Environmetrics. 2011; 22(4): 501–515.

8. GaspariG, CohnS. E. Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteorol. Soc. 1999; 125(554): 723–757.

9. GelmanA, CarlinJ, SternH, RubinD. Bayesian Data Analysis. 2003; Boca Raton, FL: Chapman & Hall. 2nd ed.

10. HaarioH, SaksmanE, TamminenJ. An adaptive Metropolis algorithm. Bernoulli. 2001; 7(2): 223–242.

11. HakkarainenJ, IlinA, SolonenA, LaineM, HaarioH, co-authors. On closure parameter estimation in chaotic systems. Nonlin. Processes Geophys. 2012; 19(1): 127–143.

12. HamillT. M, WhitakerJ. S, SnyderC. Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev. 2001; 129(11): 2776–2790. DOI: 10.1175/1520-0493(2001)129<2776:DDFOBE>2.0.CO.

13. JakobC. Accelerating progress in global atmospheric model development through improved parameterizations: challenges, opportunities, and strategies. B. Am. Meteorol. Soc. 2010; 91(7): 869–875.

14. JärvinenH, RäisänenP, LaineM, TamminenJ, IlinA, co-authors. Estimation of ECHAM5 climate model closure parameters with adaptive MCMC. Atmos. Chem. Phys. 2010; 10(20): 9993–10002.

15. JungT. Systematic errors of the atmospheric circulation in the ECMWF forecasting system. Quart. J. Roy. Meteorol. Soc. 2005; 131(607): 1045–1073.

16. KaasE, GuldbergA, MayW, DéquéM. Using tendency errors to tune the parameterisation of unresolved dynamical scale interactions in atmospheric general circulation models. Tellus A. 1999; 51(5): 612–629.

17. KalmanR. E. A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. Ser. D. 1960; 82: 35–42.

18. KlinkerE, SardeshmukhP. D. The diagnosis of mechanical dissipation in the atmosphere from large-scale balance requirements. J. Atmos. Sci. 1992; 49(7): 608–627. DOI: 10.1175/1520-0469(1992)049<0608:TDOMDI>2.0.CO.

19. KunzT, FraedrichK, KirkE. Optimisation of simplified GCMs using circulation indices and maximum entropy production. Clim. Dynam. 2008; 30: 803–813.

20. LeutbecherM. Predictability and ensemble forecasting with Lorenz-95 systems, Lecture notes. 2010. ECMWF meteorological training course on Predictability, diagnostics and seasonal forecasting. Online at: http://www.ecmwf.int/newsevents/training/meteorological_presentations/pdf/PR/Practice_L95.pdf.

21. LohmannU, QuaasJ, KinneS, FeichterJ. Different approaches for constraining global climate models of the anthropogenic indirect aerosol effect. B. Am. Meteorol. Soc. 2007; 88(2): 243–249.

22. LorenzE. N. Predictability: a problem partly solved. 1995. In: Proceedings of the Seminar on Predicability, European Center on Medium Range Weather Forecasting1, 1–18. Online at: http://www.ecmwf.int/publications/library/do/references/show?id=87423.

23. NeelinJ. D, BraccoA, LuoH, McWilliamsJ. C, MeyersonJ. E. Considerations for parameter optimization and sensitivity in climate models. Proc. Natl. Acad. Sci. 2010; 107(50): 21349–21354.

24. RobertC. P, CasellaG. Monte Carlo Statistical Methods (Springer Texts in Statistics). 2005; Secaucus, NJ: Springer-Verlag.

25. SchwartzC. S, LiuZ, ChenY, HuangX.-Y. Impact of assimilating microwave radiances with a limited-area ensemble data assimilation system on forecasts of typhoon morakot. Wea. Forecasting. 2011; 27(2): 424–437.

26. SeverijnsC. A, HazelegerW. Optimizing parameters in an atmospheric general circulation model. J. Climate. 2005; 18(17): 3527–3535.

27. SingerH. Parameter estimation of nonlinear stochastic differential equations: simulated maximum likelihood versus extended Kalman filter and Itô–Taylor expansion. J. Comput. Graph. Stat. 2002; 11: 972–995.

28. SolonenA, OllinahoP, LaineM, HaarioH, TamminenJ, co-authors. Efficient MCMC for climate model parameter estimation: parallel adaptive chains and early rejection. Bayesian Anal. 2012; 7(3): 715–736.

29. TippettM. K, AndersonJ. L, BishopC. H, HamillT. M, WhitakerJ. S. Ensemble square root filters. Mon. Wea. Rev. 2003; 131(7): 1485–1490. DOI: 10.1175/1520-0493(2003)131<1485:ESRF>2.0.CO.

30. TornR. D, DavisC. A. The influence of shallow convection on tropical cyclone track forecasts. Mon. Wea. Rev. 2012; 140(7): 2188–2197.

31. WilksD. Effects of stochastic parametrizations in the Lorenz 96 system. Quart. J. Roy. Meteorol. Soc. 2005; 131(606): 389–407.