## Introduction

In atmospheric data assimilation, observations are combined with numerical model data, weighted by their respective error statistics, to provide a best estimate of the current atmospheric state, known as the *analysis*. This is achieved through comparison of observations with the numerical model equivalent of those observations. The errors associated with the observation-model comparison are the instrument error and representation error (Janjić et al., **2018**). The representation error consists of the pre-processing error, the observation operator error and the *error due to unresolved scales* that occurs when there is a mismatch between the numerical model resolution and the scales resolved by the observation. The error due to unresolved scales depends on the observation footprint, which could be smaller or larger than the model grid, depending on the observation type and choice of model. For models which contain information on scales smaller than those observed, the standard approach to account for scale-mismatch would be to average the model state over the observation area (Janjić et al., **2018**). However, for the purposes of this paper, we focus only on situations where the observation information content includes smaller scales than can be resolved by the model. In order to obtain the best analysis from these observations the representation error must be treated correctly by the data assimilation system.

Methods of accounting for uncertainty due to unresolved scales include, for example, prediction through ensemble statistics (Karspeck, **2016**; Satterfield et al., **2017**) and the use of a stochastic superparameterization (Grooms et al., **2014**). In this manuscript we will consider two approaches: the standard approach where the uncertainty due to unresolved scales is included in the observation error covariance matrix (e.g. Hodyss and Satterfield, **2016**; Fielding and Stiller, **2019**) and an alternative approach where unresolved processes are considered in state space and hence accounted for through the state error covariance (Janjić and Cohn, **2006**).

Compensating for representation error through the standard approach involves using an observation error covariance matrix that takes into account both the instrument and representation uncertainty. This can then be used within a standard variational or sequential data assimilation scheme. Estimates of the observation uncertainty may be obtained using a statistical method, to estimate the entire observation error covariance matrix (e.g. Desroziers et al., **2005**; Stewart et al., **2014**; Waller et al., **2016a**, **2016b**; Cordoba et al., **2017**). Alternatively each component of the representation error statistics can be estimated separately and then combined with the instrument error covariance. For example the error due to unresolved scales may be approximated using high resolution observations (Oke and Sakov, **2008**) or high resolution model data (Daley, **1993**; Liu and Rabier, **2002**; Waller et al., **2014**; Schutgens et al., **2016**).

The Schmidt-Kalman filter (SKF) (Schmidt, **1966**) is an example of a filter which uses the statistics of the unresolved processes in state space, without ever evaluating the unresolved state itself, to compensate for the error due to unresolved scales. This approach allows for consideration of flow-dependent correlations between the resolved errors and the unresolved processes at the cost of additional assumptions, approximations and increased computational expense. Janjić and Cohn (**2006**) have shown that the SKF can produce positive results despite the approximations and assumptions required for implementation in a geophysical context. In this paper we provide new results that determine in which observation and model uncertainty regimes the SKF performs best. In addition we compare the SKF to two other Kalman filtering approaches.

The SKF is deemed a suboptimal filter as it does not minimise the mean-square-error of its estimated states (Janjić and Cohn, **2006**). In contrast, the Kalman filter that treats all scales is deemed optimal (for linear models and Gaussian statistics) (Nichols, **2010**). In practice, suboptimal filters that do not treat all scales are often used. The analysis error covariances propagated by suboptimal filters are not representative of the true error statistics due to omitted or incorrectly specified filter components. As such, the true analysis error equations have been derived to evaluate the performance of suboptimal filters (e.g. Brown and Sage, **1971**; Asher and Reeves, **1975**; Asher et al., **1976**). In this article we reformulate previous theory on true analysis error equations to include representation error (section 4) and evaluate the performance of the SKF.

A further issue noted by Janjić and Cohn (**2006**) is the potential for the representation error to be biased. This is because the error due to unresolved scales is sequentially correlated in time and correlated with the state resolved by the model. Other authors have circumvented this bias by careful construction of their numerical model (Janjić and Cohn, **2006**). However, in operational data assimilation, most observations are biased and the innovations need to be corrected or the bias accounted for within the assimilation (Dee, **2005**). Bias correction can be incorporated into the data assimilation algorithm by augmenting the state vector with a bias term (Friedland, **1969**; Jazwinski, **1970**; Ignagni, **1981**) which can be estimated along with the state variables. This method of bias correction is commonly used with variational data assimilation systems (e.g. Derber and Wu, **1998**; Dee, **2004**; Zhu et al., **2014**; Eyre, **2016**) but has also been applied with ensemble data assimilation systems (e.g. Fertig et al., **2009**; Miyoshi et al., **2010**; Aravéquia et al., **2011**). To the best of our knowledge a bias correction scheme has yet to be implemented in conjunction with the SKF; in section 7 we introduce a bias-correcting SKF as a new method to compensate for biases due to unresolved scales.

In summary, the objective of this paper is to investigate under which model and observation error regimes the SKF is most effective. The theoretical aspects of representation error will be reviewed in section 2 with particular emphasis on the error due to unresolved scales. Section 3 details how the SKF can be used to account for error due to unresolved scales and introduces the optimal Kalman filter (OKF) and a reduced-state Kalman filter (RKF). In section 4 we state the standard true analysis error equation and reformulate it to include representation error for each filter.

To evaluate the performance of the SKF in a numerical example we use a Gaussian random walk model. The numerical experiment methodology and model formulation are described in section 5 and results are presented in section 6. Our results show that the SKF provides the largest improvement in performance compared with the RKF when there is large error variance due to unresolved scales and small instrument error variance. In section 7 we discuss observation bias correction schemes in sequential data assimilation and introduce a novel SKF with bias correction scheme. The methodology and model formulation for the numerical experiments with biased observations is discussed in section 8 and results are presented in section 9. Our results show the SKF with bias correction can simultaneously treat observation biases and compensate for the error due to unresolved scales. We summarise and draw conclusions from our results in section 10.

## Theoretical framework

In this section we introduce a theoretical framework and the notation used in this paper. We begin by describing a numerical model (section 2.1) and observations (section 2.2). In data assimilation, the error statistics used in filters may not reflect the true uncertainties they are intended to model. To help distinguish between these two sets of statistics throughout this manuscript we will define any true error statistics with a tilde (∼). Error statistics used in or obtained from filter calculations will be referred to as *perceived* error statistics and have no tilde.

The mathematical framework used to examine the error due to unresolved scales in this manuscript is to estimate the projection of some state from a high, but finite dimensional real vector space, onto a lower dimensional subspace using observations and knowledge of the system dynamics, following a similar philosophy to Liu and Rabier (**2002**) and Waller et al. (**2014**). Our approach differs from that of Janjić and Cohn (**2006**) which begins from the standpoint of infinite dimensional function spaces.

### Model configuration

In this section we introduce the perfect and forecast models. We assume that the phase-space for the large-scale dynamics is a subspace of the phase-space for the full high dimensional system. The complement of the subspace for the large-scales will correspond to the phase-space for the small-scale dynamics. The notation for the models will be in a partitioned form that separates the large and small scales. In particular, we denote the true, complete state at time *t _{k}* as ${(\begin{array}{cc}{\mathit{x}}^{l,t}& {\mathit{x}}^{s,t}\end{array})}_{k}^{T}\in {\mathbb{R}}^{{N}_{t}}$ such that ${\mathit{x}}^{l,t}\in {\mathrm{\mathbb{R}}}^{{N}_{l}},{\mathit{x}}^{s,t}\in {\mathrm{\mathbb{R}}}^{{N}_{s}}$ and ${N}_{t}={N}_{l}+{N}_{s}.$ Here, and throughout this paper, any component with a

*t*-superscript indicates that it is a true variable. The

*l*- and

*s*-superscripts correspond to the large- and small-scale processes within the complete system dynamics. (We have deviated from the resolved/unresolved nomenclature of Janjic and Cohn 2006 for clarity, since the different filters used in our experiments resolve different scales).

An ideal linear model for the true state of a finite dimensional process can be expressed through the dynamical system

In numerical weather prediction (NWP), the true models that govern the evolution of the atmosphere are unknown and have to be approximated. For our approximation of the true dynamical system (2.1), we assume that any subgrid-scale parameterizations used to approximate the contribution from the small-scale processes to the large-scale state are contained within the large-scale model (Janjić and Cohn, **2006**; Janjić et al., **2018**). Hence, the model block ${\mathbf{M}}^{ls}={\mathbf{0}}_{{N}_{l}\times {N}_{s}}$ and our approximate dynamical model describing the complete system satisfies

In (2.2) each model block has the same dimensions as its true model counterpart. The large- and small-scale model errors are given by ${\mathit{\eta}}^{l}\in {\mathrm{\mathbb{R}}}^{{N}_{l}}$ and ${\mathit{\eta}}^{s}\in {\mathrm{\mathbb{R}}}^{{N}_{s}}$ respectively. Model errors are assumed to be random and unbiased with covariance given by

Here, using $\u3008\xb7\u3009$ to indicate the mathematical expectation over the corresponding error distribution, the matrices ${\tilde{\mathbf{Q}}}^{ll}\equiv \u3008{\mathit{\eta}}^{l}{\left({\mathit{\eta}}^{l}\right)}^{T}\u3009\in {\mathrm{\mathbb{R}}}^{{N}_{l}\times {N}_{l}},{\tilde{\mathbf{Q}}}^{ss}\equiv \u3008{\mathit{\eta}}^{s}{\left({\mathit{\eta}}^{s}\right)}^{T}\u3009\in {\mathrm{\mathbb{R}}}^{{N}_{s}\times {N}_{s}}$ and ${\tilde{\mathbf{Q}}}^{ls}\equiv \u3008{\mathit{\eta}}^{l}{\left({\mathit{\eta}}^{s}\right)}^{T}\u3009\in {\mathrm{\mathbb{R}}}^{{N}_{l}\times {N}_{s}}$ (with ${\tilde{\mathbf{Q}}}^{ls}={({\tilde{\mathbf{Q}}}^{sl})}^{T}$) are the true model error covariances of the large-scale, the small-scale and cross-covariances between the large- and small-scale, respectively. We note that for the purposes of this work, the model error distribution is assumed to be stationary, so that $\tilde{\mathbf{Q}}$ is not a function of time.

Analogously, the complete forecast state ${(\begin{array}{cc}{\mathit{x}}^{l,f}& {\mathit{x}}^{s,f}\end{array})}^{T}\in {\mathbb{R}}^{{N}_{t}}$ satisfies

The forecast errors can then be defined as

Here, ${\tilde{\mathbf{P}}}_{k}^{ll,f}\equiv \u3008{\mathbf{e}}_{k}^{l,f}{\left({\mathbf{e}}_{k}^{l,f}\right)}^{T}\u3009\in {\mathrm{\mathbb{R}}}^{{N}_{l}\times {N}_{l}},{\tilde{\mathbf{P}}}_{k}^{ss,f}\equiv \u3008{\mathbf{e}}_{k}^{s,f}{\left({\mathbf{e}}_{k}^{s,f}\right)}^{T}\u3009\in {\mathrm{\mathbb{R}}}^{{N}_{s}\times {N}_{s}}$ and ${\tilde{\mathbf{P}}}_{k}^{ls,f}\equiv \u3008{\mathbf{e}}_{k}^{l,f}{({\mathbf{e}}_{k}^{s,f})}^{T}\u3009\in {\mathbb{R}}^{{N}_{l}\times {N}_{s}}$ (with ${\tilde{\mathbf{P}}}^{ls,f}={({\tilde{\mathbf{P}}}^{sl,f})}^{T}$) are the true forecast error covariances of the large-scale, the small-scale and cross-covariances between the large- and small-scale, respectively.

This formulation of the complete finite-dimensional dynamics allows us to consider several filters with different approaches to the treatment of large- and small-scales. Moreover, we can consider the interactions between scales and the effect they have on the modelling of observations.

### Observations and their uncertainties

In this section we express the equations relating the observations, ${\mathit{y}}_{k}\in {\mathbb{R}}^{p},$ to the model state in a partitioned form and describe their uncertainties. For the rest of this section, we assume that the model state and observations are valid at the same time, and drop the time subscript, *k*. At time *t _{k}*, the observations are related to the true model state as

**2006**). We will not consider nonlinear observation operators in the remainder of this paper.

Throughout this paper, we assume that there is no pre-processing error. Hence, we will be concerned with the two cases described in sections 2.2.1 (all scales analysed) and 2.2.2 (large scales analysed) below. Case 1 shows the form of the representation error for filters that resolve all scales and is pertinent to the theoretical optimal Kalman filter discussed in section 3.2. Case 2 shows the form of the representation error for filters typically used in operational practice and is pertinent to the reduced-state Kalman filter and the Schmidt-Kalman filter discussed in sections 3.3 and 3.4 respectively.

#### Case 1: All scales analysed

In this case we assume that both the large- and small-scale states are estimated. The total observation error (observation departure from the true state), ${\mathbf{e}}^{o},$ can be expressed as

#### Case 2: Large scales analysed

In this case, we assume that only the large-scale state is estimated such that

The filter observation operator does not act on the small scales, so the term ${\mathit{\gamma}}^{s}$ is replaced by ${\mathbf{H}}^{s,t}{\mathit{x}}^{s,t},$ the error due to unresolved scales. The representation error for Case 2 is thus ${\mathit{\gamma}}^{l}+{\mathbf{H}}^{s,t}{\mathit{x}}^{s,t}$ with covariance ${\tilde{\mathbf{R}}}^{H}=\u3008({\mathit{\gamma}}^{l}+{\mathbf{H}}^{s,t}{\mathit{x}}^{s,t}){({\mathit{\gamma}}^{l}+{\mathbf{H}}^{s,t}{\mathit{x}}^{s,t})}^{T}\u3009\in {\mathrm{\mathbb{R}}}^{p\times p}.$Equations (2.10) and (2.11) are analogous to equation (1) in Janjić et al. (**2018**) with the pre-processing error omitted. The complete observation error covariance for Case 2 is given by $\tilde{\mathbf{R}}={\tilde{\mathbf{R}}}^{I}+{\tilde{\mathbf{R}}}^{H},$ where we have assumed that the representation error and instrument error are mutually uncorrelated. As in Case 1, ${\mathit{\gamma}}^{l}$ is assumed to be unbiased. However, we will see in section 2.3 that the expected value of ${\mathbf{H}}^{s,t}{\mathit{x}}^{s,t}$ is likely to be non-zero.

### Bias due to unresolved scales

Analysing only the large-scales will result in an error due to unresolved scales (section 2.2.2) that is sequentially correlated in time and correlated with the resolved state, leading to a potential bias (Janjić and Cohn, **2006**). Assuming that the large-scale observation operator is unbiased, taking the expectation of the error due to unresolved scales, (2.11), (and reintroducing the time subscript *k*) results in

*k*. Using dynamical system (2.1), repeated substitution for the equation governing ${\mathit{x}}^{s,t}$ into the expected error due to unresolved scales yields

Here the underlined terms represent the contribution from the large scales. For many non-trivial models, these terms will not be identically zero, and potentially introduce a bias even if the initial value for the small-scale state is zero, ${\mathit{x}}_{0}^{s,t}=\mathbf{0}.$ For example, Janjić and Cohn (**2006**) solved a model of non-divergent linear advection on a sphere using a truncated expansion in spherical harmonics. Introducing a shear flow results in a dynamical system where the unresolved small-scales do not directly influence the resolved large-scales, but the large-scales influence the small-scales. This yields an error and bias due to unresolved scales. Janjić and Cohn (**2006**) were able to mitigate the bias using specific initial conditions. However, this experimental freedom would not be available in less-idealized situations. Therefore, when accounting for the unresolved scales in data assimilation we must also determine and treat any bias arising. In the new results in sections 5 - 6 below, we carefully construct our model to avoid bias due to unresolved scales. However, we revisit this problem in sections 7 - 9 where we use filters with bias-correction schemes.

## Sequential linear filters and representation uncertainty

In this section we describe the general linear filtering framework that we use for data assimilation in our theoretical investigations and numerical experiments. We consider three filters in more detail: an optimal Kalman filter (OKF) that takes account of all scales; a reduced-state Kalman filter (RKF) that disregards the small-scales; and the Schmidt-Kalman filter (SKF) that provides analyses of the large-scale state through consideration of both the large- and small-scale uncertainties.

### A linear filter

A linear filter algorithm can be divided into analysis update and model prediction steps. The general form of the analysis update at time *t _{k}*, is given by

**in equation (3.1) could be either the complete state ${(\begin{array}{cc}{\mathit{x}}^{l}& {\mathit{x}}^{s}\end{array})}^{T}$ or just the large-scale state ${\mathit{x}}^{l}.$ There are various approaches to determine the gain matrix which will be discussed in sections 3.2, 3.3 and 3.4.**

*x*The perceived analysis error covariance update calcluated by the filter at time *t _{k}* is given by

**I**is the identity matrix, ${\mathbf{H}}_{k}$ is the observation operator and ${\mathbf{P}}_{k}^{f}$ is the perceived forecast error covariance. Equation (3.2) is known as the short form of the analysis error covariance update. This equation only provides the correct estimate of the analysis error covariance if the background and observation error statistics used in the filter reflect the true error statistics. The use of a suboptimal gain and the short form update equation (3.2) will result in the filter producing incorrect error statistics. The true error statistics will be derived in section 4.

For the model prediction step, the forecast state at time ${t}_{k+1}$ is evolved from the analysis at the previous time-step and is given by

**M**is a linear model. A model error term is not included in the forecast state update as linear filters estimate the mean state and we have assumed that the model error is unbiased. However, the error in the model

**M**is accounted for in the forecast error covariance update given by

**Q**are equal to their true statistics counterparts.

Equations (3.1)–(3.4) form the core components of the linear filter algorithm. In the following sections we discuss three linear filters, each based on the Kalman filter (Kalman, **1960**), that we will use in this paper. Table 1 summarizes the key vectors and matrices used in these three Kalman filters.

### The optimal kalman filter (OKF)

For the optimal Kalman filter (OKF), we assume that we are able to model the processes for all scales and know the correct error statistics for the initial state, observations and model. Therefore, the perceived error statistics for the OKF will be equivalent to the true error statistics. The OKF simultaneously updates the large- and small-scale states, ${\mathit{x}}^{l}\in {\mathbb{R}}^{{N}_{l}}$ and ${\mathit{x}}^{s}\in {\mathbb{R}}^{{N}_{s}},$ so that the analysis update takes the form

**2010**). The analysis error covariance update is calculated using (3.2) with state error covariances with the same block structure as the forecast error covariance (2.6).

As the OKF filters all scales, the total observation error is described by Case 1 (all scales analysed, section 2.2.1). Hence, the observation error covariance for the OKF is $\mathbf{R}={\mathbf{R}}^{I}+{\mathbf{R}}^{G}.$

For the OKF forecast step we use the matrix

In summary, the analysis and forecast updates for the OKF state and covariance are a partitioned form of (3.1) - (3.4). By treating all scales in the assimilation the OKF has no error due to unresolved scales in the associated observation equation. However, due to computational constraints and inadequate knowledge of small-scale processes it is not possible to apply this technique in practice. Hence, methods that approximate the influence of small-scale processes must be employed instead.

### The reduced-state kalman filter (RKF)

The suboptimal Kalman filter which estimates only the large-scale state and completely neglects the modelling of small-scale processes will be referred to as the reduced-state Kalman filter (RKF).

The analysis and forecast update equations for the RKF are simply the linear filter equations (3.1)-(3.4) where, as described in Table 1, we use the large-scale state, error covariances and observation operator. Thus, the forecast innovation is

Reduced-state methods form an attractive approach in situations where computational expense is an important consideration. However, it is necessary to approximate the representation error covariance, ${\mathbf{R}}^{H}.$ Hence, the Kalman gain for the RKF will not minimise the analysis error covariance and the filter will be suboptimal.

### The Schmidt-Kalman filter (SKF)

The Schmidt-Kalman Filter (SKF) estimates only the large-scale state, but the statistics of any unmodelled processes are used to determine the Kalman gain for the filtered state (Schmidt, **1966**; Janjić and Cohn, **2006**). A summary of the relevant equations is included in Table 1.

As only the large-scale state is estimated the forecast innovation is computed using the large-scale state, ${\mathit{x}}^{l,f},$ and observation operator, ${\mathbf{H}}^{l}.$ To determine the innovation covariance we start with the innovation (3.7) and add and subtract the term $(\begin{array}{cc}{\mathbf{H}}^{l}& {\mathbf{H}}^{s}\end{array}){(\begin{array}{cc}{\mathit{x}}^{l,t}& {\mathit{x}}^{s,t}\end{array})}^{T}.$ This allows us to write the innovation in the form

The innovation is now written in terms of the observation errors corresponding to case 1 (where all scales are analysed, see section 2.2.1), the large-scale forecast error mapped into observation space, ${\mathbf{H}}^{l}{\mathbf{e}}^{l,f},$ and the term ${\mathbf{H}}^{s}{\mathit{x}}^{s,t},$ the true small-scale state mapped into observation space. Assuming each error and ${\mathit{x}}^{s,t}$ has zero mean, taking the expectation of the outer product of (3.9) gives the true innovation covariance. The innovation covariance used by the SKF is given by

Here, we have abused our notation, to write ${\mathbf{P}}^{ls,f}$ as the perceived approximation of $\u3008-{\mathbf{e}}^{l,f}{({\mathit{x}}^{s,t})}^{T}\u3009,$ such that ${\mathbf{P}}^{sl,f}={({\mathbf{P}}^{ls,f})}^{T}.$ Using this notation for the cross-covariances of the SKF is common amongst other literature on the filter (e.g. Janjić and Cohn, **2006**; Janjić et al., **2018**). Following Janjić and Cohn (**2006**), we employ a prescribed error covariance ${\mathbf{C}}^{s}$ as a time-independent approximation of $\u3008{\mathit{x}}^{s,t}{({\mathit{x}}^{s,t})}^{T}\u3009.$ We note that as the small-scale error covariance is prescribed, the innovation covariance is an inexact approximation. The innovation covariance for the SKF is theoretically the same as the innovation covariance for the RKF (3.8) but expressed in a different form that includes contributions from the small scale processes.

The analysis state update for the SKF is given by

**2006**). As the short-form analysis error covariance update for the SKF is not symmetric, we calculate ${\mathbf{P}}^{ll,a}$ and ${\mathbf{P}}^{ls,a}$ through the short-form update only and set ${\mathbf{P}}_{k}^{sl,a}={({\mathbf{P}}_{k}^{ls,a})}^{T}.$ Thus, the SKF analysis error covariance update equations are

We note that the term $-{\mathbf{K}}_{k}^{l}{\mathbf{H}}_{k}^{s}$ will usually be non-zero for the SKF. This term couples the large-scale uncertainty to the small-scale variability. If this term were zero, the large-scale state and uncertainty estimates produced by the RKF and SKF may still differ because of the differing innovation covariances between the filters.

The SKF treatment of the forecast step has a similar philosophy to the analysis step. The state prediction (3.3) is obtained through evolving the large-scale state ${\mathit{x}}^{l}$ with the large-scale forecast model ${\mathbf{M}}^{l}:$

The large-scale and cross-covariance blocks of the forecast error covariance are calculated using the complete forecast model and model error covariance in (3.4). The SKF forecast error covariance update equations are

The prescribed small-scale error covariance ${\mathbf{C}}^{s}$ is assumed constant in time and is not updated.

The appeal of the SKF is in its ability to compensate for small-scales without estimation of the small-scale state. Practical implementation of the SKF would require the filter to be adapted to nonlinear models. However, even for linear systems, the models evolving the small-scale processes would be unknown and their influence on the error covariances would need to be quantified. Additionally, the propagation of the state cross-covariances poses a considerable computational cost.

### Discussion

The OKF, SKF and RKF represent three different approaches for dealing with observation uncertainty due to unresolved scales (see Table 1). The OKF analyses all scales, thus avoiding the error due to unresolved scales altogether, while the RKF completely disregards the small-scale processes and accounts for the error due to unresolved scales through the representation error covariance matrix. The SKF, however, takes a compromise approach where only the large-scale state is estimated, but the uncertainty in all-scales is accounted for in the estimation. Additionally, the SKF accounts for the flow-dependence of the correlations between the large-scale errors and small-scale processes (albeit approximately) through the cross-covariances in the analysis and forecast error covariances given in equations (3.13), (3.14), (3.17) and (3.18). Applications where it is a poor approximation to neglect these cross-covariances will benefit the most from using the SKF (as opposed to the RKF where these cross-covariances are neglected).

## True analysis error equations

A standard metric for assessing the quality of a data assimilation scheme is through examination of the magnitude of its analysis errors (e.g. Liu and Rabier, **2002**). Under an unrealistic and restrictive set of conditions the Kalman filter is known to be optimal in a minimum mean-square-error sense and to produce the true error statistics describing its analysis and forecast (e.g. Todling and Cohn, **1994**; Nichols, **2010**). In contrast, both the SKF and RKF described in section 3 will incorrectly estimate the true analysis and forecast error covariances due to their treatment of the small-scales in the filter calculations. In this section we extend the existing literature on the true analysis error equations to include representation error so that we may evaluate the analysis obtained through the SKF and RKF.

To obtain the true analysis error equation for a linear filter we assume that we have exact knowledge of the truth and that both the true and filter models and observation operators are linear. Under this regime, the true analysis error at time *t _{k}* has been derived by Moodey (

**2013**) and is given by

*η*is the model error (see section 2.1), ${\mathbf{e}}_{k}^{o}$ is the total observation error which will be specified for different cases in subsections 4.1 − 4.2 and ${\mathbf{K}}_{k}$ and ${\mathbf{H}}_{k}$ are the Kalman gains and observation operators for the analysis state updates respectively. Therefore, the Kalman gain is calculated from the error statistics perceived by a filter. We assume that ${\mathbf{e}}_{k-1}^{a},{\mathit{\eta}}_{k}$ and ${\mathbf{e}}_{k}^{o}$ each have zero-mean and are mutually uncorrelated. (We note that this assumption excludes a consideration of bias due to unresolved scales. However, this is considered further in section 7). Under these assumptions the true analysis error covariance is obtained through taking the expectation of the outer product of equation (4.1) with itself to give

_{k}Here we remind the reader that we have used tildes to indicate true error covariances, to help distinguish these from the covariances perceived by the filters, which may be suboptimal. Equation (4.2) is known as the Joseph-formula (Gelb, **1974**). The true analysis error covariance (4.2) is valid for any gain matrix. The Joseph-formula is equivalent to the short form analysis error covariance (3.2) for the optimal case (OKF) in exact arithmetic.

The true analysis error covariance can be calculated separately from the assimilation. In subsections 4.1 − 4.2 we use (4.1) and (4.2) to determine the true analysis error equations and error covariances for Cases 1 and 2 described in sections 2.2.1 and 2.2.2. Case 1 corresponds to analysing all scales and includes the OKF. Case 2 corresponds to analysing the large-scale state only and includes the RKF and SKF. Table 2 summarizes the matrices and vectors used in the true error calculations.

### Case 1: true analysis error covariance when all scales are filtered

To obtain the true analysis error equation we assume that we have complete knowledge of all scales as with the OKF. As in section 2.2.1 the observation error will consist of instrument error, $\mathit{\u03f5},$ and the observation operator error for large- and small-scales, ${\mathit{\gamma}}^{l}+{\mathit{\gamma}}^{s}.$ Under these assumptions the true analysis error equation will be a partitioned form of equation (4.1) and the true analysis error covariance will be a partitioned form of (4.2) with the components given in column 1 of Table 2.

### Case 2: true analysis error covariance when only large-scales are filtered

The true analysis error equation for case 2 applies to filters that estimate the large-scale state only like the RKF and SKF. Using equation (4.1) and the state gain matrices and observation operators, the large-scale analysis error for the RKF and SKF is given by

We note that the observation errors correspond to case 2 described in 2.2.2 as both the RKF and SKF filter the large-scale state only. Hence, the effect of the small-scale processes on ${\mathbf{e}}_{k}^{l,a}$ in equation (4.4) is determined through the error due to unresolved scales ${\mathbf{H}}^{s,t}{\mathit{x}}_{k}^{s,t}.$ We observe that the true large-scale error covariance may thus be written in terms of the representation error covariance as

However, the true error statistics contributing to the true analysis error covariance are unknown in practice making the use of (4.5) to evaluate filter performance unfeasible. For theoretical experiments where most error statistics can be prescribed, determining ${\tilde{\mathbf{R}}}_{k}^{H}$ requires a Monte Carlo approach due to its dependence on ${\mathit{x}}^{s,t}.$ Alternatively, a different form of the analysis error equation may be more practical.

Assuming we know the true behaviour for the small-scales we can express the true analysis error equation for the RKF and SKF as

Using the definitions of the small-scale observation operator error (2.9), the error due to unresolved scales (2.11) and the small-scale forecast error (3.9) we find that

Thus the right-hand-side of (4.7) can be evaluated without knowledge of the error due to unresolved scales specifically. Instead, this can be written in terms of the observation operator error and a small-scale forecast:

The partitioned case 2 error equation (4.9) can be used to obtain the true analysis error covariance for the SKF and RKF without knowing the full representation error covariance ${\tilde{\mathbf{R}}}^{H}.$ However, when using this form of the analysis error equation to obtain the true error statistics the correlations between ${\mathit{x}}^{s,f}$ and ${\mathbf{e}}^{s,f}$ may be non-negligible. We note that, while ${\mathit{x}}^{s,f}$ and ${\mathit{x}}^{s,t}$ will also be unknown in practice, they could be approximated offline with high-resolution models.

## Experiment methodology

### Gaussian random walk model

We now consider the methodology for numerical experiments where we apply the three filters to the simple model system

**2012**). This system uses one variable for the large-scale state,

*x*, and one variable for the small-scale state,

^{l}*x*. The large-scale state

^{s}*x*and small-scale state

^{l}*x*are random walk variables driven by the errors

^{s}*η*and

^{l}*η*whose structures are determined by the variances ${\mathrm{Q}}^{l}$ and ${\mathrm{Q}}^{s}$ respectively. There are no cross-covariances in the model error statistics. The model component ${\mathrm{M}}^{sl}$ is the contribution from the large-scale processes to the small-scale state. The observations will be taken to be the sum of the large- and small-scale states plus instrument error.

^{s}The random walk model (5.1) will first be used for a “nature run” from which observations can be created. The filters described in section 3 will then be used to assimilate these observations and the true large-scale analysis error variance calculated at the end of the assimilation window. As the RKF and SKF are suboptimal, they propagate inexact error variances. Therefore, the true error variances for the RKF and SKF are calculated using (4.9) to provide a comparison between their performances.

Through our experimental design we are able to easily control the magnitude of the observation error due to unresolved scales by adjusting ${\mathrm{Q}}^{s}.$ The relationship between ${\mathrm{Q}}^{s}$ and the error due to unresolved scales is described in section 5.3. This framework also allows for the determination of the optimal ${\mathrm{C}}^{s}$ as well as the sensitivity of the SKF to this modelled variance.

### Initial conditions and filter parameters

For our experiments, we choose the initial conditions for the true state (nature run) to be ${x}_{0}^{l}=10$ and ${x}_{0}^{s}=0$ so that the true resolved state is an order of magnitude larger than the unresolved state. Setting the small-scale true state to zero also ensures that the representation errors are initially unbiased.

We set the initial conditions for the forecast state and forecast error covariance to be

For our first set of experiments we set the model component ${\mathrm{M}}^{sl}=0$ so that the representation errors remain unbiased throughout the assimilation. The large-scale model error variance, ${\mathrm{Q}}^{l}=1,$ is used throughout our experiments while ${\mathrm{Q}}^{s}$ will vary for different experiments.

Observations are assimilated every time-step. The true observation operator is $\mathbf{H}=(\begin{array}{cc}1& 1\end{array})$ which is used by all three filters; this ensures that there is no observation operator error. Unless otherwise specified, the instrument error variance is set to ${\mathrm{R}}^{I}=0.1,$ and ${\tilde{\mathrm{R}}}_{k}^{I}={\mathrm{R}}^{I}$ so that each filter correctly accounts for the instrument error. For the RKF, we set ${\mathrm{R}}^{H}=0$ so that the filter completely ignores the small-scale processes. The prescribed small-scale error variance ${\mathrm{C}}^{s}$ is varied throughout our experiments.

To calculate the true analysis error variance with (4.9), we neglect the variance of ${x}^{s,f}$ and the correlations between ${x}^{s,f}$ and ${\mathrm{e}}^{s,f}$ as the solution for ${x}^{s,f}$ is exponentially decaying with time. This method of calculating the true analysis error covariance has been validated against a Monte Carlo approach.

### Determining the small-scale variability over the assimilation window

For the SKF and RKF, which do not update the small-scale state, the true small-scale analysis and forecast error statistics are equal at the same time-step. Setting ${\mathrm{M}}^{sl}=0,$ the true small-scale error covariance, denoted ${\tilde{\mathrm{P}}}^{ss},$ is evolved through the difference equation

For the Gaussian random walk model, we may use a scalar version of this equation, given by

For large *k*, the first term in equation (5.5) decays to zero while the second term tends to the limit ${\tilde{\mathrm{Q}}}^{s}/(1-{e}^{-1}).$ Hence, after a burn-in period the error due to unresolved scales for the SKF and RKF is primarily determined by the size of ${\tilde{\mathrm{Q}}}^{s}$ and increases each time-step.

## Numerical experiments

In this section we apply the OKF, RKF and SKF to the random walk model defined in section 5.1 with filter parameters and error statistics assumptions detailed in section 5.2.

### Determining the optimal C^{s}

Before using the SKF, we first need to approximate the matrix ${\mathrm{C}}^{s}$ (see Table 1). To find the optimal value of ${\mathrm{C}}^{s}$ over the whole assimilation window we carry out numerical experiments for a range of values of ${\mathrm{R}}^{I}$ and ${\mathrm{Q}}^{s}.$ Both of these parameters will affect the magnitude of the true large-scale analysis error variance. For each $({\mathrm{R}}^{I},{\mathrm{Q}}^{s})$ parameter pair, we test a number of values of ${\mathrm{C}}^{s}$ to determine the value of ${\mathrm{C}}^{s}$ which gives the smallest true large-scale analysis error variance at the final assimilated observation. As we are calculating the variances only, the calculation is deterministic and the choice of noise realisation is irrelevant. For this experiment, we assimilate 15 observations. We start with ${\mathrm{C}}^{s}=0$ and increase ${\mathrm{C}}^{s}$ in steps of $\Delta {\mathrm{C}}^{s}=0.001$ until ${\mathrm{C}}^{s}=1.$

The optimal values of ${\mathrm{C}}^{s}$ that produce the minimum large-scale analysis error variance for the SKF at the final time-step are shown in Fig. 1. The optimal value of ${\mathrm{C}}^{s}$ increases as both ${\mathrm{R}}^{I}$ and ${\mathrm{Q}}^{s}$ increase. In particular, the optimal value of ${\mathrm{C}}^{s}$ is most sensitive to any increase in ${\mathrm{Q}}^{s}$ as the error due to unresolved scales is primarily determined by this error variance in our model. While not as sensitive, we find that large ${\mathrm{R}}^{I}$ also affects the optimal value of ${\mathrm{C}}^{s}.$ This is because the optimal value of ${\mathrm{C}}^{s}$ is a function of ${\mathrm{R}}^{I}$ and ${\mathrm{Q}}^{l}$ after the initial time. We also find that for small ${\mathrm{R}}^{I}$ and ${\mathrm{Q}}^{s}$ the optimal value of ${\mathrm{C}}^{s}$ over the whole assimilation window is similar to ${\tilde{\mathrm{P}}}^{ss}$ given by equation (5.5) for the final time-step. For large ${\mathrm{R}}^{I}$ and ${\mathrm{Q}}^{s},$ the optimal value of ${\mathrm{C}}^{s}$ is approximately 1.4 times larger than ${\tilde{\mathrm{P}}}^{ss}$ evaluated at the final time-step.

In operational settings we would not be able to optimise ${\mathrm{C}}^{s}$ in this way. However, it may be possible to approximate part of the representation error covariance using high resolution observations (Oke and Sakov, **2008**) or model data (Waller et al., **2014**) and use these approximate representation error values to guide the choice of ${\mathrm{C}}^{s}.$ To mimic this situation in our experiments, we create an ensemble of 50,000 realizations of *x ^{s}* for the length of the assimilation windows using the random walk model (5.1) and calculate the variance, averaged over the whole ensemble and time. The variance of this ensemble will be denoted $\mathrm{S}$. The variance, $\mathrm{S}$, represents an approximation to the total small-scale variability over the assimilation window. We now compare the values of ${\mathrm{C}}^{s}$ computed in Fig. 1 with the values of S.

Figure 2(a) shows the optimal ${\mathrm{C}}^{s}$ values when ${\mathrm{R}}^{I}=0.1$ (dashed line) and ${\mathrm{R}}^{I}=0.5$ (dotted line) for different values of ${\mathrm{Q}}^{s}.$ The grey region shows all points between $\mathrm{S}$ and 2S. As ${\mathrm{Q}}^{s}$ is increased the variance $\mathrm{S}$ also increases. Both optimal ${\mathrm{C}}^{s}$ lines lie within the shaded region for nearly all ${\mathrm{Q}}^{s}.$ We note that when there is little small-scale variability (i.e. ${\mathrm{Q}}^{s}\approx 0$) the optimal ${\mathrm{C}}^{s}$ values are less than $\mathrm{S}$ but both are close to zero. Figure 2(b) shows the effect of changing ${\mathrm{C}}^{s}$ on the SKF true large-scale analysis error variance (solid line) when ${\mathrm{R}}^{I}=0.1$ and ${\mathrm{Q}}^{s}=0.35$ in comparison to the true large scale analysis error variances for the RKF and OKF. Thus, for these experiments, a reasonable rule of thumb to avoid areas where the SKF under- or overcompensates for the error due to unresolved scales, is to choose $\mathrm{S}<{\mathrm{C}}^{s}<2\mathrm{S}.$

### Comparison of the SKF with the RKF and OKF

Using the optimal values of ${\mathrm{C}}^{s}$ calculated in Fig. 1, we now carry out experiments comparing the SKF and RKF for a range of values of ${\mathrm{R}}^{I}$ and ${\mathrm{Q}}^{s}$ relative to the OKF. The results are illustrated in terms of relative error percentage for the RKF in Fig. 3a and the SKF in Fig. 3b. The relative error percentage is defined as

To examine the performance perceived by the filter we compare it to the true performance of the filter at the final time-step. Before discussing the results, we note that the SKF perceived analysis error variance will not be a smooth field for the $({\mathrm{R}}^{I},{\mathrm{Q}}^{s})$ parameter pairs considered. This is because in section 6.1 the optimal value of ${\mathrm{C}}^{s}$ was calculated to a limited precision of 0.001.

In Fig. 4 we plot the difference between the perceived and true analysis error variance at the final time-step. We note that the magnitude of the difference between the perceived and true error variances is smallest for large ${\mathrm{R}}^{I}$ and ${\mathrm{Q}}^{s}.$ Here, the SKF (RKF) perceived error variance is approximately 1.25 (0.5) times the size of the true error variance. The SKF perceived-minus-truth difference shown in panel (a) is always positive for non-negligible representation uncertainty. This shows the SKF is a conservative filtering strategy when compensating for observations exhibiting error due to unresolved scales. As both ${\mathrm{R}}^{I}$ and ${\mathrm{Q}}^{s}$ are increased the SKF perceived-minus-truth difference increases. This is due to two reasons. The first reason is because the perceived analysis error variance, ${\mathrm{P}}^{ll,a},$ increases with larger ${\mathrm{C}}^{s}$ as it is calculated using the short form update (3.2) and the optimal ${\mathrm{C}}^{s}$ will be larger for higher values of ${\mathrm{R}}^{I}$ and ${\mathrm{Q}}^{s}.$ The second reason is because, for non-negligible representation uncertainty, the true analysis error variance, ${\tilde{\mathrm{P}}}^{ll,a},$ will decrease as ${\mathrm{C}}^{s}$ approaches its optimal value. An illustrative case is provided by Fig. 2b for high representation uncertainty and low instrument uncertainty. Figure 4(b) shows the RKF perceived-minus-truth difference. This is always negative for non-negligible representation uncertainty. This shows the RKF is an overconfident filtering strategy for observations exhibiting error due to unresolved scales. The RKF is most overconfident in regimes of low instrument uncertainty and high representation uncertainty.

## Representation error bias correction through state augmentation

Up to this point we have not considered observation bias in our numerical experiments. However, in operational data assimilation, most observations or their respective observation operators exhibit systematic errors which are referred to as biases. A common approach for correcting observation biases online in a Kalman filter algorithm is to augment the state vector with a bias term (Dee, **2005**; Fertig et al., **2009**). The bias state will then be estimated and evolved along with the state variables through the data assimilation algorithm (Friedland, **1969**).

In section 2.3 we showed that observation errors may exhibit a representation error bias when there is a contribution from the large-scale processes to the small-scale state (i.e., when ${\mathbf{M}}^{sl}\ne {\mathbf{0}}_{{N}_{s}\times {N}_{l}}$). Throughout the remainder of this manuscript we only consider bias due to unresolved scales which is linked to the state-space representation of the small-scale processes. Since we know the exact form and origin of the observation bias in this study we may treat it as a model bias. Therefore, we consider the augmented state vector ** x** with form

**2010**) but is not explored here. In operational centres the model for individual sources contributing to the bias will be unknown and models describing the total bias will be used instead. These models for the bias will be obtained from assumptions imposed on the bias such as assuming it evolves slowly or is constant in time (e.g., Lea et al.,

**2008**). In cases such as these, the bias estimate will likely be poor as the variation of the bias with the evolution of the large-scale processes will be completely unaccounted for.

We now examine how a bias correction scheme can be implemented in conjunction with the SKF (section 7.1) and the RKF (section 7.2) to correct a bias due to unresolved scales. Table 3 summarizes the components for these two filters which are then substituted into the filter equations detailed in section 3.1.

### The Schmidt-Kalman filter with observational bias correction

Bias correction through state augmentation is a common method used in operational centres but use of a bias correction scheme with the SKF, which will be denoted SKFbc, is novel.

We assume that we have knowledge of the processes for all scales. We further assume that we have a model and prior estimate for the bias. The filtered state vector for the SKFbc is given by equation (7.1) and only includes the large-scale state and the bias term. The small-scale state is split into a biased and unbiased component, i.e.

This innovation is unbiased and hence the large-scale analysis errors are also unbiased. The Kalman gain for the SKFbc consists of a large-scale gain ${\mathbf{K}}^{l}\in {\mathbb{R}}^{{N}_{l}\times p}$ and a bias estimate gain ${\mathbf{K}}^{\beta}\in {\mathbb{R}}^{{N}_{s}\times p}$ given by

**D**is given in Table 3. This increases the uncertainty the filter attributes to the forecast innovation compared with the standard SKF. The additional uncertainty is a result of the errors accrued in the estimation of the bias. The term ${\mathbf{H}}^{s}{\mathbf{C}}^{\delta}{({\mathbf{H}}^{s})}^{T}$ in the SKFbc innovation error covariance corresponds to the variability of the unbiased small-scale processes.

The SKFbc equations are obtained from augmenting the large-scale terms with bias terms and defining the cross-covariance terms appropriately. The analysis state update for the SKFbc is then

To obtain the analysis error covariance update we augment the gain (7.6) with ${\mathbf{K}}^{\delta}={\mathbf{0}}_{{N}_{s}\times p}$ and substitute into the short-form analysis error covariance update (3.2). To mirror the SKF analysis error covariance update equations (3.12)-(3.14), we express the SKFbc analysis error covariance update equations as

Since in the context of the SKFbc the complete model evolving all scales is assumed to be known, it is appropriate to update the bias term using this model (7.2). Thus, the forecast state update is given by

For the forecast error covariance we need the model for the unbiased small-scale processes. To determine this model we use the definition (7.3) together with the bias evolution equation (7.2), to give

Note that the small-scale model error ${\mathit{\eta}}_{}^{s}$ is assumed to be unbiased. To mirror the SKF forecast error covariance update equations (3.16)-(3.18), we express the SKFbc forecast error covariance updates as

The prescribed unbiased small-scale error covariance ${\mathbf{C}}^{\delta}$ is assumed constant in time and is not updated.

The SKFbc allows us to correct biases due to unresolved scales and consider the effects of the unbiased small-scale processes on the large-scale state. A key advantage in this method is that the cross-correlations between the large-scale errors and small-scale errors are retained. However, the SKF is a computationally expensive procedure. This issue is exacerbated by the use of state augmentation for bias correction.

### The reduced-state Kalman filter with observation bias correction

To save on the computational expense incurred by the SKFbc we can disregard the unbiased small-scale processes to obtain the reduced-state Kalman filter with bias correction (RKFbc). As before, we augment the large scale state vector with a bias term, so that the estimated state is given by (7.1). The observation operator is also augmented and takes the form,

As with the SKFbc, the forecast innovation (7.5) used in the analysis update (3.1) takes no account of unbiased small scale error. Therefore, a properly specified observation error covariance for the RKFbc contains both instrument error and representation error (i.e. $\mathbf{R}={\mathbf{R}}^{I}+{\mathbf{R}}^{H}$).

Similarly to the SKFbc, the Kalman gain for the RKFbc consists of a large-scale gain ${\mathbf{K}}^{l}\in {\mathbb{R}}^{{N}_{l}\times p}$ and a bias estimate gain ${\mathbf{K}}^{\beta}\in {\mathbb{R}}^{{N}_{s}\times p}$ given by

For the RKF we assumed knowledge of the large-scale processes only. Hence, the model for the bias due to unresolved scales would be unknown and further assumptions required for the observation bias correction scheme. Nevertheless, to provide a direct comparison we will use the same model as the SKFbc (7.11) for the forecast state update. This model and a consistent model error covariance matrix (see Table 3) are used for the augmented analysis error covariance update (3.2).

Comparison of the OKF column in table 1 and the RKFbc column in table 3 shows the two filters have similar components as a result of the bias correction through state augmentation approach. The key difference between the two filters is the model error covariance expressions. The OKF accounts for the uncertainty in all scales and so uses the full model error covariance. The RKFbc accounts for the uncertainty in the large-scales and the estimate of the bias. Since the model for the bias (7.2) has been assumed perfect the RKFbc will only account for large-scale model error. We note that, as no knowledge of the small-scale processes is assumed for the RKFbc, the forecast model will differ in practice from that of the OKF as the RKFbc bias forecast model would come from additional assumptions placed on the bias.

The RKFbc is a computationally cheaper alternative to the SKFbc for online bias correction that takes no account of unbiased small-scale errors, except through the choice of observation error covariance.

### True analysis error equations for bias correcting filters

The true analysis error equation for the SKFbc and RKFbc will differ from the case 2 true analysis error equation (4.4) due to the innovation (7.5). The change will only be in the large-scale part of the true analysis error equations as the small-scale state is not analysed by either filter. The large-scale true analysis error equation for the bias correction filters is obtained from subtracting ${\mathbf{K}}_{k}^{l}{\mathbf{H}}^{\beta}{\mathit{x}}_{k}^{\beta ,f}$ from the case 2 true analysis error equation (4.4) which produces

The true large-scale analysis error covariance for the bias correcting filters is then given by

The difference between the true analysis error covariance for the non-bias correcting filters and (7.19) is that ${\tilde{\mathbf{R}}}_{k}^{H}$ has been replaced with $({\mathbf{H}}_{k}^{s,t}{\mathit{x}}_{k}^{s,t}-{\mathbf{H}}_{k}^{\beta}{\mathit{x}}_{k}^{\beta ,f}){({\mathbf{H}}_{k}^{s,t}{\mathit{x}}_{k}^{s,t}-{\mathbf{H}}_{k}^{\beta}{\mathit{x}}_{k}^{\beta ,f})}^{T}$ which corresponds to the uncertainty due to unresolved scales and the uncertainty in the estimate of the bias. Similarly to (4.5), equation (7.19) is still dependent on ${\mathit{x}}^{s,t}$ and so a different form may be more suitable.

Using the definitions of the small-scale observation operator error (2.9), the error due to unresolved scales (2.11), the small-scale forecast error (3.9) and the identity (4.8) we can rewrite (7.18) as

## Experimental methodology for bias correcting filters

### Gaussian random walk model

To investigate the performance of the SKFbc and the RKFbc we will apply them to the random walk model detailed in section 5.1. To introduce a bias into the observations we will set the contribution from the large-scale processes to the small-scale state ${\mathrm{M}}^{sl}$ to be nonzero in (5.1). As in (5.1), the true observation operator for the large- and small-scale states is $\mathbf{H}=(\begin{array}{cc}1& 1\end{array})$ and consequently the observation operator for the bias state ${\mathrm{H}}^{\beta}=1$ and the unbiased small-scale state ${\mathrm{H}}^{\delta}=1.$

To calculate the true large-scale analysis error variance of the RKFbc and SKFbc we proceed as discussed in section 7.3. Noting that ${x}_{0}^{s,f}$ and ${x}_{0}^{\beta ,f}$ are forecast by the same equation and tend to the same bias for large times we may neglect the variance of ${x}_{k}^{s,f}-{x}_{k}^{\beta ,f}$ and the correlations between ${x}_{k}^{s,f}-{x}_{k}^{\beta ,f}$ and ${\mathrm{e}}_{k}^{s,f}$ as they will be small at the end of the assimilation window.

### Initial conditions and filter parameters

The random walk model with ${\mathrm{M}}^{sl}=0.05$ is used to create a reference or truth trajectory for the large- and small-scale states. For our experiments we set ${x}_{0}^{l,t}=10$ and ${x}_{0}^{s,t}={\mathrm{M}}^{sl}{x}_{0}^{l,t}/(1-\hspace{0.17em}\text{exp}\hspace{0.17em}(-1/2\left)\right).$ This choice for the small-scale truth is the limit of ${x}^{s,t}$ for the deterministic version of the random walk model (i.e. (5.1) with no model noise). Using these initial conditions, the model equivalent of the observations will be biased at each time-step. The initial prior large- and small-scale estimates are set as

## Numerical experiments with bias correcting filters

### Comparison between bias correcting filters and non-bias correcting filters

We now consider the case of assimilating biased observations with standard and bias correcting filters. Figure 5 shows the analyses created by the SKF and SKFbc when assimilating biased observations for a single realization of the background, observation and model errors where ${\mathrm{Q}}^{s}=0.3.$ As optimal modelled small-scale error variances have not been calculated for the random walk model with ${\mathrm{M}}^{sl}\ne 0,$ we set ${\mathrm{C}}^{\delta}={\mathrm{C}}^{s}=0.1.$ These are suboptimal choices for both filters which results in a small difference in the true analysis error variances between the SKFbc (SKF) and RKFbc (RKF). Panel (a) shows an almost constant offset between the solutions of the bias-correcting and non-bias correcting schemes. Furthermore, calculating the time average of the squared analysis errors we find the SKF error is over four times larger than the SKFbc error.

For this realization, the time average of the squared analysis errors for the RKFbc and the SKFbc are the same to two decimal places. However, the SKFbc does have a smaller true large-scale analysis error variance than the RKFbc and the difference increases more as ${\mathrm{C}}^{\delta}$ is more optimally chosen. The same is true for the RKF and SKF with modelled variance ${\mathrm{C}}^{s}.$

In Fig. 5b we see the bias value estimated by the SKFbc and the small-scale true model solution for a particular realization, which is dominated by noise. The bias state ${x}_{k}^{\beta ,a}$ is intended to estimate the expected value of the small-scale state evolved with the filter forecast model such that it is unaffected by small-scale noise (see (7.3)). Using (2.2) we see that $\u3008{\mathit{x}}_{k}^{s,t}\u3009=\u3008{\mathbf{M}}^{sl}{\mathit{x}}_{k-1}^{l,t}+{\mathbf{M}}^{s}{\mathit{x}}_{k-1}^{s,t}+{\mathit{\eta}}_{k}^{s}\u3009$ where the angular brackets indicate the mathematical expectation over the distribution of the small-scale model errors. Here, we have plotted ${x}_{k}^{s,t}$ which is dependent on the large-scale noise and small-scale noise (cf. (2.2)). From this panel we see that the bias estimate is consistent with the small-scale true model solution.

Additional experiments using persistence as the forecast model for the bias state with the SKFbc have been carried out. We find that the time average of the SKFbc squared analysis errors is approximately three times smaller than the time average of the SKF squared analysis errors without bias correction. Nevertheless, the mean-square analysis errors for the SKFbc with the persistence bias model are more than 50% larger than when using (7.2). Additionally, when using persistence as the forecast model for the bias state in the RKFbc we find the time average of the squared analysis errors is also approximately three times less than the SKF error. Hence, for this system it is more important to treat the bias due to unresolved scales than compensate for the unbiased error due to unresolved scales.

### Determining the optimal ${\mathrm{C}}^{\delta}$ over the assimilation window

In this section we determine the optimal ${\mathrm{C}}^{\delta}$ over the whole assimilation window.

For the SKF, we found that the choice of ${\mathrm{C}}^{s}$ was key to the performance of the filter. We follow a similar procedure to section 6.1 to find the optimal values of the unbiased small-scale error covariance ${\mathrm{C}}^{\delta}.$ Our experiments have an assimilation window of 15 time-steps with an observation assimilated at each time-step. To find the optimal ${\mathrm{C}}^{\delta}$ for given parameter values for ${\mathrm{R}}^{I}$ and ${\mathrm{Q}}^{s},$ we calculate the true large-scale analysis error variance of the SKFbc for ${\mathrm{C}}^{\delta}$ ranging from 0 to 1 in steps of $\Delta {\mathrm{C}}^{\delta}=0.001$ and save the value that produces the smallest variance at the final time-step.

Figure 6 shows the optimal ${\mathrm{C}}^{\delta}$ for different values of ${\mathrm{Q}}^{s}$ and ${\mathrm{R}}^{I}.$ The behaviour is qualitatively similar to ${\mathrm{C}}^{s}$ with the SKF shown in Fig. 1 but numerical comparison is not meaningful as a different model is used. In particular, the size of ${\mathrm{C}}^{\delta}$ is primarily determined by the magnitude of ${\mathrm{Q}}^{s}.$ However, we find that an increase in ${\mathrm{R}}^{I}$ can also result in a larger ${\mathrm{C}}^{\delta}$ being optimal. If ${\mathrm{M}}^{sl}$ is increased, the optimal ${\mathrm{C}}^{\delta}$ decreases as the uncertainty caused by the contribution from the large-scale processes to the small-scale state becomes more important.

### Comparison of the bias correction filters

In this section we evaluate the performance of the SKFbc and RKFbc relative to the OKF and examine their perceived error variances.

We now compare the SKFbc and RKFbc with the OKF in terms of relative error percentage (6.1), plotted in Fig. 7. The SKFbc provides most improvement over the RKFbc for large ${\mathrm{Q}}^{s}$ and small ${\mathrm{R}}^{I}.$ This behaviour is qualitatively similar to the comparison between the RKF and SKF with the OKF shown in Fig. 3. We have also examined the perceived and true analysis error variances for the RKFbc and SKFbc (not plotted). The results are qualitatively similar to those given in section 6.2 for the RKF and SKF. Indeed, for non-negligible representation uncertainty the SKFbc (RKFbc) is a conservative (overconfident) filtering strategy as the perceived-minus-truth difference is positive (negative).

## Summary and conclusion

Observations of the atmospheric state may contain information on spatio-temporal scales unable to be represented by a numerical model. The resulting error caused by this scale mismatch between the observations and numerical model is known as the error due to unresolved scales. To obtain accurate analyses from assimilation of these observations requires that the data assimilation algorithm correctly account for this error.

In this work we have considered the ability of linear filters to compensate for the error due to unresolved scales. We considered a finite dimensional true state which could be partitioned into a large-scale state resolved by a numerical model and a small-scale state unresolved by a numerical model. The representation error was defined in this framework and a bias due to unresolved scales was shown to occur when there is a contribution from the large-scale processes to the small-scale state.

For our experiments we considered three filters: the Schmidt-Kalman filter (Janjić and Cohn, **2006**) that analyses the large-scales but models the uncertainty on all scales; the optimal Kalman filter, which analyses all scales, and a reduced-state Kalman filter, which completely disregards the small-scale processes.

The three filters were tested numerically on a random walk model with one variable for the large-scale processes and one variable for the small-scale processes. The observations were taken to be the sum of the large- and small-scale states with added noise to simulate instrument error. To obtain the best performance from the Schmidt-Kalman filter we had to tune the modelled small-scale error covariance to compensate for the variability of the small-scale processes which grew over the first half of the assimilation window. The Schmidt-Kalman filter works best in regimes of high error due to unresolved scales and low instrument error provided a suitable approximate small-scale error covariance is used. Examination of the perceived error variances revealed the analysis uncertainty calculated by the Schmidt-Kalman filter is greater than the true analysis uncertainty when accounting for error due to unresolved scales.

The novel use of the Schmidt-Kalman filter with an observation bias correction scheme was introduced as a means to correct the bias due to unresolved scales. The Schmidt-Kalman filter with a bias correction scheme proved to be a suitable method to treat observation biases and compensate for due to unresolved scales. In our experiments we found it was more important to treat an observation bias than to compensate for an unbiased error due to unresolved scales.

An important note to make regarding these experiments was that we had complete knowledge of the small-scale processes. This allowed for minimal approximations to be made to implement the Schmidt-Kalman filter and to tune the modelled error variances. In an operational setting, where all the small-scale processes are likely to be unknown, further approximations would be required. Additionally, the Schmidt-Kalman filter is a computationally expensive method due to the augmentation and propagation of the state error covariances. This must also be addressed before the filter could be considered for large problems.