Accounting for observation uncertainty and bias due to unresolved scales with the Schmidt-Kalman filter

Data assimilation combines observations with numerical model data, to provide a best estimate of a real system. Errors due to unresolved scales arise when there is a spatio-temporal scale mismatch between the processes resolved by the observations and model. We present theory on error, uncertainty and bias due to unresolved scales for situations where observations contain information on smaller scales than can be represented by the numerical model. The Schmidt-Kalman filter, which accounts for the uncertainties in the unrepresented processes, is investigated and compared with an optimal Kalman filter that treats all scales, and a suboptimal Kalman filter that accounts for the large-scales only. The equation governing true analysis uncertainty is reformulated to include representation uncertainty for each filter. We apply the filters to a random walk model with one variable for large-scale processes and one variable for small-scale processes. Our new results show that the Schmidt-Kalman filter has the largest benefit over a suboptimal filter in regimes of high representation uncertainty and low instrument uncertainty but performs worse than the optimal filter. Furthermore, we review existing theory showing that errors due to unresolved scales often result in representation error bias. We derive a novel bias-correcting form of the Schmidt-Kalman filter and apply it to the random walk model with biased observations. We show that the bias-correcting Schmidt-Kalman filter successfully compensates for representation error biases. Indeed, it is more important to treat an observation bias than an unbiased error due to unresolved scales.


Introduction
In atmospheric data assimilation, observations are combined with numerical model data, weighted by their respective error statistics, to provide a best estimate of the current atmospheric state, known as the analysis. This is achieved through comparison of observations with the numerical model equivalent of those observations. The errors associated with the observation-model comparison are the instrument error and representation error (Janji c et al., 2018). The representation error consists of the preprocessing error, the observation operator error and the error due to unresolved scales that occurs when there is a mismatch between the numerical model resolution and the scales resolved by the observation. The error due to unresolved scales depends on the observation footprint, which could be smaller or larger than the model grid, depending on the observation type and choice of model. For models which contain information on scales smaller than those observed, the standard approach to account for scale-mismatch would be to average the model state over the observation area (Janji c et al., 2018). However, for the purposes of this paper, we focus only on situations where the observation information content includes smaller scales than can be resolved by the model. In order to obtain the best analysis from these observations the representation error must be treated correctly by the data assimilation system.
Methods of accounting for uncertainty due to unresolved scales include, for example, prediction through ensemble statistics (Karspeck, 2016;Satterfield et al., 2017) and the use of a stochastic superparameterization (Grooms et al., 2014). In this manuscript we will consider two approaches: the standard approach where the uncertainty due to unresolved scales is included in the observation error covariance matrix (e.g. Hodyss and Satterfield, 2016;Fielding and Stiller, 2019) and an alternative approach where unresolved processes are considered in state space and hence accounted for through the state error covariance (Janji c and Cohn, 2006).
Compensating for representation error through the standard approach involves using an observation error covariance matrix that takes into account both the instrument and representation uncertainty. This can then be used within a standard variational or sequential data assimilation scheme. Estimates of the observation uncertainty may be obtained using a statistical method, to estimate the entire observation error covariance matrix (e.g. Desroziers et al., 2005;Stewart et al., 2014;Waller et al., 2016aWaller et al., , 2016bCordoba et al., 2017). Alternatively each component of the representation error statistics can be estimated separately and then combined with the instrument error covariance. For example the error due to unresolved scales may be approximated using high resolution observations (Oke and Sakov, 2008) or high resolution model data (Daley, 1993;Liu and Rabier, 2002;Waller et al., 2014;Schutgens et al., 2016).
The Schmidt-Kalman filter (SKF) (Schmidt, 1966) is an example of a filter which uses the statistics of the unresolved processes in state space, without ever evaluating the unresolved state itself, to compensate for the error due to unresolved scales. This approach allows for consideration of flow-dependent correlations between the resolved errors and the unresolved processes at the cost of additional assumptions, approximations and increased computational expense. Janji c and Cohn (2006) have shown that the SKF can produce positive results despite the approximations and assumptions required for implementation in a geophysical context. In this paper we provide new results that determine in which observation and model uncertainty regimes the SKF performs best. In addition we compare the SKF to two other Kalman filtering approaches.
The SKF is deemed a suboptimal filter as it does not minimise the mean-square-error of its estimated states (Janji c and Cohn, 2006). In contrast, the Kalman filter that treats all scales is deemed optimal (for linear models and Gaussian statistics) (Nichols, 2010). In practice, suboptimal filters that do not treat all scales are often used. The analysis error covariances propagated by suboptimal filters are not representative of the true error statistics due to omitted or incorrectly specified filter components. As such, the true analysis error equations have been derived to evaluate the performance of suboptimal filters (e.g. Brown and Sage, 1971;Asher and Reeves, 1975;Asher et al., 1976). In this article we reformulate previous theory on true analysis error equations to include representation error (section 4) and evaluate the performance of the SKF.
A further issue noted by Janji c and Cohn (2006) is the potential for the representation error to be biased. This is because the error due to unresolved scales is sequentially correlated in time and correlated with the state resolved by the model. Other authors have circumvented this bias by careful construction of their numerical model (Janji c and Cohn, 2006). However, in operational data assimilation, most observations are biased and the innovations need to be corrected or the bias accounted for within the assimilation (Dee, 2005). Bias correction can be incorporated into the data assimilation algorithm by augmenting the state vector with a bias term (Friedland, 1969;Jazwinski, 1970;Ignagni, 1981) which can be estimated along with the state variables. This method of bias correction is commonly used with variational data assimilation systems (e.g. Derber and Wu, 1998;Dee, 2004;Zhu et al., 2014;Eyre, 2016) but has also been applied with ensemble data assimilation systems (e.g. Fertig et al., 2009;Miyoshi et al., 2010;Arav equia et al., 2011). To the best of our knowledge a bias correction scheme has yet to be implemented in conjunction with the SKF; in section 7 we introduce a bias-correcting SKF as a new method to compensate for biases due to unresolved scales.
In summary, the objective of this paper is to investigate under which model and observation error regimes the SKF is most effective. The theoretical aspects of representation error will be reviewed in section 2 with particular emphasis on the error due to unresolved scales. Section 3 details how the SKF can be used to account for error due to unresolved scales and introduces the optimal Kalman filter (OKF) and a reduced-state Kalman filter (RKF). In section 4 we state the standard true analysis error equation and reformulate it to include representation error for each filter.
To evaluate the performance of the SKF in a numerical example we use a Gaussian random walk model. The numerical experiment methodology and model formulation are described in section 5 and results are presented in section 6. Our results show that the SKF provides the largest improvement in performance compared with the RKF when there is large error variance due to unresolved scales and small instrument error variance. In section 7 we discuss observation bias correction schemes in sequential data assimilation and introduce a novel SKF with bias correction scheme. The methodology and model formulation for the numerical experiments with biased observations is discussed in section 8 and results are presented in section 9. Our results show the SKF with bias correction can simultaneously treat observation biases and compensate for the error due to unresolved scales. We summarise and draw conclusions from our results in section 10.

Theoretical framework
In this section we introduce a theoretical framework and the notation used in this paper. We begin by describing a numerical model (section 2.1) and observations (section 2.2). In data assimilation, the error statistics used in filters may not reflect the true uncertainties they are intended to model. To help distinguish between these two sets of statistics throughout this manuscript we will define any true error statistics with a tilde ($). Error statistics used in or obtained from filter calculations will be referred to as perceived error statistics and have no tilde.
The mathematical framework used to examine the error due to unresolved scales in this manuscript is to estimate the projection of some state from a high, but finite dimensional real vector space, onto a lower dimensional subspace using observations and knowledge of the system dynamics, following a similar philosophy to Liu and Rabier (2002) and Waller et al. (2014). Our approach differs from that of Janji c and Cohn (2006) which begins from the standpoint of infinite dimensional function spaces.

Model configuration
In this section we introduce the perfect and forecast models. We assume that the phase-space for the large-scale dynamics is a subspace of the phase-space for the full high dimensional system. The complement of the subspace for the large-scales will correspond to the phasespace for the small-scale dynamics. The notation for the models will be in a partitioned form that separates the large and small scales. In particular, we denote the true, complete state at time t k as ð x l, t x s, t Þ T k 2 R Nt such that x l, t 2 R N l , x s, t 2 R Ns and N t ¼ N l þ N s : Here, and throughout this paper, any component with a t-superscript indicates that it is a true variable. The l-and ssuperscripts correspond to the large-and small-scale processes within the complete system dynamics. (We have deviated from the resolved/unresolved nomenclature of Janjic and Cohn 2006 for clarity, since the different filters used in our experiments resolve different scales). An ideal linear model for the true state of a finite dimensional process can be expressed through the dynamical system such that the matrix blocks M l, t 2 R N l ÂN l , M ls, t 2 R N l ÂNs , M s, t 2 R NsÂNs and M sl, t 2 R NsÂN l : From a numerical modelling perspective, this partitioned description of the dynamics would be suited to a pseudospectral discretization (e.g. Fourier modes).
In numerical weather prediction (NWP), the true models that govern the evolution of the atmosphere are unknown and have to be approximated. For our approximation of the true dynamical system (2.1), we assume that any subgrid-scale parameterizations used to approximate the contribution from the small-scale processes to the large-scale state are contained within the large-scale model (Janji c and Cohn, 2006;Janji c et al., 2018). Hence, the model block M ls ¼ 0 N l ÂNs and our approximate dynamical model describing the complete system satisfies In (2.2) each model block has the same dimensions as its true model counterpart. The large-and small-scale model errors are given by g l 2 R N l and g s 2 R Ns respectively. Model errors are assumed to be random and unbiased with covariance given by Here, using hÁi to indicate the mathematical expectation over the corresponding error distribution, the matri- T ) are the true model error covariances of the large-scale, the smallscale and cross-covariances between the large-and smallscale, respectively. We note that for the purposes of this work, the model error distribution is assumed to be stationary, so that e Q is not a function of time.
Analogously, the complete forecast state ð x l, f x s, f Þ T 2 R Nt satisfies The forecast errors can then be defined as where e l, f 2 R N l and e s, f 2 R Ns are the large-and smallscale forecast errors respectively. The true forecast error covariance is denoted This formulation of the complete finite-dimensional dynamics allows us to consider several filters with different approaches to the treatment of large-and smallscales. Moreover, we can consider the interactions between scales and the effect they have on the modelling of observations.

Observations and their uncertainties
In this section we express the equations relating the observations, y k 2 R p , to the model state in a partitioned form and describe their uncertainties. For the rest of this section, we assume that the model state and observations are valid at the same time, and drop the time subscript, k. At time t k , the observations are related to the true model state as where 2 R p is the instrument error, assumed to be random and unbiased with covariance e R I ¼ h T i 2 R pÂp and H l, t 2 R pÂN l and H s, t 2 R pÂNs are the true linear observation operators which map the large-and small-scale states into observation space respectively. The observation operator ð H l, t H s, t Þ is the (linear) finite-dimensional counterpart to the continuum observation operator of Janji c and Cohn (2006). We will not consider nonlinear observation operators in the remainder of this paper. Throughout this paper, we assume that there is no preprocessing error. Hence, we will be concerned with the two cases described in sections 2.2.1 (all scales analysed) and 2.2.2 (large scales analysed) below. Case 1 shows the form of the representation error for filters that resolve all scales and is pertinent to the theoretical optimal Kalman filter discussed in section 3.2. Case 2 shows the form of the representation error for filters typically used in operational practice and is pertinent to the reduced-state Kalman filter and the Schmidt-Kalman filter discussed in sections 3.3 and 3.4 respectively.
2.2.1. Case 1: All scales analysed. In this case we assume that both the large-and small-scale states are estimated. The total observation error (observation departure from the true state), e o , can be expressed as where H l 2 R pÂN l and H s 2 R pÂNs are the blocks of the observation operator used by the filter, acting on the large-and small-scale state components, respectively. Using (2.7), we rewrite e o as where c l ðH l, t À H l Þx l, t is the large-scale observation operator error and c s ðH s, t À H s Þx s, t is the small-scale observation operator error. Thus, the representation error for this case consists solely of observation operator error, c l þ c s : The observation operator errors, c l and c s , will each be assumed to be unbiased, so that in this case, the representation error is also unbiased. The representation error covariance for this case will be denoted by e where we have assumed that the representation error and instrument error are mutually uncorrelated.
2.2.2. Case 2: Large scales analysed. In this case, we assume that only the large-scale state is estimated such that (2.10) where the observation operator used consists only of the block acting on the large-scales. The decomposition of e o can be obtained by setting H s ¼ 0 pÂNs in (2.9): The filter observation operator does not act on the small scales, so the term c s is replaced by H s, t x s, t , the error due to unresolved scales. The representation error for Case 2 is thus c l þ H s, t x s, t with covariance e R H ¼ hðc l þ H s, t x s, t Þðc l þ H s, t x s, t Þ T i 2 R pÂp : Equations (2.10) and (2.11) are analogous to equation (1) in Janji c et al.
(2018) with the pre-processing error omitted. The complete observation error covariance for Case 2 is given by where we have assumed that the representation error and instrument error are mutually uncorrelated. As in Case 1, c l is assumed to be unbiased. However, we will see in section 2.3 that the expected value of H s, t x s, t is likely to be non-zero.

Bias due to unresolved scales
Analysing only the large-scales will result in an error due to unresolved scales (section 2.2.2) that is sequentially correlated in time and correlated with the resolved state, leading to a potential bias (Janji c and Cohn, 2006). Assuming that the large-scale observation operator is unbiased, taking the expectation of the error due to unresolved scales, (2.11), (and reintroducing the time subscript k) results in where hÁi denotes the mathematical expectation over the distribution of representation errors at time k. Using dynamical system (2.1), repeated substitution for the equation governing x s, t into the expected error due to unresolved scales yields þ M s, t ð:::ðM sl, t x l, t 0 þ M s, t ðx s, t 0 ÞÞ:::ÞÞÞi, (2.13) Here the underlined terms represent the contribution from the large scales. For many non-trivial models, these terms will not be identically zero, and potentially introduce a bias even if the initial value for the small-scale state is zero, x s, t 0 ¼ 0: For example, Janji c and Cohn (2006) solved a model of non-divergent linear advection on a sphere using a truncated expansion in spherical harmonics. Introducing a shear flow results in a dynamical system where the unresolved small-scales do not directly influence the resolved large-scales, but the large-scales influence the small-scales. This yields an error and bias due to unresolved scales. Janji c and Cohn (2006) were able to mitigate the bias using specific initial conditions. However, this experimental freedom would not be available in less-idealized situations. Therefore, when accounting for the unresolved scales in data assimilation we must also determine and treat any bias arising. In the new results in sections 5-6 below, we carefully construct our model to avoid bias due to unresolved scales. However, we revisit this problem in sections 7-9 where we use filters with bias-correction schemes.

Sequential linear filters and representation uncertainty
In this section we describe the general linear filtering framework that we use for data assimilation in our theoretical investigations and numerical experiments. We consider three filters in more detail: an optimal Kalman filter (OKF) that takes account of all scales; a reduced-state Kalman filter (RKF) that disregards the small-scales; and the Schmidt-Kalman filter (SKF) that provides analyses of the large-scale state through consideration of both the large-and small-scale uncertainties.

A linear filter
A linear filter algorithm can be divided into analysis update and model prediction steps. The general form of the analysis update at time t k , is given by where x a k is the analysis (state estimate), x f k is the forecast state, K k is the gain matrix and d o, f k ¼ y k À H k x f k is the innovation, defined as the observation-minus-forecast departure. In this general setting we have not defined the dimensions of the vectors and matrices in (3.1), as this will depend on the specific choice of filter. For example, the state x in equation (3.1) could be either the complete state ð x l x s Þ T or just the large-scale state x l : There are various approaches to determine the gain matrix which will be discussed in sections 3.2, 3.3 and 3.4.
The perceived analysis error covariance update calcluated by the filter at time t k is given by where I is the identity matrix, H k is the observation oper- ator and P f k is the perceived forecast error covariance. Equation (3.2) is known as the short form of the analysis error covariance update. This equation only provides the correct estimate of the analysis error covariance if the background and observation error statistics used in the filter reflect the true error statistics. The use of a suboptimal gain and the short form update equation (3.2) will result in the filter producing incorrect error statistics. The true error statistics will be derived in section 4.
For the model prediction step, the forecast state at time t kþ1 is evolved from the analysis at the previous time-step and is given by where M is a linear model. A model error term is not included in the forecast state update as linear filters estimate the mean state and we have assumed that the model error is unbiased. However, the error in the model M is accounted for in the forecast error covariance update given by which will be discussed further in section 4. We note that (3.4) will only produce correct error statistics when P a k and Q are equal to their true statistics counterparts. Equations (3.1)-(3.4) form the core components of the linear filter algorithm. In the following sections we discuss three linear filters, each based on the Kalman filter (Kalman, 1960), that we will use in this paper. Table 1 summarizes the key vectors and matrices used in these three Kalman filters.

The optimal kalman filter (OKF)
For the optimal Kalman filter (OKF), we assume that we are able to model the processes for all scales and know the correct error statistics for the initial state, observations and model. Therefore, the perceived error statistics for the OKF will be equivalent to the true error statistics. The OKF simultaneously updates the large-and smallscale states, x l 2 R N l and x s 2 R Ns , so that the analysis update takes the form

1). The gain matrix for the OKF is partitioned into
large-and small-scale components K l 2 R N l Âp and K s 2 R NsÂp respectively, and is given in Table 1. This is the optimal Kalman gain which minimises the trace of the analysis error covariance (e.g. Nichols, 2010). The analysis error covariance update is calculated using (3.2) with state error covariances with the same block structure as the forecast error covariance (2.6).
As the OKF filters all scales, the total observation error is described by Case 1 (all scales analysed, section 2.2.1). Hence, the observation error covariance for the For the OKF forecast step we use the matrix as our forecast model in (3.3) and the partitioned model error covariance given in Table 1 in the forecast error covariance prediction (3.4).
In summary, the analysis and forecast updates for the OKF state and covariance are a partitioned form of (3.1) -(3.4). By treating all scales in the assimilation the OKF has no error due to unresolved scales in the associated observation equation. However, due to computational constraints and inadequate knowledge of small-scale processes it is not possible to apply this technique in practice. Hence, methods that approximate the influence of small-scale processes must be employed instead.

The reduced-state kalman filter (RKF)
The suboptimal Kalman filter which estimates only the large-scale state and completely neglects the modelling of small-scale processes will be referred to as the reducedstate Kalman filter (RKF).
The analysis and forecast update equations for the RKF are simply the linear filter equations (3.1)-(3.4) where, as described in Table 1, we use the large-scale state, error covariances and observation operator. Thus, the forecast innovation is where the second inequality can be established by adding and subtracting the term H l x l, t : Assuming each error has zero-mean, taking the expectation of the outer product of (3.7) yields the true innovation covariance (i.e. all contributing error covariances are true error statistics). However, the innovation covariance used by the RKF is given by where the large-scale forecast error covariance, instrument error covariance and representation error covariance are perceived error statistics. The influence of any small-scale processes is now accounted for through the representation error covariance R H which needs to be approximated. Reduced-state methods form an attractive approach in situations where computational expense is an important consideration. However, it is necessary to approximate the representation error covariance, R H : Hence, the Kalman gain for the RKF will not minimise the analysis error covariance and the filter will be suboptimal.

The Schmidt-Kalman filter (SKF)
The Schmidt-Kalman Filter (SKF) estimates only the large-scale state, but the statistics of any unmodelled processes are used to determine the Kalman gain for the filtered state (Schmidt, 1966;Janji c and Cohn, 2006). A summary of the relevant equations is included in Table 1.
As only the large-scale state is estimated the forecast innovation is computed using the large-scale state, x l, f , and observation operator, H l : To determine the innovation covariance we start with the innovation (3.7) and add and subtract the term ð H l H s Þð x l, t x s, t Þ T : This allows us to write the innovation in the form The innovation is now written in terms of the observation errors corresponding to case 1 (where all scales are analysed, see section 2.2.1), the large-scale forecast error mapped into observation space, H l e l, f , and the term H s x s, t , the true small-scale state mapped into observation space. Assuming each error and x s, t has zero mean, taking the expectation of the outer product of (3.9) gives the true innovation covariance. The innovation covariance used by the SKF is given by (3.10) Here, we have abused our notation, to write P ls, f as the perceived approximation of hÀe l, f ðx s, t Þ T i, such that P sl, f ¼ ðP ls, f Þ T : Using this notation for the cross-covariances of the SKF is common amongst other literature on the filter (e.g. Janji c and Cohn, 2006;Janji c et al., 2018). Following Janji c and Cohn (2006), we employ a prescribed error covariance C s as a time-independent approximation of hx s, t ðx s, t Þ T i: We note that as the smallscale error covariance is prescribed, the innovation covariance is an inexact approximation. The innovation covariance for the SKF is theoretically the same as the innovation covariance for the RKF (3.8) but expressed in a different form that includes contributions from the small scale processes. The analysis state update for the SKF is given by k is the Schmidt-Kalman large-scale gain. To obtain an analysis error covariance update equation we augment K l with K s ¼ 0 NsÂp and substitute into equation (3.2). This is justified as the unfiltered state is assumed to have a small magnitude. Large uncertainty in the small-scale state or a small magnitude observation operator for this state would also justify this assumption (Simon, 2006). As the short-form analysis error covariance update for the SKF is not symmetric, we calculate P ll, a and P ls, a through the short-form update only and set P sl, a k ¼ ðP ls, a k Þ T : Thus, the SKF analysis error covariance update equations are (3.14) We note that the term ÀK l k H s k will usually be non-zero for the SKF. This term couples the large-scale uncertainty to the small-scale variability. If this term were zero, the large-scale state and uncertainty estimates produced by the RKF and SKF may still differ because of the differing innovation covariances between the filters.
The SKF treatment of the forecast step has a similar philosophy to the analysis step. The state prediction (3.3) is obtained through evolving the large-scale state x l with the large-scale forecast model M l : The large-scale and cross-covariance blocks of the forecast error covariance are calculated using the complete Table 2. Matrices and vectors used in the true error calculations for Case 1 and 2 described in sections 4.1 and 4.2. The tildes indicate true error covariances. Case 1 corresponds to analysing all scales and includes the OKF. Case 2 corresponds to analysing the large scales only and includes the RKF and SKF. The true analysis error equation, analysis error covariance and forecast error covariance for each case are obtained by substituting the corresponding components into equations (4.1), (4.2) and (4.3) respectively.

Case 1 (OKF)
Case 2 (SKF and RKF) Analysis Errors: e a e l, a e s, a e l, a e s, a Model Errors: g g l g s g l g s Observation Errors: The prescribed small-scale error covariance C s is assumed constant in time and is not updated.
The appeal of the SKF is in its ability to compensate for small-scales without estimation of the small-scale state. Practical implementation of the SKF would require the filter to be adapted to nonlinear models. However, even for linear systems, the models evolving the smallscale processes would be unknown and their influence on the error covariances would need to be quantified. Additionally, the propagation of the state cross-covariances poses a considerable computational cost.

Discussion
The OKF, SKF and RKF represent three different approaches for dealing with observation uncertainty due to unresolved scales (see Table 1). The OKF analyses all scales, thus avoiding the error due to unresolved scales altogether, while the RKF completely disregards the small-scale processes and accounts for the error due to unresolved scales through the representation error covariance matrix. The SKF, however, takes a compromise approach where only the large-scale state is estimated, but the uncertainty in all-scales is accounted for in the estimation. Additionally, the SKF accounts for the flowdependence of the correlations between the large-scale errors and small-scale processes (albeit approximately) through the cross-covariances in the analysis and forecast error covariances given in equations (3.13), (3.14), (3.17) and (3.18). Applications where it is a poor approximation to neglect these cross-covariances will benefit the most from using the SKF (as opposed to the RKF where these cross-covariances are neglected).

True analysis error equations
A standard metric for assessing the quality of a data assimilation scheme is through examination of the magnitude of its analysis errors (e.g. Liu and Rabier, 2002). Under an unrealistic and restrictive set of conditions the Kalman filter is known to be optimal in a minimum mean-square-error sense and to produce the true error statistics describing its analysis and forecast (e.g. Todling and Cohn, 1994;Nichols, 2010). In contrast, both the SKF and RKF described in section 3 will incorrectly estimate the true analysis and forecast error covariances due to their treatment of the small-scales in the filter calculations. In this section we extend the existing literature on the true analysis error equations to include representation error so that we may evaluate the analysis obtained through the SKF and RKF.
To obtain the true analysis error equation for a linear filter we assume that we have exact knowledge of the truth and that both the true and filter models and observation operators are linear. Under this regime, the true analysis error at time t k has been derived by Moodey (2013) and is given by where e a k is the analysis error, g k is the model error (see section 2.1), e o k is the total observation error which will be specified for different cases in subsections 4.1 À 4.2 and K k and H k are the Kalman gains and observation operators for the analysis state updates respectively. Therefore, the Kalman gain is calculated from the error statistics perceived by a filter. We assume that e a kÀ1 , g k and e o k each have zero-mean and are mutually uncorrelated. (We note that this assumption excludes a consideration of bias due to unresolved scales. However, this is considered further in section 7). Under these assumptions the true analysis error covariance is obtained through taking the expectation of the outer product of equation (4.1) with itself to give e P a k e a k e a k À Á Here we remind the reader that we have used tildes to indicate true error covariances, to help distinguish these from the covariances perceived by the filters, which may be suboptimal. Equation (4.2) is known as the Josephformula (Gelb, 1974). The true analysis error covariance (4.2) is valid for any gain matrix. The Joseph-formula is equivalent to the short form analysis error covariance (3.2) for the optimal case (OKF) in exact arithmetic. The true analysis error covariance can be calculated separately from the assimilation. In subsections 4.1 À 4.2 we use (4.1) and (4.2) to determine the true analysis error equations and error covariances for Cases 1 and 2 described in sections 2.2.1 and 2.2.2. Case 1 corresponds to analysing all scales and includes the OKF. Case 2 corresponds to analysing the large-scale state only and includes the RKF and SKF. Table 2 summarizes the matrices and vectors used in the true error calculations.

Case 1: true analysis error covariance when all scales are filtered
To obtain the true analysis error equation we assume that we have complete knowledge of all scales as with the OKF. As in section 2.2.1 the observation error will consist of instrument error, , and the observation operator error for large-and small-scales, c l þ c s : Under these assumptions the true analysis error equation will be a partitioned form of equation (4.1) and the true analysis error covariance will be a partitioned form of (4.2) with the components given in column 1 of Table 2.

Case 2: true analysis error covariance when only large-scales are filtered
The true analysis error equation for case 2 applies to filters that estimate the large-scale state only like the RKF and SKF. Using equation (4.1) and the state gain matrices and observation operators, the large-scale analysis error for the RKF and SKF is given by (4.4) We note that the observation errors correspond to case 2 described in 2.2.2 as both the RKF and SKF filter the large-scale state only. Hence, the effect of the small-scale processes on e l, a k in equation (4.4) is determined through the error due to unresolved scales H s, t x s, t k : We observe that the true large-scale error covariance may thus be written in terms of the representation error covariance as e P ll, a k e l, a k ðe l, a k Þ T D E However, the true error statistics contributing to the true analysis error covariance are unknown in practice making the use of (4.5) to evaluate filter performance unfeasible. For theoretical experiments where most error statistics can be prescribed, determining e R H k requires a Monte Carlo approach due to its dependence on x s, t : Alternatively, a different form of the analysis error equation may be more practical.
Assuming we know the true behaviour for the smallscales we can express the true analysis error equation for the RKF and SKF as e l, a e s, a where e l, f k ¼ Me l, a kÀ1 þ g l k and e s, f k ¼ M sl e l, a kÀ1 þ M s e s, a kÀ1 þ g s k : We note that as the small-scale state isn't estimated the small-scale gain is a zero matrix of dimension N s Â p: We also note that, while the large-and small-scale errors ostensibly appear uncoupled in equation (4.6), they are in fact coupled as e s, a k and e s, f k each depend on x s, t k : Adding and subtracting the term K l k H s k e s, f k from the large-scale component of the analysis error, (4.6) may be written as e l, a e s, a (4.7) Using the definitions of the small-scale observation operator error (2.9), the error due to unresolved scales (2.11) and the small-scale forecast error (3.9) we find that Thus the right-hand-side of (4.7) can be evaluated without knowledge of the error due to unresolved scales specifically. Instead, this can be written in terms of the observation operator error and a small-scale forecast: e l, a e s, a (4.9) The partitioned case 2 error equation (4.9) can be used to obtain the true analysis error covariance for the SKF and RKF without knowing the full representation error covariance e R H : However, when using this form of the analysis error equation to obtain the true error statistics the correlations between x s, f and e s, f may be non-negligible. We note that, while x s, f and x s, t will also be unknown in practice, they could be approximated offline with high-resolution models.

Gaussian random walk model
We now consider the methodology for numerical experiments where we apply the three filters to the simple model system such that g l k $ ð0, Q l Þ, g s k $ ð0, Q s Þ, and k $ ð0, R I Þ (Brown and Hwang, 2012). This system uses one variable for the large-scale state, x l , and one variable for the smallscale state, x s . The large-scale state x l and small-scale state x s are random walk variables driven by the errors g l and g s whose structures are determined by the variances Q l and Q s respectively. There are no cross-covariances in the model error statistics. The model component M sl is the contribution from the large-scale processes to the small-scale state. The observations will be taken to be the sum of the largeand small-scale states plus instrument error. The random walk model (5.1) will first be used for a "nature run" from which observations can be created. The filters described in section 3 will then be used to assimilate these observations and the true large-scale analysis error variance calculated at the end of the assimilation window. As the RKF and SKF are suboptimal, they propagate inexact error variances. Therefore, the true error variances for the RKF and SKF are calculated using (4.9) to provide a comparison between their performances.
Through our experimental design we are able to easily control the magnitude of the observation error due to unresolved scales by adjusting Q s : The relationship between Q s and the error due to unresolved scales is described in section 5.3. This framework also allows for the determination of the optimal C s as well as the sensitivity of the SKF to this modelled variance.

Initial conditions and filter parameters
For our experiments, we choose the initial conditions for the true state (nature run) to be x l 0 ¼ 10 and x s 0 ¼ 0 so that the true resolved state is an order of magnitude larger than the unresolved state. Setting the small-scale true state to zero also ensures that the representation errors are initially unbiased.
We set the initial conditions for the forecast state and forecast error covariance to be where a l $ N ð0, P ll, f 0 Þ and a s $ N ð0, P ss, f 0 Þ are perturbations from the true states. We have assumed that the initial large-and small-scale forecast errors are uncorrelated.
For our first set of experiments we set the model component M sl ¼ 0 so that the representation errors remain  unbiased throughout the assimilation. The large-scale model error variance, Q l ¼ 1, is used throughout our experiments while Q s will vary for different experiments.
Observations are assimilated every time-step. The true observation operator is H ¼ ð 1 1Þ which is used by all three filters; this ensures that there is no observation operator error. Unless otherwise specified, the instrument error variance is set to R I ¼ 0:1, and e R I k ¼ R I so that each filter correctly accounts for the instrument error. For the RKF, we set R H ¼ 0 so that the filter completely ignores the small-scale processes. The prescribed smallscale error variance C s is varied throughout our experiments.
To calculate the true analysis error variance with (4.9), we neglect the variance of x s, f and the correlations between x s, f and e s, f as the solution for x s, f is exponentially decaying with time. This method of calculating the true analysis error covariance has been validated against a Monte Carlo approach. For large k, the first term in equation (5.5) decays to zero while the second term tends to the limit e Q s =ð1 À e À1 Þ: Hence, after a burn-in period the error due to unresolved scales for the SKF and RKF is primarily determined by the size of e Q s and increases each time-step.

Numerical experiments
In this section we apply the OKF, RKF and SKF to the random walk model defined in section 5.1 with filter  parameters and error statistics assumptions detailed in section 5.2.

Determining the optimal C s
Before using the SKF, we first need to approximate the matrix C s (see Table 1). To find the optimal value of C s over the whole assimilation window we carry out numerical experiments for a range of values of R I and Q s : Both of these parameters will affect the magnitude of the true large-scale analysis error variance. For each ðR I , Q s Þ parameter pair, we test a number of values of C s to determine the value of C s which gives the smallest true large-scale analysis error variance at the final assimilated observation. As we are calculating the variances only, the calculation is deterministic and the choice of noise realisation is irrelevant. For this experiment, we assimilate 15 observations. We start with C s ¼ 0 and increase C s in steps of DC s ¼ 0:001 until C s ¼ 1: The optimal values of C s that produce the minimum large-scale analysis error variance for the SKF at the final time-step are shown in Fig. 1. The optimal value of C s increases as both R I and Q s increase. In particular, the optimal value of C s is most sensitive to any increase in Q s as the error due to unresolved scales is primarily determined by this error variance in our model. While not as sensitive, we find that large R I also affects the optimal value of C s : This is because the optimal value of C s is a function of R I and Q l after the initial time. We also find that for small R I and Q s the optimal value of C s over the whole assimilation window is similar to e P ss given by equation (5.5) for the final time-step. For large R I and Q s , the optimal value of C s is approximately 1.4 times larger than e P ss evaluated at the final time-step. In operational settings we would not be able to optimise C s in this way. However, it may be possible to approximate part of the representation error covariance Table 3. The filter matrices and vectors for the SKFbc and RKFbc. The equations for the the two filters are obtained through substituting these terms into (3.1)-(3.4).

RKFbc SKFbc
State: State error covariance: P P ll P lb P bl P bb 2 R NtÂNt P ll P lb P ld P bl P bb P bd using high resolution observations (Oke and Sakov, 2008) or model data (Waller et al., 2014) and use these approximate representation error values to guide the choice of C s : To mimic this situation in our experiments, we create an ensemble of 50,000 realizations of x s for the length of the assimilation windows using the random walk model (5.1) and calculate the variance, averaged over the whole ensemble and time. The variance of this ensemble will be denoted S. The variance, S, represents an approximation to the total small-scale variability over the assimilation window. We now compare the values of C s computed in Fig. 1 with the values of S. Figure 2(a) shows the optimal C s values when R I ¼ 0:1 (dashed line) and R I ¼ 0:5 (dotted line) for different values of Q s : The grey region shows all points between S and 2S. As Q s is increased the variance S also increases. Both optimal C s lines lie within the shaded region for nearly all Q s : We note that when there is little small-scale variability (i.e. Q s % 0) the optimal C s values are less than S but both are close to zero. Figure 2(b) shows the effect of changing C s on the SKF true large-scale analysis error variance (solid line) when R I ¼ 0:1 and Q s ¼ 0:35 in comparison to the true large scale analysis error variances for the RKF and OKF. Thus, for these experiments, a reasonable rule of thumb to avoid areas where the SKF under-or overcompensates for the error due to unresolved scales, is to choose S < C s < 2S:

Comparison of the SKF with the RKF and OKF
Using the optimal values of C s calculated in Fig. 1, we now carry out experiments comparing the SKF and RKF for a range of values of R I and Q s relative to the OKF. The results are illustrated in terms of relative error percentage for the RKF in Fig. 3a  where j Á j indicates the absolute value and each term is evaluated at the end of the assimilation window. In these experiments, the SKF always has a true analysis error variance smaller than or equal to the RKF. When there is no error due to unresolved scales (i.e. Q s ¼ 0) we have that C s ¼ 0 is the optimal value for C s and the SKF would reduce to the RKF. The largest relative error percentages for both the RKF and SKF occur when there is large uncertainty due to unresolved scales (large Q s ) and small R I and the smallest differences are when Q s is small. We also find that larger values of R I limit the difference in performance between the RKF and SKF with the OKF. Therefore, the benefits of using the SKF are most apparent when there is considerable error due to unresolved scales and small instrument error. Comparing Fig. 3a to Fig. 3b we see that, for any fixed value of R I , as the uncertainty due to unresolved scales is increased the improvement of the SKF over the RKF will also increase.
To examine the performance perceived by the filter we compare it to the true performance of the filter at the final time-step. Before discussing the results, we note that the SKF perceived analysis error variance will not be a smooth field for the ðR I , Q s Þ parameter pairs considered. This is because in section 6.1 the optimal value of C s was calculated to a limited precision of 0.001.
In Fig. 4 we plot the difference between the perceived and true analysis error variance at the final time-step. We note that the magnitude of the difference between the perceived and true error variances is smallest for large R I and Q s : Here, the SKF (RKF) perceived error variance is approximately 1.25 (0.5) times the size of the true error variance. The SKF perceived-minus-truth difference shown in panel (a) is always positive for non-negligible representation uncertainty. This shows the SKF is a conservative filtering strategy when compensating for observations exhibiting error due to unresolved scales. As both R I and Q s are increased the SKF perceived-minus-truth difference increases. This is due to two reasons. The first reason is because the perceived analysis error variance, P ll, a , increases with larger C s as it is calculated using the short form update (3.2) and the optimal C s will be larger for higher values of R I and Q s : The second reason is because, for non-negligible representation uncertainty, the true analysis error variance, e P ll, a , will decrease as C s approaches its optimal value. An illustrative case is provided by Fig. 2b for high representation uncertainty and low instrument uncertainty. Figure 4(b) shows the RKF perceived-minus-truth difference. This is always negative for non-negligible representation uncertainty. This shows the RKF is an overconfident filtering strategy for observations exhibiting error due to unresolved scales. The RKF is most overconfident in regimes of low instrument uncertainty and high representation uncertainty.

Representation error bias correction through state augmentation
Up to this point we have not considered observation bias in our numerical experiments. However, in operational data assimilation, most observations or their respective observation operators exhibit systematic errors which are referred to as biases. A common approach for correcting observation biases online in a Kalman filter algorithm is to augment the state vector with a bias term (Dee, 2005;Fertig et al., 2009). The bias state will then be estimated and evolved along with the state variables through the data assimilation algorithm (Friedland, 1969).
In section 2.3 we showed that observation errors may exhibit a representation error bias when there is a contribution from the large-scale processes to the small-scale state (i.e., when M sl 6 ¼ 0 NsÂN l ). Throughout the remainder of this manuscript we only consider bias due to unresolved scales which is linked to the state-space representation of the small-scale processes. Since we know the exact form and origin of the observation bias in this study we may treat it as a model bias. Therefore, we consider the augmented state vector x with form (2.13)). For bias correction through state augmentation we require a prior estimate of the bias and a model to forecast it. Using (2.2), the forecast model for the bias state is given by where we have assumed the model for the bias to be perfect. Random noise can be added to (7.2) to indicate that the bias evolution model is not perfect (M enard, 2010) but is not explored here. In operational centres the model for individual sources contributing to the bias will be unknown and models describing the total bias will be used instead. These models for the bias will be obtained from assumptions imposed on the bias such as assuming it evolves slowly or is constant in time (e.g., Lea et al., 2008). In cases such as these, the bias estimate will likely be poor as the variation of the bias with the evolution of the large-scale processes will be completely unaccounted for.
We now examine how a bias correction scheme can be implemented in conjunction with the SKF (section 7.1) and the RKF (section 7.2) to correct a bias due to unresolved scales. Table 3 summarizes the components for these two filters which are then substituted into the filter equations detailed in section 3.1.

The Schmidt-Kalman filter with observational bias correction
Bias correction through state augmentation is a common method used in operational centres but use of a bias correction scheme with the SKF, which will be denoted SKFbc, is novel.
We assume that we have knowledge of the processes for all scales. We further assume that we have a model and prior estimate for the bias. The filtered state vector for the SKFbc is given by equation (7.1) and only includes the large-scale state and the bias term. The small-scale state is split into a biased and unbiased component, i.e.
The unbiased small-scale processes, x d , will be accounted for through their statistics. The full observation operator for the SKFbc is given by where H b 2 R pÂNs and H d 2 R pÂNs are the linear observation operators which map the bias and unbiased smallscale states into observation space respectively. However, as with the SKF, the analysis update equation (3.1) uses a forecast innovation that takes no account of the unbiased small-scales, This innovation is unbiased and hence the large-scale analysis errors are also unbiased. The Kalman gain for the SKFbc consists of a large-scale gain K l 2 R N l Âp and a bias estimate gain K b 2 R NsÂp given by where P lb 2 R N l ÂNs is the perceived cross-covariance between the large-scale errors and bias estimate errors, P ld 2 R N l ÂNs is the perceived cross-covariance between the large-scale errors and unbiased small-scale errors, P bb 2 R NsÂNs is the perceived covariance of the bias estimate errors and P bd 2 R NsÂNs is the perceived crosscovariance between the bias estimate errors and unbiased small-scale errors. The perceived augmented innovation covariance D is given in Table 3. This increases the uncertainty the filter attributes to the forecast innovation compared with the standard SKF. The additional uncertainty is a result of the errors accrued in the estimation of the bias. The term H s C d ðH s Þ T in the SKFbc innovation error covariance corresponds to the variability of the unbiased smallscale processes. The SKFbc equations are obtained from augmenting the large-scale terms with bias terms and defining the cross-covariance terms appropriately. The analysis state update for the SKFbc is then To obtain the analysis error covariance update we augment the gain (7.6) with K d ¼ 0 NsÂp and substitute into the short-form analysis error covariance update (3.2). To mirror the SKF analysis error covariance update equations (3.12)-(3.14), we express the SKFbc analysis error covariance update equations as P ll, a P lb, a P bl, a P bb, a Since in the context of the SKFbc the complete model evolving all scales is assumed to be known, it is appropriate to update the bias term using this model (7.2). Thus, the forecast state update is given by For the forecast error covariance we need the model for the unbiased small-scale processes. To determine this model we use the definition (7.3) together with the bias evolution equation (7.2), to give Note that the small-scale model error g s is assumed to be unbiased. To mirror the SKF forecast error covariance update equations (3.16)-(3.18), we express the SKFbc forecast error covariance updates as 14) The prescribed unbiased small-scale error covariance C d is assumed constant in time and is not updated. The SKFbc allows us to correct biases due to unresolved scales and consider the effects of the unbiased small-scale processes on the large-scale state. A key advantage in this method is that the cross-correlations between the large-scale errors and small-scale errors are retained. However, the SKF is a computationally expensive procedure. This issue is exacerbated by the use of state augmentation for bias correction.

The reduced-state Kalman filter with observation bias correction
To save on the computational expense incurred by the SKFbc we can disregard the unbiased small-scale processes to obtain the reduced-state Kalman filter with bias correction (RKFbc). As before, we augment the large scale state vector with a bias term, so that the estimated state is given by (7.1). The observation operator is also augmented and takes the form, As with the SKFbc, the forecast innovation (7.5) used in the analysis update (3.1) takes no account of unbiased small scale error. Therefore, a properly specified observation error covariance for the RKFbc contains both instrument error and representation error Similarly to the SKFbc, the Kalman gain for the RKFbc consists of a large-scale gain K l 2 R N l Âp and a bias estimate gain K b 2 R NsÂp given by (7.17) where the perceived innovation covariance D 2 R pÂp is given in Table 3. Thus, the analysis state update for the RKFbc is obtained through substitution of the gain matrix (7.17) and forecast innovation (7.5) into the linear filter analysis state update equation (3.1). Likewise, the analysis error covariance update equation is obtained through substitution of the gain matrix (7.17) into the short form analysis error covariance update (3.2) along with the observation operator (7.16).
For the RKF we assumed knowledge of the large-scale processes only. Hence, the model for the bias due to unresolved scales would be unknown and further assumptions required for the observation bias correction scheme. Nevertheless, to provide a direct comparison we will use the same model as the SKFbc (7.11) for the forecast state update. This model and a consistent model error covariance matrix (see Table 3) are used for the augmented analysis error covariance update (3.2).
Comparison of the OKF column in table 1 and the RKFbc column in table 3 shows the two filters have similar components as a result of the bias correction through state augmentation approach. The key difference between the two filters is the model error covariance expressions. The OKF accounts for the uncertainty in all scales and so uses the full model error covariance. The RKFbc accounts for the uncertainty in the large-scales and the estimate of the bias. Since the model for the bias (7.2) has been assumed perfect the RKFbc will only account for large-scale model error. We note that, as no knowledge of the small-scale processes is assumed for the RKFbc, the forecast model will differ in practice from that of the OKF as the RKFbc bias forecast model would come from additional assumptions placed on the bias.
The RKFbc is a computationally cheaper alternative to the SKFbc for online bias correction that takes no account of unbiased small-scale errors, except through the choice of observation error covariance.

True analysis error equations for bias correcting filters
The true analysis error equation for the SKFbc and RKFbc will differ from the case 2 true analysis error equation (4.4) due to the innovation (7.5). The change will only be in the large-scale part of the true analysis error equations as the small-scale state is not analysed by either filter. The large-scale true analysis error equation for the bias correction filters is obtained from subtracting The true large-scale analysis error covariance for the bias correcting filters is then given by e P ll, a The difference between the true analysis error covariance for the non-bias correcting filters and (7.19) is that e R H k has been replaced with ðH s, t k x s, t k Þ T which corresponds to the uncertainty due to unresolved scales and the uncertainty in the estimate of the bias. Similarly to (4.5), equation (7.19) is still dependent on x s, t and so a different form may be more suitable.
Using the definitions of the small-scale observation operator error (2.9), the error due to unresolved scales (2.11), the small-scale forecast error (3.9) and the identity (4.8) we can rewrite (7.18) as e l, a e s, a (7.20) which is analogous to (4.9). In order to use (7.20) to obtain the true error statistics the correlations between e s, f k and H s k x s, f k À H b k x b, f k must be considered.

Gaussian random walk model
To investigate the performance of the SKFbc and the RKFbc we will apply them to the random walk model detailed in section 5.1. To introduce a bias into the observations we will set the contribution from the large-scale processes to the small-scale state M sl to be nonzero in (5.1). As in ( and e s, f k as they will be small at the end of the assimilation window.

Initial conditions and filter parameters
The random walk model with M sl ¼ 0:05 is used to create a reference or truth trajectory for the large-and smallscale states. For our experiments we set x l, t 0 ¼ 10 and x s, t 0 ¼ M sl x l, t 0 =ð1 À exp ðÀ1=2ÞÞ: This choice for the smallscale truth is the limit of x s, t for the deterministic version of the random walk model (i.e. (5.1) with no model noise). Using these initial conditions, the model equivalent of the observations will be biased at each time-step. The initial prior large-and small-scale estimates are set as where a l $ N ð0, P ll, f 0 Þ and a s $ N ð0, P ss, f 0 Þ where we take P ll, f 0 ¼ 1 and P ss, f 0 ¼ 0:1: Similarly, we set the initial prior bias estimate as , P ss, f 0 Þ: We take the initial cross-covariances between the forecast errors for the large-scale and bias state errors to be zero. The modelled unbiased small-scale error variance C d for the SKFbc will be varied for our experiments. We also take the unbiased small-scale errors to be initially uncorrelated with large-scale and bias estimate forecast errors. We set the large-scale model error variance as Q l ¼ 1 throughout our experiments while Q s will be varied. Unless otherwise specified, the instrument error variance will be set to R I ¼ 0:1: For the RKFbc, we set R H ¼ 0 so that the filter completely ignores the unbiased small-scale processes.

Comparison between bias correcting filters and non-bias correcting filters
We now consider the case of assimilating biased observations with standard and bias correcting filters. Figure 5 shows the analyses created by the SKF and SKFbc when assimilating biased observations for a single realization of the background, observation and model errors where Q s ¼ 0:3: As optimal modelled small-scale error variances have not been calculated for the random walk model with M sl 6 ¼ 0, we set C d ¼ C s ¼ 0:1: These are suboptimal choices for both filters which results in a small difference in the true analysis error variances between the SKFbc (SKF) and RKFbc (RKF). Panel (a) shows an almost constant offset between the solutions of the bias-correcting and non-bias correcting schemes. Furthermore, calculating the time average of the squared analysis errors we find the SKF error is over four times larger than the SKFbc error.
For this realization, the time average of the squared analysis errors for the RKFbc and the SKFbc are the same to two decimal places. However, the SKFbc does have a smaller true large-scale analysis error variance than the RKFbc and the difference increases more as C d is more optimally chosen. The same is true for the RKF and SKF with modelled variance C s : In Fig. 5b we see the bias value estimated by the SKFbc and the small-scale true model solution for a particular realization, which is dominated by noise. The bias state x b, a k is intended to estimate the expected value of the small-scale state evolved with the filter forecast model such that it is unaffected by small-scale noise (see (7.3)).
where the angular brackets indicate the mathematical expectation over the distribution of the small-scale model errors. Here, we have plotted x s, t k which is dependent on the large-scale noise and small-scale noise (cf. (2.2)). From this panel we see that the bias estimate is consistent with the small-scale true model solution.
Additional experiments using persistence as the forecast model for the bias state with the SKFbc have been carried out. We find that the time average of the SKFbc squared analysis errors is approximately three times smaller than the time average of the SKF squared analysis errors without bias correction. Nevertheless, the mean-square analysis errors for the SKFbc with the persistence bias model are more than 50% larger than when using (7.2). Additionally, when using persistence as the forecast model for the bias state in the RKFbc we find the time average of the squared analysis errors is also approximately three times less than the SKF error. Hence, for this system it is more important to treat the bias due to unresolved scales than compensate for the unbiased error due to unresolved scales. 9.2. Determining the optimal C d over the assimilation window In this section we determine the optimal C d over the whole assimilation window.
For the SKF, we found that the choice of C s was key to the performance of the filter. We follow a similar procedure to section 6.1 to find the optimal values of the unbiased small-scale error covariance C d : Our experiments have an assimilation window of 15 time-steps with an observation assimilated at each time-step. To find the optimal C d for given parameter values for R I and Q s , we calculate the true large-scale analysis error variance of the SKFbc for C d ranging from 0 to 1 in steps of DC d ¼ 0:001 and save the value that produces the smallest variance at the final time-step. Figure 6 shows the optimal C d for different values of Q s and R I : The behaviour is qualitatively similar to C s with the SKF shown in Fig. 1 but numerical comparison is not meaningful as a different model is used. In particular, the size of C d is primarily determined by the magnitude of Q s : However, we find that an increase in R I can also result in a larger C d being optimal. If M sl is increased, the optimal C d decreases as the uncertainty caused by the contribution from the large-scale processes to the small-scale state becomes more important.

Comparison of the bias correction filters
In this section we evaluate the performance of the SKFbc and RKFbc relative to the OKF and examine their perceived error variances.
We now compare the SKFbc and RKFbc with the OKF in terms of relative error percentage (6.1), plotted in Fig. 7. The SKFbc provides most improvement over the RKFbc for large Q s and small R I : This behaviour is qualitatively similar to the comparison between the RKF and SKF with the OKF shown in Fig. 3. We have also examined the perceived and true analysis error variances for the RKFbc and SKFbc (not plotted). The results are qualitatively similar to those given in section 6.2 for the RKF and SKF. Indeed, for non-negligible representation uncertainty the SKFbc (RKFbc) is a conservative (overconfident) filtering strategy as the perceived-minus-truth difference is positive (negative).

Summary and conclusion
Observations of the atmospheric state may contain information on spatio-temporal scales unable to be represented by a numerical model. The resulting error caused by this scale mismatch between the observations and numerical model is known as the error due to unresolved scales. To obtain accurate analyses from assimilation of these observations requires that the data assimilation algorithm correctly account for this error.
In this work we have considered the ability of linear filters to compensate for the error due to unresolved scales. We considered a finite dimensional true state which could be partitioned into a large-scale state resolved by a numerical model and a small-scale state unresolved by a numerical model. The representation error was defined in this framework and a bias due to unresolved scales was shown to occur when there is a contribution from the large-scale processes to the small-scale state.
For our experiments we considered three filters: the Schmidt-Kalman filter (Janji c and Cohn, 2006) that analyses the large-scales but models the uncertainty on all scales; the optimal Kalman filter, which analyses all scales, and a reduced-state Kalman filter, which completely disregards the small-scale processes.
The three filters were tested numerically on a random walk model with one variable for the large-scale processes and one variable for the small-scale processes. The observations were taken to be the sum of the large-and small-scale states with added noise to simulate instrument error. To obtain the best performance from the Schmidt-Kalman filter we had to tune the modelled small-scale error covariance to compensate for the variability of the small-scale processes which grew over the first half of the assimilation window. The Schmidt-Kalman filter works best in regimes of high error due to unresolved scales and low instrument error provided a suitable approximate small-scale error covariance is used. Examination of the perceived error variances revealed the analysis uncertainty calculated by the Schmidt-Kalman filter is greater than the true analysis uncertainty when accounting for error due to unresolved scales.
The novel use of the Schmidt-Kalman filter with an observation bias correction scheme was introduced as a means to correct the bias due to unresolved scales. The Schmidt-Kalman filter with a bias correction scheme proved to be a suitable method to treat observation biases and compensate for due to unresolved scales. In our experiments we found it was more important to treat an observation bias than to compensate for an unbiased error due to unresolved scales.
An important note to make regarding these experiments was that we had complete knowledge of the smallscale processes. This allowed for minimal approximations to be made to implement the Schmidt-Kalman filter and to tune the modelled error variances. In an operational setting, where all the small-scale processes are likely to be unknown, further approximations would be required. Additionally, the Schmidt-Kalman filter is a computationally expensive method due to the augmentation and propagation of the state error covariances. This must also be addressed before the filter could be considered for large problems.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
Z. Bell was supported by a University of Reading PhD studentship. S. L. Dance and J. A. Waller were supported in part by UK EPSRC grant EP/P002331/1 (DARE). We would also like to thank two anonymous referees whose comments have helped improve the quality of this paper.