^{*}

^{a}

^{a}

The relevance of climatological background error statistics for mesoscale data assimilation has been investigated with regard to basic assumptions and also with regard to the ensemble generation techniques that are applied to derive the statistics. It is found that background error statistics derived by simulation through Ensemble Data Assimilation are more realistic than the corresponding statistics derived by downscaling from larger scale ensemble data. In case perturbation of observations is used to inject a spread into the ensemble, and the ensemble is integrated over a few hours only, it was found that the derived structure functions may be contaminated by the geometry of the observing network. The effects of the assumptions of stationarity, homogeneity and isotropy, that are generally applied in the generation of background error statistics, and the implications of the background error covariance model have also been illustrated. Spatial covariances derived under these assumptions were contrasted against spatial covariances obtained by ensemble averaging only, preserving the signals from forecast errors of the day. This indicates that it is likely to be favourable to apply data assimilation with ensemble background error statistics obtained from ensemble averaging, like in ensemble Kalman filters or in hybrids between variational and ensemble data assimilation techniques.

The number of observations for initialization of Numerical Weather Prediction (NWP) models is generally much smaller than the number of model state variables. For this reason,

It was early recognized that ideal BGE statistics needs to be flow dependent since forecast errors have a strong time variability. For most operational NWP systems, however, data assimilation schemes have been based on flow independent BGE statistics, derived by averaging in time. Besides this implicit assumption of stationarity, other simplifying assumptions like horizontal homogeneity and isotropy are also generally applied.

A model for the BGE statistics of limited area forecast models was formulated by Berre (

In this paper, we will investigate the application of the Berre (

Two sets of BGE statistics based on two different time series of short-range ensemble forecasts will be compared in the present study. One time series of ensemble forecasts is obtained by downscaling of a global ECMWF ensemble that was derived through an Ensemble of Data Assimilations (EDA) with different perturbations of observations for each ensemble member. The second time series of ensemble forecasts is a high resolution ensemble derived by EDA for the HARMONIE model. The characteristics of the two sets of ensembles and their suitability for BGE statistics will first be investigated in sections 2–3 and then the derived statistics will be described and compared in sections 4–5. The effects of the assumptions of stationarity, homogeneity and isotropy will be demonstrated and discussed in

The development of high resolution models with a grid separation of a few km has made it possible to study model spectra ranging from the global scales down to scales resolved by these models (Skamarock et al.,

In case a high-resolution model is initialized with data from a coarser resolution model or data assimilation (“down-scaling experiment”), the model integration will start with no energy in the smallest resolved scales, and during the initial model integration the energy content in these scales will adjust to the larger scale input energies such that the simulated model energy spectra will approach the theoretically predicted spectra. This initial spectral adjustment or spectral spinup process may last for as long as 6–24 hours (Skamarock et al.,

The HARMONIE forecasting model is a limited area spectral model (Bubnova et al.,

We will thus calculate the second order structure function

The structure function

The HARMONIE model was applied for the month of August 2011 at a horizontal resolution of 2.5 km over the domain illustrated in

Model domain for the HARMONIE ensemble experiments (739 × 949 horizontal grid points with a 2.5 km grid resolution).

The first set of HARMONIE ensemble forecasts was generated by a direct downscaling of the ECMWF EDA ensemble forecasts at 06 UTC and 18 UTC. The HARMONIE forecasts were integrated up to +12 h. The second set of HARMONIE ensemble forecasts were generated from initial data obtained by a 3 D-Var data assimilation scheme (Fischer et al.,

As a diagnosis of the spectral characteristics of the two sets of HARMONIE ensemble forecasts, we will show structure functions for the longitudinal component of the wind in the east-west direction calculated in accordance with

Structure functions for the longitudinal component of the wind in the east-weast direction. Horizontal domain average at model level 47 (

We may first notice that the downscaling structure function curves are associated with a strong spinup during the initial 6–12 h of the forecast. This is mainly because the HARMONIE forecast model, in the downscaling experiment, was initialized with ECMWF coarse resolution data. A similar spinup effect was reported by Skamarock et al. (

Structure functions for the longitudinal component of the wind in the east-west direction. Horizontal domain average at model level 47 (

Secondly, we can notice in

Comparing the two sets of structure functions for downscaling and EDA, we may notice that EDA curves have slightly larger values for a mesoscale range of grid separations 50–200 km, see

The model for the BGE statistics in the HARMONIE data assimilation was formulated by Berre (

The assumption that spectral components for different wave numbers are statistically independent has the convenient consequence that horizontal covariances can simply be expressed as the variances for different wave components. If we furthermore assume horizontally isotropy with respect to the covariances, the horizontal covariances can be represented by 1-dimensional variance density spectra for the control variables. We will study such variance density spectra below.

The balance operator (or statistical de-correlation operator) _{s}_{b}_{u}

In summary, the model for the background error statistics include: (1) The balance operators H, M, N, P, Q, R, and S; (2) Horizontal variance density spectra for the control variables (

HARMONIE BGE statistics have been derived for a Nordic domain (

Here we will discuss the estimation of BGE statistics for 06 UTC initial data only, forecasts were run up to +12 h for the whole month of August 2011 and a 6 h data assimilation cycle was applied in the case of the HARMONIE EDA run. Both the downscaling forecast and the HARMONIE EDA run included 6 ensemble members. The following ensemble member model state differences were taken as proxies for background forecast errors in the estimation of the BGE statistics: member 1 - member 2, member 2 - member 3, member 3 - member 4, member 4 - member 5, member 5 - member 6 and member 6 - member 1. This combination of difference fields is linearly dependent, but it is kept in order to apply the same efficient weight for each ensemble member. The only effect of the linear dependency will be a scaling of the resulting statistics (a factor 2.0 for variances).

^{2}, where k* is the one-dimensional wave number. This makes the amplitude of the spectral densities proportional to rotational kinetic energy.

Horizontal spectral variance densities of +1 h, +3 h, +6 h and +12 h vorticity background error as estimated by the ensemble downscaling (top) and the Ensemble Data Assimilation (EDA, middle) techniques. Model level 35. Spectral variance densities of +12 h vorticity background error as estimated by the downscaling and EDA techniques (bottom). The spectral variance densities have been normalized with k*^{2}, where k* is the one-dimensional wave number.

With regard to the variations in the EDA spectra depending on background error forecast length, influences from at least two processes can be identified in

Horizontal spectra for unbalanced divergence and unbalanced temperature (not shown), unbalanced surface pressure (

Spectral variance densities of +1 h, +3 h, +6 h and +12 h unbalanced surface pressure background error as estimated by the ensemble downscaling (top) and the Ensemble Data Assimilation (EDA, middle) techniques. Model level 35. Spectral variance densities of +12 h unbalanced surface pressure background error as estimated by the downscaling and EDA techniques (bottom).

Spectral variance densities of +1 h, +3 h, +6 h and +12 h unbalanced humidity background error as estimated by the ensemble downscaling (top) and the Ensemble Data Assimilation (EDA, middle) techniques. Model level 35. Spectral variance densities of +12 h unbalanced humidity background error as estimated by the downscaling and EDA techniques (bottom).

From the unbalanced humidity spectral variance densities (

The spectral variance densities for vorticity and unbalanced divergence at model level 35 are compared in

Spectral variance densities of +12 h vorticity and unbalanced divergence background errors as estimated by the ensemble downscaling and the Ensemble Data Assimilation (EDA) techniques. Model level 35. The spectral variance densities have been normalized with k*^{2}, where k* is the one-dimensional wave number.

Background error variances for the different variables can be obtained by summing the spectral variance densities over horizontal wave numbers. This can be carried out for the unbalanced component as well as for the different balanced components of the different variables.

Background error standard deviations for vorticity, +3 h and +12 h, downscaling and EDA.

Standard deviations for various components of the temperature background error are shown in

Background error standard deviations for temperature, +3 h and +12 h, EDA; The component balanced with vorticity (via linearized geopotential _{b}

Balance relations with the temperature involved are strongly dependent on horizontal scale. This can be seen in

Percentages of temperature background error variances explained by vorticity and unbalanced divergence as a function of horizontal wave number. EDA +3 h and EDA +12 h.

The horizontal wave length dependencies of the balance relations with the (logarithm of) surface pressure involved are illustrated in

Percentages of surface pressure background error variances explained by vorticity and unbalanced divergence as a function of horizontal wave number. EDA +3 h and EDA +12 h.

A second likely model deficiency is also indicated in

This problem of small-scale noise raises many questions:

- The HARMONIE AROME model uses a linear grid (the shortest wave in the spectral model representation is 2 grid lengths). This was introduced together with the semi-Lagrangian advection, since aliasing due to the quadratic Eulerian advection no longer is present. But the model, and in particular the physics, include many other non-linearities of even higher order than the quadratic advection. The small-scale noise that we observe in

- It seems over-optimistic to explicitly describe the effects of convective cells on the kilometre-scale with a model using a 2.5 km grid resolution. Taking our consideration that the model can accurately describe phenomena at a horizontal scale of 5–6 grid lengths, a model grid resolution of the order of 500 m would be needed.

Balance relations with moisture involved are illustrated in

Percentages of humidity background error variances explained by vorticity, unbalanced divergence and unbalanced temperature and surface pressure as a function of horizontal wave number. Downscaling +3 h, +12 h (top), EDA +3 h, +12 h (bottom).

The model for the HARMONIE BGE statistics is based on the three assumptions of stationarity, homogeneity and isotropy with respect to horizontal covariances. In this section, we will investigate the effects of these assumptions. We will take temperature at a few model levels as examples, and we will use the HARMONIE EDA ensemble data from 2–31 August 2011, 06 UTC + 12 h only, for a demonstration.

Orography (upper left). Horizontal correlations for temperature at model level 35 (

Although the smooth statistics obtained through the assumptions of homogeneity and isotropy may be attractive from a computational stability point of view, one may ask how representative the derived stationary, homogeneous and isotropic horizontal correlations are for real forecast error correlations with a strong case-to-case variability. To illustrate this, we have selected a single case, 12 August 2011 06 UTC + 12 h, and we show in

Temperature at model level 35 (

The effects of the assumption of horizontal homogeneity are very strong in this particular case, compare

For atmospheric variables close to surface of the earth, one would expect horizontal correlations that are non-isotropic and inhomogeneous, depending on orography and on land/sea differences, for example. This has been demonstrated with the use of observational data for standalone mesoscale analysis (Häggmark et al.,

Orography of a sub-domain over the Western Baltic Sea including the east coast of mainland Sweden and the Islands of Öland and Gotland. Horizontal correlations for temperature at model level 62 (a few hundred meters above the ground) with a reference point over central Gotland. Time averaged correlations for August 2011 (upper right) and averages over ensemble members only for 12 August 2011 06UTC +12 h (lower left) and 21 August 2011 06UTC +12 h (lower right).

For individual cases, the horizontal correlations in

We will now evaluate the relevance of the background error covariance model through its impact on the forecast quality. To measure the forecast quality in a statistical sense, we have conducted a series of one-month HARMONIE 3 D-Var experiments for the period from the 13th of June 2016 until the 13th of July 2016. We use standard verification scores, Bias and Root Mean Square Error (RMSE), averaged over the last 24 days. All experiments run in a data assimilation cycling mode with a 3 h update frequency. The domain is centred over Denmark, it covers an area of 1620 × 1620 km^{2} and employs a linear grid with a 2.5 km grid resolution.

In the first series of experiments, we use identical climatological structure functions but we restrict analysis increments to a coarse horizontal resolution. Such sensitivity experiments are easy to carry out with the HARMONIE data assimilation, since the assimilation control variables are formulated in spectral space. For a grid-point representation of horizontal covariances through horizontal diffusion operators, this would, for example, require a reformulation of these diffusion operators. Thus, the observations filtered through the background error covariance influence the initial model state for a specified range of scales only. At the same time, the impact from observations is redistributed to the restricted range of spatial scales. The experiments are summarized in

Summary of the conducted experiments and their abbreviations. _{s}

_{100}

_{50}

_{10}

_{s}

The domain geometry with a linear grid allows 324 one-dimensional isotropic waves in the horizontal. An elliptic truncation is used to relate a one-dimensional isotropic wave to 2 D harmonics. For example, in _{50}_{50}

_{100}_{50}_{10}

Temperature analysis increment at model level 47 obtained from the original (upper left) and from the restricted data assimilation schemes: _{100}_{50}_{10}

Three weeks average of Bias and RMSE scores for surface pressure forecasts from _{100}_{50}_{10}

_{100}

_{50}

_{10}

The results are even more pronounced for other model variables than surface pressure with smaller scale spatial variability. _{10}_{10}

The analysis of the specific humidity at model level 25 obtained from original (left plot) and from the restricted _{10}

In _{100}_{50}_{10}

Time and domain averaged Bias and RMSE scores of 12 h accumulated precipitation for _{100}_{50}_{10}

Small scale structures of the analysis increment of course help to get a closer fit of the initial state to observations. _{100}_{50}_{50}_{,} with the analysis increment filtered to 16.2 km, provides a clearly superior +24 h forecasts. Experiment _{10}_{,} with analysis increments filtered to 81 km, has lost information relevant for moisture, and has the worst scores.

Time and domain averaged Bias and RMSE scores of RH at 850 hPa level for _{100}_{50}_{10}

Wind field verification scores (valid at 00 UTC): Bias and RMSE scores of wind speed; _{100}_{50}_{10}

To get a better understanding of what impact the background error balance model (_{100}_{,} with the analysis increment filtered to 8.1 km) and we progressively switch off different balance operators in the background error covariance model. _{100}

Summary of the conducted statistical balance experiments and their abbreviations. “+” sign denotes what balance operators are included in the background error covariance model for each of experiments. See

_{100}

Time and domain averaged Bias and RMSE scores of wind speed at the 850 hPa level for the _{100}

It is interesting to see the impact of the background error covariance model on the development of the meso-scale and synoptic scale systems. _{100}

Analysis of specific humidity on the model level 25 obtained from _{100}

From investigations performed in

_{b}_{b}_{b}

Percentage of surface pressure variance explained by vorticity and unbalanced divergence derived from the

Probably a long-time integration (see

The results presented in this paper are primarily relevant for the HARMONIE forecasting system, although some conclusions might be quite general. The presented results are furthermore obtained from experiments that are subject to certain simplifications, assumptions and known deficiencies that might influence the results:

We have noted a rather week growth of the HARMONIE ensemble spread between +3h (or even +1h) and +12h of the model integration. It may be speculated whether this problem is associated with the lateral boundary conditions (LBCs), that are of strong importance over the small HARMONIE model domain. The LBCs are taken from 3-12h forecasts of the ECMWF 4D-Var based EDA runs at 06 UTC and 18 UTC. These 3-12h forecasts are efficiently within the 12h 4D-Var assimilation windows, and could be considered as analyses influenced by observations. This may explain the weak growth of the ensemble spread over the small HARMONIE domain.

The horizontal 2-dimensional Fourier transforms used in the formulation of the background error covariance model require an extension zone in both horizontal directions when they are applied over the limited area model domain in order to obtain a bi-periodicity. In the current version the extension zone is eleven grid distances wide (27.5 km). Such a narrow extension zone may lead to strong artificial gradients close to the boundaries. We cannot exclude that a wrap-around of information in the analysis increment, due to the periodicity assumption, and artificial gradients may impact the results. The sensitivity of the forecast quality to the width of the extension zone is a subject of on-going work.

We have noticed that HARMONIE forecasts are contaminated by a high-frequency and small scale (^{– 1}.

There are several other pragmatic decisions that have been taken in order to make the work affordable. For example, only conventional observations are assimilated in the experiments for which we have presented verification scores in

In the experiments described in _{100}_{100}

The generation of structure functions in

The generation of ensemble variability through observation perturbations in the Ensemble Data Assimilation (EDA) approach and the direct generation of model space perturbations in the BRAND (B-matrix RANDom) approach are both dependent on the applied BGE statistics (the B matrix). Longer time integrations may help to establish ensemble perturbations more representative of “real” forecast errors (as in the NMC method). An iterative approach to determine new BGE statistics could also be investigated, although convergence is uncertain.

The relevance of climatological BGE statistics for mesoscale data assimilation has been investigated. We have found limitations in the usefulness of such climatological BGE statistics, both with regard to basic covariance modelling assumptions and also with regard to the simulation techniques used to estimate parameters of BGE covariance models. From the results presented and discussed in this study, we can conclude that the BGE statistics derived by simulation through Ensemble Data Assimilation are more realistic than the corresponding statistics derived by downscaling of ECMWF EDA data, since the downscaling data are associated with a model spinup, that is not completed even after 12 hours of model integration.

We have also seen that structure functions derived from an ensemble of model state differences are very sensitive to the way the ensemble is generated. A few hours of model integration with a high-resolution model are not enough to spread the variability injected on a particular scale to the entire model space. For example, when a perturbation of observations is used to inject a spread into the ensemble, the structure functions are contaminated by the geometry of the observing network. In case of spatially incomplete observing networks this might project arbitrary background error structures into the ensemble instead of sampling forecast errors developed by the model.

Furthermore, we have illustrated that the generation of BGE statistics, in particular statistical balance operators, with model simulation techniques is very sensitive to small scale noise generated during the model simulations. For this reason, one needs to restrict the use of such BGE statistics to spatial scales that are resolved properly by the model, i.e. scales larger than 3-5 Δs, with Δs being the model grid resolution.

Finally, we have illustrated the effects of the basic assumptions of stationarity, homogeneity and isotropy that are generally applied in the generation of BGE statistics. Spatial covariances derived under these assumptions were contrasted against simplified examples of spatial covariances obtained by ensemble averaging only, preserving the signals from forecast errors of the day. This indicates that it is likely to be favourable to apply data assimilation with ensemble background error statistics obtained from ensemble averaging, like in ensemble Kalman filters or in hybrids between variational and ensemble data assimilation techniques. At the same time, one should be aware that the quality and the affordable number of members of the ensemble systems for operational applications, in particular for short forecast ranges, is premature and not sufficient, despite a number of different proposed measures (for example lagging, inflation and localisation). Climatological background error statistics will furthermore continue to play an important role in the operational data assimilation in order to assure enough search directions during the minimization process (Gustafsson and Bojarova,

Many thanks to Magnus Lindskog and Martin Ridal for help with generation and access to ensemble information and many thanks to Nedjeljka Zagar for inspiration to start this study.

Abdelwaheb Hannachi was the responsible Subject Editor for the handling of this paper.