In recent years, it was demonstrated that lakes affect local weather conditions and regional climate (Krinner, 2003; Eerola et al., 2010; Samuelsson et al., 2010). With the increasing horizontal resolution of numerical weather prediction (NWP) and climate models to the kilometre scale, more and more attention is being paid to the interaction between the atmosphere and lakes. Over lakes, turbulent and radiative fluxes are dependent on the water surface temperature and the presence of ice (Mironov, 2008; Rontu et al., 2012). Their influence is essential for countries with a high percentage of lakes, such as Finland, Sweden, Canada, Norway and Russia.
To account for the effects of lakes in atmospheric models, physically sound and computationally cheap lake models (parameterisation schemes) are used. The most commonly used lake parameterisation scheme is the lake model Freshwater Lake (FLake) (Mironov, 2008), with a two-layer parametric representation of the water temperature profile. External parameters for the lake scheme are provided by the Global Lake Database (Kourzeneva et al., 2012a; Choulga et al., 2014). For NWP initialisation at the very first simulation, lake climatology is developed (Batrak, 2012; Kourzeneva et al., 2012b). FLake is widely used in NWP and climate modelling for research (Salgado and Le Moigne, 2010; Samuelsson et al., 2010; Balsamo et al., 2012; Martynov et al., 2012). It is also included in operational NWP model runs in some National Weather Service centres (Mironov et al., 2012; Rontu et al., 2012). However, it has not become a standard operational practice in NWP. There are several reasons for this, including too simplified data assimilation (DA) methods (borrowed from ocean surface state representation). Through initialisation of the prognostic lake model variables, DA techniques should provide the analysis with corrected errors which come from an unknown initial state. Currently used DA techniques do not fulfil this task, although model errors for the lake surface state may be quite large in NWP (Batrak, 2012; Kourzeneva et al., 2012b).
In-situ or remote-sensing observations of different water quantities are a significant source of information about the state of lakes. Observations of the lake water surface temperature (LWST) and ice cover are mostly important for NWP and climate modelling since these values control energy and moisture exchanges between the atmosphere and lake surface [see, e.g., Rontu et al. (2012)]. Space-borne observations contain information about LWST and ice cover. They cover large territories and have spatial resolutions from kilometres to tens of metres and temporal resolutions of several hours. However, they suffer from coverage gaps and different errors, for example, from cloud contamination or skin effects (Kheyrollah Pour et al., 2014a). In-situ data may contain measurements of different physical characteristics of water and ice, for example, the ice depth and the temperature profiles in water, ice and snow. They contain fewer errors compared with satellite observations. But they are very sparse and produced mainly for research purposes. On an operational basis, the LWST of only about 30 lakes in Finland is measured in-situ by the Finnish Environmental Institute (SYKE) [see Kheyrollah Pour et al. (2014b), Rontu et al. (2012) and Eerola et al. (2010) for details].
To combine optimally information from a model with observations, DA methods may be used. Experiments performed in Kheyrollah Pour et al. (2014b), Rontu et al. (2012) and Eerola et al. (2010), where different LWST and ice cover observations were treated by the NWP model in quite a simplified way, showed that assimilation of LWST observations may improve the description of the lake surface state and the quality of a weather forecast. For example, in a winter case study over Lake Ladoga, a better description of ice cover due to the assimilation of remote-sensing data reduced the screen level temperature forecast errors by up to 5°C for several selected stations in Finland (Kalle Eerola, personal communications 2014). DA systems are widely used in meteorology (Kalnay, 2003), in land surface modelling (Houser et al., 2010) and in oceanography (Haines, 2010). In hydrology or limnology, there are some studies devoted to DA for inland water bodies (Zhang et al., 2007; Stroud et al., 2009). However, the focus of their research is confined to specific lakes. They concentrate mainly on circulation characteristics in one large lake, using 3D lake models and DA techniques used for oceanography. Unfortunately, 3D lake models cannot be used to parameterise all lakes in an atmospheric model domain which differ in size, form and depth. Hence, oceanographic techniques are not applicable for lakes in NWP. Although lakes belong to the hydrosphere and lake modelling is close to oceanography, the atmosphere ‘sees’ the lake surface as part of a very heterogeneous land surface. Since simplified 1D or even bulk lake models are used for parameterisation purposes in NWP, methods used to assimilate LWST observations should be closer to those applied in the 1D aspect of land surface DA (Mahfouf, 1991; Rhodin et al., 1999; Hess, 2001; Mahfouf et al., 2009; de Rosnay et al., 2013).
Thus, assimilation of lake observations includes two aspects. The first aspect is to spread information in the horizontal direction, that is, to interpolate data from observational points or from the satellite image grid to the atmospheric model grid. This task is similar to the analysis of the screen level temperature, relative humidity and snow depth in land surface analysis, or to the 2D analysis of the sea surface temperature (SST) [see, e.g., Donlon et al. (2012), Homleid (2009), Brasnett (1999), Navascues (1997) and Gustafsson (1985)]. The main method used in this kind of analysis is optimal interpolation (OI) (Gandin, 1965). However, in contrast with sea water surface properties, lake water surface properties are very heterogeneous. Interpolation of LWST between two neighbouring lakes differing in size, depth and elevation is rather questionable. Perhaps the anisotropic surface conditions may be accounted for by using structure functions dependent not only on the separation in the horizontal (as for SST) but also on differences in depth and elevation. In practice this aspect also includes data quality control, processing of very dense satellite observations and questions of consistency between land-water masks. These problems are discussed in detail in Kheyrollah Pour et al. (2014b).
The second aspect is to propagate information about the lake surface state in the vertical direction, in the lake model space. In other words, to redistribute innovations (departures between observed and simulated values) of LWST in the lake water vertical profile (1D aspect). Mathematically, this task is analogous to the analysis of the soil moisture [see, e.g., Mahfouf et al. (2009) and Hess (2001)], and the same algorithms can be applied. In the case of soil moisture analysis, several methods of different complexity exist, starting from OI (Mahfouf, 1991) to variational methods and extended Kalman filtering (Rhodin et al., 1999; Hess, 2001; Mahfouf et al., 2009; de Rosnay et al., 2013). Nudging, which is also applicable, and OI are computationally cheap. Yet, their coefficients are dependent on an observational dataset to derive error statistics and on partition of the dataset according to physical situations. Also it is difficult to account for new observations. Variational methods and different versions of the extended Kalman filter (EKF) are more flexible for different physical situations and can accept new observations more readily. In Kalman filtering algorithms, the background error covariance matrix is prognostic, dependent on the physical situation, and no preliminary error statistics study is needed. However, these methods require greater computing resources. Some of these need additional simulations to compute Jacobians in finite differences. Balsamo et al. (2007) showed that the high computational cost may be reduced by running a surface model offline, using saved forcings from the atmospheric model. In this case, variables from the lowest atmospheric model level are used to force additional runs (with the perturbed initial state) of the surface model. This approach is used, for example, to assimilate satellite measurements of the soil moisture (Draper et al., 2009). For lakes, this approach also could be used. The most common lake parameterisation scheme FLake contains a maximum of 12 prognostic variables, depending on the lake regime. Therefore the control vector size is small, and the EKF technique seems a natural one to apply.
The purpose of this study is to apply the EKF approach to assimilate LWST observations into the FLake lake model. For these first experiments, the offline version of the lake model was used, with forcing from the High Resolution Limited Area Model (HIRLAM) (Undén et al., 2002). SYKE in-situ observations were assimilated, for some lakes with added observations from the Moderate Resolution Imaging Spectrometer (MODIS) on board of Terra and Aqua satellites. In Section 2, the methodology of applying EKF in FLake (mathematical formulation) is described. Section 3 is devoted to lake observations. In Section 4, numerical experiments are presented and discussed, focusing on the role of observations in different seasons. Conclusions complete the paper.
We assimilated observations into the lake model FLake (Mironov, 2008). This model is used to represent lake processes in several regional and global NWP models for both operational and research applications (Salgado and Le Moigne, 2010; Balsamo et al., 2012; Rontu et al., 2012 Mironov et al., 2012). It is a two-layer integral (bulk) model with the temperature profile in the upper mixed layer and in the underlying thermocline parameterised with the concept of self-similarity (assumed shape). It also contains snow-ice and bottom sediment modules using the same concept to describe their temperature profiles. The mixed layer depth is predicted through an equation of convective entrainment and a relaxation-type equation during the wind mixing. For the solar radiation transfer, the decay law is approximated exponentially. Using atmospheric forcings, turbulent fluxes in the atmospheric surface layer may also be calculated by the model. This possibility is useful for offline experiments.
To accurately pose the DA task for lakes (and for FLake in particular), it is important to consider different model regimes and to study the model behaviour for each of them. Firstly, we distinguish the ice period and the open water period. Secondly, in the open water period, we distinguish the mixed and non-mixed (stratified) lake regimes. Thirdly, in the non-mixed period the convective and wind-mixing regimes may be discriminated.
In this study, DA is used to analyse the prognostic variables describing the liquid water reservoir; ice and snow variables are not analysed. Bottom sediments are not considered. The analysed variables are the mean water temperature (), the bottom temperature (Tb), the mixed layer depth (h) and the shape factor (CT). Prognostic equations for the mixed and stratified regimes are described in Mironov (2008). The mixed layer temperature (TML) in FLake is calculated from a diagnostic equation (see Section 2.2).
The mean water temperature and the lake bottom temperature Tb are slowly evolving variables. Depending on the depth of a lake and the season of a year, varies on a time scale from several days to several weeks. For deep lakes, Tb changes gradually and shows only some annual variability. For shallow polymictic lakes (so shallow that their waters may mix from top to bottom several times throughout the open water period), oscillations of Tb during mixing periods follow oscillations of and have a variability of several days. The variability of h strongly depends on the season. In summer, h usually shows significant daily oscillations. In early spring and autumn, during the mixing period, it is equal to the lake depth and remains constant. The integral of the polynomially approximated temperature profile in the thermocline CT has a rather poor physical definition. During the stratified period, it usually changes gradually, with a time scale of several days. For the mixed regime, CT does not exist mathematically, because there is no thermocline. In practice, in the model code, during the mixed regime CT is kept constant and equals to a bogus value of 0.5.
Therefore, considering physical reasons and mathematical properties of the model, all lake water prognostic variables are controlled in the DA procedure. We considered separately (1) the ice and open water periods and (2) the mixed and stratified regimes for the open water period. Regarding daily oscillations of h during the stratified regime and the related model non-linearity, the time scale of these processes is equal to or shorter than the typical assimilation cycle period (1 d). Preliminary studies with perturbations of the initial state and analysis of the time evolution of different model variables showed less model sensitivity on a time scale of 1 d comparing to a time scale of several hours due to the smoothed diurnal oscillations of h. Therefore we did not distinguish the convective and wind-mixing regimes, and assumed that this non-linearity is acceptable for the assimilation procedure.
By means of sequential DA, the control variables in the lake model should be initialised using information only from the LWST observations. In other words, innovations (departures between the observed and simulated LWST) should be redistributed among control lake variables. We pose the task differently for different lake model regimes. For the ice period, when there are no observations, the DA procedure is switched off. The assimilation cycling is stopped explicitly, because the gap between observations is too large (the entire winter), making it impossible to cycle the background error covariance matrix for this period. It is unknown beforehand, when observations will appear again after the ice break-up date, at mixed or stratified model regime. In an operational implementation, the question of skipping the analysis during the ice period should be specially addressed.
For the open water period and stratified regime, four prognostic variables are included into the state vector X:
Here and Tb are the mean water and bottom temperatures (K), CT is the shape factor, is the dimensionless mixed layer depth and h and D (both in m) are the mixed layer depth and the lake depth, respectively. For the mixed regime in FLake there is only one prognostic variable . The mixed layer depth h in this case is equal to the lake depth D, Tb is equal to and CT does not exist. Therefore, the state vector consists of only one variable:
We look for the analysis vector XA, using the previous forecast values for the background vector XB. Since only LWST is observed, and in open water conditions TML is equal to LWST, there is only one value of the observed mixed layer temperature in the observation vector Y:
In the case of the stratified regime the diagnostic equation for TML from FLake may be used for the observation operator H(X):
For the mixed regime H=1.
Then, the EKF procedure may be applied in the form described in Le Moigne et al. (2009):
Here M(X) is the non-linear model operator, M is the linearised model operator matrix, B is the background error covariance matrix, R is the observational error covariance matrix, Q is the model error covariance matrix, A is the analysis error covariance matrix and K is the Kalman gain vector. In our case, the observational error covariance matrix R is reduced to the scalar value of the observational error variance. For both the mixed and stratified regimes, the linearised model operator matrix M is obtained numerically by calculating Jacobians:
Here are elements of the state vector%3B the lower index denotes the number of an element and the upper index denotes the time, from one sequential analysis cycle to another. For the stratified regime M is given as:
In the mixed regime M is reduced to:
To calculate Jacobians numerically, the state vector should be perturbed by values small enough for accurate approximation but not too small that round errors occur (Mahfouf et al., 2009). Our choice of perturbations resulted from a preliminary study with both small and large, negative and positive perturbations. We considered only the overall performance of EKF in terms of LWST evolution. For a better choice of perturbations, careful examination is needed, as was done in Balsamo et al. (2004). In our study, for the stratified regime the following perturbation values of the state vector were used:
For the mixed regime, the state vector which has only one component, was perturbed by:
The size of matrix B changes when the regime changes. In these cases we stop the EKF cycling and reinitialise the B matrix. Explicit reinitialisation helps to avoid problems caused by non-linearity and possible EKF divergence (Jacobians and variances in the matrix B which are too large) when jumping between regimes. There is one more important constraint in the application of EKF in the FLake model. In the model initialisation, ‘inverse’ water density stratification is impossible. In terms of temperature, this means that in the stratified regime (1) the mixed layer temperature and the bottom temperature should both be higher or lower than the temperature of the maximum water density (277.13 K) and (2) if both of these are higher, the bottom temperature should be lower than the mixed layer temperature, and vice versa if both are lower. The crossover situation must not result from analysis. If it happens, the analysis is skipped and matrix B is reset.
In general, it is not trivial to specify the model error covariance matrix Q, which contains model errors other than errors resulting from the initial state. In the case of offline simulations, when the lake model is driven by atmospheric forcing from a NWP model or from observations, it also contains errors from the forcing data. In our study, Q is prescribed in ad hoc manner. Variances were chosen as typical model errors squared. Typical model errors were derived from different studies, where the performance of FLake was assessed based on deep water temperature observations (Kirillin, 2010%3B Golosov et al., 2012). All correlations were prescribed to be 0.5, with signs determined using some modelling experience. For example, positive errors in the mean water temperature often lead to temperature profiles which are too sharp with mainly negative errors in the bottom temperature. Therefore, correlation between the appropriate errors is negative. Covariances are calculated accordingly. For better definitions of these values, more studies based on deep water temperature observations on lakes with different depths and mixing types are needed. For the mixed regime, Q is defined as
In our experiments, matrix B evolved from matrix Q in the first cycle, and was reset back to B=Q each time at reinitialisation. The sensitivity of the analysis system performance to Q values was studied only in terms of LWST and appeared to be small (0.1°C order of magnitude).
A good overview of lake observations, in-situ and remote-sensing, is given in Kheyrollah Pour et al. (2014a, 2014b). Both types of observations have advantages and disadvantages. Space-borne observations not only contain information about LWST and ice cover, but also about other lake water characteristics, for example, water quality parameters (Potes et al., 2011). LWST and ice cover are measured by different sensors on various satellites. In Kheyrollah Pour et al. (2014b), the MODIS data are mainly used. Different satellite observations for 263 large lakes from the ARCLake project (http://www.geos.ed.ac.uk/arclake/) (MacCallum and Merchant, 2012) were compared with operational LWST analyses in the Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) product by Fiedler et al. (2013). However, remote-sensing data may contain gaps due to cloudiness. Errors due to the fractional ice cover, undetected clouds and inaccuracies in the land-water mask may be significant. Also, the remotely measured water skin temperature may differ from the bulk temperature, which is predicted by models and measured in-situ [see, e.g., Donlon et al., (2002)].
In-situ measurements may not only contain information about lake surface characteristics but also information about the vertical temperature distribution, ice depth and other properties [see, e.g., Arst et al. (2008) and Jonas et al. (2003)]. However, in-situ data are produced only for selected lakes and are used mainly for research purposes, with the exception of operational SYKE measurements. For operational in-situ measurements, the time period is problematic. Usually instruments are installed in early spring when the ice has already disappeared. This happens approximately 1 week after the ice break-up and means that these measurements alone do not provide the exact ice break-up date. Therefore, extra observations are needed. At present, both types of lake observations are widely used for model verification (Duguay et al., 2003; Kheyrollah Pour et al., 2012). In the NWP context, lake state observations are used in a very simplified way (Eerola et al., 2010; Rontu et al., 2012; Kheyrollah Pour et al., 2014b).
In our study, two sets of LWST observations were assimilated into the lake model. First, we assimilated in-situ SYKE observations for 27 lakes in Finland (see Table 1 for the lake locations and depth). SYKE observations do not contain direct information about ice cover, and during the ice period LWST is not measured. Therefore, the ‘no data’ flag indicates that LWST is not observed either because of ice cover or for other reasons. In practice, the majority of cases with the ‘no data’ flag correspond to the ice cover situation, and these periods last for several months. In our experiments, for cases with ‘no data’ flag and during the ice period, the assimilation cycling was interrupted and the model evolved without any analysis update (relying on the model state). When LWST measurements are again available, the background error matrix was reinitialised and the assimilation cycling was resumed from the current state vector.
Secondly, for four selected lakes in Finland, in-situ measurements from SYKE were combined with MODIS UW-L3 data prepared by the University of Waterloo (Kheyrollah Pour et al., 2014a) for early spring, and observations from SYKE about ice break-up dates [see Kheyrollah Pour et al. (2014b) for details]. The four lakes differ in depth, size and location: (1) deep northern Lake Inarijärvi, (2) shallow southern Lake Tuusulanjärvi, (3) large Lake Saimaa and (4) medium Lake Lappajärvi (see Table 1 for details). Time series of the LWST measured by SYKE and observed by MODIS were merged manually. MODIS data were taken for the pixels nearest to the SYKE locations and added to the time series starting from the ice break-up date until the first spring SYKE measurements date. For the period prior to the ice break-up date, the ‘no data’ flag was set to LWST, as in the original SYKE measurements. Hereafter, we refer to this type of data as ‘the merged data set’. The main aim of the experiments with the merged data set was to study the role of observations during early spring. Cross-validations were also performed with these data.
LWST is measured daily at 08.00 Eastern European Time (EET) by SYKE. One MODIS observation per day was added to the merged data set. In our experiments, all MODIS data were assigned the time 08.00 EET (note that there is almost no diurnal cycle of LWST in the early spring). Gaps in MODIS observations were filled manually with time-interpolated values. Assuming in-situ measurements provided by SYKE were of good quality and manually selected MODIS observations were carefully checked, no other quality controls were used.
The observational error variance usually consists of the instrumental error squared and the representativeness error variance. For MODIS, the error variance associated with the skin effects, undetected cloudiness and floating ice should be included. In all of our experiments, the observational error variance is equal to an ad hoc value of 1.0 K2. This is the order of magnitude of the difference between the in-situ and MODIS observations variance (Kheyrollah Pour et al., 2012). The specification of observational error statistics requires further studies. Note that the observational error variance is small compared to the background error variances (which represent the errors of the parameters defining the lake water temperature profile and may be compared with the observational error variance using values of the H vector, see Appendix).
The EKF algorithm for assimilating LWST observations was implemented in a stand-alone (1D) offline version of the lake model FLake. In the offline mode the lake model is coupled only one-way with a NWP model, being forced by atmospheric model data. Forcing data from operational runs of the HIRLAM NWP model were used. The following forcing variables were extracted from the HIRLAM 7.3 operational 6-hour forecasts1 archives: wind speed, temperature and specific humidity at the lowest model level, accumulated downward long-wave and short-wave radiation fluxes and the total snowfall. Hourly outputs were used to calculate radiation fluxes and precipitation rates. The turbulent heat and momentum fluxes were calculated with the specific flux formulation in FLake.
Experiments were performed for a 1 yr period from 3 November 2010 to 10 November 2011. For this period, observations from different lakes were selected and compiled into time series. For both the SYKE and merged data sets, observations were assigned to the valid time 08.00 EET. The EKF analysis was performed at 06.00 EET with a cycling period of 24 hours and an assimilation window2 of 4 hours (we considered all observations 2 hours before and 2 hours after the analysis time and referred them to this time). Note that in our experiments the satellite observations were put into the assimilation window artificially. Therefore, in operational implementation, the assimilation window question should be specially addressed. For cross-validation, observations were assimilated from every second day (the rest were used for validation). In this case, the cycling period was 48 hours. For all lakes, the free model run (without DA) was initialised from the typical late autumn temperature profile for a boreal lake: a lake is mixed down to the bottom, and the water temperature is equal to the screen level temperature. Numerical experiments with FLake-EKF were performed for 27 lakes assimilating SYKE observations and for four lakes assimilating merged observations. The design of experiments is summarised in Table 2.
The best way to evaluate the performance of the assimilating system is by a comparison with independent observations. Either different sources of data may be used, or cross-validation methods may be applied, when observations from the single source of data are partly assimilated, partly used for validation. A posteriori verification is useful to reveal potential problems, for example, biases in a model or in observations. The quality of the linear approximation and possibilities for simplifications (if any) may be assessed from the behaviour of the Jacobians, as well as from the evolution of the B matrix and Kalman gain vector K. However, to study in details the evolution of all the EKF components, the amount of material is large. The statistics differ depending on the season and the corresponding model regime. In this paper, the entire picture is described, focusing on the changing regimes and other physical aspects. Only a preliminary study of Jacobians, the components of the B matrix and vector K is performed. In experiments using different datasets we evaluate the performance of the EKF depending on data availability. For SYKE observations (EKF-S experiment) the impact of data is shown for all of the lakes and the behaviour of the components of the matrices and vectors M, H, B and K is briefly described. For the merged data set (EKF-M experiment), the results of cross-validation are presented. A detailed study of a posteriori statistics and the performance of the EKF assimilation system in terms of the evolution of its components is planned for the future.
A number of examples are given to provide a qualitative overview of the assimilation of SYKE observations. In the FR experiment (the free run) TML was too warm for all of the lakes in summer. This is a well-known FLake problem (Kourzeneva et al., 2012b; Martynov et al., 2012). Figure 2 shows that for Lake Inarijärvi the mixed layer temperature TML in the EKF-S experiment is much closer to observations than in the FR experiment, where it is up to 5°C warmer. In Fig. 3, comparing the EKF-S experiment with the FR experiment, no impact is seen on the ice and snow thicknesses in autumn and winter (the snow thicknesses in EKF-S and EKF-M are not shown), mainly because of the lack of SYKE observations just before the ice onset date. In late autumn and winter, the EKF-S experiment relied on a background state similar to the FR experiment. In spring, the FR experiment usually showed an ice break-up date which was too late. The early spring situation for different experiments is described in Table 3. For Lake Inarijärvi in the FR experiment the ice break-up was on June 6th. For the EKF-S experiment it was 2 weeks earlier, on May 23rd, and coincided with the beginning of SYKE observations in spring. However, the SYKE observations usually commence one or 2 weeks after the actual ice break-up date. Therefore, LWST may appear to be much higher than the melting point. For example, in Table 3 in the EKF-S experiment the observations for Lake Inarijärvi start from the already high LWST value of 7.6°C. Thus, the real ice break-up conditions were unknown from both the FR and EKF-S experiments. In Fig. 2, the late ice break-up date in the FR experiment influences the spring TML (values too low), so that TML in June is up to 10°C lower for the FR experiment than for the EKF-S experiment. The EKF-S experiment compared with FR gave much warmer TML during the whole of spring. Hence, with SYKE observations alone (without satellite observations), there was lot of uncertainty in real knowledge of the lake state, mainly in early spring but also in late autumn. The impact of the EKF assimilation on the ice and snow temperatures in autumn and winter was minor (not shown).
Through the DA, the LWST observations also influenced the mean water temperature and the bottom temperature Tb (Fig. 4 for Lake Inarijärvi and Fig. 5 for Lake Saimaa). The impact was large, and typically the summer maximum was warmer in the EKF-S experiment than in FR: for 5–7°C for and for 7–11°C for Tb. Since deep water temperature measurements were not available, it is difficult to understand which experiment is closer to the truth. For a better understanding, if Tb in the EKF-S experiment is realistic, it was compared with the observed Tb values in Lake Valkea-Kotinen (Dmitrii Mironov, personal communications 2014). Lake Valkea-Kotinen is a very small lake with the mean depth of 3 m. It is located in Southern Finland, about 100 km to the northwest of Lake Tuusulanjärvi. Both lakes are small and shallow and located in the same climate zone, and are expected to exhibit similar behaviour annually. The maximum summer Tb of Lake Valkea-Kotinen was 7°C in 2006. For Lake Tuusulanjärvi Tb was 20.5°C in the EKF-S experiment (not shown), which is too high. For Lake Saimaa Tb shown in Fig. 5b is 18.2°C, which is also too high. According to preliminary studies, the sensitivity of and Tb evolution to the Q matrix specification is high (not shown). Thus, poorly specified values in the Q matrix may be one reason for the overestimation of the deep water temperatures. Better specification needs further study, comparing the modelling results with regular deep water temperature observations. Another possible reason is the wrong timing of regimes. Both Tb and evolve slowly, depending on the lake depth. In the EKF-S experiment, Tb and were strongly influenced by memory from the early spring just after ice break up, when a lake is mixed down to its bottom. Usually this mixed period is rather short (several days). In most cases, SYKE observations started only after this period. At that moment, the lake model still showed a mixed regime, while in reality the regime was already stratified. The EKF algorithm resulted in positive corrections to the entire water column (both for and Tb) instead of making a non-mixed profile. Due to the long memory of Tb and , this resulted in their overestimation for the entire modelling period. In this way, the lack of SYKE observations in the early spring maintained errors of 6–11°C in deep water temperatures during the entire summer.
Although the state vector includes the dimensionless mixed layer depth η, the mixed layer depth h itself (connected to η by the simple equation h=D(1−η), where D is the lake depth) can be interpreted more naturally. In contrast with deep water temperatures, h is a quickly varying variable. It was also influenced by the EKF analysis. In Fig. 6a, the EKF-S experiment shows earlier autumn mixing than FR. In summer, the EKF-S experiment usually showed a deeper mixed layer. But since h changes quickly, it tends to return to its background state within one DA cycle, which neglects the assimilation impact. Results for Lake Saimaa (11 m deep) illustrate this summer process in more detail (Figs. 7–9). In Fig. 7, the h increments are mainly negative and the increments are very small. Thus, the EKF-S experiment produced colder TML values, which were closer to the observations, but the result was unstable (Fig. 8). As h quickly returned to its background state (i.e. wrong values), TML increased quite fast and again deviated from the observations. In Fig. 9b the innovations are mainly negative. Physically, in this situation the model jumps between the convective and wind-mixing regimes. During daytime in summer, due to short-wave radiation, the temperature of the upper-most water layer increases and stratification becomes stable (the wind mixing regime with the absolute value of h decreasing). During night-time, due to long-wave cooling, the temperature of the upper-most surface layer decreases and stratification becomes unstable (the convective regime with the absolute value of h increasing). In the model, these two regimes are described by different equations. The mainly negative increments of h in the EKF analysis mean that the prognostic equation for h in the stable regime may be biased. The problem is enhanced by in summer which is too warm in EKF-S. During summer, when the absolute value of h is small, the deep and upper water layers are decoupled. This results in small analysis increments (Fig. 7b), not enough to fix existing errors (see also the analysis of the tangent linear model operator M later in this section). Assimilation of more observations, for example, from remote sensing, more frequent and starting earlier in the spring, might improve the situation. A possible bias in the atmospheric forcing may also be a reason for the discrepancy: absolute values of h which are too small may come from momentum fluxes being too small. It is not easy to answer the question, whether excluding h from the state vector would be a good solution of the decoupling problem: it only evolves quickly in midsummer, but more slowly in spring and autumn. Besides, it is responsible for the timing of the mixed and stratified regimes. Figure 8 illustrates also that in the EKF experiments the observational error variance is small compared to the background error variances: the EKF-S results are very close to the observations.
The shape factor CT has no clear physical meaning, thus it is difficult to analyse its performance. It can be said that it was also influenced, and in Fig. 6b there is a significant difference between the FR and EKF-S experiments.
In order to quantify the assimilation performance with a posteriori statistics, a root mean square error (RMSE) was computed for different lakes. The RMSE was calculated for the open water period, when observations of LWST were available. The impact I (%) of the assimilation with respect to the model was calculated as:
Here RMSEmod and RMSEassim are for the free model run and for the assimilation system run, respectively. The impact for different lakes along with their geographical coordinates and mean depths is presented in Table 1. As expected, RMSEassim is much smaller than RMSEmod and the impact is high (note that the analysed LWST is not a weighted average between the observed and modelled values). A maximum impact of 98.0% is seen for the medium-depth polar Lake Kevojärvi and a minimum impact of 93.4% for the medium-depth Lake Lappajärvi located in central Finland. A dependency of the impact on the lake depth or latitude is not seen (the lakes in Table 1 are arranged by increasing mean depth).
Weights in the Kalman gain vector K define the overall performance of the analysis system and depend on the components of the background errors covariance matrix B. Behaviour of the B matrix is in turn dependent on the components of the tangent linear model operator M and on the linearised observation operator H (note that the components of H are derived analytically). The detailed study of their behaviour is planned for the future. A preliminary study was performed with SYKE observations for Lake Inarijärvi and Lake Tuusuljärvi, perturbation values and the matrix Q indicated in Section 2. In the Appendix the limits of the components of the matrix M, the vector H, the matrix B and the vector K are shown and their behaviour at different mixing regimes is briefly described. For the stratified regime, many components of the matrix M and the vector H, as well as of the matrix B and the vector K vary depending on season and lake type and often show an annual cycle. Maxima and minima appear in midsummer when stable stratification prevails, or in autumn before mixing occurs. The components of the matrix M are bounded and evolve smoothly, which means that there is a good tangent linear approximation within each model regime. Sometimes the components of the matrix M and vector H can be very large (e.g. , , ). This leads to large values in the matrix B elements (e.g. var ()). But being large, they remain bounded, and decrease when the appropriate components of the M matrix and the vector H decrease (as well as when the B matrix is reset). Some components of the matrix M, such as , remain almost constant and are the same for different lakes. Components of the Kalman gain vector K show an annual cycle or oscillations in time. All of these are stable, even when some of the B matrix components are large. For the mixed regime, all of the matrices and vectors are reduced to scalar values and remain almost constant in time (although is slightly less for shallow lakes). The almost constant values give potential for simplification of the EKF algorithm. Note that in midsummer has minimum and has maximum. This corresponds to decoupling and leads to small weights in the Kalman gain vector K and small analysis increments in . Increasing the related components of the B matrix in order to propagate information in the vertical in the decoupling situation and to increase the weights for , necessitates modification of the filtering procedure.
Experiments with the merged data set were designed to study the role of LWST information in early spring, just after ice break-up. Can this information improve the quality of the analysis? To answer this question, numerical experiments with two observational datasets, SYKE and merged (EKF-S and EKF-M experiments), were compared. The ice break-up dates in the EKF-S and EKF-M experiments did not differ much. In the EKF-M experiment, the ice break-up date was up to 7 d earlier; however, the first spring LWST was up to 9°C higher than observed (see Figs. 2 and 3, and Table 3). Thus, TML in the EKF-M experiment increased very quickly during this 7-d period. In Fig. 2, the TML difference between the EKF-M and EKF-S experiments reaches 10°C in early spring, and then diminishes within 2–3 weeks. Therefore, starting from June the effect of additional early spring observations on TML cannot be noticed. Spring mixing in the EKF-M experiment appeared also several days earlier (Fig. 6a). The summer behaviour of the temperature profile in lakes was also influenced by early spring observations. The impact was different depending on the spring mixing conditions. For Lake Saimaa, the spring mixing period lasted for 2–3 d (not shown). For this lake, the summer maxima of and Tb in the EKF-M experiment were up to 4°C and 12°C lower compared with the overestimated values in the EKF-S experiment (Fig. 5) and therefore seem to be more realistic. For the very shallow Lake Tuusulanjärvi (the mean depth is 3 m) the spring mixing period lasted for 15–20 d, and in terms of and Tb there was no difference between the EKF-S and EKF-M experiments, although in the EKF-M experiment the spring TML was higher until June (not shown). Different summer behaviour of the lake temperature profiles in the EKF-M and EKF-S experiments resulted in different autumn mixing conditions. For Lake Inarijärvi, there was earlier mixing in the EKF-M experiment (Fig. 6a). This in turn may influence the ice onset date the following year and the following winter ice conditions. The length of our experiments was not sufficient for studying these aspects and this should be investigated in future.
In the EKF-M experiments, there was also a problem of a possible bias in the prognostic equation for h in the case of stable stratification. Improved summer values of resulted in neither more stable behaviour of TML nor smaller increments of h. In summer observational departures of TML were mainly negative and increments of h were mainly positive (not shown) and had approximately the same values as in the EKF-S experiments. This means that the problem of a possible bias in the prognostic equation for h cannot be corrected by better values due to additional early spring observations and necessitates special attention. Perhaps, bias correction in the assimilation system should be considered.
For the merged data set, the assimilation system was cross-validated, assimilating every second observation and using the other data to calculate scores. Note here a longer cycling period (48 hours) was used, which means more non-linearity in the model, less accurate Jacobians and larger analysis departures (not shown). The results of the cross-validation, as well as the impact for the merged data, are presented in Table 4. Here, the bias, RMSE and the error standard deviation (ESTD) were calculated for each lake for the entire open water period. The impact of the merged observations for chosen lakes was slightly larger (ca. 1.5%) than for the SYKE data (compare Table UT0001 and 4 UT0004). In this study, a reliable time series of error statistics could not be obtained due to the small number of lakes. For individual lakes, the mainly positive bias in summer was compensated by the mainly negative bias in spring, so the overall annual bias was quite small. In the EKF-M experiment compared with FR, the absolute value of the bias decreased for Lake Inarijärvi and Lake Saimaa (and the bias even changed sign) and slightly increased for Lake Lappajärvi and Lake Tuusulanjärvi. The RMSE and ESTD are much more demonstrative scores in this type of validation, because errors do not compensate each other. For all lakes, the EKF assimilation procedure resulted in a noticeable decrease in RSME and ESTD. For example, for Lake Inarijärvi ESTD decreased from 4.59°C to 0.96°C. For Lake Tuusulanjärvi, despite the increase in bias and unrealistic temperature profiles in summer, the ESTD for LWST also noticeably decreased (from 2.80°C to 1.13°C). Note that in the EKF-S experiments, the unrealistic behaviour of and Tb did not spoil the LWST analysis.
From cross-validation, the EKF-M experiment was closer to reality than FR. These results are very promising for NWP. The role of early spring observations and errors in and Tb should be studied further in the future. These errors may be relevant for the next winter period. Also, although the simulated deep lake water properties do not influence an atmospheric model, their accurate estimation could be a useful by-product of NWP models for environmental applications (e.g. hydrological or limnological).
An algorithm to assimilate LWST observations into the lake model FLake has been developed using the EKF technique. This algorithm propagates information to the water temperature profile in the lake model space. The algorithm considers different lake model regimes: the ice period, the mixed regime and the stratified regime. To avoid the divergence of EKF, the prediction of the B-matrix is reinitialised when switching between model regimes. The algorithm was applied for two types of data: (1) instrumental measurements of LWST by SYKE and (2) the merged data set, compiled from SYKE measurements, the ice break-up dates observed by SYKE and manually picked MODIS LWST data. With the second type of data, the role of early spring observations was studied. Tests with the SYKE data were performed for 27 lakes in Finland and tests with the merged data for 4 lakes. Experiments comprised a 1 yr period (3 November 2010 to 10 November 2011) and were run offline with forcings from HIRLAM operational forecasts.
In addition to the qualitative evaluation of the results, the impact was calculated for the numerical experiments with SYKE data. Cross-validation was performed for the merged data set. In terms of LWST, for all lakes and all types of data, the EKF experiments are much closer to observations than the free model run, the impact is more than 90%. For all lakes, cross-validation shows a small overall annual bias (less than 0.9°C) for the EKF run. The ESTD with the EKF decreases significantly in comparison with the free model (1–3°C). The experiments with the merged data set demonstrated the important role of the early spring observations to improve the performance of the model in terms of the mean water temperature and the bottom temperature (in the experiments using only the SYKE data they were overestimated). For further evaluation of this behaviour, longer experiments are needed. Numerical experiments revealed a possible model bias in the prognostic equation for the mixed layer depth in the case of the wind mixing in FLake. This problem should be studied in the future. A preliminary study of the behaviour of different elements of the assimilation system was done. Components of Jacobians, the model and background error matrices, the Kalman gain vector showed the annual cycle or time oscillations, no instability was noticed. However, more tests and comparisons with deep water temperature observations are needed. Thus, the task is difficult and the results presented are quite early. Additional numerical experiments using different types of data to study potential problems in the assimilation system (the frequency of observations, the quality control of remote-sensing data) are planned.
The EKF algorithm used to assimilate LWST data into the lake model FLake showed promising results and was recognised as effective. For the operational implementation however, other parts of the lake DA system (OI analysis, the data quality control, etc.) should also be developed or improved. New sources of data are very important and welcome, in particular space-borne data such as MODIS, Advanced Along-Track Scanning Radiometer (AATSR) and Medium Resolution Imaging Spectrometer (MERIS). The EKF algorithm will be implemented into the externalised surface modelling platform SURFEX (Le Moigne, 2009; Masson et al., 2012) which is widely used in NWP, in climate applications and for monitoring and research purposes. Implementation of SURFEX with FLake and EKF into the operational NWP system HARMONIE/AROME (Seity et al., 2010) is also planned. The EKF algorithm has strong potential to initialise the lake parameterisation scheme in NWP models and to correct model errors by using LWST observations.
11The model horizontal resolution is 15 km, 60 levels in vertical.
22The following notation is used here: An analysis time is a moment in time when the analysis is performed to initialise the new forecast (increments are added to the background). A cycling period is a period between two analysis times. The (shortest) forecast length equals to the cycling period. Jacobians are calculated with the model runs for the full previous cycling period. Assimilation window is a time period around the analysis time, during which the available observations are picked to be assimilated.
The author thanks Laura Rontu (Finnish Meteorological Institute) and Homa Kheyrollah Pour (University of Waterloo) for their help with preparing the data, Jean-François Mahfouf, Alina Barbu (Météo-France) and Kalle Eerola (Finnish Meteorological Institute) for useful discussions, Suleiman Mostamandi (Russian State Hydrometeorological University) for technical help and Emily Gleeson (Met Éireann) for the careful reading and suggestions for the language revision of the manuscript. Three anonymous reviewers made many useful comments. The work was partly supported by the European Space Agency (ESA-ESRIN) Contract No. 4000101296/10/I-LG (Support to Science Element, NorthHydrology Project).
BalsamoG., BouysselF., NoilhanJ. A simplified bi-dimensional variational analysis of soil moisture from screen-level observations in a mesoscale numerical weather prediction model. J. Q. Roy. Meteorol. Soc. 2004; 130: 895–915.
BalsamoG., SalgadoR., DutraE., BoussettaS., StockdaleT., co-authors. On the contribution of lakes in predicting near-surface temperature in a global weather forecasting model. Tellus A. 2012; 64: 15829.
ChoulgaM., KourzenevaE., ZakharovaE., DoganovskyA. Estimation of mean depth of boreal lakes for use in numerical weather prediction and climate modeling. Tellus A. 2014; 66: 21295. http://dx.doi.org/10.3402/tellusa.v66.21295.
de RosnayP., DruschM., VasiljevicD., BalsamoG., AlbergelC., co-authors. A simplified extended Kalman filter for the global operational soil moisture analysis at ECMWF. Q. J. Roy. Meteorol. Soc. 2013; 139: 1199–1213.
DonlonC. J., MinnetP. J., GentemannC., NightingaleT. J., BartonI. J., co-authors. Towards improved validation of satellite sea surface skin temperature measurements for climate research. J. Clim. 2002; 15: 353–369.
DuguayC. R., FlatoG. M., JeffriesM. O., MénardP., MorrisK., co-authors. Ice-cover variability on shallow lakes at high latitudes: model simulations and observations. Hydrol. Process. 2003; 17: 3465–3483.
GustafssonN. Development of mesoscale analysis scheme for nowcasting and very short range forecasting. Proceedings of the ECMWF Workshop on High Resolution Analysis. 1985; June. 24–26183–212. Available from ECMWF, Shinfield Park, Reading, Berkshire RG2 9AX, UK Reading.
HomleidM. Sea surface temperature and sea ice concentration in HIRLAM and HARMONIE. HIRLAM Newsletter N55 part B. 2009; 55–64. Online at: http://www.hirlam.org.
Kheyrollah PourH., DuguayC. R., SolbergR., RudjordØ.Impact of satellite-based lake surface observations on the initial state of HIRLAM – part I: remotely-sensed lake water temperature observations. Tellus A. 2014a; 66: 21534. Online at: http://dx.doi.org/10.3402/tellusa.v66.
Kheyrollah PourH., RontuL., DuguayC., EerolaK., KourzenevaE. Impact of satellite-based lake surface observations on the initial state of HIRLAM – part II: analysis of lake surface temperature and ice cover. Tellus A. 2014b; 66: 21395. Online at: http://dx.doi.org/10.3402/tellusa.v66.
Kheyrollah PourH., DuguayC. R., MartynovA., BrownL. C. Simulation of surface temperature and ice cover of large northern lakes with 1-D models: a comparison with MODIS satellite data and in-situ measurements. Tellus A. 2012; 64: 17614.
Le MoigneP., BooneA., CalvetJ.-C., DecharmeB., FarouxS., co-authors. SURFEX Scientific Documentation. 2009; Météo France, France: CNRM.221. Online at: http://www.cnrm.meteo.fr/surfex/.
MahfoufJ.-F., BergaouiK., DraperC., BousselF., TailleferF., co-authors. A comparison of two off-line soil analysis schemes for assimilation of screen level observations. J. Geophys. Res. 2009; 114: D08105.
MartynovA., SushamaL., LapriseR., WingerK., DugasB. Interactive lakes in the Canadian regional climate model, version 5: the role of lakes in the regional climate of North America. Tellus A. 2012; 64: 16226.
MassonV., Le MoigneP., MartinE., FarouxS., AllasA., co-authors. The SURFEX7.2 land and ocean surface platform for coupled offline simulation of earth surface variables and fluxes. Geosci. Model Dev. 2012; 5: 3771–3851.
MironovD., RitterB., SchultzJ.-P., BuchholdM., LangeM., co-authors. Parameterisation of sea and lake ice in numerical weather prediction models of the German Weather Service. Tellus A. 2012; 64: 17330.
RhodinA., KucharskiF., CalliesU., EppelD. P., WergenW. Variational soil moisture analysis from screen-level atmospheric parameters: application to a short-range weather forecast model. Q. J. Roy. Meteorol. Soc. 1999; 125: 2427–2448.
UndénP., RontuL., JärvinenH., LynchP., CalvoJ., co-authors. The HIRLAM-5 scientific documentation. 2002. Online at: http://hirlam.org.