The importance of energy exchanges between the earth’s surface and the atmosphere is well recognized in weather forecasting. The surface heat, moisture and momentum fluxes depend not only on atmospheric conditions but also on the properties of the land cover, which in lake-rich areas are largely determined by inland water bodies. The importance of a correct description of inland water (lake) surface state is well known in climate modelling (Duguay et al., 2006; BrownDuguay2010; KrinnerBoikeSPSUNDSCR2010; SamuelssonetalSPSUNDSCR2010; NgaietalSPSUNDSCR2013) and in numerical weather prediction (Niziol, 1987; NizioletalSPSUNDSCR1995; ZhaoetalSPSUNDSCR2012; EerolaetalSPSUNDSCR2014). For example, during freezing and melting the lake surface radiative and conductive properties as well as the latent and sensible heat released from lakes to the atmosphere change dramatically, leading to a completely different surface energy balance. By affecting the surface fluxes, lakes modify the structure of the atmospheric boundary layer.
Lake surface water temperature (LSWT) is directly related to the heat fluxes, thus it is a critical variable to measure, assimilate and predict in NWP. The quality of observation-based lake surface state description depends crucially on the availability and selection of observations. Obtaining reliable observations on lakes in real-time, especially at high latitudes, is challenging. Satellite-based data offer the only realistic way of obtaining frequent observations with a sufficient spatial resolution over large areas. Various types of satellite observations collected over lakes represent different scales and have different accuracies, depending on the observing system. Measurements based on optical wavelength signals are limited by clouds and thus irregularly distributed in space and time. To assimilate remotely sensed observations and available in situ observations into a NWP model, knowledge of the statistical properties of the observed LSWT as well as error characteristics of observations and a model background are needed.
A correct description of the lake surface state is relevant for NWP models with high horizontal resolution, enough to resolve even small lakes. We use High-Resolution Limited Area Model HIRLAM (Unden et al., 2002); EerolaSPSUNDSCR2013 as a framework of our study. This NWP system has been applied since 1990 for the numerical short-range weather forecast over the Northern Europe. Presently, it includes the prognostic lake temperature parameterization and objective analysis of observed LSWT.
In HIRLAM, the monthly climatological water surface temperature for both sea and lakes was used in early 1990s. The LSWT climatology was estimated by extrapolating the Sea Surface Temperature (SST) climatology to lakes, as no suitable data for lakes were available. Both SST and LSWT were kept constant during the forecast. The first step to improve the SST analysis was to apply the European Centre for Medium Range Weather Forecast (ECMWF) SST analysis fields (Chelton and Wentz, 2005) in the form of pseudo observations, with the method of successive corrections (Cressman, 1959) and later with Optimal Interpolation (OI) (Gandin, 1965). The previous analysis was used as a background for the following analysis and the background was relaxed towards climatology, to avoid drifting far from reality. LSWT climatology over 20 Finnish lakes was obtained based on the measurements of the Finnish Environment Institute (Suomen Ympäristökeskus = SYKE) and applied in the analysis, also in the form of pseudo observations (Eerola, 1995). The fractional ice cover over the sea and lakes was provided in consistency with SST and LSWT.
The freshwater lake model (FLake, Mironov, 2008; Mironov et al., 2010) was implemented in the HIRLAM forecasting system in 2012 (Kourzeneva et al., 2008); EerolaetalSPSUNDSCR2010. The model uses external data-sets on the lake depth (Kourzeneva et al., 2012a) and the lake climatology to initialize the prognostic variables of FLake at the very first model start (Kourzeneva et al., 2012b). At the same time, real-time in situ LSWT observations by SYKE for 27 Finnish lakes became available for the operational HIRLAM analysis (Eerola et al., 2010); RontuetalSPSUNDSCR2012. The OI method with the same assumed statistical properties for SST and LSWT fields was applied to these observations. Currently, FLake provides the background for the OI analysis of LSWT in the operational HIRLAM. However, the prognostic FLake variables are not updated using the analysed LSWT since this part of the data assimilation algorithm still remains to be implemented (according to the suggestion by Kourzeneva (2014)).^{1}
Developments related to the analysis of LSWT took place also at U.K. Met. Office within the Operational Sea Surface Temperature and Ice Analysis (OSTIA) system (Donlon et al., 2012; Fiedler et al., 2014). The OSTIA system produces surface water temperature analysis fields for NWP purposes. For the background, OSTIA uses the previous analysis field and treats remote-sensing and in situ observations of the surface water temperature by the OI method. The LSWT observations used in OSTIA are part of Sea Surface Temperature (SST) products from AATSR and MetOp-AVHRR (Infrared Atmospheric Sounding Interferometer, IASI). This product is based on the method of SST retrievals and does not include lake-specific processing. In OSTIA, the LSWT analysis is produced only for the large lakes (ca. 248 globally), without cross-lake interpolation. The background LSWT is backed up by lake climatology (Fiedler et al., 2014). In the OI method of OSTIA, the same statistical properties are used both for SST and LSWT.
Statistical properties of fields and estimates of the background and observational errors are essential in the OI method. The OI (also called statistical interpolation) method was introduced by Eliassen (1954) and Gandin (1965) and was applied for the upper air analysis in operational NWP systems (e.g. Lorenc, 1981; Hollingsworth and Lönnberg, 1986; Lönnberg and Hollingsworth, 1986; Daley, 1991). OI is one of the methods of objective analysis, which provide initial conditions to NWP. Such methods were developed for atmospheric models by several groups of meteorologists (Panofsky, 1949); GilchristCressmanSPSUNDSCR1954; (Bergthórsson and Döös, 1955); CressmanSPSUNDSCR1959. In general, objective analysis methods are based on the idea of minimizing errors of the analysis. The OI is still applied for the analysis of the near-surface variables such as surface water temperature, screen-level temperature and relative humidity and snow water equivalent (Thiebaux, 1975); JulianThiebauxSPSUNDSCR1975; SattlerHuangSPSUNDSCR2002; DonlonetalSPSUNDSCR2012. An advantage of OI is that it can be used also for the quality control of observations.
The OI method accounts for the statistical properties of the analysed field via the autocorrelation function. The autocorrelation function is an internal property of the field itself, it can be derived from it by statistical methods. Usually the OI method applies analytical approximations of the autocorrelation function, with the dependency on distance (between points). Often (and in HIRLAM as well), an exponential representation is used. The influence radius in the exponential representation sometimes becomes a tuning value, dependent on the density of observations, although it should be connected with the real statistical properties of the fields. For the LSWT fields, the autocorrelation function has never been studied and no approximation proposed.
Thus, for historical reasons, currently in the operational analysis of LSWT the autocorrelation function borrowed from the SST analysis is used. However, there is no reason why the statistical properties of LSWT and SST should be generally similar. LSWT of different lakes may correlate due to the similar atmospheric conditions and air–water interactions; however, the evolution of LSWT is also dependent on lake morphology, especially on the lake depth, because of vertical mixing (Walsh et al., 1998; ToffolonetalSPSUNDSCR2014). It is natural to expect that the correlation coefficient of LSWT for lakes with the same depth would be larger than for the case when the lake depth differ much. Moreover, the autocorrelation function of the LSWT field may be time-dependent. Therefore, it is of high interest to study the statistical properties of the LSWT field, to obtain the autocorrelation function and to find an analytical approximation for it. We assume the dependency of the autocorrelation function not only on horizontal distance, but also on difference in the lake depth. In this study, we concentrate on the summer period. It is also of interest to study the impact of the new autocorrelation function formulation for LSWT analysis in NWP.
This approach is in line with studies of coherence among lakes, where correlations of different properties between lakes are investigated (e.g. Magnuson et al., 1990). Compared with these studies, the NWP objective analysis is simpler as only LSWT is considered. However, a NWP system assumes applying a physically-based prognostic lake parameterization, which accounts for many important thermodynamic processes in lakes. Correction of the temperature profiles inside lakes, for example changes in the mixing regime, falls out of the scope of the present study. It could be treated by the Extended Kalman Filter algorithm described by Kourzeneva (2014).
Our aim is to study the LSWT autocorrelation function as an internal property of the LSWT field and to obtain its improved formulation for use in the objective analysis of NWP models. For this, we calculated observation statistics depending on the distance between the observation points as well as on the lake depth differences. For the statistics, we used two data-sets with two types of LSWT observations: local and satellite-based. Local observations are provided by SYKE for different lakes in Finland. Satellite-based observations consist of the moderate-resolution imaging spectroradiometer (MODIS) LSWT data over Fennoscandia and Northwestern Russia. We estimate also the observation error for these two types of measurements. To examine the sensitivity of different analytical formulations of the empirical autocorrelation functions in HIRLAM objective analysis of LSWT, we compare its results to independent observations. The present study builds on the work reported by Kheyrollah Pour et al. (2014b) on the use of satellite measurements for LSWT data assimilation in HIRLAM and on the influence of the lake surface state in weather forecast by Eerola et al. (2014). To our knowledge, this is the first study of its kind which could lead to an improved definition of the LSWT autocorrelation function and thus to a better objective analysis of LSWT in NWP models.
The study is organized as follows. Satellite-based and in situ observations and general statistics are described in Section 2 The algorithm to obtain the LSWT autocorrelation function and the results follows in Section 3, and the theoretical explanations are provided in Appendix 1. The sensitivity of the HIRLAM LSWT analysis to the formulation of the autocorrelation function, which was tested in numerical experiments, is presented in Section 4 Section 5 contains the summary and outlook.
The LSWT observations were derived from the archives of NASA Land Processes Distributed Active Archive Center (LP DAAC) for the thermal remote sensing sensor MODIS, Land Surface Temperature and Emissivity (MOD/MYD11-L2, collection 5, 1 km). This data-set was developed from both Terra and Aqua satellites (Kheyrollah Pour et al., 2012; Kheyrollah Pour et al., 2014a). Both day-time and night-time LSWT observations from Terra and Aqua satellites and estimated errors for each observation were retrieved.
Seventy-one MODIS pixels were extracted representing 44 Fennoscandian lakes of a size larger than 6 $\phantom{\rule{0.333333em}{0ex}}{\text{km}}^{2}$ and of various depths (Fig. 1). Lakes of dissimilar sizes and shapes over Northern Europe were chosen. The MODIS LSWT observations are represented on a grid with a resolution of 1 $\phantom{\rule{0.333333em}{0ex}}\text{km}\times $ 1 km. For most lakes, one pixel was selected to represent the observation points; however, for large lakes, several pixels were chosen (e.g. 9 pixels for Lake Onega and 15 pixels for Lake Ladoga).
For this study, three months of daily averages were calculated from all available observations at the selected locations for five summers (June–July–August, JJA) of 2010–2014. We applied moving averages in a window of $\pm $24h to remove suspicious observations which deviated for more than $\pm $3 degrees from the average. All observations indicating a temperature below the freezing point, which could appear in MODIS data during JJA either due to undetected clouds or the presence of ice cover, were removed, since only water temperature was considered. The maximum possible number of daily average temperature values from both Terra and Aqua satellites was 32660 (71 lakes x 5 years x 92 days x 1 daily average per day). Due to cloudiness, the number of available observations was reduced to 20694 (63.4% of the possible maximum).
Regular in situ lake water temperature measurements are collected by the Finnish Environment Institute (SYKE). SYKE operates 32 regular lake and river water temperature measurement sites in Finland. The temperature of the lake water is measured every morning at 8.00 local time, close to shore, at 20 cm below the water surface. The measurements are either recorded automatically (13 stations) or manually and are performed only during the ice-free season (Rontu et al., 2012). With respect to the diurnal cycle, SYKE temperatures represent most closely the daily minimum surface water temperature because in the morning the solar heating has a small impact.
In this study, we used measurements from 27 lakes (Fig. 1). The time period was the same as for MODIS data, namely five summers (JJA) of 2010–2014. From the maximum amount of12400 observations (27 lakes x 5 years x 92 days x 1 measurement per day), 12227 (98.6%) were available. No preprocessing was applied to the SYKE data.
MODIS observations cover 24 SYKE lakes out of 27 with two additional Finnish northern lakes that are not included in the SYKE data-set (Orajärvi and Lokka). The main differences between the SYKE and MODIS data-sets used in this study are:
According to Table 1 and Fig. 3, the empirical distribution of data is close to normal for both MODIS and SYKE data-sets. For both data-sets, the mean value and median are close to each other, the skewness is –0.55 and –0.52 for MODIS and SYKE, respectively. In addition, the normal probability plot (Fig. 4) shows the validity of the distribution assumption. On the plot, points are falling approximately on a straight line, which indicates a normal distribution of the data. Thus, both samples (SYKE and MODIS) appear to be homogeneous and not strongly affected by the annual cycle. The variance of MODIS data is larger than for SYKE. This reflects the fact that MODIS observations cover a larger area, a larger range of differences in depth and probably contain a larger observation error. Another possible reason is that the MODIS observations represent highly variable radiative (skin) temperature at the very surface of water, while SYKE observations represent a 20 cm deep water layer where the temperature changes are smaller.
Lake depth values for the current study were obtained from the Global Lake Data Base (Choulga et al., 2014); KourzenevaetalSPSUNDSCR2012a, which has a resolution of 30 arc seconds. It contains the mean lake depth, and for several large lakes it provides the bathymetry. In our study area, the bathymetry was available for Lake Ladoga, Lake Onega, Lake Vänern, Lake Vättern and Lake Peipsi. In HIRLAM experiments (Section 4), the fraction of lake in each grid-square is based on the Global Land Cover Characteristics data-set (GLCC, Loveland et al., 2000).
In meteorology, the task of data assimilation is to provide the best possible initial value of a progonstic variable at each grid point by using all available information (objective analysis). Thus, the information contains both observations (provided by MODIS and SYKE here) and a model state (from the previous analysis-forecast cycle) as a background field. The background field is updated using new observations. The analysed value remains the same as in the background if no observations are available.
The objective analysis by OI uses weighting factors based on statistical properties of the analysed field. The error of each observation type as well as the background error are taken into account. In the basic (univariate) set-up of OI, the weight of a certain observation depends on the distance between the observation and the grid point and the distance between this and the other observations. The method was introduced into the field of meteorology by Gandin (1965) and has been applied to treat continuous atmospheric variables such as temperature, wind and pressure. Strictly speaking, the LSWT field is discontinous because it is defined only over lakes, not over the surrounding land. However, this type of discontinuity can be taken into account by masking land. In addition, the discontinuity problem is alleviated when the LSWT field is represented on the NWP model grid so that each grid box contains a fraction (from 0 to 1) of lake water.
Following Gandin (1965), we calculate the LSWT deviations from the norm and consider this field to be isotropic and homogeneous. In OI, information about the statistical structure of the field of the considered variable is incorporated into the procedure through the use of autocorrelation functions. In this study, we are concerned with the statistical structure of the LSWT field represented by the autocorrelation function. Due to the assumption that the field is homogeneous and isotropic, we may consider the autocorrelation function of LWST to be continuous and defined within and across lakes. During the objective analysis, the OI method gives the state of LSWT at a particular time at all model grid points where the lake fraction $>$ 0. The basic algorithm of OI in its application to the LSWT analysis is described in Appendix 1, also information about the autocorrelation and structure functions is included.
Determination of the autocorrelation function for LSWT with dependency on the horizontal distance and the depth difference between lakes requires a reliable and homogeneous observational network.
In this study, the LSWT MODIS and SYKE observations (see Section 2) were used to calculate the autocorrelations. In both cases, the same procedure was applied as suggested by Gandin (1965). First, for each observation-point, the mean LSWT values (the norms) together with deviations from these norms (Equation (A5)), were calculated. Then, all observation points were organized in pairs: a point in question with all other points. If the number of observation points equals to P, the number of pairs equals to $({P}^{2}-P)/2$. For all pairs, the distances between point locations were calculated, and distance categories were defined (0 to 100 km, 100 to 200 km, etc., till the maximum distance of 1600 km). Similarly, the differences in depth for all pairs and the related categories (0 to 5 m, 5 to 10 m or 0 to 10 m, 10 to 20 m, etc.) were defined. The structure (Equation (A4)) and autocorrelation (Equation (A6)) function values for each category were calculated. For that, the averaging was performed through all pairs and all observations (in time) within each category. Pairs with a missing observation were excluded at the times concerned. The results were plotted as functions of the distance and depth differences.
An estimate of the observation error variance ${\mathit{\sigma}}^{2}$ for each data-set was obtained by extrapolating the structure function (Equation (A4)) towards the zero distance (to y-axis), and estimating the corresponding value on the y-axis. This gives the double observation error variance, 2${\mathit{\sigma}}^{2}$ = b(0) (Chapter 2 of Gandin, 1965). The total variance of the LWST observations within each category was calculated and the normalized autocorrelation function $\mathit{\mu}$ (Equation (A9)) was obtained using this total variance minus ${\mathit{\sigma}}^{2}$ in denominator of Equation (A9) instead of $\overline{{f}^{{}^{\prime}2}}$. This allows to take into account the influence of the observation errors on the normalized autocorrelation function. The same method of calculation was applied independently for both the distance and depth difference categories but the estimate of the observation error variance was based only on the distance.
First, structure functions and autocorrelation functions of LSWT depending only on distance were obtained from SYKE and MODIS datasets. They are shown in Fig. 5. The MODIS-based structure and autocorrelation functions in comparison with SYKE-based are less smooth and show much larger values. This is due to the larger variability of MODIS data as MODIS observations cover the whole Fennoscandia, whereas SYKE observations cover only Finnish lakes and probably due to the larger observational error of MODIS data. The SYKE-based structure function increases monotonically, until the distance of approximately 800 km, and has a slight maximum after that. It may be interpreted to reach saturation at this distance (for distances larger than 1100 km, we can only extrapolate its behaviour). The corresponding autocorrelation function decreases monotonically. The MODIS-based structure function also increases with distance, but only in general. Still, it may be interpreted to reach saturation at the distance of 600–800 km. The corresponding autocorrelation function decreases in general.
Observation error for both SYKE and MODIS observations was obtained from the corresponding structure functions by the method described in Section 3.2 According to our estimates, the observation error for SYKE data is 0.9 ${}^{\circ}$C, and for MODIS data is 1.2${}^{\phantom{\rule{0.166667em}{0ex}}\circ}$C. Note that the further results are sensitive to these somewhat subjective error estimates.
The normalized autocorrelation functions were calculated from both data-sets taking into account the total variance and observation error variance as explained in Section 3.2 The results obtained for the distance dependency are shown in the lower panel of Fig. 5 and in Table 2. In Table 2, the normalized autocorrelation function values $\mathit{\mu}(\mathit{\rho})$ depending on distance $\mathit{\rho}$, together with the number of available observations (observation pairs at all times) are presented. The SYKE-based values decrease more monotonically but slower than the MODIS-based. The normalized autocorrelation functions were approximated using an exponential function (see Section A.3). An iterational algorithm for least-squares estimation of nonlinear parameters (Marquardt, 1963) was applied for fitting. In general, the best fit was obtained using the influence radius of approximately ${L}_{H}=1010\phantom{\rule{0.166667em}{0ex}}\mathrm{km}$-pagination for the SYKE-based function and ${L}_{H}=1000\phantom{\rule{0.166667em}{0ex}}\mathrm{km}$ for the MODIS-based. However, for the exponential approximation with the best general fit, large errors appear at the short distances and small errors at the large distances. Thus, the tail of the function is described better (not shown).
One way to achieve a better approximation for the short distances is to choose another function (not exponential). Another way is to keep the exponential function, but to concentrate on the small and medium distances, considering the large distances as less important. This means cutting the distances and recalculating the approximation. Based on preliminary testing, it was decided to cut the distances to 900 km for the SYKE data-set and 800 km for the MODIS data-set (with these distances, the SYKE and MODIS data-sets still contain 89 and 76% of available observation pairs). In this case, the best fit for the normalized autocorrelation function was obtained with the horizontal influence radius of ${L}_{H}=1050\phantom{\rule{0.166667em}{0ex}}\mathrm{km}$ for the SYKE-based function, and of $630\phantom{\rule{0.166667em}{0ex}}\mathrm{km}$ for the MODIS-based. These functions are depicted in Fig. 5, lower panel.
The same procedure was applied to obtain a dependency of the normalized autocorrelation function from both distance and difference in depth. Results are displayed in Fig. 6. To keep the exponential approximation (see Equation (A11)) and provide the best fit for the central part of the plot, for the SYKE-based function the tail parts were cut at the distance of 900 km (see Fig. 6a). For this exponential function, the influence radius is ${L}_{H}=1100\phantom{\rule{0.166667em}{0ex}}\mathrm{km}$ for the distance and ${L}_{V}=20\phantom{\rule{0.166667em}{0ex}}\mathrm{m}$ for the difference in depth. For the MODIS-based function, the tail parts were cut off at the distance of 800 km and at the difference in depth of 40 m (see Fig. 6b). The influence radiuses for the appropriate exponential approximation are ${L}_{H}=740\phantom{\rule{0.166667em}{0ex}}\mathrm{km}$ and ${L}_{V}=50\phantom{\rule{0.166667em}{0ex}}\mathrm{m}$.
For comparison, the MODIS-based normalized autocorrelation function, including also the tails, is shown in Fig. 6c. In this case, the influence radiuses in the exponential approximation for the distance and difference in depth are ${L}_{H}=1100\phantom{\rule{0.166667em}{0ex}}\mathrm{km}$ and ${L}_{V}=140\phantom{\rule{0.166667em}{0ex}}\mathrm{m}$. However, in the central part of the plot the approximation errors are very large and the fit is quite poor. The estimated parameter values, especially the depth scale, seem to lose their physical relevance, falling outside of the input data range. This is a possible situation, discussed already by Gandin (1965), when the observational data are too inhomogeneous (e.g. in different parts of the domain, or with respect to other characteristics).
Table 2 shows also the confidence interval and confidence index C${}_{i}$ of the calculated autocorrelation (see Section A.4 for the definitions) in each distance category of SYKE and MODIS observations. The confidence index values are everywhere $>>$ 3, showing that the autocorrelation function values are reliable and statistically significant at the chosen confidence level (0.05). The confidence interval increases and the confidence index decreases with increasing distance because of the decreasing amount of lake pairs available for calculation. Also the two-dimensional autocorrelation function values are reliable (not shown).
The HIRLAM NWP system, version 7.4, (www.hirlam.org) was used to study the sensitivity of the LSWT analysis to the modifications in the autocorrelation function. We limit our study on validating the objective analysis against independent observations. Hence, we run only short (+6h) HIRLAM forecasts to provide background for the next analysis-forecast cycle. For the meteorological impact of the initial LSWT analysis on the forecast we refer to the detailed discussion in a case study over the freezing Lake Ladoga by Eerola et al. (2014).
The model domain, resolution and basic set-up of the system in our experiments were the same as in Kheyrollah Pour et al. (2014b). The surface data assimilation methods and parametrisations of atmospheric and surface processes relevant for this study were described in Eerola et al. (2014). In our experiments, only the surface data assimilation was applied, while the upper air analysis was replaced by the interpolated fields from the ECMWF analyses. It is important that the lake model FLake, which is fully integrated in HIRLAM and provides the background for the LSWT analysis (Rontu et al., 2012), was not used in our experiments. Instead, the previous six-hour LSWT analysis was used as the background for the next analysis. This made it possible to see the influence of observations on the analysis (and forecast) more clearly, since FLake may dominate the result via the background LSWT. However, relaxation to the climatology was necessary in order to prevent the drift of the analysis with time in case when no influencing observations are available.^{2}
In our experiments, the observation error standard deviation in the LSWT analysis was kept at 1.5${}^{\phantom{\rule{0.166667em}{0ex}}\circ}$C, based on earlier work with MODIS observations by Kheyrollah Pour et al. (2014b) and results of this study (Section 3.3). The background error standard deviation of 1.0${}^{\phantom{\rule{0.166667em}{0ex}}\circ}$C was retained. Data quality control rejected the LSWT values which deviated from the background for more than ca. 5${}^{\phantom{\rule{0.166667em}{0ex}}\circ}$C (see Equation (3) in Kheyrollah Pour et al. 2014b). Preliminary tests showed that this tolerance value worked satisfactorily for our dataset. The preliminary tests also revealed the importance of strict separation between the lake and ocean observations, so that ocean observations would not affect the analysis over lakes. Note that most of the lakes in our study were small and shallow when compared to the neighbouring ocean.
Both SYKE (daily) and MODIS (max. 4 overpasses per day) observations, shown in the map of Fig. 1, were used in our experiments. Timing of MODIS observations is not the same as the times of HIRLAM analysis. Thus all available MODIS observations within a $\pm $2-hour window from the analysis time were used. For an independent validation, 10 observations over 8 lakes (8 MODIS and 2 SYKE, Table 3) were treated as passive (not affecting the analysis).
Four HIRLAM experiments (LH80LVNO, LH800LVNO, LH80LV20 and LH800LV20) were defined (see Table 4). In the first two experiments LH80LVNO and LH800LVNO, the autocorrelation function depended only on the distance (Equation (A10)), with length scales of ${L}_{H}=80\phantom{\rule{0.166667em}{0ex}}\mathrm{km}$ and ${L}_{H}=800\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.333333em}{0ex}}\text{km}$, respectively. The value of ${L}_{H}=80\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.333333em}{0ex}}\text{km}$ is historically used in the reference HIRLAM system both for the SST and LSWT analysis. This is a tuning value in the SST analysis, to give a larger weight to the nearby than the (possibly abundant) distant observations. The same value is currently applied for lakes but without any justification. The ten-fold value of ${L}_{H}=800\phantom{\rule{0.166667em}{0ex}}\mathrm{km}$ was chosen to correspond to the suggestions of this study (Section 3.3). In the last two experiments LH80LV20 and LH800LV20, the depth difference between observations or an observation and a grid point was also taken into account using the depth scale of ${L}_{V}=20\phantom{\rule{0.166667em}{0ex}}\mathrm{m}$ (Equation (A11)).
The length of all four experiments was four months, from 1 May to 31 August 2011. The first background values for 1 May were provided by earlier experiments, where FLake was used (Kheyrollah Pour et al., 2014b). We focus on the period when most of the lakes were free of ice and discuss the results between 10 May and 31 August 2011.
Three examples of the response of the analysed LSWT values for changing autocorrelation function are discussed. In all cases, the independent (passive) observations were compared with the objective analysis. The passive observations pass through the quality control but do not affect the analysis. Locations of passive observations, on which we focus here, are shown in red in Fig. 1.
The first example is from Lake Valday, located in Russia (58.0${}^{\phantom{\rule{0.166667em}{0ex}}\circ}$N 33.3${}^{\phantom{\rule{0.166667em}{0ex}}\circ}$E). Its mean depth is 14 m and surface area 97.2 km${}^{2}$. Here, independent MODIS observations were used for validation. In the experiments LH80LVNO and LH80LV20, with the 80 km length scale (see Fig. 7a and c), the analysis was only marginally influenced by the observations because the lake is located far from other observation sites. Therefore, the analysed LSWT was almost totally controlled by the background, which was relaxed towards climatology. In the experiments LH800LVNO and LH800LV20, when using the length scale of 800 km (see Fig. 7b and d), the analysis followed more closely the observations, because now even distant observations influenced the analysis. In spring (May and early June), the results showed improvements also in experiment LH800LV20 compared to LH800LVNO, when the depth difference was taken into account in addition to the distance (Fig. 7b and d). This is because less weight was given to the observations representing deep lake conditions, here especially the deep parts of Lake Ladoga which warm up slowly in spring.
The second example comes from Lake Rehja-Nuasjärvi, situated in the North-Eastern part of Finland ($64.{2}^{\phantom{\rule{0.166667em}{0ex}}\circ}$N 28.0${}^{\phantom{\rule{0.166667em}{0ex}}\circ}$E, Fig. 1). This is a shallow medium-sized lake ($96.4\phantom{\rule{0.166667em}{0ex}}{\mathrm{km}}^{2}$, the mean depth is ca. 10 m). At this location, both SYKE and MODIS observations were available and treated as passive. We compare the analysis against SYKE observations because they are very consistent in time.
There were many LSWT observations within a radius of less than 300 km around the lake. The experiments LH80LVNO and LH80LV20 with the 80 km horizontal length scale (Fig. 8a and 8c) gave a better fit to the independent observations compared to experiments LH800LVNO and LH800LV20 with the 800 km horizontal length scale (Fig. 8b and d), especially in spring. In this case of the large length scale, the analysis corresponded well to the observations after midsummer, while in spring and early summer it was for several degrees too cold, due to the colder observations from North and East. The depth difference did not play a significant role here, as Lake Rehja-Nuasjärvi has approximately the same mean depth as the neighbouring lakes (around 10 metres). Thus accounting for the lake depth difference did not improve the situation.
The third example is a MODIS observation point in the central part of Lake Ladoga (Fig. 1). Here, the independent MODIS observations from this pixel were used for validation. The depth at this point was 39 m, and the nearest observation point represented even slightly deeper conditions (40 – 50 m). Over Lake Ladoga, a dense network of 15 MODIS pixels provided the LSWT measurements, available in clear-sky conditions. Therefore, the experiments LH80LVNO and LH80LV20 with the 80 km length scale produced a reasonable analysis (Fig. 9a and c). In this case, including the depth difference in the autocorrelation function did not change the result significantly because the neighbouring observations represented similar depths.
However, an interesting feature was the following. In experiment LH800LVNO using a 800 km length scale and ignoring the depth difference, the analysis led to too warm LSWT in late May and early June (Fig. 9b). This was because in this experiment, the measurements from the shallow and warm lakes in Finland affected the analysis of the cool and slowly warming Lake Ladoga. Around the 10th of June, when the positive analysis error was the largest, very warm temperatures were observed in the Finnish lakes, see for instance the observations at Lake Rehja-Nuasjärvi (Fig. 8). Taking into account the depth difference in the experiment LH800LV20 improved the analysis by several degrees (Fig. 9d).
The following conclusions can be drawn from the HIRLAM experiments:
In this study, we investigated for the first time an approach of calculating the LSWT autocorrelation function from in situ and satellite-based observations. The data-sets of the LSWT observations used for this purpose contained MODIS data from 44 Fennoscandian lakes (71 pixels) and SYKE in situ data from 27 Finnish lakes for five summers (JJA) 2010–2014. Only summer time observations were selected in order to ensure reliable and homogeneous data for the statistical estimates. Both sets of observations showed Gaussian distribution with a sufficient accuracy. SYKE daily measurements were available with a high regularity (i.e. 98.6% of all possible observations during the five years were used for calculations, $N=12227$). MODIS observations were preprocessed to represent the daily average LSWT at each location. In the case of MODIS, only 63.4% ($N=20694$) of daily values were available for the selected pixels due to cloudy conditions.
The normalized autocorrelation functions depending on distance and depth difference were obtained and their analytic approximations were proposed. The MODIS observations suggested the horizontal length scale of ${L}_{H}=740\phantom{\rule{0.166667em}{0ex}}\mathrm{km}$ and the scale for the difference in the lake depth ${L}_{V}$ equal to ca. 50 m. The corresponding estimates based on SYKE observations were ${L}_{H}=1100\phantom{\rule{0.166667em}{0ex}}\mathrm{km}$ and ${L}_{V}=20\phantom{\rule{0.166667em}{0ex}}\mathrm{m}$. The empirical autocorrelation function values were shown to be statistically significant. Observation error standard deviations of 0.9 and 1.2${}^{\phantom{\rule{0.166667em}{0ex}}\circ}$C for SYKE and for MODIS data were obtained by extrapolation of the appropriate structure functions.
Both the autocorrelation function formulations and observation error estimates are directly applicable to the optimal interpolation of LSWT in the surface data assimilation of NWP models. The length scales of the LSWT autocorrelation function suggested by the observation statistics differ from those applied in the present reference HIRLAM in two respects. First, the horizontal length scale is an order of magnitude larger than the current value. Second, the vertical (depth) scale had not been applied earlier. To introduce it, the one-dimensional analytic exponential function, depending only on distance, should be replaced with the two-dimensional function depending on both the distance and depth difference.
Three-dimensional HIRLAM experiments were performed for the summer (May–August) of 2011 to test the sensitivity of the LSWT analysis to the formulation of the autocorrelation function. All possible data – both SYKE and MODIS observations – were used as input. In order to reveal better the impact of the new formulation of the LSWT structure function on the analysis, the background (first guess) was taken from the previous analysis, not from the lake model FLake forecast (which is used by default in HIRLAM). The resulting analyses were compared to independent observations, which were put aside during the objective analysis. Use of a 10-fold value for the influence radius, i.e. ${L}_{H}=800\phantom{\rule{0.166667em}{0ex}}\mathrm{km}$, allowed to introduce more observations and to obtain the meaningful analysis also for remote lakes. Including the depth-dependency by using the depth scale ${L}_{V}=20\phantom{\rule{0.166667em}{0ex}}\mathrm{m}$ allowed to improve the analysis in cases where shallow-lake observations could lead to the wrong LSWT values over deep waters (as for Lake Ladoga), or vice versa. However, when in addition to the neighbouring observation, there are also distant observations representing very different conditions. If such observations prevail, using a large length scale may deteriorate the result. When there were sufficiently observations, the values of the length and depth scales played a minor role.
Several practical issues were noticed when performing the HIRLAM experiments. It is important to keep ocean and lake observations well separated in the analysis in order to avoid spurious outcome of the analysis. Due to the variability and uncertainty of the satellite observations, it is advisable to combine them with available in situ instrumental LSWT measurements, such as those from SYKE. Making such local observations available for the operational NWP data assimilation, together with the real-time remote sensing data, is a demanding but necessary practical task.
LSWT statistics obtained in this study and the appropriate conclusions are based on summer time observational data over a limited Northern European domain, where lake depth variations are moderate. The results are valid for the summer season and for such a uniform environment, and everything that breaks the uniform conditions may influence the conclusions. In spring, during and after the ice break-up, even close observations may represent quite different conditions. To a lesser extent, the same happens in early winter when lakes are freezing. We suggest studying the autocorrelation function for other seasons (spring, autumn) for the future.
LSWT may have quite a large diurnal cycle (Woolway et al., 2015), which should be addressed in NWP. A prognostic lake parameterization scheme is usually able to reproduce it (see e.g. Mironov, 2008). However, in data assimilation of LSWT, it has not yet been considered. This aspect should be studied specially, which will require observations with high temporal resolution.
Lakes in areas of complex orography would present another example of a non-uniform environment, where accounting for the depth differences may not be sufficient to obtain a reasonable objective analysis of LSWT. Another parameter that could influence the LSWT evolution is the light attenuation coefficient of lake water. This is known both from observations (Williamson et al., 2015) and modelling results (Kourzeneva et al., 2012b; KirillinSPSUNDSCR2010). Satellite methods to measure the transparency of inland water are being developed (Potes et al., 2011). In the future, these observations might be used in NWP. Finally, for the full data assimilation over lakes in NWP, the algorithm (such as the Extended Kalman Filter method described in Kourzeneva (2014)), should be combined with the LSWT objective analysis in order to allow updating the lake model prognostic variables.
^{{ label (or @symbol) needed for fn[@id='FN0003'] }}No potential conflict of interest was reported by the authors.
1 Usually in NWP, in the case when the prognostic parameterization of the underlying surface processes is applied, the data assimilation algorithm consists of two parts. First, the information from the observations is penetrated ‘in horizontal’ (along the underlying surface). For this, the OI method is usually applied. Then, the information is penetrated ‘in vertical’, or inside the prognostic parameterization model space. For this, different advanced techniques exist, such as Extended Kalman Filter.
2 LSWT climatology used in HIRLAM is obtained by extrapolation from the ocean SST (Bringfelt et al., 1995; Alexanderetal1974). Thus it is not reliable for shallow and small lakes. Alternatives, such as ARC-Lake data-set (MacCallum and Merchant, 2011), exist but have not been tried in HIRLAM.
In-situ lake water temperature measurements over Finnish lakes were provided by the Finnish Environmental Institute (SYKE). The support of the international HIRLAM programme is acknowledged.
Alexander , R.C. and Mobley , R.L. 1974 . Monthly average sea-surface temperatures and ice-pack limits for 1 deg global grid . RAND Report R-1310-ARPA.
Bergthórsson , P. and Döös , B. R. 1955 . Numerical weather map analysis . Tellus 7 , 329 – 340 . DOI: https://doi.org/10.1111/j.2153-3490.1955.tb01170.x .
Bringfelt , B. , Gustafsson , N. , Vilmusenaho , P. and Järvenoja , S. 1995 . Updating of the HIRLAM physiography and climate data base . Technical Report 19, HIRLAM . Online at: http://hirlam.org/ .
Brown , L. C. and Duguay , C. R. 2010 . The response and role of ice cover in lake-climate interactions . Prog. Phys. Geogr. 34 , 671 – 704 .
Chelton , D. B. and Wentz , F. J. 2005 . Global microwave satellite observations of sea surface temperature for numerical weather prediction and climate research . Bull. Amer. Meteor. Soc. 86 , 1097 – 1115 .
Choulga , M. , Kourzeneva , E. , Zakharova , E. and Doganovsky , A. 2014 . Estimation of the mean depth of boreal lakes for use in numerical weather prediction and climate modelling . Tellus A 66 , 21295 . DOI: https://doi.org/10.3402/tellusa.v66.21295 .
Cressman , G. P. 1959 . An operational objective analysis system . Mon. Wea. Rev. 87 , 367 – 374 .
Daley , R. 1991 . Atmospheric Data Analysis. Cambridge Atmospheric and Space Science Series Cambridge University Press , New York, NY .
Donlon , C. J. , Martin , M. , Stark , J. , Roberts-Jones , J. , Fiedler , E. and et al. 2012 . The operational sea surface temperature and sea ice analysis (OSTIA) system . Remote Sens. Env. 116 , 140 – 158 .
Duguay , C. R. , Prowse , T. D. , Bonsal , B. R. , Brown , R. D. , Lacroix , M. P. and et al. 2006 . Recent trends in Canadian lake ice cover . Hydrol. Process. 20 , 781 – 801 .
Eerola , K. 1995 . Experiences with the analysis of sea surface temperature, ice coverage and snow depth. HIRLAM 3 Workshop on Soil Processes and Soil/Surface Data Assimilation . Spanish Meteorological Institute , 33 – 37 . Online at: http://hirlam.org/ .
Eerola , K. 2013 . Twenty-one years of verification from the HIRLAM NWP system . Wea. Forecast. 28 , 270 – 285 . DOI: https://doi.org/10.1175/WAF-D-12-00068.1 .
Eerola , K. , Rontu , L. , Kourzeneva , E. , Kheyrollah Pour , H. and Duguay , C. 2014 . Impact of partly ice-free Lake Ladoga on temperature and cloudiness in an anticyclonic winter situation-a case study using a limited area model . Tellus A 66 , 23929 . DOI: https://doi.org/10.3402/tellusa.v66.23929 .
Eerola , K. , Rontu , L. , Kourzeneva , E. and Shcherbak , E. 2010 . A study on effects of lake temperature and ice cover in HIRLAM . Boreal Env. Res. 15 , 130 – 142 .
Eliassen , A. 1954 . Provisional report on calculation of spatial covariance and autocorrelation of the pressure field Inst. Weather Clim. Res., Acad. Sci., Oslo, Tech. Rep , p. 5 .
Fiedler , E. K. , Martin , M. J. and Roberts-Jones , J. 2014 . An operational analysis of lake surface water temperature . Tellus A 66 , 21247 . DOI: https://doi.org/10.3402/tellusa.v66.21247 .
Gandin , L. 1965 . Objective analysis of meteorological fields. Gidrometizdat, Leningrad. Translated from Russian, Jerusalem. Israel Program for Scientific Translations .
Gilchrist , B. and Cressman , G. P. 1954 . An experiment in objective analysis . Tellus 6 , 309 – 318 .
Hollingsworth , A. and Lönnberg , P. 1986 . The statistical structure of short-range forecast errors as determined from radiosonde data. part i: The wind field . Tellus A , 38 , 111 – 136 .
Julian , P. R. and Thiebaux , H. J. 1975 . On some properties of correlation functions used in optimum interpolation schemes . Mon. Wea. Rev. 103 , 605 – 616 .
Kheyrollah Pour , H. , Duguay , C. , Martynov , A. and Brown , L. C. 2012 . Simulation of surface temperature and ice cover of large northern lakes with 1-d models: a comparison with modis satellite data and in situ measurements . Tellus A 64 , 17614 .
Kheyrollah Pour , H. , Duguay , C. R. , Solberg , R. and Rudjord , Ø. 2014a . Impact of satellite-based lake surface observations on the initial state of HIRLAM - Part I: Evaluation of MODIS/AATSR lake surface water temperature observations . Tellus A 66 , 21534 . DOI: https://doi.org/10.3402/tellusa.v66.21534 .
Kheyrollah Pour , H. , Rontu , L. , Duguay , C.R. , Eerola , K. and Kourzeneva , E. 2014b . Impact of satellite-based lake surface observations on the initial state of HIRLAM. Part II: Analysis of lake surface temperature and ice cover . Tellus A , 66 , 21395 . DOI: https://doi.org/10.3402/tellusa.v66.21395 .
Kirillin , G. 2010 . Modeling the impact of global warming on water temperature and seasonal mixing regimes in small temperate lakes . Boreal Env. Res. 15 , 279 – 293 .
Kourzeneva , E. 2014 . Assimilation of lake water surface temperature observations with Extended Kalman filter . Tellus A 66 , 21510 . DOI: https://doi.org/10.3402/tellusa.v66.21510 .
Kourzeneva , E. , Asensio , H. , Martin , E. and Faroux , S. 2012a . Global gridded dataset of lake coverage and lake depth for use in numerical weather prediction and climate modelling . Tellus A 64 , 15640 . DOI: https://doi.org/10.3402/tellusa.v64i0.15640 .
Kourzeneva , E. , Martin , E. , Batrak , Y. and Moigne , P. L. 2012b . Climate data for parameterisation of lakes in numerical weather prediction models . Tellus A. 64 , 17226 . DOI: https://doi.org/10.3402/tellusa.v64i0.17226 .
Kourzeneva , E. , Samuelsson , P. , Ganbat , G. and Mironov , D. 2008 . Implementation of lake model Flake into HIRLAM . HIRLAM Newsletter , 54 , 54 – 64 . Online at: http://hirlam.org/ .
Krinner , G. and Boike , J. 2010 . A study of the large-scale climatic effects of a possible disappearance of high-latitude inland water surfaces during the 21st century . Boreal Env. Res. 15 , 203 – 217 .
Lönnberg , P. and Hollingsworth , A. 1986 . The statistical structure of short-range forecast errors as determined from radiosonde data. Part II: The covariance of height and wind errors . Tellus A , 38 , 137 – 161 .
Lorenc , A. 1981 . A global three-dimensional multivariate statistical interpolation scheme . Mont. Wea. Rev. 109 , 701 – 721 .
Loveland , T.R. , Reed , B.C. , Brown , J.F. , Ohlen , D.O. , Zhu , J. and co authors. 2000 . Development of a global land cover characteristics database and IGBP DISCover from 1-km AVHRR data . Int. J. Rem. Sens. , 21 , 1303 – 1130 . Online at: http://edc2.usgs.gov/glcc/glcc.php .
MacCallum , S. and Merchant , C. 2011 . ARC-Lake v2.0, University of Edinburgh, School of GeoSciences / European Space Agency . Online at: http://hdl.handle.net/10283/88 . Dataset 1991-2011 .
Magnuson , J. J. , Benson , B. J. and Kratz , T. K. 1990 . Temporal coherence in the limnology of a suite of lakes in Wisconsin, U.S.A . Freshwater Biol. 23 , 145 – 159 . DOI: https://doi.org/10.1111/j.1365-2427.1990.tb00259.x .
Marquardt , D. 1963 . An algorithm for least-squares estimation of nonlinear parameters . SIAM J. Appl. Math. 11 , 431 – 441 .
Mironov , D. 2008 . Parameterization of lakes in numerical weather prediction. Description of a lake model. COSMO Technical Report, No. 11 . Deutscher Wetterdienst, Offenbach am Main, Germany, 41 .
Mironov , D. , Heise , E. , Kourzeneva , E. , Ritter , B. , Schneider , N. and et al. 2010 . Implementation of the lake parameterisation scheme FLake into the numerical weather prediction model COSMO . Boreal Env. Res. , 15 , 218 – 230 .
Ngai , K. L. C. , Shuter , B. J. , Jackson , D. A. and Chandra , S. 2013 . Projecting impacts of climate change on surface water temperatures of a large subalpine lake: Lake Tahoe, U.S.A . Clim. change 118 , 841 – 855 .
Niziol , T. A. 1987 . Operational forecasting of lake effect snowfall in western and central New York . Wea. Forecasting 2 , 310 – 321 .
Niziol , T. A. , Snyder , W. R. and Waldstreicher , J. S. 1995 . Winter weather forecasting throughout the Eastern United States. Part IV: Lake effect snow . Wea. Forecasting 10 , 61 – 77 . DOI: https://doi.org/10.1175/1520-0434(1995)010<0061:WWFTTE>2.0.CO;2 .
Osipov , A. 2002 . Econometrics: Educational-methodical complex for distance learning . Siberian Institute of State Service, Novosibirsk. pp. 1 – 173 .
Panofsky , R. 1949 . Objective weather-map analysis. J. Meteor. 6 , 386 – 392 .
Potes , M. , Costa , M. J. , da Silva , J. C. B. , Silva , A. M. and Morais , M. 2011 . Remote sensing of water quality parameters over Alqueva Reservoir in the South of Portugal . Int. J. Rem. Sens. 32 ( 12 ), 3373 – 3388 . DOI: https://doi.org/10.1080/01431161003747513 .
Rodriguez , E. , Navascues , B. and Ayuso , J. 2001 . The tiling surface scheme for HIRLAM5: features and latest results . Proceedings of the SRNWP/HIRLAM Workshop on Surface Processes, Turbulence and Mountain Effects, INM , Madrid , pp. 55 - 63 .
Rontu , L. , Eerola , K. , Kourzeneva , E. and Vehviläinen , B. 2012 . Data assimilation and parametrisation of lakes in HIRLAM . Tellus A. 64 , 17611 . DOI: https://doi.org/10.3402/tellusa.v64i0.17611 .
Samuelsson , P. , Kourzeneva , E. and Mironov , D. 2010 . The impact of lakes on the european climate as stimulated by a regional climate madel . Boreal Env. Res. , 15 , 113 – 120 .
Sattler , K. and Huang , X.-Y. 2002 . Structure function characteristics for 2 metre temperature and relative humidity in different horizontal resolutions . Tellus A 54 , 14 – 33 .
Thiebaux , H. 1975 . Experiments with correlation representation for objective analysis . Mon. Wea. Rev. 103 , 617 – 627 .
Toffolon , M. , Piccolroaz , S. , Majone , B. , Soja , A.-M. , Peeters , F. and et al. 2014 . Prediction of surface temperature in lakes with different morphology using air temperature . Limnol. Oceanogr. 59 , 2185 – 2202 . DOI: https://doi.org/10.4319/lo.2014.59.6.2185 .
Unden , P. , Rontu , L. , Järvinen , H. , Lynch , P. , Calvo , J. and et al. 2002 . HIRLAM-5 scientific documentation: HIRLAM-5 Project, c/o Per Unden SMHI, S-601 76 . NorrkŁoping, Sweden . Online at: http://hirlam.org
Walsh , S. E. , Vavrus , S. J. , Foley , J. A. , Fisher , V. A. , Wynne , R. H. and et al. 1998 . Global patterns of lake ice phenology and climate: Model simulations and observations . J. Geoph. Res. 103 , 28825 – 28837 .
Williamson , C. E. , Overholt , E. P. , Pilla , R. M. , Leach , T. H. , Brentrup , J. A. , et al. 2015 . Ecological consequences of longterm browning in lakes . Sci. Rep. 5 , 18666 . DOI: https://doi.org/10.1038/srep18666 .
Woolway , R. I. , Jones , I. D. , Feuchtmayr , H. and Maberly , S. C. 2015 . A comparison of the diel variability in epilimnetic temperature for five lakes in the English Lake District . Inland Waters 5 , 139 – 154 . DOI: https://doi.org/10.5268/IW-5.2.748 .
Zhao , L. , Jin , J. , Wang , S.Y. , and Ek , M.B. 2012 . Integration of remote-sensing data with wrf to improve lake-effect precipitation simulations over the great lakes region . J. Geoph. Res. 117 , D09102 . DOI: https://doi.org/10.1029/2011JD016979 .
According to Gandin (1965), the analysed value at the grid-point may be expressed as (the arrow sign over the radius-vector r is omitted for simplicity):
where ${f}_{A}$ is the analysed value of a variable f (LSWT in our case), ${f}_{B}({r}_{i})$ is the background value of f at the grid-point ${r}_{i}$ and ${f}_{O}({r}_{k})$ and ${f}_{B}({r}_{k})$ are the observed and background values, respectively, at the observation point ${r}_{k}$. K is the number of influencing observations, I is the number of grid-points, and ${W}_{k}$ are the weights given to each observation increment $[{f}_{O}({r}_{k})-{f}_{B}({r}_{k})]$. Weights are defined by minimization of the interpolation error. Minimization leads to the following system of K linear equations at each grid-point ${r}_{i}$ from which the weights can be found:
where $\mathit{\mu}({r}_{i},{r}_{k})$ is the normalized autocorrelation function of the analysed variable between points ${r}_{i}$, ${r}_{j}$, and $\mathit{\eta}$ is the normalized observation error variance:
Here, ${\mathit{\sigma}}^{2}$ is the observation error variance and $\overline{{f}^{{}^{\prime}2}}$ is the variance of the analysed variable (LSWT) or the background error variance. The observation error is assumed to depend only on the observation type, not on the location of the observation.
The mathematical definition of the structure and autocorrelation functions are given here according to Gandin (1965). If ${r}_{1}$ and ${r}_{2}$ are two points in space where the considered value is defined, then the structure function $b({r}_{1},{r}_{2})$ reads
where the over bar denotes a sample average (the time and category average in our case), and the deviations of f from its mean value $\overline{f}$ are defined as
The autocorrelation function is defined as the mean product of the deviations from the mean of the same variable at two points:
There is a simple relationship between the structure and autocorrelation functions:
If the field is homogeneous and isotropic, the structure and autocorrelation functions between two points do not depend on the location of each point ${r}_{1}$ and ${r}_{2}$ but only on distance $\mathit{\rho}$ between them.
The normalized autocorrelation function is defined as:
or
Note that the autocorrelation coefficient m(0), used for normalization in Equation (A8), equals to the variance $\overline{{{f}^{\prime}}^{2}}$ because it is the correlation between the same variables at the same locations.
Usually in OI for NWP, an analytical approximation of the autocorrelation function is used. Very often the autocorrelation function is approximated by a Gaussian function depending on the horizontal distance. For example, the present HIRLAM system uses the following analytical approximation for the autocorrelation function for SST depending on the horizontal distance ($\mathit{\rho}$) (Rodriguez et al., 2001):
where ${L}_{H}$ is a horizontal length scale (influence radius).
Note that an important internal characteristic of the autocorrelation function is the positive definiteness. Because of that, not any but only a positive definite analytical function can be fitted to the autocorrelation function obtained from observations. Exponential function is positive definite, which is proved e.g. by Gandin (1965). To avoid proving the positive definiteness of other possible analytical representations, we will still approximate our empirical results with the exponential function but optimize the length scale (or scales).
An exponential function has a natural physical interpretation as the scales which can be related to distances and differences. For the two-dimensional autocorrelation function, a two-dimensional exponential approximation may be suggested, for example:
where $\mathit{\delta}$ is the vertical distance (or difference in depth) and ${L}_{V}$ is the vertical length scale.
For the evaluation of confidence, the root mean square error ${\mathit{\sigma}}_{\mathit{\mu}}$ of the calculated $\mathit{\mu}$ can be estimated from
where N is the sample size: in our case, number of observations within each category over all times. This equation is an approximation (Osipov, 2002) based on the assumption of a Gaussian distribution of the variable whose autocorrelation is calculated. Confidence index C${}_{i}=|\mathit{\mu}|/{\mathit{\sigma}}_{\mathit{\mu}}$ is used to estimate the statistical significance of the autocorrelations. At the significance level of 0.05, a value of C${}_{i}>$ 3 is required for a reliable result. In addition, the confidence interval of $\mathit{\mu}$ is calculated at the chosen significance level.