Start Submission Become a Reviewer

Reading: Pattern-based statistical downscaling of East Asian Summer Monsoon precipitation


A- A+
Alt. Display

Original Research Papers

Pattern-based statistical downscaling of East Asian Summer Monsoon precipitation


Thorsten Simon ,

Meteorological Institute, University Bonn, DE
X close

Andreas Hense,

Meteorological Institute, University Bonn, DE
X close

Buda Su,

National Climate Centre (NCC), Beijing, CN
X close

Tong Jiang,

National Climate Centre (NCC), Beijing, CN
X close

Clemens Simmer,

Meteorological Institute, University Bonn, DE
X close

Christian Ohlwein

Meteorological Institute, University Bonn, DE
X close


This study identifies daily Meiyu-like East Asian Summer Monsoon patterns that are linked to precipitation observations in the Poyang lake catchment. This analysis provides insight into the dynamics of strong, local precipitation events and has the potential to improve projections of precipitation from coarse-grid numerical simulations. Precipitation observations between 1960 and 1999 are taken from 13 rain gauges located in the Poyang lake catchment, which is a sub-catchment of the Yangtze River. The analysis shows that the observations are linked to daily patterns of relative vorticity at 850 hPa (Vo850) and vertical velocity at 500 hPa (W500) taken from the ERA-40 re-analysis data set. The patterns are derived by two approaches: (a) empirical orthogonal function (EOF) analysis and (b) rotated EOF analysis. Vo850 and W500 refer to geostrophic and ageostrophic processes, respectively. A logistic regression connects the large-scale dynamics to the local observations, whereby a forward regression selects the patterns best suited as predictors for the probability of exceeding thresholds of 24 h accumulated rainfall at the gauges. The regression model is verified by cross-validation.

The spatial structure of the detected patterns can be interpreted in terms of well-known meso-α-scale disturbances called Southwest vortices. Overall, the proposed EOF and rotated EOF patterns are both related to physical processes and have the potential to work as predictors for exceedance rates of local precipitation in the Poyang catchment.

How to Cite: Simon, T., Hense, A., Su, B., Jiang, T., Simmer, C. and Ohlwein, C., 2013. Pattern-based statistical downscaling of East Asian Summer Monsoon precipitation. Tellus A: Dynamic Meteorology and Oceanography, 65(1), p.19749. DOI:
  Published on 01 Dec 2013
 Accepted on 24 Jan 2013            Submitted on 20 Sep 2012

1. Introduction

In summertime, the East Asian Summer Monsoon (EASM) dominates climate and weather over the Yangtze Basin, China. The so-called Meiyu rain belt – typically occurring between early June and mid-July – stretches across the Yangtze valley and extends east to the southern parts of Korea and Japan. The Meiyu over the Yangtze is the same atmospheric phenomenon as the Baiu in Southern Korea and the Changma in Southern Japan (Wang, 2006).

The heavy rain events caused by the EASM and their strong variability have a high impact on water resource management, agriculture, and land use planning (Piao et al., 2010; Zhang et al., 2012), play an important role in the reinsurance industry and emergency management (Zong and Chen, 2000; Zhai et al., 2005; Kron, 2009), and even affect air quality over China (Zhao et al., 2010a). This makes the EASM a key factor influencing economic development in the region. Therefore, it is crucial to improve our knowledge about processes behind the EASM and its relation with heavy precipitation events.

Existing EASM indices can be clustered into 4–5 classes depending on definition and physical basis (Wang et al., 2008). The processes reflected by these indices are mainly the thermal contrast between the landmass and the adjacent seas (Guo, 1983; Webster and Yang, 1992) and the strength of the subtropical high-pressure system over the western North Pacific (Wang and Fan, 1999; Li and Zeng, 2002). However, all of these indices are defined on the seasonal scale and are thus less suited to reflect monsoon dynamics on the daily scale. Zhao et al. (2010b) present a daily scale index based on the difference between sea level pressure (SLP) anomalies of two boxes, with one box located over the East China Sea and one over China.

The Meiyu rain belt is associated with a quasi-stationary subtropical front separating the warm and moist air located over the tropical Southern Chinese Sea (SCS) from the cold and dry air formed over the mid-latitude landmass (Ding and Chan, 2005; Wang, 2006). This large-scale synoptic system is further characterized by low-level vortices, which are generated locally over the Tibetan plateau, and mainly visible on the 850 and 700 hPa pressure levels. Vortices that originate over the southeastern part of the plateau are called southwest vortices (SWVs) as they build up in the southwest of China (Ding and Chan, 2005).

SWV genesis can be described conceptually as follows. In spring and even more so in summer, the plateau receives strong solar radiation leading to strong convective instability given sufficient moisture supply. A mesoscale cyclonic circulation can develop, which supports convection (Wang, 2006). Additionally, baroclinic instability might instigate the generation of SWVs. The front between the warm and moist air from the south – or more precisely from the Indochinese peninsula and the SCS – and the cold and dry air from the mid-latitudes preconditions the development of SWVs (Chen and Dell'Osso, 1984). SWVs may take different tracks through China and the adjacent seas. Most SWVs move eastward and can cause heavy rainfall in the Yangtze Basin. Others might take a more northward path and can cause severe rainfall in northern China (Ding et al., 2001).

SWVs have been investigated many times over the last three decades. Comprehensive overviews on SWVs can be found in Wang (2006) and Chang (2004). Several studies focus on the analysis of SWVs during strong rainfall periods. The following model-based analyses of SWV events are of particular relevance: the formation and propagation of a SVW in July 1979 (Wang and Orlanski, 1987), the atmospheric processes during the Sichuan flood in July 1981 (Kuo et al., 1988), and the onset of the EASM in May 1992 (Chang et al., 2000). The latter study also investigates the role of orography and highlights its importance for vortex genesis and pathway (Chang et al., 2000). Tao and Ding (1981) analysed observed heavy rainfall events during 1931 and 1980 and the relevance of the Tibetan plateau for the associated atmospheric processes. During the GAME/HUBEX field experiment – an intensive hydrological and meteorological observation project taking place over eastern China in 1998 and 1999 – strong rainfall occurred over the Yangtze valley, which could be directly linked to SWVs (Ding et al., 2001).

In order to quantify the relationship between daily local precipitation and regional monsoon dynamics in a probabilistic context, a statistical downscaling framework was developed. To this end, we post-process the GCM dynamics for the following reasons: A grid box of a global GCM output represents an area of more than 40 000 km2 (Roeckner, 2003; Uppala et al., 2005). Studies investigating the energy spectra within numerical weather prediction (NWP) models found an effective resolution of 4–7 grid boxes (Skamarock, 2004; Bierdel et al., 2012). It is reasonable to consider a similar effective resolution for GCMs used for climate scenarios or re-analyses. Unresolved subgrid processes like precipitation have to be parameterised often leading to systematic biases (Murphy, 1999; Wilby and Wigley, 2000). The uncertainty of GCM output is only assessable at high computational costs by ensembles (Schölzel and Hense, 2011). Furthermore, due to the horizontal resolution, local extreme events are only weakly depicted in GCM output.

To overcome some of these problems, many studies choose a dynamical downscaling approach via regional climate models (RCMs) (Frei et al., 2006; Gao et al., 2008; Park et al., 2008). RCMs are able to simulate dynamics on a finer scale, and come up with many benefits; for example, RCMs are physically motivated and provide a physically consistent dataset with cross-correlations between the different atmospheric variables. Nevertheless, new disadvantages come up with this approach. A new error source is introduced by the different ways boundary conditions are handled (Ebell et al., 2008; Mesinger and Veljovic, 2013) and sets of parameterisations are selected (Bachner et al., 2008). In particular, precipitation can show significant biases (Lindau and Simmer, 2013). Due to the high computational costs, RCMs are integrated usually with a one-way-nesting approach. Therefore, teleconnections are suppressed (Wang et al., 2005). In the case of the EASM, relevant processes cannot take place, for example, the interaction between the Meiyu rainbelt and the SST of the adjacent seas (Wang et al., 2005) or the coupling with ENSO (Wang et al., 2000).

Statistical downscaling is an alternative way to overcome some of the above-mentioned drawbacks of GCMs. The outcome is unbiased as the statistics are trained on observations. Depending on the target quantity an exceedance rate model (Zhai et al., 2005), a model for quantiles (Bremnes, 2004; Friederichs and Hense, 2007) or a description of the full probability density function (PDF) (Benestad et al., 2012) can be realised. Downscaling can improve the representation of extremes, especially when extreme value theory is applied (Bentzien and Friederichs, 2012). Statistical downscaling circumvents the low resolutions of GCMs, as the statistics are adapted to local measurements. By using amplitudes of spatial patterns instead of time series of single grid boxes as predictors, the downscaling approach can accommodate non-local influences in space and time. Such approaches are thus termed dynamical-statistical methods (Benestad, 2004; Maraun et al., 2010).

Besides regression techniques, other methods can be applied to estimate local PDFs of precipitation. Orlowsky et al. (2010) applied an analogue resampling scheme to observations from the Yangtze valley. Cooley et al. (2007) used a Bayesian hierarchical model to assess high precipitation return levels, which can also be used for downscaling. Stochastic weather generator techniques are also very popular in the field of statistical downscaling (Wilby et al., 1998; Vrac and Naveau, 2007).

In this paper, a statistical downscaling model is proposed, which predicts the probability of exceeding local precipitation thresholds. As covariates we test spatial patterns associated with the monsoon dynamics on a daily scale. Both observational data from rain gauges in the Poyang catchment and re-analysis data (cf. Section 2) are used for the derivation of the predictors and the setup of the statistical model (cf. Section 3). The results of the downscaling approach (cf. Section 4) are discussed in Section 5, with the main focus set on the physical interpretation of the predictors. Section 6 contains concluding remarks.

2. Data

The study region in China is the catchment of the Poyang Lake, a sub-catchment of the Yangtze River, located in the Jiangxi province. This region roughly extends from 114°E to 119°E longitude and from 26°N to 30°N latitude. Thirteen rain gauge stations are available, which are located at altitudes ranging from 30 to 144 m (Fig. 1). Annual precipitation amounts vary from 1435 to 1850 mm with 22–29% occurring in the major rain season during June and July, which is associated with the Meiyu rain belt. Therefore, the focus of this study is set on this season. Daily precipitation totals from the 13 stations are available from 1960 to 1999.

Fig. 1.   

Location of the Poyang catchment (light grey area) and the 13 rain gauges (dark grey points) in China.

To link observed precipitation to different atmospheric properties, the ERA-40 re-analysis, with the resolution of 1,875°×1,875°, is chosen (Uppala et al., 2005). According to Skamarock (2004) and Bierdel et al. (2012), covariates of the statistical model should be related to atmospheric dynamics on larger scales than the grid resolution of ERA-40. Therefore, a region of ERA-40 has been selected that envelopes the Poyang catchment spaciously. The corners of the envelope are 10°N, 100°E, 40°N and 130°E. The Poyang catchment is not at the centre of this region, which is slightly shifted to the southwest as atmospheric disturbances are anticipated to develop in the southwest before they propagate parallel to the Yangtze valley. The ERA-40 output is taken for the same period (1960–1999) as station data is available. The following abbreviations will be used for the output variables: UV850 – horizontal windfield at 850 hPa, W500 – vertical velocity in pressure coordinates at 500 hPa, TCW – total column water content, Vo850 – relative vorticity at 850 hPa. The physical motivation for this pre-selection of the variables is given in the results section.

3. Methods

One aim of this study is to set up a statistical model for the probability of rainfall threshold exceedance at rain gauges depending on large-scale predictors. To this end, the rainfall observations are transformed to binary time series via

(1 )
where R24h is the locally observed 24 h accumulated rainfall and u denotes the threshold. For the region of interest – the Poyang catchment in China – the thresholds u={1,5,10,25,50}[mm] lead to reasonable exceedance rates (Fig. 2).

Fig. 2.   

Climatological exceedance rates conditional on the different thresholds. Each box contains the expectation values E(y) for the 13 stations.

To this end the ERA-40 output for the selected area is processed by emperical orthogonal function (EOF) analysis (Hannachi et al., 2007). To account for cross-correlations between the different variables, the fields of different variables are combined as input to the EOF analysis, i.e. our data vectors have the spatial dimension nx×ny×nv, where nx=ny=16 is the number of grid boxes in the x and y direction and nv is the number of variables, which depends on how many variables are combined (cf. Fig. 3). To take account of different units for different variables, each variable is scaled by its standard deviation, calculated simultaneously over space and time. The resulting EOFs have the same length and contain sub-vectors with spatial patterns for each variable. As EOF analysis is a purely statistical construct, it does not necessarily result in physically meaningful patterns (e.g. Dommenget and Latif, 2002). In addition to EOFs, varimax rotated EOFs (vEOFs) are used as predictors. Basically, the varimax rotation provides more localised patterns than the EOF analysis and avoids the generation of high-order multi-poles. The varimax rotation was performed in such way that, the spatial patterns are no longer orthogonal, but the coefficient time series are uncorrelated, which is also the case for the EOFs. Another characteristic of both methods is that the principal components or EOF/vEOF amplitudes tend to be normally distributed. A comprehensive overview of multivariate statistical techniques can be found in Wilks (2011) and Von Storch and Zwiers (2002). Deeper insights into EOFs are given by Jolliffe (2002).

Fig. 3.   

Comparison of different setups for the EOF analysis and the subsequent forward regression. Each box-whisker-plot shows the distribution of the Brier skill score over all stations in the catchment. These results are based on a threshold of u =25 mm. The letters refer to the different input sets: A – UV850, B – W500, C – TCW, D – Vo850, E – Vo850 and W500, F – Vo850 and W500 and TCW. Note: The boxes are grey shaded only for visual convenience.

To combine the binary time series of rainfall events above a given threshold [eq. (1)] with the EOF/vEOF patterns, a logistic regression is applied, which assumes that the events are drawn from a Bernoulli-distributed random variable Y∈{y=0, y=1} and Y~Be(p)=py(1−p)1−y. Here, p is the exceedance probability of the rainfall event. The logistic regression assumes a linear model between the logit transformed exceedance probability p and the predictor values,

(2 )
where pi denotes the probability of threshold exceedance, xi1,…,xim the predictor time-series and β0,β1,…,βm the model parameters. The log-likelihood for a Bernoulli-distributed random variable is expressed by
(3 )

in combination with eq. (2) (McCullagh and Nelder, 1989, chap. 4). The estimation of the model parameters β0,β1,…,βm is performed by the R function glm() (R Development Core Team, 2011), as the logistic regression is a special case of a generalised linear model (GLM).

As the decomposition techniques (EOF and vEOF) still generate an awkward number of effective modes (Bretherton et al., 1999, eq. 4), a selection of covariates has to be performed in order to avoid over-fitting. This selection is divided into two parts. In the first part a forward selection is performed. Each pool of potential predictors includes principal components of one EOF or vEOF analysis. At each step, the best yet unselected covariate maximising the log-likelihood is added until no further covariates remain. As the second part of the selection, a stopping rule for the predictor chain has to be found. This is achieved by testing each model on a set of independent data via cross-validation. The predictor chain would be truncated at the point where the skill of the model does not increase with the addition of more predictors (Wilks, 2011, chap. 7.4). More details of the application of the stopping rules are given further on.

To avoid conflicts with temporal autocorrelations of the EASM, a four-fold cross-validation (Michaelsen, 1987; Efron and Tibshirani, 1993) is performed with four sets with 30 yr training periods and one decade for verification. This timescale is chosen as the EASM is strongly linked to ENSO varying with a period of 3–7 yr (Neelin et al., 1998; Wang et al., 2000). The cross-validation includes the following steps: a) EOF analysis of ERA-40 data for the training period, b) forward regression via log-likelihood to determine the order of the covariates, c) calculation of skill scores from the independent part of the data, and d) averaging the skill scores over the four verification decades. This strategy does not only validate the statistical model itself but also the derivation of the predictor time-series by EOF analysis (Von Storch and Zwiers, 2002; Hastie et al., 2008).

In the following sections, the goodness of the models is expressed by the Brier skill score (BSS) (Brier, 1950; Gneiting et al., 2007)

(4 )
where BSref is the Brier score of the climatology E(y). In addition, as the probability of exceeding the 90% quantile (cf. Fig. 2) is modelled, the Winkler score is applied (Winkler, 1994; Gneiting et al., 2007),
(5 )
where H is a heavyside function, which is zero for pic and one for pi>c. The value of c∈(0,1) works as a reference probability. The WS is an asymmetric scoring rule and serves as an additional verification for models, for which the climatological exceedance rate is far away from E(y)=0.5 (cf. Fig. 2). Note that eq. (5) shows a special case of the WS derived from the BS. In general, the WS could also be derived from any other score. As a reference for the scoring rules, the mean probability over the training period (“climate”) is chosen (Fig. 2). All scoring rules applied in this study are strictly proper scoring rules, which means that the scoring rule is maximised if and only if the forecast equals the observation (Gneiting et al., 2007).

For a more detailed validation of a statistical model the reliability diagram is calculated, which exhibits the joint distribution of forecast and observation via calibration-refinement factorisation. For each pre-defined forecast probability pk={0.05,0.15,…,0.95}, both the conditional probability that an event has been observed Pr(y=1∣pk) and the relative frequency of a forecast probability Pr(pk) are calculated. The conditional probability and the relative frequency are called calibration curve and refinement curve, respectively. The calibration curve of a perfectly calibrated model lies exactly on the diagonal. The model has a high confidence, if the refinement curve exhibits a U-shape. That means, very low and very high probabilities are predicted in the majority of cases (Wilks, 2011).

4. Results

Different sets of ERA-40 output variables are compared as input for the downscaling procedure, i.e. the EOF analysis and the forward regression. The variables are either well known for describing monsoon dynamics on the seasonal timescale [UV850 cf. Wang and Fan (1999)] or typical variables for downscaling precipitation [W500, TCW, Vo850 cf. Friederichs and Hense (2007)]. To consider cross-correlations between variables, they are combined as input for the EOF analysis. The resulting principal components are also tested as predictors for the logistic regression. It was found that a combination of Vo850 and W500 performs best while no gain in skill is achieved by adding TCW to the set (Fig. 3). Models with no predictors lead automatically to the climatological exceedance rates (Fig. 2) and therefore result in zero skill. Not all tested combinations are shown in Fig. 3.

To gain more insight into the (skill) scores conditioned on the chosen threshold, the dependence of both the symmetric BSS and the asymmetric WS on the threshold is shown in Fig. 4. Each boxplot exhibits the distribution of (skill) scores over the 13 stations in the Poyang catchment after performing the cross-validation. Two predictors of the common EOF analysis of the variables Vo850 and W500 are used as covariates (more details on the selection are given in the next paragraph). Therefore, the BSS boxplot for u=25 mm in Fig. 4 is equal to the boxplot with two predictors of group E in Fig. 3. The BSS decreases for higher thresholds as low-probability events are considered at high thresholds (Fig. 2). In contrast, the WS increases for higher thresholds, because it accounts for the asymmetric character of the response time series with high thresholds. Note, all models in this figure are significant at a 1% level with respect to climatology, which was checked by a likelihood-ratio test.

Fig. 4.   

Comparison of the symmetric Brier skill score (dark grey shaded boxes) and the asymmetric Winkler skill score (light grey shaded boxes) conditioned on the different thresholds. Each box-whisker-plot shows the distribution of one skill score over all stations in the catchment. The first and the third EOF mode of the set Vo850 and W500 were used as predictors.

Two aspects of the cross-validation are analysed in detail: The effect of adding predictors to a statistical model and the spread of skill and reliability resulting from the cross-validation method. The results, shown in Fig. 5, refer to the station at Yushan (No. 58634) and a threshold of u=25 mm. The skill increases with increasing numbers of predictors (Fig. 5a). The maximum in the cross-validation curve, from which a truncation criterion for the predictor chain would be expected, occurs between 15 and 20 predictors. However, the strong increments of the first two selected predictors, referring to the 1st and 3rd EOF, show the major relevance of these patterns in terms of precipitation exceeding a specified threshold. The ongoing improvement of skill indicates that not all relevant processes for rainfall events can be reduced to a small number of modes. A further and even stronger indication for the major relevance of these two modes and a truncation of the predictor chain after these modes is that the 1st and 3rd EOF are selected first during the forward regression for nearly all stations in the catchment (Table 1) and thresholds (not shown). The reliability diagram (Fig. 5b) for the model with the 1st and 3rd EOFs as predictors is reasonably good. The calibration curve lies on the diagonal, but exhibits also some over-estimation of the model conditioned on the forecast probability 0.6<pk<0.8. The refinement curve shows that the forecast probability lies between 0 and 0.1 in the majority of cases, but lacks a peak in the box for the highest forecast probabilities.

Fig. 5.   

Validation of the probability of threshold exceedance model with Vo850 and W500 as input variables of the EOF analysis. The predicted was calculated from the precipitation observations at Yushan (No. 58634) with the threshold u=25 mm [cf. eq. (1)].

The spatial patterns of the modes with major relevance in the statistical analysis can be related to EASM dynamics. Figures 6 and 7 show the 1st and 3rd EOF, respectively. These modes were selected for all stations as first and second predictor with only one exception (Table 1). For the station in Ganzhou (No. 57993), the 6th mode was selected as second predictor. Furthermore, these two predictors led to the strongest increase of the scores (Fig. 5a). The vertical velocity field of the 1st EOF exhibits a strong pole over the Yangtze valley extending to the adjacent sea. The sign convention is chosen such that a negative spatial amplitude combined with a positive temporal amplitude indicates rising motion. Similarly located is a band of positive relative vorticity that is through the same sign convention associated to cyclonic disturbances. Rising motion and cyclonic disturbances are typical synoptic features of SW vortices as described in the introduction. However, there is a strong counter-pole in both fields over the SCS. Though this counter-pole is part of the 1st EOF, neither the relative vorticity nor the vertical velocity over the SCS can be associated with any process related to the EASM.

Fig. 6.   

The first EOF mode of ERA-40 output variables relative vorticity at 850 hPa and vertical velocity at 500 hPa. Explained variance 8.3%.

Fig. 7.   

The third EOF mode of ERA-40 output variables relative vorticity at 850 hPa and vertical velocity at 500 hPa. Explained variance 4.4%.

The 3rd EOF (Fig. 7) exhibits a composition of vertical velocity and relative vorticity over the Yangtze valley similar to the 1st EOF, but with a slightly different angle of its main axis. This belt is also located over the Yangtze valley, but its extension partly covers Southern Korea. It has more southwest to northeast direction in contrast to the belt in the 1st EOF, which is almost W–E aligned and does not cover Southern Korea. Furthermore, the pole in the 3rd EOF extends to a horseshoe-like pattern further south into the SCS. The horseshoe appears for both Vo850 (blue) and W500 (red).

Figure 8 shows the model response to the predictors derived from ERA-40 from the year 1998. This year is particularly interesting as a great flood occurred along the Yangtze River (Zong and Chen, 2000). The models for the different thresholds were trained on the period from 1960 to 1989. The good performance of the model – and with it the quality of the predictors – is obvious. As the same predictors were used for the model with different thresholds, no artificial crossing of the predicted probability curves occurs.

Fig. 8.   

Example application of the statistical model with EOF predictors for the rain gauge at Yushan (No. 58634). The training period is from 1960 to 1989. The model is applied to 1998. Dashed lines denote climatological exceedance probabilities. Solid lines denote modelled exceedance probabilities. The intensity of the lines stands for the different thresholds, from light to dark u={1,5,10,25,50} (mm). The dots show the observed precipitation at the rain gauge.

The varimax rotation of the first 25 EOF patterns leads to another decomposition of the data. The first 25 patterns were passed to the rotation, because that is the number of effective modes of the corresponding EOFs (Bretherton et al., 1999, derived by eq. 4). Again a forward selection is applied. Fig. 9 exhibits the BSS dependent on the number of predictors in the order that was determined by the forward selection. In contrast to the result of the forward regression with the EOF predictors, only one predictor with major impact can be found. The second vEOF mode was selected as first covariate throughout the stations, before the forward selection starts picking covariates in an arbitrary order that leads only to small increases in skill. The reliability curve correspond to the logistic model with one rotated EOF-predictor (Fig. 9b). The calibration curve exhibits a slight over-estimation for forecast probabilities greater than 0.4. Like the refinement curve of the EOF model (Fig. 5), the refinement curve of the varimax model lacks a peak for high forecast probabilities. However, overall the reliability diagram looks reasonable good.

Fig. 9.   

Validation of the probability of threshold exceedance model, same as in Fig. 5, but with Vo850 and W500 as input variables of the VARIMAX rotated EOF analysis. The predicted was calculated from the precipitation observations at Yushan (No. 58634) with the threshold u=25 mm [cf. eq. (1)].

The spatial pattern of the selected varimax mode display the already described relation between vertical velocity on 500 hPa and relative vorticity at 850 hPa (Fig. 10). The pattern is similar to the structure of the 1st EOF, but with the extension to the east less pronounced. However, the counter-pole over the SCS has vanished. Therefore, the spatial structure is more interpretable in a synoptic sense (cf. Discussion).

Fig. 10.   

The second vEOF mode of ERA-40 output variables relative vorticity at 850 hPa and vertical velocity at 500 hPa. Explained variance 6.3%.

5. Discussion

With the combination of rising motion (W500) and cyclonic disturbances (Vo850) on the meso-α-scale, both EOF predictors (Figs. 6 and 7) and the varimax predictor (Fig. 10) are consistent with the concept of SWVs discussed above. As SWVs propagate along different paths through East China, the orientation of the main axis of the Meiyu rain belt vary also. However, the main axis of each predictor pattern coincides with potential paths of the vortices and therefore with the location of the Meiyu rain belt (Ding and Chan, 2005; Wang, 2006). The SWVs develop over the southwestern part of the Tibetan plateau forced by enhanced radiative input. Afterwards, the vortices travel within the Meiyu belt often causing heavy rainfall events.

Relative vorticity and vertical velocity refer to geostrophic and ageostrophic processes, respectively. The statistical link found between both dynamical patterns and local precipitation comes with a physical meaning. Locally available precipitable water is not sufficient for the generation of heavy rainfall events (Trenberth et al., 2003). Therefore, other processes acting on a specific time scale have to play a crucial role for their generation. One of these important processes for heavy precipitation events in the Poyang catchment are the SW vortices. This relation is made clear by the performance of statistical models linking observed precipitation occurrence and predictors that quantify the strength of SW vortices.

The SWVs can also be found in the horizontal windfield at 850 hPa (UV850) by analysing single events (cf. Supplementary Material). However, it was not possible to extract a SWV-like pattern from UV850 by an EOF analysis. The leading EOF mode of UV850 (explained variance of 23.3%) is related to the suptropical high over the western North Pacific and therefore similar to the index of Wang and Fan (1999).

In order to verify the physical meaning of the spatial pattern introduced above, a simple box correlation analysis is applied [cf. Dommenget and Latif (2002)]. The vertical velocity is averaged for the grid boxes covering the Poyang catchment. This area is highlighted by a box in Fig. 11b. The gridpoint-wise correlations of relative vorticity on 850 hPa and vertical velocity on 500 hPa (Fig. 11) exhibit nearly the same pattern as the varimax predictor with a correlation of −0.88 between both patterns. (Note: By construction the sign of the varimax pattern is arbitrary. Therefore, the sign of the correlation is unimportant.) This finding supports the hypothesis that varimax pattern can be linked to the synoptical processes driving (heavy) precipitation events in the Poyang catchment.

Fig. 11.   

Gridpoint-wise correlation to the averaged vertical velocity over Poyang (box).

What is the drawback of the varimax predictor model, despite its higher physical relevance? After all, the skill score for the one varimax predictor model is slightly less than for the two EOF predictor model. The EASM is a complex system. A model including too many simplifications – like the reduction to one predictor – might not supply enough degrees of freedom to cope with the high dimensionality of the system.

In the end, it comes down to a trade-off between amplitude (rotated EOF) and variability (EOF). A single predictor can only account for variations in the strength of the pattern. This works well with the varimax pattern (Fig. 10), as it represents an isolated process. However, a single predictor cannot consider more complex variations like a tilt in the main axis of the rain belt. This variability of the Meiyu belt, which can extend over East Asia with a nearly W–E direction or can be tilted further North in its eastward extension, is described in detail by Ding and Chan (2005). The linear combination of the two EOF patterns (Figs. 6 and 7) can account for such variability as the two convection belts in the EOFs exhibit different angles.

As a conclusion, each of the predictor set comes with its own benefits and drawbacks. This conclusion is corroborated by the correlation between the linear predictor [the right hand side of eq. (2)] of both sets, which does not exceed 0.55. There is a subset for which one method outperforms the other and vice versa: The EOF set is able to cope with the variability of the main axis of the Meiyu belt, but gets disturbed by structures over the SCS, which cannot be associated to processes causing precipitation over the Yangtze valley. Instead, the varimax pattern can be fully interpreted in terms of Meiyu dynamics, but is unable to account for the high dimensional variability in the pattern.

In an application, one would probably neglect the differences in physical interpretation between EOF and rotated EOF predictors and truncate the predictor chain when it reaches its maximum in skill. Note, that if the leading 25 principal components are used as predictors for a model, the skill of the model would be equal to the skill of a model with the 25 modes of the corresponding varimax set. However, one aspect of this study was to show how the physical meaning of the leading predictors depends on the underlying method (EOF vs. rotated EOF). Furthermore, in some applications one might want to keep the physical meaning in order make the application interpretable in terms of EASM dynamics on daily scale.

6. Conclusion

A variety of downscaling techniques have been introduced over the last two decades (e.g. Wilby et al., 1998; Benestad, 2004; Friederichs and Hense, 2007; Vrac and Naveau, 2007). It is now up to the climate research community to select physically meaningful predictors (Maraun et al., 2010; Wilks, 2011). This study presents spatial patterns to explain the dynamics of the EASM on the daily scale. Furthermore, a statistical model for the probability of local precipitation exceeding a certain threshold was set up, taking advantage of these predictors. A cross-validation experiment is performed to verify the overall downscaling scheme – not only the relation between the response and the predictor, but also the generation of the predictor time series, which is the identification of atmospheric processes in the re-analysis output.

It is shown that downscaling procedures should not be treated like a black box, but still require a sensitive analysis of the single steps: Quality assurance of observational data, model selection and predictor selection. This paper discusses two strategies for the latter step. EOF analysis and varimax rotation of the EOF patterns were used to extract predictors from re-analysis output. Though both techniques lead to skilful predictors, the physical meaning of the patterns with major relevance differs from one to the other technique. The EOFs can account for variability of the direction of the Meiyu belt. In contrast, the time series corresponding to the varimax pattern represents the amplitude of an isolated process.

The method presented can be extended to any downscaling of binary events. This can be either a direct binary event like specific complex atmospheric phenomena, which have been observed or not, such as the transition of tropical depressions into tropical cyclones. Similar as in our presentation, any continuous variable can be transferred to a binary time series by applying a threshold. Hereby, the threshold does not necessarily agree with a high quantile, but it can be set to any other user relevant level such as above/below normal, which is important in seasonal forecasting. Even the forecasting of non-meteorological events like the occurrence of certain phenological phases, e.g. the flowering of cherry trees on a specific day in spring, can be treated through logistic regression in combination with pattern-based covariates to predict the event.


The authors gratefully acknowledge two anonymous reviewers. Their accurate and thorough comments were most helpful in improving the manuscript. This work was funded by the NSFC/DFG-joint funding programme Land Use and Water Resources Management under Changing Environmental Conditions (NSFC project number: 40911130506). T. Simon would like to thank the GSP members around Steve Sain and Doug Nychka for fruitful discussions during his visit at the NCAR, which was supported by the CISL visitor programme RSVP.


  1. BachnerS. KapalaA. SimmerC. Evaluation of daily precipitation characteristics in the clm and their sensitivity to parameterizations. Meteorolo Z. 2008; 17(4): 407–419. 

  2. BenestadR. Empirical-statistical downscaling in climate modeling. Eos. 2004; 85(42): 417. 

  3. Benestad, R, Nychka, D and Mearns, L. 2012. Specification of wet-day daily rainfall quantiles from the mean value. Tellus A. 64, 14981. 

  4. Bentzien, S and Friederichs, P. 2012. Generating and calibrating probabilistic quantitative precipitation forecasts from the high-resolution nwp model cosmo-de. Weather Forecast. 27, 988–1002. 

  5. Bierdel, L, Friederichs, P and Bentzien, S. 2012. Spatial kinetic energy spectra in the convection-permitting limited-area nwp model cosmo-de. Meteorol Z. 21, 245–258. 

  6. BremnesJ. Probabilistic forecasts of precipitation in terms of quantiles using nwp model output. Mon. Weather. Rev. 2004; 132(1): 338–347. 

  7. BrethertonC. WidmannM. DymnikovV. WallaceJ. BladéI. The effective number of spatial degrees of freedom of a time-varying field. J. Climate. 1999; 12(7): 1990–2009. 

  8. BrierG. Verification of forecasts expressed in terms of probability. Mon. Weather. Rev. 1950; 78(1): 1–3. 

  9. ChangC. East Asian Monsoon. World Scientific: Singapore, 2004; 2 

  10. ChangC. YiL. ChenG. A numerical simulation of vortex development during the 1992 East Asian summer monsoon onset using the navy's regional model. Mon. Weather. Rev. 2000; 128(6): 1604–1631. 

  11. ChenS. Dell'OssoL. Numerical prediction of the heavy rainfall vortex over eastern Asia monsoon region. J. Meteorol. Soc. Jpn. 1984; 62(5): 730–747. 

  12. CooleyD. NychkaD. NaveauP. Bayesian spatial modeling of extreme precipitation return levels. J. Am. Stat. Assoc. 2007; 102(479): 824–840. 

  13. DingY. ChanJ. The East Asian summer monsoon: an overview. Meteorol. Atmos. Phys. 2005; 89(1): 117–142. 

  14. DingY. ZhangY. MaQ. HuG. Analysis of the large-scale circulation features and synoptic systems in East Asia during the intensive observation period of game/hubex. J. Meteorol. Soc. Jpn. 2001; 79(1B): 277–300. 

  15. DommengetD. LatifM. A cautionary note on the interpretation of EOFs. J. Climate. 2002; 15(2): 216–225. 

  16. EbellK. BachnerS. KapalaA. SimmerC. Sensitivity of summer precipitation simulated by the clm with respect to initial and boundary conditions. Meteorol. Z. 2008; 17(4): 421–431. 

  17. EfronB. TibshiraniR. An Introduction to the Bootstrap. Chapman & Hall/CRC: London, 1993; 57 

  18. FreiC. SchöllR. FukutomeS. SchmidliJ. VidaleP. Future change of precipitation extremes in Europe: Intercomparison of scenarios from regional climate models. 2006; 111(D6): D06105. 

  19. FriederichsP. HenseA. Statistical downscaling of extreme precipitation events using censored quantile regression. J. Geophys. Res. 2007; 135(6): 2365–2378. 

  20. GaoX. ShiY. SongR. GiorgiF. WangY. co-authors. Reduction of future monsoon precipitation over China: Comparison between a high resolution rcm simulation and the driving gcm. Mon. Weather Rev. 2008; 100(1): 73–86. 

  21. GneitingT. BalabdaouiF. RafteryA. Probabilistic forecasts, calibration and sharpness. Meteorol. Atmos. Phys. 2007; 69(2): 243–268. 

  22. GuoQ. The summer monsoon index in East Asia and its variation (in Chinese). J. Roy. Stat. Soc. B. 1983; 38: 208–217. 

  23. HannachiA. JolliffeI. StephensonD. Empirical orthogonal functions and related techniques in atmospheric science: a review. Acta Geogr Sin. 2007; 27(9): 1119–1152. 

  24. HastieT. TibshiraniR. FriedmanJ. The Elements of Statistical Learning. 2nd ed. Springer, Series in Statistics: Berlin, 2008 

  25. JolliffeI. Principal Component Analysis3rd ed. Springer, Series in Statistics: New York, 2002 

  26. KronW. Flood insurance: from clients to global financial markets. 2009; 2(1): 68–75. 

  27. Kuo, Y, Cheng, L and Bao, J. 1988. Numerical simulation of the 1981 Sichuan flood: Part i. evolution of a mesoscale southwest vortex. J. Flood Risk Manage. 116(12), 2481–2504.> 

  28. LiJ. ZengQ. A unified monsoon index. Mon. Weather Rev. 2002; 29(8): 115–1. 

  29. Lindau, R and Simmer, C. 2013. On correcting precipitation as simulated by the regional climate model cosmo-clm with daily rain gauge observations. Geophys res lett. 119, 31–42. 

  30. MaraunD. WetterhallF. IresonA. ChandlerR. KendonE. co-authors. Precipitation downscaling under climate change: recent developments to bridge the gap between dynamical models and the end user. Meteorol. Atmos. Phys. 2010; 48(3): RG3003. 

  31. McCullaghP. NelderJ. Generalized Linear Models. Chapman & Hall/CRC: London, 1989 

  32. Mesinger, F and Veljovic, K. 2013. Limited area nwp and regional climate modeling: a test of the relaxation vs eta lateral boundary conditions. 119, 1–16. 

  33. MichaelsenJ. Cross-validation in statistical climate forecast models. Meteorol. Atmos. Phys. 1987; 26(11): 1589–1600. 

  34. MurphyJ. An evaluation of statistical and dynamical techniques for downscaling local climate. J. Clim. Appl. Meteorol. 1999; 12(8): 2256–2284. 

  35. NeelinJ. BattistiD. HirstA. JinF. WakataY. co-authors. Enso theory. J. Climate. 1998; 103(C7): 14261–14290. 

  36. OrlowskyB. BotheO. FraedrichK. GerstengarbeF. ZhuX. Future climates from bias-bootstrapped weather analogs: An application to the Yangtze river basin. J. Geophys. Res. 2010; 23(13): 3509–3524. 

  37. ParkE. HongS. KangH. Characteristics of an east–Asian summer monsoon climatology simulated by the regcm3. J. Climate. 2008; 100(1): 139–158. 

  38. PiaoS. CiaisP. HuangY. ShenZ. PengS. co-authors. The impacts of climate change on water resources and agriculture in China. Meteorol. Atmos. Phys. 2010; 467(7311): 43–51. 

  39. R Development Core Team. 2011. R foundation for statistical computing. Vienna: Austria. ISBN 3-900051-07-0. Online at: 

  40. Roeckner, E. 2003. The atmospheric general circulation model ECHAM 5. Part 1: model description. Max-Plank-Institut für Meteorologie, Report 349, 127 pp. 

  41. SchölzelC. HenseA. Probabilistic assessment of regional climate change in southwest Germany by ensemble dressing. 2011; 36(9): 2003–2014. 

  42. SkamarockW. Evaluating mesoscale nwp models using kinetic energy spectra. Clim. Dynam. 2004; 132(12): 3019–3032. 

  43. TaoS. DingY. Observational evidence of the influence of the Qinghai-Xizang (Tibet) plateau on the occurrence of heavy rain and severe convective storms in China. Mon. Weather Rev. 1981; 62(1): 23–30. 

  44. TrenberthK. DaiA. RasmussenR. ParsonsD. The changing character of precipitation. B. Am. Meteorol. Soc. 2003; 84(9): 1205–1218. 

  45. UppalaS. KållbergP. SimmonsA. AndraeU. BechtoldV. co-authors. The era-40 re-analysis. B. Am. Meteorol. Soc. 2005; 131(612): 2961–3012. 

  46. Von StorchH. ZwiersF. Statistical Analysis in Climate Research. Cambridge Univ Pr..2002 

  47. VracM. NaveauP. Stochastic downscaling of precipitation: from dry events to heavy rainfalls. 2007; 43: W07402. 

  48. WangB. The Asian Monsoon. Springer Verlag: Berlin, 2006 

  49. WangB. DingQ. FuX. KangI. JinK. co-authors. Fundamental challenge in simulation and prediction of summer monsoon rainfall. 2005; 32(15): L15711. 

  50. WangB. FanZ. Choice of south Asian summer monsoon indices. Geophys. Res. Lett. 1999; 80: 629–638. 

  51. Wang, B and Orlanski, I. 1987. Study of a heavy rain vortex formed over the eastern flank of the Tibetan plateau. B. Am. Meteorol. Soc. 115(07), 1370–1393. 

  52. WangB. WuR. FuX. Pacific-East Asian teleconnection: How does enso affect East Asian climate?. Mon Weather Rev. 2000; 13(9): 1517–1536. 

  53. WangB. WuZ. LiJ. LiuJ. ChangC. co-authors. How to measure the strength of the East Asian summer monsoon. J. Climate. 2008; 21(17): 4449–4463. 

  54. WebsterP. YangS. Monsoon and ENSO: selectively interactive systems. J. Climate. 1992; 118(507): 877–926. 

  55. WilbyR. WigleyT. Precipitation predictors for downscaling: observed and general circulation model relationships. Q. J. Roy. Meteor. Soc. 2000; 20(6): 641–661. 

  56. WilbyR. WigleyT. ConwayD. JonesP. HewitsonB. co-authors. Statistical downscaling of general circulation model output: A comparison of methods. Int. J. Climatol. 1998; 34(11): 2995–3008. 

  57. WilksD. Statistical Methods in the Atmospheric Sciences. Academic Press: Boston, 2011; 100 

  58. Winkler, R. 1994. Evaluating probabilities: asymmetric scoring rules. 40, 1395–1405. 

  59. ZhaiP. ZhangX. WanH. PanX. Trends in total precipitation and frequency of daily precipitation extremes over China. Manage Sci. 2005; 18(7): 1096–1108. 

  60. ZhangQ. SunP. SinghV. ChenX. Spatial-temporal precipitation changes (1956–2000) and their implications for agriculture in China. J. Climate. 2012; 82: 86–95. 

  61. ZhaoC. WangY. YangQ. FuR. CunnoldD. co-authors. Impact of East Asian summer monsoon on the air quality over China: view from space. Global Planet Change. 2010a; 115: D09301. 

  62. Zhao, P, Zhu, Y and Zhang, Q. 2010b. A summer weather index in the East Asian pressure field and associated atmospheric circulation and rainfall. J. Geophys. Res. 32, 375–386. 

  63. ZongY. ChenX. The 1998 flood on the Yangtze, China. Int. J. Climatol. 2000; 22(2): 165–184. 

comments powered by Disqus