## 1. Introduction

Smoothness is an important characteristic of a spatial process that measures local variability of the process. Due to the presence of strong spatial correlation in most climate variables, the values of a climate variable at nearby locations are considered simultaneously in many studies. Realistic climate models are expected to produce plausible values of climate variables, not only at each grid pixel, but also at its nearby grid pixels, in order to accurately describe spatial variation of the true process. Therefore, the smoothness is an important measure to validate climate models in terms of their ability to simulate fine scale spatial variability of climate variables. The aim of this paper is twofold: first, we seek to validate the spatial smoothness of a (temporal) long-term average climate variable, by season and by climate regions; and second, we compare the smoothness of climate model outputs from the Coupled Model Intercomparison Project Phase 5 (CMIP5) and the meteorological reanalysis data. The spatial smoothness is estimated by assuming isotropy within each climate region, and we assess the nonstationarity of the climate model outputs by assuming that the smoothness varies by region. We also assess the similarities in the smoothness across climate model ensembles. If all climate ensembles represent the same true phenomenon but deviate from the truth by random errors, the difference in the smoothness of ensemble realisations and reanalysis data should be negligible without any patterns across the climate ensembles.

We consider multidecadal averages of land surface temperatures in CMIP5 experiments. CMIP5 comprises a standard set of coordinated climate change experiments of 60 deterministic climate models. It is processed by the Working Group on Coupled Modelling of the World Climate Research Programme, who has gathered around 20 climate modelling groups from across the world. Outputs are archived in a common format and can be downloaded from the Program for Climate Model Diagnosis and Intercomparison web site (PCMDI, http://www-pcmdi.llnl.gov/). **Taylor et al. (2012a)** presented an overview of the experimental designs in CMIP5 and **Knutti and Sedláček (2013)** described detailed characteristics of climate model projections in CMIP5. The reanalysis data that we consider are from the National Centers for Environmental Prediction and the National Center for Atmospheric Research (NCEP/NCAR). The NCEP/NCAR reanalysis data set is a gridded data set representing the state of the Earth's atmosphere, incorporating observations and numerical weather prediction model output. It is provided by the Earth System Research Laboratory in the National Oceanic and Atmospheric Administration (http://www.esrl.noaa.gov/psd/). Kalnay et al. (**1996**) and Kistler et al. (**2001**) presented the details of the NCEP/NCAR reanalysis.

We introduce the concept of a locally self-similar process, and we model long-term average land surface temperature anomalies as a Gaussian locally self-similar process. The local self-similarity is a weaker (and thus more general) assumption than the Matérn covariance function (a widely used covariance model that enables modelling the spatial smoothness of a stochastic process) and is also capable of modelling spatial smoothness. The estimation of smoothness is done by optimising the composite restricted likelihood for the smoothness parameter of a locally self-similar process (Stein et al., **2004**; Lee, **2012**). The composite likelihood is a general term for any product of marginal or conditional likelihoods (Varin et al., **2011**). It has been widely used as a substitute of likelihood, when the likelihood calculation is difficult. This paper considers a product of conditional likelihoods of neighbouring observations. Since nearby observations contain most information on the local behaviour of a process, our approach balances statistical and computational efficiency in estimating the smoothness of the process. In addition to the composite likelihood approach, Gaussian Markov Random Fields form another approximation approach applicable (Rue and Held, **2005**). However, we chose the composite likelihood approach since its implementation is closer to traditional statistical practice and may be more familiar to the climate science community.

The remainder of this paper is organised as follows: Section 2 describes the statistical methodology used in the estimation of the smoothness parameter; Section 3 analyses long-term average near-surface air temperature anomalies in CMIP5 and NCEP/NCAR reanalysis by region and season; and Section 4 summarises the key findings and proposes related future work. Details on the statistical methodologies are provided in the Appendix.

## 2. Methodology

### 2.1. Locally self-similar process

Suppose that we observe a mean zero and isotropic Gaussian process, $\mathrm{\mathcal{E}}\subset {\mathbb{R}}^{3}$, for $\mathrm{\mathcal{E}}\subset {\mathbb{R}}^{3}$. We assume that the semi-variogram of *Z*, γ(·), follows a power function around the origin. That is, for all **s** and $\mathrm{\mathcal{E}}\subset {\mathbb{R}}^{3}$, with the Euclidean norm, $\mathrm{||||}$·$\mathrm{||}$:

*=(*

**θ***C*,

*H*), for the

*scale parameter*,

*C*>0, and the

*smoothness parameter*,

*H*∈(0,1) (Gneiting et al.,

**2012**). The notation $o(\mid \mid \mathbf{s}-\mathbf{u}\mid {\mid}^{2H})$ means that $\{\gamma (\mid \mid \mathbf{s}-\mathbf{u}\mid \mid )-C\mid \mid \mathbf{s}-\mathbf{u}\mid {\mid}^{2H}\}$/$\mid \mid \mathbf{s}-\mathbf{u}\mid {\mid}^{2H}\to 0$, as $\mid \mid \mathbf{s}-\mathbf{u}\mid \mid \to 0$. The scale parameter controls the overall size of the variation of the process. And the larger the smoothness parameter is, the smoother the realised surfaces of

*Z*. A process that satisfies eq. (1) is called

*locally self-similar*, since a self-similar process with index

*H*satisfies $\gamma (\mid \mid \mathbf{s}-\mathbf{u}\mid \mid )$=$C\mid \mid \mathbf{s}-\mathbf{u}\mid {\mid}^{2H}$ for all

**s**and $\mathbf{u}\in {\mathbb{R}}^{3}$ (Samorodnitsky and Taqqu,

**1994**; Genton et al.,

**2007**). The index

*H*determines the smoothness of the self-similar process, and it has been widely used as a measure of surface roughness for various natural phenomena, such as the surface of soil, surface height measurements of computer chips, etc. [see Mandelbrot and Wallis (

**1969**) and Adler (

**1981**) for some application examples]. Note that the quantity, 2

*H*, is essentially the fractal index for the mean zero isotropic Gaussian process,

*Z*(Gneiting and Schlather,

**2004**; Gneiting et al.,

**2012**).

A locally self-similar process is more general than a self-similar process, in the sense that it does not fully specify the variogram but only in a region around the origin. Indeed, mean zero Gaussian processes with many widely used parametric covariance functions are locally self-similar. These include powered exponential, generalised Cauchy or Matérn covariances (Gneiting et al., **2012**, Table 1). Among these, the most widely used, the isotropic Matérn covariance, takes the following form:

**s**and

**h**, where Γ(·) is the gamma function and ${\mathrm{\mathcal{K}}}_{\nu}$ is a Bessel function of the second kind of order

*ν*. The parameters of the Matérn covariance function are the partial sill, σ

^{2}>0, the range,

*φ*>0, and the smoothness parameter,

*ν*>0. The range parameter controls the rate of correlation decay with distance and the partial sill measures the size of variation of the process. A mean zero Gaussian process with the Matérn covariance satisfies eq. (1) with

*H*=

*ν*and

*C*=σ

^{2}2

^{−1−ν}

*φ*

^{−2ν}Γ(1−

*ν*)/Γ(1+

*ν*) when 0 <

*ν*<1. Note that our variogram model in eq. (1) is more general than the Matérn model as we only specify the variogram near the origin.

In this paper, we restrict our attention to the case when 0 < *H*<1, under which eq. (1) becomes a statistically valid variogram. Many natural phenomena satisfy this condition. For example, Tuck (**2008**, p. 14, 41) studied atmospheric variability and observed that temperature is smoother than wind speed. The scaling exponent, which is the same as 2*H* of eq. (1), of the temperature was shown to be close to but less than unity. Lovejoy and Schertzer (**1985**, p. 1235) pointed out empirically that 0 < *H*<1 in the rate of energy transfer, buoyancy, velocity, temperature fluctuations, radar reflectivity and cloud drop volumes. **North et al. (2011)** found that the spatial covariance of temperature fields based on simple energy balance climate models follows the Matérn covariance with *ν*=1, and that *ν*<1 is expected due to rough landscapes. **Sun et al. (2015)** mentioned that precipitation amounts become smoother when summed over longer periods and they showed numerically that the smoothness of long-term precipitation amounts is less than *ν=*0.5. We determine that the smoothness of multidecadal average near-surface air temperature anomalies is between zero and one in Section 3.

One thing to note is that the estimated smoothness may depend on the grid resolution of the climate models. In the estimation procedure described in Section 2.2, the relationship, eq. (1), is applied to the number (*k*=3,…,10) of neighbouring observations. As shown in Table 1, climate models in CMIP5 have various grid resolutions. In Section 3, we check the effect of spatial grid resolution on the estimated smoothness.

### 2.2. Composite likelihood

To estimate the scale and smoothness parameters of a locally self-similar process, we consider the composite restricted likelihood of * θ*. We briefly introduce the idea of composite likelihood as opposed to the likelihood method in this section. Further details on how to calculate composite restricted likelihoods are given in the Appendix.

The idea of restricted likelihood is used to estimate variogram parameters without estimating nuisance parameters such as E{*Z*(·)} or Var{*Z*(·)} (Kitanidis, **1983**). It is a marginal likelihood associated with any *N*–1 linearly independent error contrasts, mean zero linear combination of the observations. Since a locally self-similar process does not fully specify the variogram, we have neither the exact likelihood nor the restricted likelihood of * θ*. Therefore, we approximate the restricted likelihood of

*by the composite restricted likelihood, similarly to Stein et al. (*

**θ****2004**) and

**Lee (2012)**.

Let us first sketch the idea to obtain a composite likelihood. Suppose that *Z*(·) is observed at *N* locations, {**s**_{1},…,**s**_{N}}. Let *p*(·; * θ*) indicate a generic probability density function, possibly conditional density. We order the observation locations by starting from a random location,

**s**

_{1}, then selecting

**s**

_{i}to be the nearest location to any of {

**s**

_{1},…,

**s**

_{i−1}} among the remaining locations, for

*i*≥2. If there are two or more locations at equal distance from the set {

**s**

_{1},…,

**s**

_{i−1}}, we choose one randomly. The likelihood of

*is*

**θ****s**

_{i}, define

*k*locations in proximity of

**s**

_{i}, among the previously selected locations as $\{{\mathbf{s}}_{i,1},\dots ,{\mathbf{s}}_{i,k}\}$$\subset $$\{{\mathbf{s}}_{1},\dots ,{\mathbf{s}}_{i-1}\}$, for

*i*>

*k*. Since closely located observations are highly correlated and informative about the smoothness of the process, the

*composite likelihood*approximates eq. (3) by conditioning on {

**s**

_{i,1},…,

**s**

_{i,k}} only:

*k*denotes the size of the conditioning set. The composite likelihood, eq. (4), is associated with the statistical optimal property if

*Z*follows a Gaussian process. For a Gaussian probability density,

*p*, $p(Z({\mathbf{s}}_{i})\mid $$Z({\mathbf{s}}_{i,1}),\dots ,Z({\mathbf{s}}_{i,k});q)$ is the density of the error of the best linear predictor of

*Z*(

**s**

_{i}) based on $Z({\mathbf{s}}_{i,1}),\dots ,Z({\mathbf{s}}_{i,k})$. Also, the approximation in eq. (4) requires

*O*(

*k*

^{3}

*N*) operations while the likelihood requires

*O*(

*N*

^{3}) operations. It is especially beneficial for large irregularly spaced observations where the likelihood calculation is computationally demanding.

The composite restricted log-likelihood, ${\tilde{rl}}_{k}(q)$, provided in the Appendix, is defined similarly by applying the idea of the composite likelihood to the logarithm of the restricted likelihood. Our estimator, , is then defined as a value that maximises the composite restricted log-likelihood. We consider the conditioning set of size *k*=3,…,10 in Section 3. We assess the variance of by the sandwich estimator, a widely used measure of the variance of estimators from an estimating equation, $\nabla {\tilde{rl}}_{k}(q)=0$. Here, ∇ denotes the vector of partial derivatives with respect to * θ*. Then we have is asymptotically normal with asymptotic covariance matrix

See Lindsay (**1988**) and **Godambe and Heyde (2010)** for more details.

## 3. Analysis

### 3.1. Data

Climate model outputs from CMIP5 consist of 3 and 6 hourly, daily, monthly and annual mean values of more than 404 ocean, land and atmosphere related climate variables for decadal hindcasts and predictions. The NCEP/NCAR reanalysis data consist of 6 hourly, daily and monthly mean values of atmospheric variables from January 1948 to the most recent month. In this paper, we analyse the long-term average near-surface air temperatures measured at 2 m above ground at gridded locations on the Earth from 1979 to 2005, the time period common to all climate models in CMIP5 and the NCEP/NCAR reanalysis.

We analyse 191 ensemble runs from the 48 climate models in CMIP5 (experiment 3.2). Each climate model has 1–25 ensemble replicates that are initialised under different or the same initial conditions but produced by different perturbed versions of the same model (**Taylor et al., 2012b**). Ensemble replicates are treated and interpreted independently from each other, and their spatial resolutions vary from ensemble to ensemble. Table 1 lists the climate models in CMIP5 and the NCEP/NCAR reanalysis data set used in this paper, with their grid resolutions and the numbers of ensemble replicates. The climate models are numbered in ascending order of the number of grid pixels. The model number thus represents the rank of the spatial resolution of the climate model. For the climate models with the same spatial resolutions, lower model numbers are given to the ones with smaller average estimated smoothness over the regions.

We focus on the mean surface temperatures in Boreal winter (December, January, February; DJF) and summer (June, July, August; JJA), averaged over 27 yr. That is, at each location, we use multidecadal averages of land surface air temperatures during DJF and JJA. Also, we divide the land area except for Antarctica into the 21 climate regions that are used in Giorgi and Francisco (**2000**). There are two main reasons for dividing the land areas into climate regions. It is common that the smoothness varies spatially in climate variables. Also, the distance between grid points becomes smaller in regions at higher latitudes. Since the estimated smoothness parameter depends on the resolution of the observed process, dividing regions where observations are separated by similar spacing is reasonable. The climate regions are shown in Fig. 1. The sizes of the regions vary from 807 to 6735 km in the north-south and east-west directions. Each region contains from 12 to 7649 grid pixels of the ensemble outputs from CMIP5, depending on the grid resolutions of the ensembles. The minimum spacing between grid locations at the equator ranges from 83 to 417 km.

### 3.2. Models

Denote the entire study region as $\mathrm{\mathcal{D}}$. Then, partition into the climate regions, $\mathrm{\mathcal{D}}={\cup}_{r=1}^{21}{\mathrm{\mathcal{D}}}_{r}$. Let ${T}_{ijl}(\mathbf{s})$ be a multidecadal average of near-surface air temperature at grid location **s**∈$\mathrm{\mathcal{D}}$ for climate model *j*=1,…,48, ensemble replicate *l*, during DJF and JJA, for *i*=1 and *i*=2, respectively. The number of ensemble replicates varies by climate model. Let ${\mu}_{ijl}(\mathbf{s})=\text{E}\{{T}_{ijl}(\mathbf{s})\}$ be the mean of the multidecadal average and ${\mathrm{\epsilon}}_{ijl}(\mathbf{s})$ be the anomaly (residual) at location **s**∈$\mathrm{\mathcal{D}}$, such that

*µ*

_{ijl}, and make the anomaly field close to mean zero. Spherical harmonics, $\{{P}_{n}^{m}(sinL)cos(ml),$${P}_{n}^{m}(sinL)sin(ml)\mid $$n=0,1,2,\dots ,m=$$0,\dots ,min(3,n)\},$ where $-\pi /2\le L\le \pi /2$ is the latitude, $-\pi <l\le \pi $ is the longitude, and ${P}_{n}^{m}$ is the Legendre polynomial of degree

*n*and order

*m*, provide a natural basis for capturing large-scale spatial patterns (Stein,

**2007**). Because surface temperatures are closely related to altitude, we estimate

*µ*

_{ijl}by regressing on the altitude from the sea level in addition to spherical harmonics for

*n*=12, for each climate ensemble realisation in CMIP5 and the NCEP/NCAR reanalysis. The choice of

*n*=12 is made following the literature dealing with similar data sets (Jun and Stein,

**2008**; Stein,

**2008**; Jun,

**2011**,

**2014**).

After the mean filtering through regression, we assume that *ɛ*_{ijl} in eq. (5) is a mean zero, locally self-similar Gaussian process that satisfies for **s** and $\mathbf{u}\in {\mathrm{\mathcal{D}}}_{r}$, $\frac{1}{2}\text{E}{\{{\mathrm{\epsilon}}_{ijl}(\mathbf{s})-{\mathrm{\epsilon}}_{ijl}(\mathbf{u})\}}^{2}=$${C}_{ijrl}\mid \mid \mathbf{s}-\mathbf{u}\mid {\mid}^{2{H}_{ijrl}}+$$o(\mid \mid \mathbf{s}-\mathbf{u}\mid {\mid}^{2{H}_{ijrl}}),$ as $\mid \mid \mathbf{s}-\mathbf{u}\mid \mid \to 0$, for *r*=1,…,21. The smoothness of the temperature anomalies in the NCEP/NCAR reanalysis is defined similarly. Since ${\mathrm{\epsilon}}_{ijl}(\mathbf{s})$ is a multidecadal average of temperature anomalies, its distribution may be close to a Gaussian distribution.

The top panels in Figs. 2 and 3 show the multidecadal average near-surface air temperature, *T*_{ijl}, the estimated mean, ${\mu}_{ijl}$, and the anomaly, *ɛ*_{ijl}, in the reanalysis and GFDL-CM3 data, by season. The spherical harmonics terms and the altitude capture most of the patterns in the mean, and the anomalies do not have noticeable large-scale spatial patterns. Figure 4 compares the minimum, median and maximum values of the anomalies, shown in the bottom panel of Fig. 2, by climate region and season. In all regions, the medians of the anomalies are around zero and the ranges of the anomalies are similar regardless of season and region, except for ALA and GRL. The spatial patterns of the mean and residuals displayed in Figs. 2 and 4 are similar to patterns created by other ensemble models.

### 3.3. Estimation of the smoothness

We estimate the smoothness parameter, *H*, of the anomalies of multidecadal average land surface temperature in the NCEP/NCAR reanalysis and CMIP5 by maximising the composite restricted likelihood with a conditioning set of size *k*, *k*=3,…,10. Figure 5 shows the changes in the smoothness estimates by increasing *k*. Each plot represents a climate model in CMIP5 and each curve represents a climate region in Fig. 1. Among the climate regions, WNA, SAH, NAS, AMZ and TIB are coloured. We explain the reason for choosing these specific regions after showing the estimated smoothness of the NCEP/NCAR reanalysis. The smoothness estimates become quickly stabilised as *k* increases. The estimated smoothness is always less than unity, regardless of the size of the conditioning set. Hereafter, we present the estimated smoothness using the composite restricted likelihood with a conditioning set of size *k*=5.

Figure 6 maps the smoothness estimates of the temperature anomalies in the NCEP/NCAR reanalysis of the corresponding climate regions during DJF and JJA. Similarly, a smoothness map is drawn for each of the climate model ensembles in CMIP5. The ensemble replicates of the same climate model have almost the same smoothness in all climate regions. The climate model outputs generated from the same modelling institution also have similar smoothness in all land regions. Therefore, in order to save space, Figs. 7 and 8 show the smoothness maps of 15 climate models that have distinct patterns, for DJF and JJA, respectively.

The estimated smoothness for the NCEP/NCAR reanalysis is larger than 0.5 in all regions except CNA ($\stackrel{\u02c6}{H}=0.041$) during JJA. The regions near the North Pole (WNA, NAS, NEU, GRL, ALA) have smoother surface temperature anomaly fields during DJF than during JJA, while the regions near the equator (SAH, SAS) have smoother fields during JJA. The seasonal difference in the smoothness is small in other regions. This is the reason why we coloured WNA, SAH, NAS, AMZ and TIB in Fig. 5. Among these, the first three and the last two represent the climate regions with large and small seasonal differences, respectively. This pattern appears in GCESS, CMCC-CESM, CMCC-CMS and BCC in CMIP5. The rest of the climate models in CMIP5 exhibit similar smoothness for JJA and DJF.

Figures (**6**–**8**) show quite a lot of regional variation in each smoothness map. The smoothness maps from CMIP5 also vary across climate models. The climate modelling institution and the climate region are the main factors that determine the smoothness of the surface temperature anomalies. The relative regional characteristics, however, do not change across climate models. Figure 9 plots the smoothness of the temperature anomalies in CMIP5 against the climate model number, i.e. the rank of the spatial resolution of the climate model. There are 21 curves, each of which represents a climate region. Again, the curves that represent WNA, SAH, NAS, AMZ and TIB regions are coloured. Generally, all the curves in Fig. 9 resemble each other. This implies that the climate model is the primary factor that determines the estimates for the smoothness, and the spatial resolution of the model has a weaker effect than the climate model on the estimated smoothness. Each climate model generates the relative regional characteristics well, while the average level of smoothness differs for each climate model. The climate models developed by NASA GISS, IPSL, and MOHC produce rougher temperature anomaly fields than do the other climate models over all climate regions during both seasons. The crosses at model number 22 indicate the smoothness of the NCEP/NCAR reanalysis, as the resolution of the reanalysis is similar to the resolution of climate model 22.

Some smoothness estimates are near the boundaries of the range of the smoothness parameter, *H*≈0, suggesting that the estimation failed. For the NCEP/NCAR reanalysis data, we fail to estimate the smoothness over the region CNA during JJA. In CNA, there were seven pairs of neighbouring observations out of 90 observations, which temperature anomalies differ significantly. Furthermore, the failure of the estimation of CMIP5 happens when we do not have sufficient number of data, for the models with coarse grid resolution. Climate models 1 (CMCC-CESM) and 2 (HadCM3) of CMIP5 fail to estimate the smoothness of CAM, CNA, ALA or NEU during JJA or DJF. Climate model 1 has a spatial resolution of 96×48 on the Earth, and there are only 12 and 25 observations for CAM and CNA, respectively.

Figures (**10**–**12**) show the estimated scale parameters of the multidecadal average near-surface air temperature anomalies in reanalysis and CMIP5 during JJA and DJF. The values are plotted on a logarithmic scale due to wide range of scale parameter estimates for some climate models and/or climate regions (this is for the display purposes only). The climate models developed by NASA GISS and IPSL give large estimates of the scale parameters for MED, CAS and TIB, while NASA GISS gives large estimates for AMZ, CAM, GRL and NEU. Relatively large estimates of the scale parameters occur together with relatively little smoothness. NASA GISS and IPSL produce rougher temperature anomalies in those regions. In other models and regions, the estimates of scale parameters range from 0.0002 to 1.42 (in logarithmic scale). The standard errors of the smoothness and scale parameter estimates were mostly small, except for the climate models developed by NASA GISS, MOHC, and IPSL that produce rough land surface temperature anomaly fields (not shown).

## 4. Discussion and conclusion

The smoothness of a spatial process is one of the important measures of spatial dependence. This paper estimates the spatial smoothness of multidecadal averages of land surface temperature anomalies in the NCEP/NCAR reanalysis and the CMIP5 multimodel ensembles by climate region and season. The temperature anomaly field of the NCEP/NCAR reanalysis becomes smoother if the average temperature of the field is extremely high or low. This pattern appears in some climate models from CMIP5, while the others exhibit similar smoothness during JJA and DJF. The smoothness of the multidecadal average near-surface air temperature anomalies in CMIP5 depends primarily on the modelling institution and the region. Interestingly, there are strong similarities in the smoothness between the climate models generated from the same institution, which supports observations that have been reported frequently in the literature (Tebaldi and Knutti, **2007**; Jun et al., **2008a**, **b**; Knutti et al., **2010**).

In the future, we plan to examine the smoothness of various climate variables, such as precipitation amount, pressure, wind speed, etc. This can give us more insights into the characteristics of climate regions and climate models. Also, CMIP5 is an archive of well-designed experiments of vast climate models. This paper analyses historical runs (experiment 3.2) of CMIP5 under which all forcings are implemented. Experiments 5.1–5.5 of CMIP5 are composed of simulation runs of the same climate models as in experiment 3.2 but with emissions forcings with fixed or different scenarios of the carbon cycle. The carbon cycle is an important factor that affects surface temperature, as do the presence of sulphate, clouds, interactive aerosols and greenhouse gas emissions. By comparing the smoothness between the climate models with or without forcings, we may test the effect of forcings on the local variation of the surface temperature anomalies.