Earth's weather and climate vary on a wide range of spatio-temporal scales. Whereas external orbital variations are believed to be the dominant driving force for macro-climate (on millennial time scales), weather, macro-weather (Lovejoy, 2018) and climate variations (on shorter time scales) are mainly the result of complex nonlinear interactions between very many degrees-of-freedom (Sura and Hannachi, 2015) and also due to many climate sub-components with different time scales The atmospheric (and climate) system is an excellent example of high-dimensional and highly complex dynamical system. One outstanding and ubiquitous feature of the large scale (and low frequency) atmospheric (and climate) variability is non-Gaussianity (Franzke et al., 2007; Proistosescu et al., 2016; Pires and Hannachi, 2017; Hannachi and Iqbal, 2019). For instance, Sura and Sardeshmukh (2008) show that sea surface temperature (SST) has non-Gaussian probability distribution function (PDF) with particular tail extrema. Many processes, e.g. subgrid scales, large-scale teleconnections and nonlinearity can lead to various kinds of uncertainties, which can affect the accuracy of our understanding, such as forecasts. Stochastic modelling can help overcome some of the previous problems and improve accuracies (Berner et al., 2017).

Although non-Gaussianity and nonlinearity can be considered as two distinct aspects of weather, macro-weather and climate (e.g., Sura and Sardeshmukh, 2008; Sura and Hannachi, 2015), observations do suggest that these are two inter-related characteristic features of the system (e.g., Woollings et al., 2010; Hannachi et al., 2017; and references therein). Some systems can exhibit weak (deterministic) nonlinearity, but with strong non-Gaussian statistics, which may be explained by a multiplicative or state-dependent noise as detailed, e.g. in Sura and Sardeshmukh (2008), Sardeshmukh and Sura 2009, see also Hannachi et al. (2017) for a review. In contrast, non-Gaussianity can also result from systems with strong nonlinearity and additive noise, see e.g., Hannachi et al. (2017) for a review and further references. Notwithstanding this link between nonlinearity and non-Gaussianity, it is sometimes helpful to disentangle the contributions of nonlinearity and stochastic noise to the system PDF. There is a general consensus that understanding non-Gaussian statistics of weather and climate is important for a number of reasons, not least for weather and climate prediction, planning and risk assessment. Extreme events, for instance, which are important in planning and risk assessment, depend closely on the structure of the non-Gaussian PDF. Various sources exist, which contribute to the non-Gaussianity. Sura and Hannachi (2015) provide a detailed account of the different sources and mechanisms contributing to the observed non-Gaussianity of the atmospheric large-scale and low-frequency variability. They discussed, in particular, nonlinear regime dynamics, multiplicative noise, cross-frequency coupling, jet stream meandering and nonlinear boundary layer drags. They investigated specifically non-Gaussianity in geopotential heights, jet stream latitude and SST fields, with particular focus on skewness and kurtosis. Skewness and kurtosis are simple measures of non-Gaussianity, in the time domain, that does not directly reflect frequencies. A convenient way to examine nonlinearity and/or non-Gaussianity is to use high (e.g., third and fourth) order statistics in the frequency domain, namely, e.g. bispectrum and trispectrum (e.g. Brillinger and Rosenblatt, 1967; Nikias and Raghuveer, 1987). Spectral analysis of stationary time series is well rooted in the study of time series x(t),t=0,1,2 (supposed to be zero-mean and unit variance for simplicity) from weather and climate and other fields. The duality between the time and spectral domains allows to investigate the distribution of, e.g. variance and skewness, as a function of spectral bins. For example, the duality between the second-order cumulant, or autocovariance function γx(·) and the power spectrum Γx(f) reveals that the integral of the latter is precisely the variance of the time series, i.e.,


This means, in particular, that the power spectrum can be viewed as a decomposition of the variance by frequency bins. Likewise, the duality between the bispectrum Γ3,x(f1,f2)  and the third-order moment or bicovariance γx(τ1,τ2)E[x(t)x(t+τ1)x(t+τ2)] means that the skewness is the integral of the bispectrum,


This also implies that the bispectrum can be viewed as a decomposition of the skewness by bins of frequency pairs. This can be used to identify nonlinearly interacting pairs of spectral bins contributing to the skewness which are often attributed to phase locking between components at frequency triplets f1,f2, f1+f2 producing triadic resonance (Pires and Perdigão, 2015).

Bispectral analysis has been used in signal processing to study nonlinearity detection and bicoherency in a number of different fields including econometry (Ashley et al., 1986; Rusticelli et al., 2008), acoustics (Richardson and Hodgkiss, 1994), and Earth Sciences (Müller, 1987; Biswas et al., 1995; Hocke and Kämpfer, 2008).

Here we focus on the bispectral analysis of El Niño Southern Oscillation (ENSO), the main atmosphere–ocean interannual mode (Neelin et al., 1998; Wang et al., 2016). ENSO aspects, like nonlinearity and complexity (Timmermann, 2003; Frauen et al., 2014; Berner et al., 2018; Bianucci et al., 2018), non-Gaussianity (Burgers and Stephenson, 1999; Chunzai, 2018; Boucharel et al., 2009; Hannachi et al., 2017) and also its modelling (Kondrashov et al., 2005) have been extensively studied, both from a physical and signal-processing perspective. In particular, some ENSO bispectral analysis was performed by Timmermann et al. (2001, 2018), to fit a low-order nonlinear dynamical model of El Niño index and by Schulte et al. (2020) to compute the cross-bi-coherency and synchronization with the Indian Monsoon.

The present study aims to perform a systematic and thorough analysis of third-order statistics, both in the time and spectral domains to infer and improve the understanding of ENSO non-Gaussianity through skewness, nonlinearity (e.g. Hinich, 1982; Cox, 1991), and nonlinear predictability on time scales ranging from seasons to years. We will emphasize, in particular, the role of nonlinear lagged correlation with the intra-annual time scale in overcoming of the Spring predictability barrier of ENSO (Duan and Wei, 2013). The bispectrum of El Niño index and its statistical significance are thus computed, looking for the most relevant wave-triad contributing to the skewness, which are associated with ENSO extremes by a phase synchronization, see e.g. Jajcay et al. (2018). Then, we check how simple stochastic models driven by lagged correlated additive multiplicative (CAM) noise (Monahan, 2020) can produce the main bispectral features of ENSO and El Niño/La-Niña asymmetry (Martinez-Villalobos et al., 2019).

The manuscript starts with data description with a synthesis of exploratory statistics (Section 2). Then the autocorrelation function and spectrum (Section 3) are shown. In Section 4, we present the tests of nonlinearity and non-Gaussianity based on the bicovariance function. Section 5 details the bispectrum properties and its estimation and statistical significance applied to ENSO. Section 6 develops a minimal bilinear stochastic model fitting data. A summary and conclusion are given in the last section. Most of the technical details and symbols are put in appendices. The details presented in the main text and appendices, some of them conventional, make the paper convenient particularly for didactive purposes and ready to apply to other time-series beyond ENSO.



The raw data used are monthly anomalies (with respect to the 1981–2010 period) of El Niño 3.4 sea surface temperature (SST) Index, obtained by area averaged SST in the geographical region 5S–5N by 170–120 W, taken in the 119-year period 1870–2018 and extracted from the periodically updated website https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.anom.data (Rayner et al., 2003). Raw data exhibit a positive trend of 0.19 °C/century, associated to oceanic global warming. It reveals also a decadal-scale variability (observed in the 50 yr running means) in phase with the Pacific Decadal Oscillation (PDO), also known as a long-lived El Niño pattern (Zhang et al., 1997). After the removal of the linear trend, we get a detrended zero-centred time-series xcent of the index, with a standard deviation σ(xcent)=0.77°C. The skewness is sk(xcent)=0.44, associated to the prevalence of El Niños to La Niñas (Burgers and Stephenson, 1999) and an excess kurtosis (ekurt) of 0.46. The monthly dependence of statistics is particularly striking in the skewness and excess kurtosis, ranging from a positively skewed leptokurtic probability distribution function (PDF) in NH winter (sk=1.81,ekurt=2.93 in January), favouring extreme El Niños, up to a platykurtic, nearly unskewed PDF in the NH Spring (sk=0.01,ekurt=0.65 in May). Spring has weaker extreme El Niños compared to Winter, and the extrapolation of the Spring’s El Niño state to the next Winter is rather unskilful i.e. the Spring predictability barrier (Duan and Wei, 2013). The intra-seasonal variability is quite small, compared to the inter-seasonal and interannual variability. Short-range persistence is quite high leading to 1- and 2-month lag-correlations of 0.92 and 0.85. Three-monthly seasonal averages (JFM, AMJ, JAS, OND) are constructed with an overall std σ=0.74°C,sk=0.46 and ekurt=0.41, peaking again during the NH or Boreal Winter’s (sk=1.04,ekurt=1.35  in JFM) (Stuecker et al., 2013; Stein et al., 2014). The statistics are quite similar to those obtained with more commonly used trimesters (e.g. for DJF, we get sk=1.07,ekurt=1.68. The remaining three seasons are much less skewed with sk[0.25,0.28] and a rather small ekurt except for the platykurtic behaviour in Boreal Spring (AMJ), with ekurt=0.51. All the analysis that follows is performed on the standardized time series, hereafter denoted as x(t),with sample size N=596 trimesters (trm), being shown in Fig. 1.

Fig. 1. 

Detrended and standardized time series in the period 1870–2018, of the three-monthly average anomalies (with respect to the annual cycle) of El Niño 3.4 index (black) and its low-pass (red) and high-pass (green) components with a cutoff frequency of 0.08 cycles per trimester roughly corresponding, respectively, to inter- and intra-triennial timescales variability.


Autocorrelation and spectrum


Autocorrelation function

We start by evaluating second-order lagged moments of the time series. The autocorrelation function Cx(τ)1Nτt=0Nτ1x(t)x(t+τ)  is shown in Fig. 2. It has oscillatory behaviour with typical ENSO timescales (3–4 years). In order to test the null hypothesis H0 of a vanishing autocorrelation, taking into account the serial correlation, we use an approximate effective sample size NeffN[1Cx(1)][1+Cx(1)]66 (Wilks, 2011). Then, H0  is rejected if |Cx(τ)|2>[qα2(Neff)]2/{[qα2(Neff)]2+Neff2} (von Storch and Zwiers, 1999) where qα2(Neff) is the α2-th quantile of a t-Student PDF with Neff degrees of freedom. From Fig. 2, some local extremes and significant correlations appear also at decadal lags.

Fig. 2. 

Autocorrelation function of El Niño 3.4 index (solid black) along with the 95% (dashed black) and 90% (dotted black) confidence interval. The autocorrelation function of the AR(5) model fitting data is also shown (solid red).


The spectrum and its estimation

A regularly sampled stationary time-series x(t) can be decomposed using discrete Fourier transform (DFT) as: Xx(f)=t=0N1x(t)exp(2πift), for every frequency f in cycles per sampling period and can be reconstructed through the inverse Fourier transform (FT) as x(t)=1NK=N2N2Xx(kN)exp(2πikNt) (for N even). The sample spectrum or periodogram is the DFT of the autovariance function:

whereas the asymptotic spectrum is given by:

where E(·) is the expectation operator over realizations of the stochastic process. According to the Wiener–Khintchine theorem (2) is the FT of the autocovariance function i.e.:


Equation (3) yields, in particular, the variance of the time series

where Γx(f)df in (4) provides the contribution from the spectrum to the time series variance within the frequency interval [f,f+df].

For a stationary process, the DFTs Xx(f), estimated at different frequencies are nearly uncorrelated (von Storch and Zwiers, 1999), and the periodogram (1), is known to be inconsistent, and hence a smoothed estimator is often used:

where λ(τ)λ̂(τM) where λ̂(·) is a symmetric standardized positive lag window function (Priestley, 1981), characterized by a standardized bandwidth b11λ̂(u)2du. The window scale length M is chosen, based on a trade-off between spectral resolution (b1/M), low values of the variance (var[Ŝx(f)][Γx(f)2MNb1]) and bias (E[Ŝx(f)]Γx(f)λ̂(0)Γx(f)4π2M2) of (5) (Jenkins and Watts, 1968), where double apostrophe represents second derivatives. The 90% confidence interval of Γx(f) is given by [ Ŝx(f)νq95%(χν2), Ŝx(f)νq5%(χν2)], where ν=2Nb1M  is the number of degrees of freedom of the χν2.

We have also computed, the maximum entropy estimator of the spectrum by fitting an autoregressive (AR) model of generic order p, via the Yule–Walker equations. The order is chosen by minimizing the Akaike Information Criterion, AIC (Akaike, 1974). The goal of computing the theoretical maximum entropy spectrum is to build a null hypothesis spectrum Γx,0(f). Therefore, in order to objectively check if the spectral peaks of (1) and (5) are truly distinguishable and significantly larger than those of Γx,0(f),  at a significance level α, the estimated spectra (1,5) have to be compared to the threshold Γx,0(f)q1α(χν2)ν (Wilks, 2011) with ν=2Nb1M for (5) and ν=1 for (1) corresponding to M=N, and a standardized bandwidth equal to the Nyquist frequency.


El Niño index spectrum


Periodogram and smoothed spectrum

The periodogram (1) of the time series is shown in Fig. 3. There is evidence of sharp and well separated peaks near the frequencies 0.021, 0.045, 0.071, 0.087, 0.105, and 0.168 cycles per trimester (cpt), corresponding (in the same order) to periods of 12.2 years, linked to decadal variability (Sun and Yu, 2009; Kravtsov, 2012), 5.62 years (Kim, 2002), and 3.53, 2.87, 2.38 and 1.49 years in the range of the ‘Quasi-quadrennial and quasi-biennial variability’ (Jiang et al., 1995). These peaks are also found by Deser et al. (2010) that are responsible for variations in the ENSO frequency, intensity, propagation, and predictability (e.g., An and Wang, 2000; Fedorov and Philander, 2000; Timmermann, 2003; An and Jin, 2004; Yeh and Kirtman, 2004; Wang, 2018).

Fig. 3. 

Empirical periodogram (thin green); Smoothed spectrum (thick black) and corresponding 95% confidence interval (thin black); theoretical AR(5) spectrum (red), and 10% significance level for the periodogram (red dashed) and for the smoothed spectrum (red dotted). Units are (°C)2/cpt. The top axis presents periods in years.

For the smoothed spectrum (5), we chose the Bartlett–Priestley window-lag function: λ̂(u)=3(πu)2[sin(πu)πucos(πu)]​​​(u0);λ̂(0)=1 (Priestley, 1981) for which b1=0.855 and λ̂(0)=1.97. In order to resolve the main spectral peaks, the bandwidth b1M  must be smaller than the minimum difference between consecutive leading frequencies, i.e. b1M<0.016  cpt, hence M>b10.01652 trimesters. To further increase spectral resolution, we show in Fig. 3 the estimator (5) using M=80 (20 years), corresponding to ν13. In this case, the standard deviation of the filtered estimator is about 39% of the true spectrum. The 90% confidence interval for Γx(f) is [0.52Ŝx(f),2.62Ŝx(f)] which is quite large, a consequence of the high window length M. The local maxima of the filtered estimator lie close to the high spikes. Moreover, the periodogram lies well within the 95% confidence interval of the true spectrum (see thin black lines of Fig. 3).


Maximum entropy spectrum

An AR(5) model (see the AIC in Table 1) is fitted here:

where w(t) is a standard Gaussian white noise. The autocorrelation function of the AR(5) model (Fig. 2) closely captures the empirical one for lags up to 12–20 trm but misses low-frequency oscillations. The corresponding (theoretical) maximum entropy spectrum is shown in Fig. 3 showing a spectral bump in the range (3–5 years period) and with no other spectral peaks. Figure 3 also shows the thresholds of significance (at α = 10%) for the periodogram and for the smoothed spectrum.

The peaks with periods 5.62, 3.53, and 1.49 years are significant, both in the periodogram (1) and the smoothed spectrum (5) whereas the peak 2.87 years is significant in the periodogram (1) but only marginally significant in the smoothed spectrum (5).


Bicovariance, non-Gaussianity and nonlinearity


General properties

The bicovariance function γx(τ1,τ2) is a generalization of the autocovariance function, given by third-order cumulants between lagged values of x(t), which for a zero mean stationary process writes as γx(τ1,τ2)=E[x(t)x(t+τ1)x(t+τ2)] and is estimated here, for τ1,τ20 as:


Stationarity implies time invariance of lagged statistics leading to several identities reflecting the symmetry of the bicovariance (Rao and Gabr, 1984), namely:

leading to the partition into 6 sectors of the plane (τ1,τ2) as illustrated in Fig. 4a. We stress here that the origin of a non-vanishing bicovariance is the non-Gaussianity of the process x(t). In fact, for a Gaussian, (necessarily linear) process, the bicovariance vanishes because cumulants of order 3 (the equivalent to lagged skewness) or higher vanish. For finite time series, Gaussianity is rejected if the absolute value of skewness is large enough.

Fig. 4. 

Symmetries (a) of the bicovariance function in the delay plan (adapted from Rao and Gabr, 1984) and bicorrelation of El Niño 3.4 index (b). Thick black contours show the 5% significance level.

For a nonvanishing third-order cumulant, the bicorrelation provides a measure of predictability (for lags Δ0), coming from nonlinear correlation:


Local maxima and minima of bicorrelation can thus provide sources of predictability due to non-Gaussianity and/or nonlinearity. However, we must note that part of the nonlinear correlation (9), in particular at τ1=0, is due to skewness, and hence to eliminate that contribution, we must consider the nonlinear component or residual xnl(t)=x(t)2[skx x(t)+1], of the predictor x(t)2 after removing the linear regression with x(t), where skx  is a regression constant equal to the skewness of x.

Another useful advantage of bicorrelation is the Cox (1991) test of nonlinearity, that we will apply bellow, also based on skewness. More general tests exist, which analyse nonlinear predictability originating from a set of past values (Granger and Anderson, 1978). Under the null hypothesis of linearity, the test TCox(Δ)cor[x(t+Δ),xnl(t)]2 must vanish for all lags. Threshold values of TCox under an inexistent predictability hypothesis are obtained by randomly shuffling x(t) and xnl(t) (10,000 times) and then computing the 95% quantile (denoted as TCox95%) of the sorted values of TCox(0). Nonlinearity of El Niño is thus accepted if TCox(Δ)>TCox95% at a 5% significance level.


Results for El Niño index



The bicorrelation (Fig. 4b), exhibits fluctuations of the order of 12–20 trimesters similar to the autocorrelation (Fig. 2). The maximum value of the bicorrelation coincides with the skewness: Cx(0,0)=0.46.

In order to test the departure of the bicorrelation from the Gaussian hypothesis, we have generated 10,000, N-sized simulations with the AR(5) and computed uncertainties. The 90% (95%) quantiles of |Cx˜(0,0)|  are equal to 0.24 (0.29), which are both below the observed skewness 0.46, hence the null hypothesis of Gaussianity is rejected at the 5% significance level. For τ1τ2 (diagonals of the bicovariance graph) or (τ10  and/or τ20), the 95% (90%) quantiles are ∼0.14 (0.16) and 0.10 (0.12) elsewhere. The rejection regions rejecting of the null hypothesis (at the 5% significance level) appear within thick black contours in Fig. 4b.

From inspection of Fig. 4b, we verify that the deepest bicovariance minimum Cx(0,4)=0.40, is significant at the 5% significance level, corresponding to a nonlinear correlation (Eq. (9)) of cor[x(t+4),x(t)2]=0.24 with τ1=0,Δ=4 trm. This implies that an extreme El Niño or La Niña (high x(t)2 than average) favours the occurrence of La Niña occurrence four trimesters later (negative x(t+4)), whereas mild conditions favour El Niño. Another local bicorrelation minimum Cx(8,8)=0.15 (10% significance level) implies that La Niña event favours a strong El Niño or La Niña 2 years later (8 trm).


Linear and nonlinear predictability

As regards El Niño predictability skill, Fig. 5a shows the linear and nonlinear correlations: cor[x(t+Δ),x(t)] and cor[x(t+Δ),xnl(t)], with xnl(t) defined in Section 4.1 and for forecast lags Δ up to 20 trm. The linear correlations of El Niño 3.4 are significant at 10% level for lags Δ3 trm (Fig. 5a thick black line) whereas the nonlinear correlations show significant values for lags 3 trmΔ5 trm (Fig. 5a thick red line). Those correlations are also evaluated for the stronger El Niño season i.e., the JFM trimester (t+Δ in JFM) (Fig. 5a thin black line for linear and thin red line for nonlinear correlations respectively). We note here the presence of the Spring predictability barrier of El Niño (Duan and Wei, 2013) i.e., the barely weak linear extrapolation (i.e. persistence) of El Niño index from current Spring (AMJ) to the next Winter (JFM), as shown by the small value (∼0.05) of cor[x(t+Δ),x(t) | t+Δ=JFM] where | means 'conditionated to' and Δ=3. However, the Spring barrier is reduced if we include the nonlinear term in the forecast. In fact, the nonlinear correlation evaluated at JFM (Fig. 5a thin red line) for forecast lags Δ=35 trm (–0.25 to −0.4) is statistically significant, e.g. a negative value of cor[x(t+3),x(t)2  |  t=AMJ, t+3=JFM] between x(t) in Spring and x(t+3) in next Winter (see Fig. 5b).

Fig. 5

(a) Linear (thick black) and nonlinear (thick red) correlations (cor[x(t+Δ),x(t)] and cor[x(t+Δ),xnl(t)] respectively) of El Niño 3.4 index. The same correlations restricted to forecast trimester JFM (black thin and red thin respectively). Horizontal dashed lines show the 10% significance level interval [1.64Neff3,1.64Neff3]. (b) AMJ versus following year JFM El Niño and best quadratic fitting. (c) Cox test TCox(Δ) (thick) and the 95% (dotted) and 99% (dashed) confidence level threshold of nonrejection of the linearity hypothesis.

This is corroborated by the conditional expectations of the Winter signal conditioned to the previous Spring signal. First, we have E[x(t+3)  | x(t)<1]=0.38 (see left sector of Fig. 5b). We argue that strong trade-winds (or persistent eastern wind bursts in the Eastern Pacific), favouring strong La Niña Spring conditions tend to persist over some trimesters eventually reaching the next Winter season (e.g. JFM of years 1989, 1974, and 2011).

On the other hand, we have E[x(t+3)  | 1<x(t)<1]=0.12 (central sector of Fig. 5b) corresponding to (Spring) near climatological conditions in the Eastern Pacific. Here, the tendency is to favour a Winter with El Niño predominance in agreement with Boreal Winter phase-locking. In particular, strong Winter El Niños (e.g. 1983, 1998, and 2016) were preceded by quite mild Spring conditions. Finally, from E[x(t+3) |  x(t)>1]=0.19, strong El Niño Spring conditions (associated with westerly winds anomalies) tend to reverse in the next trimesters. This suggests that the nonlinear predictability is a consequence of the asymmetric persistence of El Niño signal and Pacific trade winds in Spring, as a function of their intensity and phase. A possible mechanism for this is seasonal growth rate dependence on the ENSO regime and feedbacks controlling SST (Yishuai et al., 2020), and the seasonal dependence of easterly wind bursts from Spring to Autumn (Fan et al., 2019).

Finally, we compute the nonlinear forecast score TCox(Δ) for lags up to 80 trm (Fig. 5c). Clearly, the nonlinearity is especially significant at certain lags (4–8, 28, 36, 52, 72, 80 trm) where TCox(Δ) is even larger than the quantile TCox99% of nonrejection of the linearity hypothesis. Those lags are related to phase synchronization between Fourier frequencies, namely those with periods τ1 and τ2=2τ1. Lags are generally close to multiples of the half-period of the shorter oscillation, i.e. Δ=nτ12, nϵN. For instance, from Fig. 3, the Fourier spectral peaks at periods τ1=10.087=11.5 trm and τ2=10.0452τ1 justify the peak of TCox(Δ) at lag Δ=5τ1229 trm . This frequency relationship is known as quadratic phase coupling (Biswas et al., 1995) and examples in relation to decadal variability of ENSO are given in Timmermann (2003).




General properties


Bispectrum background

The bicovariance of El Niño time series (Fig. 4b) exhibits certain features and periodicities in the lag-time domain. Therefore, the two-dimensional Fourier transform of the bicovariance, i.e. bispectrum (Brillinger and Rosenblatt, 1967), can provide a dual complementary information about the most relevant spectral interactions contributing to the bicovariance and skewness of the time series making easier the physical interpretation of such interactions.

The bispectrum is the two-dimensional version of polyspectra (Brillinger, 1965), providing relevant information on non-Gaussian processes. The bispectrum Γ3,x(f1,f2)  is given by:

where its real and imaginary parts are discrete Fourier transforms (DFT) of the symmetric and antisymmetric parts of the bicovariance, respectively, with the last one vanishing if the underlying stochastic process is reversible (Weiss, 1975).

For the simplest case of a purely random noise w(t), the bicovariance is a spike at the origin i.e. γw(τ1,τ2)=E(w3)δ(τ1)δ(τ2) where δ(·) is the Kronecker delta, yielding a flat, constant and real bispectrum Γ3,w(f1,f2)=E(w3).


Bispectrum and spectral components

The asymptotic bispectrum of a N-sized time series writes in terms of DFTs Xx(·) of the time series, taken at triplets of frequencies (multiples of the minimum frequency 1N) (Hinich, 1982) at f1,f2 and f3=[(f1+f2+12)mod(1)12] as:

where (*) stands for conjugate complex. Eq. (11) shows that the bispectrum comes from the interaction between Fourier components at three frequencies f1,f2 and f3=f1+f2<1/2 in the area outside of the aliasing region, i.e. lower than the Nyquist frequency. When f1+f2>1/2, then f3=1(f1+f2) becomes an aliased frequency (Hinich and Wolinsky, 1988).

Equation (11) can still be interpreted in terms of the amplitude and phase of the DFT in polar form, i.e. Xx(f)Ax(f)e[iΘx(f)] . By denoting Ax(f1)Ax(f2)Ax(f3)Ax,123,  and ei[Θx(f1)+Θx(f2)Θx(f3)]eiΘx,123, and applying the product expectation decomposition to (11), we get (Kovach et al., 2018):

Γ3,x(f1,f2)=limN1NE[Ax,123]E[eiΘx,123] + limN1Ncov(Ax,123,eiΘx,123),
where the first r.h.s. term of (12) depends on phase synchronization of the three Fourier components and the second term is a covariance between amplitudes and phases, vanishing for linear processes. In fact, the Volterra representation of a linear process writes as a convolution: x(t)=kα(k)w(tk), where w(t) is a purely random noise. Its DFT is Xx(f)=Xα(f)Xw(f)=Xα(f)Aw(f)e[iΘw(f)]  where  Xα(f),Xw(f)  are DFTs, respectively of the sequence α(·) and of the noise. The independence between Aw(f) and Θw(f) yields Γ3,x(f1,f2)= Xα(f1) Xα(f2) Xα(f3)* E(w3),  restricted to the synchronization term (Nikias and Raghuveer, 1987), where E(w3)=limN1NE[Aw,123]E[eiΘw,123], showing hence the intrinsic nonlinear origin of the covariance term of (12).


Properties of the bispectrum

Like the spectrum, the bispectrum of a real signal satisfies Γ3,x(f1,f2)=Γ3,x(f1,f2)*, and (11) leads to 5 identities of symmetry, namely:


This allows for a partition of the bispectrum domain (BD) into 12 polygonal regions in which the bispectrum can be reproduced from the Principal Domain (PD): the triangle of vertices (0,0), (0,1/2) and (1/3,1/3) (shown by triangle 1 in Fig. 6a). The periodicities are evident in the theoretical bispectrum in Fig. 6b (real part) and Fig. 6c (imaginary part) of a non-Gaussian AR(5) model of El Niño (presented later in Section 5.3.1).

Fig. 6. 

Principal domain (triangle 1) in the spectral plane and its symmetric replicas (a) (adapted from Rao and Gabr (1984). Real (b) and Imaginary (c) parts of the bispectrum of the NGAR(5) model. Note the symmetries associated with the 12 sectors shown in panel a).

The reconstruction of bicovariance is obtained through the inverse FT of (10):

which leads to the skewness decomposition in terms of positive or negative contributions given by the real parts along the PD:

It is important to note here that, like the power spectrum (4), (15) implies that the element of the bispectrum Re[Γ3,x(f1,f2)]df1df2 provides the contribution to the skewness E(x3) from the bi-spectral bin [f1,f1+df1]×[f2,f2+df2].

Finally, since the square correlation is a predictability measure coming from third-order moments (see Eq. (9)), its overall sum can be distributed over the bispectral domain through the Parseval relationship:



Bispectrum estimation

The estimation of bispectrum has been addressed by many authors (e.g. Brillinger and Rosenblatt, 1967; Raghuvver and Nikias, 1986; Nikias and Raghuveer, 1987). The empirical bispectrum or biperiodogram of a finite sample of length N is the two-dimensional DFT of the sampled bicovariance, which can also be expressed in terms of DFTs of the signal (see Section 3.1):

where f3=(f1+f2+12)mod(1)12. The inverse FT of Eq. (17) yields the empirical bicovariance:

Like the periodogram (1), the estimator (17) is not consistent, which can be overcome by (i) smoothing the sample bispectrum (Hinich, 1982) or dividing the sample into pieces and then averaging and smoothing bispectra (Lii and Rosenblatt, 1982); (ii) using multi-tapers (Birkelund and Hanssen, 1999, 2000) or (iii) smoothing the bicovariance function (indirect-method) (Rao and Gabr, 1984), which we use here.

The smoothed bispectrum is:

where the 2D-lag window Λ(·) satisfies similar properties of symmetry as the bicovariance (8) and is taken to be Λ(τ1,τ2)=λ̂(τ1M2)λ̂(τ2M2)λ̂(τ1M2τ2M2) where λ̂( )  is the lag window function and M2 is the window length. The equivalent to bandwidth is given by b2M22 where b2=1/[λ̂(u)λ̂(v)λ̂(uv)]2du dv=1.19 (Appendix A).


Bispectrum estimation of El Niño index


Null hypothesis bispectrum

To reproduce the observed skewness, we can construct an AR process driven by a non-Gaussian noise as a null hypothesis H0.

The model we wish to fit is like model (6) x˜(t)=τ=1pa(k)x˜(tk)+σww(t), but with a non-Gaussian w(t) white noise, σw2=E(x2)1212|A(f)|2df and A(f)=1τ=1pa(k)e(2πiτf). The bispectrum of such a linear process (Nikias and Raghuveer, 1987) is:


By using the coefficients of the AR(5) model (6), we get the approximations σw2=0.275 and E(w3)=0.895, hence σw3E(w3)=0.129. This null H0 is hereafter designated NGAR(5).

Figure 6 shows (Fig. 6b) and imaginary (Fig. 6c) parts of the bispectrum (20) of the NGAR(5) model over the global bifrequency domain. The real part is mostly positive whereas the imaginary part is formed by dipolar structures. Both parts reflect the symmetry shown in (13) and exhibit a positive band of maximum absolute values in the region near f1+f20.09  cpt and f1,f2[0.02,0.08] cpt, coinciding with the frequency range where the power of the AR(5) model (6) exceeds 3 (°C)2/cpt as seen in Fig. 3.


Empirical smoothed bispectrum

To estimate the empirical bispectrum using the smoothed estimator (19), we start by identifying the ideal window function M2, according to (A4) in Appendix A. The values of the bispectral fluctuations σŝ3,x and the average confidence interval half-size (cihs) of the bispectrum (given by the square root of the l.h.s. of (A4)) are given in Table 2 for several values of M2. In order to separate peaks, the condition σŝ3,x>cihs must be satisfied. As expected, smaller bandwidths (larger M2) lead to larger bispectral fluctuations and larger bispectrum estimation errors via cish. From Table 2, the largest M2 around which σŝ3,x>cihs is M230, which is used below.

Figure 7 shows the real (Fig. 7a) and imaginary (Fig. 7b) parts of the smoothed bispectrum of the 3.4 El Niño index for the most relevant part of the first quadrant.

Fig. 7. 

Real (a) and Imaginary (b) part of the empirical bispectrum using a window lag (M=30). Squared amplitude of the smoothed bispectrum (c). In figures a–c, significant regions at 20% significance level (or lower) appear within thick contours. Real (d) and imaginary (e) parts of the standardized bispectrum deviation Tx,x˜(f1,f2)  of El Niño index, and corresponding sum of the squared real and imaginary parts (f). Bifrequencies for which the null hypothesis H0 is rejected at 20% significance levels (or lower) are color-shaded (|Tx,x˜)| >1.3 for each part). Values are restricted to the most significant part of bispectrum.

Both parts show several peaks. In particular, both present high absolute values for f1, f2[0.04,0.07]  cpt, as for null NGAR(5) model (Fig. 6) but with much larger amplitude. As expected, the average diameter of peaks is of the order of b2M2=0.036 cpt. The integral of Re(Ŝ3,x) (Fig. 7a) is the estimated skewness as Eq. (15). According to Eq. (11), the superposition of Fourier components along the triplet of frequencies f1,f2, f3=f1+f2 where Re()  is positive (negative), will mostly generate extreme positive (negative) values, i.e. El Niño (La Niña) events. The integral of positive and negative values of Re(Ŝ3,x) over the frequency domain is 0.54 and −0.08 respectively (adding up to the observed skewness 0.46). The local maxima of the real part, mostly contributing to El Niño extremes, lie near the frequency triplet (f1=f2=0.05, f3=0.1 cpt) and the band (f1+f2=f3=0.165 cpt), corresponding to local maxima of the power spectrum (Fig. 3), (e.g. quadratic phase synchronization Jajcay et al., 2018). On the other hand, the local minima of the real part, mostly contributing to La Niña extremes, lie near the frequency triplets (f1=f2=0.018, f3=0.036 cpt) and (f1=0.05, f2=0.018, f3=0.063 cpt), which are again close to relative power maxima (see Fig. 3).

Figure 7c shows the squared bispectrum amplitude, providing the bispectral contribution to the predictability through (16), which agree quite well with Timmermann (2003). Its maxima occur near the local extremes of the real or imaginary parts of Ŝ3,x (Fig. 7a,b). The most relevant region for nonlinear predictability occurs for frequencies satisfying f1+f2[0.07, 0.1] cpt. Another maximum is observed for f1+f2[0.16, 0.18] cpt, producing oscillations with periods of 5–6 trimesters, and suggests possible source of the high nonlinear predictability for lags 5–6 trm, as diagnosed by Cox test in Fig. 5c.


Bispectrum and bicovariance near the origin

The bispectrum is relevant for the bicovariance behaviour near the origin. In fact, γx(τ1,τ2) can be approximated by a Taylor expansion:

γx(τ1,τ2)=γx(0,0)+p=1,2γxτpτp+12p,q=1,22γxτpτqτpτq+16p,q,r=1,23γxτpτqτrτpτqτr+O(τ1aτ2b); a+b=4
where derivatives are computed at τ1=τ2=0. Using Eq. (14), we get:
where bicovariance symmetries lead to symmetries at the origin: γxτp=0, i.e. a local bicovariance extreme (see Fig. 4b) and 2γxτ1τ1=2γxτ2τ2, 3γxτ1τ1τ1=3γxτ2τ2τ2,3γxτ1τ1τ2=3γxτ2τ2τ1.

The partial derivatives at the origin, estimated with the smoothed El Niño bispectrum yield 2Cxτpτp=0.264 (p=1,2),  which explains most of the symmetric decrease of Cx(τ1,τ2) near the origin (see Fig. 4b). The term 3Cxτpτpτp=0.114, however explains the asymmetry of that decrease, which is stronger for positive than negative lags yielding the bicovariance minimum Cx(0,4)=0.4.


Bispectrum from frequency-band partitions

A coarse-grained description of the bispectrum can be achieved by classifying the triplets f1,f2,f3=f1+f2 into sets of frequencies forming a partition of [0,1/2]. Each triplet is then characterized by the number of frequencies in each set. We consider the simple partition of the frequency interval using a cutoff frequency 0<fcut<1/2, separating low (S) and high (F) frequencies, with the corresponding decomposition x(t)=s(t)+f(t) (see Fig. 1). Figure 8a shows the 4 obtained subdomains namely, SSS, SSF, SFF and FFF, yielding an expansion into 4 terms of third-order statistics, e.g.,


Fig. 8. 

(a) Subdomains SSS, SSF, SFF and FFF obtained from a frequency partition using a cut-off frequency fcut=0.08 cpt (∼3 years). (b) Contributing terms to El Niño 3.4 index skewness (Eq. (26)).

Figure 8b shows the terms in the r.h.s. of Eq. (26) as a function of fcut. A reasonable criterion of discrimination among the different components is to choose fcut  that maximizes the sum |SSS|+|SSF|+|SFF|+|FFF|, which takes place at fcut=0.082 cpt (3.04 years). This yields inter- (slow S) and intra-triennial (fast F) variations with respective 82% and 38% explained variance, with a well-marked scale separation lying at a local minimum of the smoothed power spectrum (see Fig. 3), and a minimum value of E(s3) (–0.066).

To see the impact of spectral decomposition on the different terms of skewness, we compare in Table 3 the terms of Eq. (26), derived directly from the time series to those obtained from the partial integrals of the smoothed bispectrum (Fig. 7a). Table 3 shows that the values are quite close, except for the negative value of E(s3). The underestimation obtained from the smoothed bispectrum is suggested to be due to the weak resolution of low frequencies.

Extremes are classified according to the dominant term (SSS, SSF, SFF or FFF). From Table 3, extreme events of La Niña must be explained by the slow component s(t) (e.g. 1887, 1917, 1956, 2000) or by phase synchronization of s(t)  and f(t) (mostly of SFF type, e.g. 1973, 1988, 2008, and 2011) (Fig. 1). On the other hand, extreme events of El Niño, are mainly due to slow-fast component interactions, namely of SSF type (e.g. 1877, 1918, 1930, 1958, 2015) and SFF type (e.g. 1926, 1951, 1965, 1972, 1982–83, 1992, 1997, 2002) or from fast components only (FFF type) (e.g. 1923, 1977, 2006, 2009). However, some El Niño events (e.g. 1905, 1940, 1986–87), have occurred due to long persistence of the slow component (SSS type) (Fig. 1). Note also that there are few cases of phase polarity between fast and slow components (e.g. 1974) that lead to weak El Niño index.

In order to determine temporal changes of the bispectrum, we assess the third moment and its decomposition, Eq. (26), both in the full period (FULL) and along the three half-centuries: 1870–1919 (HC1), 1920–1969 (HC2 and 1970–2018 (HC3) (Fig. 1). Moreover, we evaluate the above statistics during El Niños (x(t)>0) and La Niñas (x(t)<0) to examine the variability of extremes and corresponding spectral contributions. For a given term in Eq. (26), for instance, SSS, its average E(SSS) decomposes as: E(SSS)=E(SSS)++E(SSS), where E(SSS)+=E(SSS  | x>0)prob(x>0) and E(SSS)=E(SSS  |  x0)prob(x0), giving the contributions to E(SSS) during El Niños and La Niñas, respectively. Table 4 summarizes the results.

First, the most recent half century (HC3) has on average, the most extreme episodes of La Niña and El Niño, as observed from the high absolute values of sk and sk+. The amplification of La Ninãs comes mostly from a clear increase of self-slow interaction SSS=–0.06 and cross interaction SFF=–0.4 whereas amplification of El Niños comes from the enhancement of the SFF term (0.6), as compared to the previous two half centuries. This suggests changes and decadal variability of the ENSO skewness, its bicovariance and bispectrum (Wu and Hsieh, 2003), associated to changes in the preferential Fourier phase couplings (Schulte et al., 2019). This was accompanied by an ENSO regime shift, near 1970 towards more nonlinearity (An and Wang, 2000; An and Jin, 2004; An, 2009).


Statistical significance of the empirical bispectrum

It is important to check the acceptance or rejection of the null bispectrum H0 of NGAR(5) (Fig. 6b,c). The spectral method used here is based on a variation of the Hinich (1982) test of linearity. We anticipate that both the local (A6) and the integrated (A7) spectral-based tests of nonlinearity in the bi-frequency domain reject the null H0 in consistency with the nonlinearity Cox test in the time domain (see Section 4.2.2) and hence other sources of non-Gaussianity (e.g. deterministic nonlinearity and multiplicative noise) shall be necessary. Furthermore, as shown in Appendix A the asymptotic bias (A2), variance (A3) as well as the asymptotic Gaussian PDF (Rao and Gabr, 1984) of the smoothed estimator are not good approximations because of the small number of degrees freedom (Neff=66) of the time series and hence cannot be used to test the null H0. This could be alleviated by using, e.g. a very long run of a climate model simulation. We use a Monte-Carlo strategy by computing the statistics of bispectrum by generating 10,000 surrogates of the NGAR(5) model forced by a noise prescribed by its first three moments. In order to easily obtain noise realizations, we consider noises produced by polynomial independent standard Gaussian noises, by relating the coefficients of monomial expectations to the imposed noise moments. For instance, the first trial noise: σww(t)=aw1(t)+b[w12(t)1] where w1 is a standard Gaussian white noise, leads to σw2=a2+2b2=0.275 and σw3E(w3)=8b3+6a2b=0.129, yielding a=0.5122, and b=0.0794, and its excess kurtosis is 1.073.

Here, we choose: σww(t)=aw1(t)+b[w22(t)1] where w1,w2 are independent standard Gaussians with σw2=a2+2b2=0.275 and σw3E(w3)=8b3=0.129, yielding a=0.3840, and b=0.2525 and a excess kurtosis of 2.475. The results are quite robust to changes in the noise model. Other possible, though less practical, noises are generated by maximum entropy constrained by the four first moments (Pires et al., 2010). We then compute the Monte-Carlo ensemble average E[Ŝ3,x˜] and variance var[Ŝ3,x˜], of the real and imaginary parts of the smoothed bispectrum. Noise high-order moments (greater than 2) appear only to influence the high-order moments of the smoothed spectrum (e.g. skewness and kurtosis) which are not relevant for the linearity test devised here.

The deviation of El Niño smoothed bispectrum Ŝ3,x(f1,f2) (Fig. 7a,b) from that of NGAR(5) model is assessed by the test statistic (standardized deviation) (Eq. (A6)): Tx,x˜(f1,f2)Ŝ3,x(f1,f2)E[Ŝ3,x˜](var[Ŝ3,x˜])1/2 . Under H0, its real and imaginary parts are approximately standard Gaussian. We also limited the tests to frequencies with higher bispectrum amplitude, roughly corresponding to |f1|,|f2|<0.2 (as in Fig. 7a,c). Figure 7 shows the real (Fig. 7d) and imaginary (Fig. 7e) parts of Tx,x˜(f1,f2) where significant regions (at α=20% significance level). are color-shaded. Fig. 7d shows that most peaks of Re(Ŝ3,x) (Fig. 7a) are significant. In particular, the low-frequency region (SSS type), producing La Niña events for |f1|,|f2|,|f3|<0.06 cpt, is significant (i.e. rejecting H0). The other positive and negative bispectrum extremes discussed in Section 5.3.2 are also significant at α=10%. The imaginary part of the test (Fig. 7e) and the squared amplitude (Fig. 7f) are also highly significant in most of the relevant regions of the bispectrum with significance levels reaching 5%. The most significant region of nonlinear predictability holds approximately for f1+f2  within [0.16, 0.18]  cpt where bispectrum is significant at α=5%, producing oscillations with periods of the order 5–6 trimesters. This suggests a possible source of the high nonlinear predictability for lags 3–6 trm, as diagnosed by Cox test (Fig. 5c), and for the nonlinear curtailing of El Niño Spring barrier (see Fig. 5a).

The local test Tx,x˜(f1,f2)  computed on a frequency basis may lead, ambiguously, either to the rejection or to the acceptance of linearity, depending on f1,f2 (see Fig. 7d–e). This is suggested by the fact that finite N-sized samples generated by the non-Gaussian NGAR(5) model led to local frequency tests Tx˜,x˜(f1,f2) where linearity is falsely rejected (not shown). Therefore, in order to overcome this difficulty and enhance test robustness, we propose the integrated test of nonlinearity (A7) given by Tint x,x˜(f1,f2)L |Tx,x˜(f1,f2)|2 over a representative lattice L (Fig. A3). We found that nonlinearity cannot be rejected (at 5% level), thanks to the highly significant regions of El Niño bispectrum (Fig. 7d–e and 7a–c).


Normalized bispectrum and bicoherency

An independent test of linearity, beyond that of previous section and diagnostic of phase synchronization comes from the normalized bispectrum or bicoherence spectrum (Kim and Powers, 1979; Nikias and Raghuveer, 1987; Hinich and Wolinsky, 2005; Rao et al., 2012). It is obtained by prewhitening x(t) to yield a non-Gaussian white noise y(t), and reconstructed by inverse FT of Xy(f)Xx(f)[Γx(f)]12=Nei[Θx(f)]. The test is then:

Γ3,y(f1,f2) Γ3,x(f1,f2)( Γx(f1)Γx(f2)Γx(f3))12

We stress that the phases of Xy(f) and Xx(f)  are the same (i.e. Θx(f)), leading to a nonvanishing correlation cor(x,y)=σx11212 Γx(f) df.

A linear process i.e. x(t)=kα(k)w(tk) where w is a white noise yields Γx(f)=| Xα(f)|2E(w2). By using the result of Section 5.1.2, the normalized bispectrum (27) becomes Γ3,y(f1,f2)=sk(w)ei[Θα(f1)+Θα(f2)Θα(f3)] where Θα(·) is the phase of the DFT of sequence α(k). Therefore, the amplitude of Γ3,y  is uniform, i.e. |Γ3,y(f1,f2)|=sk(w) which is precisely the Hinich (1982) null hypothesis of linearity.

In the case of El Niño, we get an estimated prewhitened non-Gaussian noise y(t) by using the more reliable theoretical maximum entropy NGAR(5) spectrum for the normalization in (27), instead of the empirical smoothed spectrum. The lag correlation of the resulting noise y(t) is very close to zero and thus can be considered a white noise. Its skewness is 0.282 and the excess kurtosis is 0.531. The correlation with the signal x(t) is quite high: 0.75, coming mainly from extreme events.

The smoothed normalized bispectrum, using the same window length M=M2=30, is hereby denoted Ŝ3,y. Significant peaks (at 20% significance level) of Ŝ3,y, both of the real (Fig. 9a) and imaginary (Fig. 9b) parts are located nearly at the same bifrequencies as the non-normalized bispectrum Ŝ3,x (Fig. 7a,b) though peaks are attenuated by normalization and a new peak appears at higher frequencies f1=f2=0.18 (2.5 years), (f3=0.36) cpts (2.8 years). The squared bicoherency |Ŝ3,y(f1,f2)| 2 (Fig. 9c) is clearly nonuniform, showing regions of nonacceptance of the null bispectrum of NGAR(5) and hence rejecting the linearity hypothesis.

Fig. 9. 

Real (a) and Imaginary (b) parts of the normalized smoothed bispectrum of El Niño index, (M2=30) and its squared amplitude (c). Regions statistically significant at 20% level are colour-shaded.


Stochastic modelling


The method


The motivation

ENSO has been extensively studied to look for a deeper understanding of the underlying physics and complexity (see recent reviews of Chunzai, 2018; Timmermann et al., 2018 and references therein), improve predictability (see the review of Tang et al., 2018), as well as to get accurate statistics (e.g. pdf, extremes etc.). This was done through different dynamical models (physical-based deterministic models), statistical models (e.g. linear inverse modelling by Penland (1996) and Privalsky and Muzylev, 2013) and models based on machine learning (Dijkstra et al., 2019).

However, even complex models may exhibit biases of various statistics. The present top-down approach of fitting simple stochastic models to observations, attempts learn signal-noise relationships from models, enabling parametrizations of the nonlinear and complex effects of nonobserved variables and hence reproducing a set of relevant stochastic properties (e.g. spectrum, bispectrum).

A number of stochastic univariate models of ENSO have been fitted such as: smooth transition autoregressive (STAR) models (Hall et al., 2001; Ubilava and Helmers, 2013), autoregressive conditional heteroscedasticity type models (ARCH) (Ahn and Kim, 2005), and threshold AR models (De Gooijer, 2017). This section aims at fitting a minimal univariate stochastic model for El Niño 3.4. index driven by a multiplicative Gaussian delayed noise, able to reproduce the observed empirical spectrum, skewness and bispectrum as well as assess the impact it has on predictability, compared to benchmark linear models.


The model formulation

The models fitted here, belong to a class in which the simulated scalar state x˜(t),  at integer t, is driven by a noise u(t)=σww(t) where w(t) is a standard Gaussian white noise and σw is a positive constant. The simulated state x˜(t) depends (through a function F) on: a) the simulated previous state values at t<0, represented in delay coordinate (Takens et al., 1981) by x˜(t1)[x˜(t2),x˜(t1)]T; b) the previous noise values at t<0:u(t1)[u(t2),u(t1)]T and c) a parameter vector θ. Throughout this section bold letters refer to vectors and italic to scalars. The model writes thus as:

x˜(t)=F[x˜(t1), u(t1), θ]+ u(t)

The model (28) will also be used in forecast mode, and the τ-lag (τ1)  forecast valid at time at t,x(t,τ), is given by:

x(t,τ)=F[x(t1,τ1), ϵ(t1,τ), θ],
where x(t1,τ1)[,x(t2,τ2),x(t1,τ1)]T, with x(t,τ)=x(t) for τ0. The noise vector in (29) becomes ϵ(t1,τ)[,ε(t2,τ1),ε(t1,τ)]T with ε(t,τ)x(t,τ1)x(t,τ), where τ=max(1,τ).  For instance the one-step forecast is x(t,1)=F[x(t1,0), ϵ(t1,1), θ] where x(t1,0)=x(t1)[,x(t2),x(t1)]T  and ϵ(t1,1)=ϵ(t1)=[,ε(t2),ε(t1)]T with ε(t)x(t)x(t,1) is the one-step error forecasts, being thus consistent with (28).

The models of form (28) include AR linear models, fitted in Section 3.3.2. However, in order to reproduce nonlinear and non-Gaussian ENSO behaviour, we have considered bilinear models (Haggan and Oyetunji, 1980; Rao and Gabr, 1984; Rao 1981), which differ from AR processes by the addition to them of lagged bilinear (BL) terms, denoted ARBL(p1, p2):

F[x˜(t1), u(t1), θ]=k=1p1akx˜(tlk)+k=1p2bkx˜(trk)u(tsk)+α,
characterized by its correlated-additive–multiplicative (CAM) noise (Usoro, 2015), where lk, rk,sk1 are lags, with θ(a1,,ap1,b1,,bp2,α). These models can produce non-Gaussian statistics and nonvanishing bicovariances and bispectra (Rao and Gabr, 1984). Note that the restricted case of lags rk=0, sk=0 (Monahan, 2020) are excluded from model of Eq. (30), to allow inverting u(t) from past values for forecasting. Note also that the case rk>sk leads to sub-diagonal bilinear models whereas the case rksk corresponds to diagonal/super-diagonal bilinear models. In the former case, for example, the intervening noise is independent of the most recent state whereas the latter case corresponds to nonlinear feedbacks, having in general nonvanishing time averages due to correlation between states and past noises. We stress that models (30) have a nonlinear Volterra development in terms of lagged noises, which in a certain way parametrizes nonlinearities which are not present in a deterministic form in function F. However, other type of models could be fitted, e.g. adding quadratic terms in the deterministic forcing but which can lead to instabilities in simulations. Another difficulty with nonlinear deterministic part, is the very wide class of nonlinearity: which nonlinearity: if polynomial-what degree? However, we investigate it in another study.


Model fitting

In order to optimize predictability and reproduce data statistics, we apply a hybrid fitting algorithm that minimizes a cost function Jhyb  which is the weighted sum of the normalized one-step forecast residuals Jfitt  and the normalized squared distance between a set of observed and simulated statistics (average, autocovariance and bi-covariance) Jstat:

where cfitt and cstat are positive weights and σw(θ)=σx(Jfitt(θ))1/2 is the RMS of the forecast residuals for a model using parameters θ with σx=1 (observations are standardized).

The term Jfitt  is given by the one-step forecast error:

Jfitt(θ)=1(NNγ)σx2t=NγN1[x(t)F[x(t1), ϵ(t1), θ]]2
where Nγmaxk,k(lk,rk,sk).

Iterative minimization algorithms of Jfitt(θ) for general bilinear models are discussed in Pham and Tran (1981), (Rao and Gabr 1984), Grahn 1995, Guegan and Pham (1989), Gabr 1998 and Falguerolles and Francis (1992). Traditionally, the method of moments is used to obtain implicit relationships between the parameters and lagged moments (e.g. Sesay and Rao, 1988; Tang and Mohler, 1988; Kim et al., 1990), where, in most cases parameters are difficult to be expressed as a function of moments. Here we apply a method where statistics are estimated from a long simulation of (Eq. (28)) with initial conditions: x˜(t)=w(t)=0; t=Npast,,Npast+Nγ, and w(t)N(0,1);t=Npast+Nγ+1,…, Nsim. Statistics are computed for t=1,,Nsim with Nsim=30,000 and the initial Npast=1000 values are discarded from statistics as spin-up.

The term Jstat, involving the mean, autocovariance and bicovariance from observations, (Sobs,Cobs(·), and Bobs(·,·)), and simulations (Ssim,Csim(·),  and Bsim(·,·)), is

b1τmaxσx4 1/21/2[Ŝ2,x˜(f)Ŝ2,x(f)]2df,
JB(θ,σw)|τ1|,|τ2|τmax[Bsim(τ1,τ2)Bobs(τ1,τ2)]2b2λ(τ1,τ2)2τmax2σx6 b2τmax2σx6 1/21/21/21/2[Ŝ3,x˜(f1,f2)Ŝ3,x(f1,f2)]2df1df2,
with τmax=16.

The used window lag functions λ(τ) and λ(τ1,τ2) are scaled by M=M2=τmax. The Parseval Theorem applied to (35–36), shows that minimizing JC(θ,σw),JB(θ,σw) leads also to minimizing errors in the spectral domain.

In the analysis we compare two situations: cfitt=1,cstat=0 (simple fitting) and the hybrid fitting where cfitt=1; cstat=ΔJfitt/ΔJstat, given by the ratio of typical variations of Jfitt and Jstat (hybrid fitting). The optimal parameters issued from simple and hybrid fittings are hereby denoted θfitt and θhyb, respectively. Note that the term cstatJstat reduces overfitting and the domination of one term over the other, see Appendix B for the description of the minimization of Jhyb(θ). We compare both fittings in terms of Jfitt and Jstat for the AR(p1) model and several ARBL(p1, p2) models. The statistical significance is given through the normalized Akaike Information Criterion: NAIC=log(Jfitt)2dimθ/(NNγ) that penalizes the number of model terms.


Results for El Niño index


Fitting statistics

We analyse and evaluate a sequence of models (Eq. (30)), for p1=5,lk=k;k=1,,p1, with various p2 values using lags rk6,sk6 (Table 5) with every new lag pair producing the largest Jfitt decrease.

Table 5 shows that the mean residual squares Jfitt  decreases with increasing model complexity. For hybrid, NAIC decreases with increasing complexity (i.e., no overfitting). Moreover, the hybrid fitting is able to get improved statistics compared to that from the simple fitting (Jstat(θhyb)<Jstat(θfitt)), with reductions up to about one fifth for the ARBL(5,5) model. That reduction entails very tiny increments in the sum of residuals,  i.e. Jfitt(θhyb)>Jfitt(θfitt), of the order of 1%, but still keeping significant NAIC values.

The estimated of the ARBL(5,5) model are shown in Table 6 and (α,σw)=(0.0626, 0.5102).


Simulated autocovariance and spectrum

We note that the autoregressive part of the ARBL(5,5) model is quite similar to that of AR(5) model (Eq. (6)). The bilinear coefficients have smaller amplitude than the autoregressive coefficients. ARBL(5,5) contains two feedback terms corresponding to (rk,sk)=(2,4) and (1,3) whose nonzero mean values come from white noise squares and leading to a nonvanishing constant α. This parametrizes partially the bicovariance at the interseasonal scales which is at the origin of the nonlinear reduction of the Spring Predictability Barrier (see Section 4.2.2). The model recovers quite well the empirical autocovariance function (not shown), and also the smoothed spectrum (Fig. 10) using the window lag M=30, with particularly similar peak in the band f0.050.07 cpt.

Fig. 10. 

Smoothed empirical (black) and ARBL(5,5) simulated (red) spectra using a window lag M = 30.


Simulated bicovariance and bispectrum

Most (∼80%) uncertainty in Jhyb(θhyb) comes from the bicovariance uncertainty through JB(θ,σw). The simulated bicovariance (Fig. 11) reproduces very well the same pattern of the bicovariance inferred from observations (Fig. 4b), at least up to lags 12 trimesters. The simulation yields a skewness equal to 0.26, which is comparable with the observed skewness (0.46). Any simpler model with only one bilinear term is unable to yield positive skewness and negative bicovariances Cx˜(0,3), Cx˜(0,4), which are fundamental to get skillful El Niño predictions from Spring to the next Winter.

Fig. 11. 

Bicorrelation of the ARBL(5,5) model, approaching the empirical bicorrelation of El Niño index (to be compared with Fig. 4b).

The smoothed bispectrum of the ARBL(5,5) simulation, using a window lag M2=30, is shown in Fig. 12a (real part) and Fig. 12b (imaginary part), to be compared with the empirical one (Fig. 7a,b). The real part (Fig. 7a) is quite well reproduced, with negative values in the low-frequency band |f1+f2|<0.06, and positive local maxima near f1=f2=0.045 cpts  and f1=f2=0.085 cpts (Fig. 12a). The decomposition of skewness (Eq. (26)), (with fcut=0.06) yields the values −0.036, 0.062, 0.154 and 0.082 respectively for the SSS, SSF, SFF and SSS contributions, which agree approximately with observations (Table 3). The imaginary part of the simulated bispectrum (Fig. 12b) is also roughly well reproduced, showing negative values within the region f1+f2<0.06 (Fig. 7b) and positive elsewhere. The maximum is near f1=f20.07 cpts. Note that the existence of a single peak is related to the simulated single-peak spectrum.

Fig. 12. 

Real (a) and Imaginary (b) parts of smoothed bispectrum of the ARBL(5,5) using a window lag M2=30.

We shall remark that the bilinear models chosen here, have a linear deterministic part with nonlinearity coming indirectly through the CAM noise. However, other type of models could be fitted, e.g. adding quadratic terms in the deterministic forcing but which can lead to other difficulties like instabilities and chaotic behaviour in simulations.



In order to assess the predictability impact of the inclusion of bilinear terms in ARBL models, compared to the AR(5) model, we compute the correlation skill corτ, (Table 5) between observations and predictions for lags for τ=1,2,3 trimesters. Models are optimized by hybrid fitting in the training period 1870–1969 (100 years) and predictions are evaluated in the validation period 1970–2018 (49 years), where the most intense La Niñas and El Niños have been observed (see Section 5.3.4). All the tested models’ predictions are skillful (corτ>0.5) for lags up to two trimesters.

Table 5 shows that, in general correlation skills increase with increasing complexity. For instance, the correlation skill of model ARBL(5,5) is about 2%, 7% and 9% larger than AR(5), respectively for lags of 1, 2 and 3 trimesters. The presence of at least two bilinear terms suggests the reason behind the improvement of the 2-trimester forecasts by ARBL(5,5), with respect to AR(5), for which some extreme El Niños (e.g. 1973, 1983, 1998, 2016) are more accurately predicted. The ARBL(5,5) model has no explicit deterministic nonlinearities, which are common to physically-based models. Here, nonlinearities are parametrized through the bilinear terms. Despite the simplicity of the ARBL(5,5) model, its correlation skills (cor1=0.87, cor2=0.62) are not dramatically smaller, in the same period than those of physically-based models (cor1=0.850.91, cor2=0.680.80) and to those of a much more complex neural-network based model (cor1=0.92, cor2=0.80) as observed in Fig. 2a of (Ham et al., 2019).


Discussion and conclusion

El Niño Southern Oscillation (ENSO) is one of the most important coupled atmosphere–ocean system, exhibiting time scales ranging from seasons to decades and beyond, with a particularly worldwide teleconnection. Using different stochastic and/or dynamic approaches, most studies have emphasized and shown its intrinsic complexity, nonlinearity and non-Gaussianity. Most of those studies limited their investigations to the second order statistics in addition to skewness and/or kurtosis.

Here, we follow the same line of research by performing a data-driven systematic study of the third-order stochastic moments, both in the time and spectral domains, applied to the standardized trimonthly-average El Niño 3.4 index with a trimester sampling. Within the time domain, this comprises the bicovariance γx(τ1,τ2)=E[x(t)x(t+τ1)x(t+τ2)], in addition to nonlinear correlations for testing nonlinearity and nonlinear predictability for forecasting horizons from seasons up to a few years. The study uses a 149-year period (1870–2018) time series with its statistically significant skewness of 0.46, peaking mostly at the boreal winter (1.04). The analysis of bicovariance maxima reveals high negative nonlinear correlation, implying that El Niño or La Niña extremes are likely to be followed by La Niña one year later, whereas mild conditions, on the other hand, favour El Niño occurrence.

The analysis of the nonlinear predictability, on seasonal time scales, shows that such nonlinear correlation is enhanced further when forecasts are issued at the NH Spring season (AMJ). This is linked to the persistence of many La Niñas starting in Spring up to next Winter and to the fact that strong Winter El Niños have only occurred under close climatological conditions in the previous boreal Spring. This strongly suggests that nonlinearity in the inter-seasonal timescale can contribute significantly to reduce the so-called El Niño Spring predictability barrier. Another equally important aspect is the fact that ENSO nonlinearity allows for the extension of predictability skill even for forecasting time of a few years, particularly when these forecasting time intervals satisfy phase synchronization and quadratic phase locking with certain dominant Fourier frequencies.

Similarly, within the spectral domain the bispectrum and bicoherence have been computed. As with power spectrum and variance, the bispectrum provides, in particular, the contribution of each bi-frequency bin to the observed skewness and squared bicovariance. This warrants the detection of combinations of El Niño Fourier components that mostly contribute to ENSO extremes by phase synchronization. The bispectrum also permits a test of nonlinearity in the spectral domain. The bispectrum has been estimated by a smoothed estimator using a window lag of 30 trimesters = 7.5 years, obtained from a trade-off between bispectrum bias, variance, and spectral resolution. To estimate the statistical significance of peaks, a conservative test has been adopted. The bispectrum of a non-Gaussian autoregressive null-hypothesis model NGAR(5) is tested and rejected at 5% significance level. We first obtained the coarse-grained spectral partition of the skewness by splitting the full signal into a slow component s(t) (with periods larger than 3 years) and a fast component f(t). The skewness is then decomposed into 4 components, namely SSS=0.066,SSF=0.263,SFF=0.185  and FFF=0.071, implying, in particular, that most El Niños result from interaction between inter- and intra-triennial timescales. Some decadal tendency towards SFF-type El Niños is apparent from the time series, which is consistent with the observed ENSO decadal variability. Note that if a maximum of the bispectrum real part is observed at (f1,f2) then a peak in the power spectrum is observed approximately at f3=f1+f2. In particular, the leading bispectral maxima contributing to El Niño occurs at f1=f2=0.05, f3=0.1 cpt cpt (periods of 5 and 2.5 years inside the SSF region) and near the band f1+f2=0.165 cpt, crossing the SFF and FFF regions, with a maximum at f1=f2=0.082, f3=0.165 cpt (periods of 2.9 and 1.5 years). On the other hand, minima contributing mostly to La Niña extremes, lie near f1=f2=0.018, f3=0.036 cpt (periods of 14 and 7 years) and (f1, f2)=(0.05,0.018), f3=0.063 cpt (periods of 5, 7 and 4 years), both inside the SSS region.

Lastly, a minimal stochastic model was constructed, which was able to reproduce the main features of the spectrum and bispectrum, and yielded robust improvement of the predictability skill, compared to an autoregressive AR(5) model. The model was selected from a large class of bilinear models with correlated-additive–multiplicative lagged noise. To gain predictability skill with the right stochastic properties, a hybrid fitting approach is used by minimizing a combination of forecasting squared residuals and squared deviations from empirical third-order statistics. The bilinear model yields forecast improvements, particularly at lags of 1, 2 and 3 trimesters with 2%, 7% and 9% of correlation skill increment respectively, suggested to be linked to the attenuation of El Niño predictability Spring barrier and to the more accurate prediction of super El Niños.

This study contributes to the understanding of ENSO predictability and modelling from the perspective of the bispectrum and phase synchronization. In a changing climate, this is especially relevant for the study and predictability of ENSO extremes resulting from resonant-type interaction. The study also provides the possibility to investigate other ENSO indices such as El-Nino Modoki or other Nino indices, and check whether other processes are at play. In particular, forecast skill of ENSO, based on the developed models are of great importance for seasonal (and longer) timescales forecasting. A systematic analysis of these topics is beyond the scope of this manuscript and is left for future research.