1.

## Introduction

Earth's weather and climate vary on a wide range of spatio-temporal scales. Whereas external orbital variations are believed to be the dominant driving force for macro-climate (on millennial time scales), weather, macro-weather (Lovejoy, 2018) and climate variations (on shorter time scales) are mainly the result of complex nonlinear interactions between very many degrees-of-freedom (Sura and Hannachi, 2015) and also due to many climate sub-components with different time scales The atmospheric (and climate) system is an excellent example of high-dimensional and highly complex dynamical system. One outstanding and ubiquitous feature of the large scale (and low frequency) atmospheric (and climate) variability is non-Gaussianity (Franzke et al., 2007; Proistosescu et al., 2016; Pires and Hannachi, 2017; Hannachi and Iqbal, 2019). For instance, Sura and Sardeshmukh (2008) show that sea surface temperature (SST) has non-Gaussian probability distribution function (PDF) with particular tail extrema. Many processes, e.g. subgrid scales, large-scale teleconnections and nonlinearity can lead to various kinds of uncertainties, which can affect the accuracy of our understanding, such as forecasts. Stochastic modelling can help overcome some of the previous problems and improve accuracies (Berner et al., 2017).

Although non-Gaussianity and nonlinearity can be considered as two distinct aspects of weather, macro-weather and climate (e.g., Sura and Sardeshmukh, 2008; Sura and Hannachi, 2015), observations do suggest that these are two inter-related characteristic features of the system (e.g., Woollings et al., 2010; Hannachi et al., 2017; and references therein). Some systems can exhibit weak (deterministic) nonlinearity, but with strong non-Gaussian statistics, which may be explained by a multiplicative or state-dependent noise as detailed, e.g. in Sura and Sardeshmukh (2008), Sardeshmukh and Sura 2009, see also Hannachi et al. (2017) for a review. In contrast, non-Gaussianity can also result from systems with strong nonlinearity and additive noise, see e.g., Hannachi et al. (2017) for a review and further references. Notwithstanding this link between nonlinearity and non-Gaussianity, it is sometimes helpful to disentangle the contributions of nonlinearity and stochastic noise to the system PDF. There is a general consensus that understanding non-Gaussian statistics of weather and climate is important for a number of reasons, not least for weather and climate prediction, planning and risk assessment. Extreme events, for instance, which are important in planning and risk assessment, depend closely on the structure of the non-Gaussian PDF. Various sources exist, which contribute to the non-Gaussianity. Sura and Hannachi (2015) provide a detailed account of the different sources and mechanisms contributing to the observed non-Gaussianity of the atmospheric large-scale and low-frequency variability. They discussed, in particular, nonlinear regime dynamics, multiplicative noise, cross-frequency coupling, jet stream meandering and nonlinear boundary layer drags. They investigated specifically non-Gaussianity in geopotential heights, jet stream latitude and SST fields, with particular focus on skewness and kurtosis. Skewness and kurtosis are simple measures of non-Gaussianity, in the time domain, that does not directly reflect frequencies. A convenient way to examine nonlinearity and/or non-Gaussianity is to use high (e.g., third and fourth) order statistics in the frequency domain, namely, e.g. bispectrum and trispectrum (e.g. Brillinger and Rosenblatt, 1967; Nikias and Raghuveer, 1987). Spectral analysis of stationary time series is well rooted in the study of time series $x\left(t\right),t=0,1,2\dots$ (supposed to be zero-mean and unit variance for simplicity) from weather and climate and other fields. The duality between the time and spectral domains allows to investigate the distribution of, e.g. variance and skewness, as a function of spectral bins. For example, the duality between the second-order cumulant, or autocovariance function ${\gamma }_{x}\left(·\right)$ and the power spectrum reveals that the integral of the latter is precisely the variance of the time series, i.e.,

$E\left({x}^{2}\right)={\gamma }_{x}\left(0\right)={\int }_{-\frac{1}{2}}^{\frac{1}{2}}{\mathrm{\Gamma }}_{x}\left(f\right)df.$

This means, in particular, that the power spectrum can be viewed as a decomposition of the variance by frequency bins. Likewise, the duality between the bispectrum and the third-order moment or bicovariance ${\gamma }_{x}\left({\tau }_{1},{\tau }_{2}\right)\equiv E\left[x\left(t\right)x\left(t+{\tau }_{1}\right)x\left(t+{\tau }_{2}\right)\right]$ means that the skewness is the integral of the bispectrum,

$E\left({x}^{3}\right)={\gamma }_{x}\left(0,0\right)={\int }_{-1/2}^{1/2}{\int }_{-1/2}^{1/2}\mathit{Re}\left[{\mathrm{\Gamma }}_{3,x}\left({f}_{1},{f}_{2}\right)\right]d{f}_{1}d{f}_{2}.$

This also implies that the bispectrum can be viewed as a decomposition of the skewness by bins of frequency pairs. This can be used to identify nonlinearly interacting pairs of spectral bins contributing to the skewness which are often attributed to phase locking between components at frequency triplets producing triadic resonance (Pires and Perdigão, 2015).

Bispectral analysis has been used in signal processing to study nonlinearity detection and bicoherency in a number of different fields including econometry (Ashley et al., 1986; Rusticelli et al., 2008), acoustics (Richardson and Hodgkiss, 1994), and Earth Sciences (Müller, 1987; Biswas et al., 1995; Hocke and Kämpfer, 2008).

Here we focus on the bispectral analysis of El Niño Southern Oscillation (ENSO), the main atmosphere–ocean interannual mode (Neelin et al., 1998; Wang et al., 2016). ENSO aspects, like nonlinearity and complexity (Timmermann, 2003; Frauen et al., 2014; Berner et al., 2018; Bianucci et al., 2018), non-Gaussianity (Burgers and Stephenson, 1999; Chunzai, 2018; Boucharel et al., 2009; Hannachi et al., 2017) and also its modelling (Kondrashov et al., 2005) have been extensively studied, both from a physical and signal-processing perspective. In particular, some ENSO bispectral analysis was performed by Timmermann et al. (2001, 2018), to fit a low-order nonlinear dynamical model of El Niño index and by Schulte et al. (2020) to compute the cross-bi-coherency and synchronization with the Indian Monsoon.

The present study aims to perform a systematic and thorough analysis of third-order statistics, both in the time and spectral domains to infer and improve the understanding of ENSO non-Gaussianity through skewness, nonlinearity (e.g. Hinich, 1982; Cox, 1991), and nonlinear predictability on time scales ranging from seasons to years. We will emphasize, in particular, the role of nonlinear lagged correlation with the intra-annual time scale in overcoming of the Spring predictability barrier of ENSO (Duan and Wei, 2013). The bispectrum of El Niño index and its statistical significance are thus computed, looking for the most relevant wave-triad contributing to the skewness, which are associated with ENSO extremes by a phase synchronization, see e.g. Jajcay et al. (2018). Then, we check how simple stochastic models driven by lagged correlated additive multiplicative (CAM) noise (Monahan, 2020) can produce the main bispectral features of ENSO and El Niño/La-Niña asymmetry (Martinez-Villalobos et al., 2019).

The manuscript starts with data description with a synthesis of exploratory statistics (Section 2). Then the autocorrelation function and spectrum (Section 3) are shown. In Section 4, we present the tests of nonlinearity and non-Gaussianity based on the bicovariance function. Section 5 details the bispectrum properties and its estimation and statistical significance applied to ENSO. Section 6 develops a minimal bilinear stochastic model fitting data. A summary and conclusion are given in the last section. Most of the technical details and symbols are put in appendices. The details presented in the main text and appendices, some of them conventional, make the paper convenient particularly for didactive purposes and ready to apply to other time-series beyond ENSO.

2.

## Data

The raw data used are monthly anomalies (with respect to the 1981–2010 period) of El Niño 3.4 sea surface temperature (SST) Index, obtained by area averaged SST in the geographical region 5S–5N by 170–120 W, taken in the 119-year period 1870–2018 and extracted from the periodically updated website https://www.esrl.noaa.gov/psd/gcos_wgsp/Timeseries/Data/nino34.long.anom.data (Rayner et al., 2003). Raw data exhibit a positive trend of 0.19 °C/century, associated to oceanic global warming. It reveals also a decadal-scale variability (observed in the 50 yr running means) in phase with the Pacific Decadal Oscillation (PDO), also known as a long-lived El Niño pattern (Zhang et al., 1997). After the removal of the linear trend, we get a detrended zero-centred time-series ${x}_{\mathit{cent}}$ of the index, with a standard deviation $\sigma {\left(x}_{\mathit{cent}}\right)=0.77°\mathrm{C}.$ The skewness is $sk\left({x}_{\mathit{cent}}\right)=0.44,$ associated to the prevalence of El Niños to La Niñas (Burgers and Stephenson, 1999) and an excess kurtosis (ekurt) of $0.46.$ The monthly dependence of statistics is particularly striking in the skewness and excess kurtosis, ranging from a positively skewed leptokurtic probability distribution function (PDF) in NH winter ($sk=1.81,$$\mathit{ekurt}=2.93$ in January), favouring extreme El Niños, up to a platykurtic, nearly unskewed PDF in the NH Spring ($sk=-0.01,$$\mathit{ekurt}=-0.65$ in May). Spring has weaker extreme El Niños compared to Winter, and the extrapolation of the Spring’s El Niño state to the next Winter is rather unskilful i.e. the Spring predictability barrier (Duan and Wei, 2013). The intra-seasonal variability is quite small, compared to the inter-seasonal and interannual variability. Short-range persistence is quite high leading to 1- and 2-month lag-correlations of 0.92 and 0.85. Three-monthly seasonal averages (JFM, AMJ, JAS, OND) are constructed with an overall std $\sigma =0.74°\mathrm{C},$$sk=0.46$ and $\mathit{ekurt}=0.41,$ peaking again during the NH or Boreal Winter’s ($sk=1.04,$ in JFM) (Stuecker et al., 2013; Stein et al., 2014). The statistics are quite similar to those obtained with more commonly used trimesters (e.g. for DJF, we get $sk=1.07,$$\mathit{ekurt}=1.68.$ The remaining three seasons are much less skewed with $sk\in \left[0.25,0.28\right]$ and a rather small $\mathit{ekurt}$ except for the platykurtic behaviour in Boreal Spring (AMJ), with $\mathit{ekurt}=-0.51.$ All the analysis that follows is performed on the standardized time series, hereafter denoted as $x\left(t\right),$with sample size $N=596$ trimesters (trm), being shown in Fig. 1.

Fig. 1.

Detrended and standardized time series in the period 1870–2018, of the three-monthly average anomalies (with respect to the annual cycle) of El Niño 3.4 index (black) and its low-pass (red) and high-pass (green) components with a cutoff frequency of 0.08 cycles per trimester roughly corresponding, respectively, to inter- and intra-triennial timescales variability.

3.

## Autocorrelation and spectrum

3.1.

### Autocorrelation function

We start by evaluating second-order lagged moments of the time series. The autocorrelation function is shown in Fig. 2. It has oscillatory behaviour with typical ENSO timescales (3–4 years). In order to test the null hypothesis ${H}_{0}$ of a vanishing autocorrelation, taking into account the serial correlation, we use an approximate effective sample size ${N}_{\mathit{eff}}\approx \frac{N\left[1-{C}_{x}\left(1\right)\right]}{\left[1+{C}_{x}\left(1\right)\right]}\approx 66$ (Wilks, 2011). Then, is rejected if ${|{C}_{x}\left(\tau \right)|}^{2}>\left[{q}_{\frac{\alpha }{2}}\left({N}_{\mathit{eff}}{\right)\right]}^{2}/\left\{\left[{q}_{\frac{\alpha }{2}}\left({N}_{\mathit{eff}}{\right)\right]}^{2}+{N}_{\mathit{eff}}-2\right\}$ (von Storch and Zwiers, 1999) where ${q}_{\frac{\alpha }{2}}\left({N}_{\mathit{eff}}\right)$ is the $\frac{\alpha }{2}$-th quantile of a t-Student PDF with ${N}_{\mathit{eff}}$ degrees of freedom. From Fig. 2, some local extremes and significant correlations appear also at decadal lags.

Fig. 2.

Autocorrelation function of El Niño 3.4 index (solid black) along with the 95% (dashed black) and 90% (dotted black) confidence interval. The autocorrelation function of the AR(5) model fitting data is also shown (solid red).

3.2.

### The spectrum and its estimation

A regularly sampled stationary time-series $x\left(t\right)$ can be decomposed using discrete Fourier transform (DFT) as: ${X}_{x}\left(f\right)={\sum }_{t=0}^{N-1}x\left(t\right) \text{exp} \left(-2\pi \mathit{ift}\right),$ for every frequency $f$ in cycles per sampling period and can be reconstructed through the inverse Fourier transform (FT) as $x\left(t\right)=\frac{1}{N}{\sum }_{K=-\frac{N}{2}}^{\frac{N}{2}}{X}_{x}\left(\frac{k}{N}\right) \text{exp} \left(2\pi i\frac{k}{N}t\right)$ (for $N$ even). The sample spectrum or periodogram is the DFT of the autovariance function:

((1))
${S}_{x}\left(f\right)=\frac{1}{N}{|{X}_{x}\left(f\right)|}^{2}={\sum }_{\tau =-\left(N-1\right)}^{N-1}{C}_{x}\left(\tau \right) \text{exp} \left(-2\mathit{\text{πifτ}}\right)\ge 0$
whereas the asymptotic spectrum is given by:
((2))
${\Gamma }_{x}\left(f\right)=\underset{N\to \mathrm{\infty }}{\mathit{lim}}E\left[{S}_{x}\left(f\right)\right]$

where $E\left(·\right)$ is the expectation operator over realizations of the stochastic process. According to the Wiener–Khintchine theorem (2) is the FT of the autocovariance function i.e.:

((3))
${\gamma }_{x}\left(\tau \right)={\int }_{-\frac{1}{2}}^{\frac{1}{2}}{\Gamma }_{x}\left(f\right) \text{exp} \left(2\mathit{\text{πifτ}}\right)df.$

Equation (3) yields, in particular, the variance of the time series

((4))
${\gamma }_{x}\left(0\right)={\int }_{-\frac{1}{2}}^{\frac{1}{2}}{\Gamma }_{x}\left(f\right)df\approx \frac{1}{N}\left[{S}_{x}\left(0\right)+2{\sum }_{k=1}^{\frac{N}{2}}{S}_{x}\left(\frac{k}{N}\right)\right],$
where ${\mathrm{\Gamma }}_{x}\left(f\right)df$ in (4) provides the contribution from the spectrum to the time series variance within the frequency interval $\left[f,f+df\right].$

For a stationary process, the DFTs ${X}_{x}\left(f\right),$ estimated at different frequencies are nearly uncorrelated (von Storch and Zwiers, 1999), and the periodogram (1), is known to be inconsistent, and hence a smoothed estimator is often used:

((5))
${\stackrel{̂}{S}}_{x}\left(f\right)=\sum _{\tau =-\left(N-1\right)}^{N-1}{C}_{x}\left(\tau \right)\lambda \left(\tau \right) \text{exp} \left(-2\mathit{\text{πifτ}}\right)$
where $\lambda \left(\tau \right)\equiv \stackrel{̂}{\lambda }\left(\frac{\tau }{M}\right)$ where $\stackrel{̂}{\lambda }\left(·\right)$ is a symmetric standardized positive lag window function (Priestley, 1981), characterized by a standardized bandwidth ${b}_{1}\equiv \frac{1}{{\int }_{-\mathrm{\infty }}^{\mathrm{\infty }}{\stackrel{̂}{\lambda }\left(u\right)}^{2}du}.$ The window scale length $M$ is chosen, based on a trade-off between spectral resolution ${\left(b}_{1}/M\right),$ low values of the variance $\left(\mathit{var}\left[{\stackrel{̂}{S}}_{x}\left(f\right)\right]\sim \left[\frac{{{\mathrm{\Gamma }}_{x}\left(f\right)}^{2}M}{N{b}_{1}}\right]\right)$ and bias $\left(E\left[{\stackrel{̂}{S}}_{x}\left(f\right)\right]-{\mathrm{\Gamma }}_{x}\left(f\right)\sim \frac{-{\stackrel{̂}{\lambda }}^{\text{″}}\left(0\right){{\mathrm{\Gamma }}_{x}^{\text{″}}\left(f\right)}^{}}{4{\pi }^{2}{M}^{2}}\right)$ of (5) (Jenkins and Watts, 1968), where double apostrophe represents second derivatives. The 90% confidence interval of ${\mathrm{\Gamma }}_{x}\left(f\right)$ is given by where is the number of degrees of freedom of the ${\chi }_{\nu }^{2}.$

We have also computed, the maximum entropy estimator of the spectrum by fitting an autoregressive (AR) model of generic order $p,$ via the Yule–Walker equations. The order is chosen by minimizing the Akaike Information Criterion, AIC (Akaike, 1974). The goal of computing the theoretical maximum entropy spectrum is to build a null hypothesis spectrum ${\mathrm{\Gamma }}_{x,0}\left(f\right).$ Therefore, in order to objectively check if the spectral peaks of (1) and (5) are truly distinguishable and significantly larger than those of at a significance level $\alpha ,$ the estimated spectra (1,5) have to be compared to the threshold $\frac{{\mathrm{\Gamma }}_{x,0}\left(f\right){q}_{1-\alpha }\left({\chi }_{\nu }^{2}\right)}{\nu }$ (Wilks, 2011) with $\nu =\frac{2N{b}_{1}}{M}$ for (5) and $\nu =1$ for (1) corresponding to $M=N,$ and a standardized bandwidth equal to the Nyquist frequency.

3.3.

### El Niño index spectrum

3.3.1.

#### Periodogram and smoothed spectrum

The periodogram (1) of the time series is shown in Fig. 3. There is evidence of sharp and well separated peaks near the frequencies 0.021, 0.045, 0.071, 0.087, 0.105, and 0.168 cycles per trimester (cpt), corresponding (in the same order) to periods of 12.2 years, linked to decadal variability (Sun and Yu, 2009; Kravtsov, 2012), 5.62 years (Kim, 2002), and 3.53, 2.87, 2.38 and 1.49 years in the range of the ‘Quasi-quadrennial and quasi-biennial variability’ (Jiang et al., 1995). These peaks are also found by Deser et al. (2010) that are responsible for variations in the ENSO frequency, intensity, propagation, and predictability (e.g., An and Wang, 2000; Fedorov and Philander, 2000; Timmermann, 2003; An and Jin, 2004; Yeh and Kirtman, 2004; Wang, 2018).

Fig. 3.

Empirical periodogram (thin green); Smoothed spectrum (thick black) and corresponding 95% confidence interval (thin black); theoretical AR(5) spectrum (red), and 10% significance level for the periodogram (red dashed) and for the smoothed spectrum (red dotted). Units are (°C)2/cpt. The top axis presents periods in years.

For the smoothed spectrum (5), we chose the Bartlett–Priestley window-lag function: $\stackrel{̂}{\lambda }\left(u\right)=\frac{3}{\left(\pi u{\right)}^{2}}\left[\frac{ \text{sin} \left(\pi u\right)}{\pi u}- \text{cos} \left(\pi u\right)\right]\text{​​​}\left(u\ne 0\right);\stackrel{̂}{\lambda }\left(0\right)=1$ (Priestley, 1981) for which=0.855 and ${\stackrel{̂}{\lambda }}^{\text{″}}\left(0\right)=-1.97.$ In order to resolve the main spectral peaks, the bandwidth must be smaller than the minimum difference between consecutive leading frequencies, i.e. cpt, hence $M>\frac{{b}_{1}}{0.016}\sim 52$ trimesters. To further increase spectral resolution, we show in Fig. 3 the estimator (5) using (20 years), corresponding to $\nu \sim 13.$ In this case, the standard deviation of the filtered estimator is about 39% of the true spectrum. The 90% confidence interval for ${\mathrm{\Gamma }}_{x}\left(f\right)$ is $\left[0.52{\stackrel{̂}{S}}_{x}\left(f\right),2.62{\stackrel{̂}{S}}_{x}\left(f\right)\right]$ which is quite large, a consequence of the high window length $M.$ The local maxima of the filtered estimator lie close to the high spikes. Moreover, the periodogram lies well within the 95% confidence interval of the true spectrum (see thin black lines of Fig. 3).

3.3.2.

#### Maximum entropy spectrum

An AR(5) model (see the AIC in Table 1) is fitted here:

((6))
$\begin{array}{c}\stackrel{˜}{x}\left(t\right)=1.18\stackrel{˜}{x}\left(t-1\right)-0.46\stackrel{˜}{x}\left(t-2\right)-0.09\stackrel{˜}{x}\left(t-3\right)\\ +0.17\stackrel{˜}{x}\left(t-4\right)-0.12\stackrel{˜}{x}\left(t-5\right)+0.524w\left(t\right),\end{array}$
where $w\left(t\right)$ is a standard Gaussian white noise. The autocorrelation function of the AR(5) model (Fig. 2) closely captures the empirical one for lags up to 12–20 trm but misses low-frequency oscillations. The corresponding (theoretical) maximum entropy spectrum is shown in Fig. 3 showing a spectral bump in the range (3–5 years period) and with no other spectral peaks. Figure 3 also shows the thresholds of significance (at α = 10%) for the periodogram and for the smoothed spectrum.

The peaks with periods 5.62, 3.53, and 1.49 years are significant, both in the periodogram (1) and the smoothed spectrum (5) whereas the peak 2.87 years is significant in the periodogram (1) but only marginally significant in the smoothed spectrum (5).

4.

## Bicovariance, non-Gaussianity and nonlinearity

4.1.

### General properties

The bicovariance function ${\gamma }_{x}\left({\tau }_{1},{\tau }_{2}\right)$ is a generalization of the autocovariance function, given by third-order cumulants between lagged values of $x\left(t\right),$ which for a zero mean stationary process writes as ${\gamma }_{x}\left({\tau }_{1},{\tau }_{2}\right)=E\left[x\left(t\right)x\left(t+{\tau }_{1}\right)x\left(t+{\tau }_{2}\right)\right]$ and is estimated here, for ${\tau }_{1},{\tau }_{2}\ge 0$ as:

((7))
${C}_{x}\left({\tau }_{1},{\tau }_{2}\right)=\frac{1}{N-\mathit{max}\left({\tau }_{1},{\tau }_{2}\right)}\sum _{t=0}^{N-\mathit{max}\left({\tau }_{1},{\tau }_{2}\right)-1}x\left(t\right)x\left(t+{\tau }_{1}\right)x\left(t+{\tau }_{2}\right).$

Stationarity implies time invariance of lagged statistics leading to several identities reflecting the symmetry of the bicovariance (Rao and Gabr, 1984), namely:

((8))
$\begin{array}{c}{\gamma }_{x}\left({\tau }_{1},{\tau }_{2}\right)={\gamma }_{x}\left({\tau }_{2},{\tau }_{1}\right)={\gamma }_{x}\left(-{\tau }_{1},{\tau }_{2}-{\tau }_{1}\right)={\gamma }_{x}\left(-{\tau }_{2},{\tau }_{1}-{\tau }_{2}\right)={\gamma }_{x}\left({\tau }_{2}-{\tau }_{1},-{\tau }_{1}\right)=\\ ={\gamma }_{x}\left({\tau }_{1}-{\tau }_{2},-{\tau }_{2}\right)\end{array}$
leading to the partition into 6 sectors of the plane $\left({\tau }_{1},{\tau }_{2}\right)$ as illustrated in Fig. 4a. We stress here that the origin of a non-vanishing bicovariance is the non-Gaussianity of the process $x\left(t\right).$ In fact, for a Gaussian, (necessarily linear) process, the bicovariance vanishes because cumulants of order 3 (the equivalent to lagged skewness) or higher vanish. For finite time series, Gaussianity is rejected if the absolute value of skewness is large enough.

Fig. 4.

Symmetries (a) of the bicovariance function in the delay plan (adapted from Rao and Gabr, 1984) and bicorrelation of El Niño 3.4 index (b). Thick black contours show the 5% significance level.

For a nonvanishing third-order cumulant, the bicorrelation provides a measure of predictability (for lags $\Delta \ge 0$), coming from nonlinear correlation:

((9))
$\mathit{cor}\left[x\left(t+\Delta \right),x\left(t\right)x\left(t-{\tau }_{1}\right)\right]=\frac{{C}_{x}\left({\tau }_{1},{\tau }_{1}+\Delta \right)}{\sqrt{\mathit{var}\left[x\left(t\right)x\left(t+{\tau }_{1}\right)\right]}}.$

Local maxima and minima of bicorrelation can thus provide sources of predictability due to non-Gaussianity and/or nonlinearity. However, we must note that part of the nonlinear correlation (9), in particular at ${\tau }_{1}=0,$ is due to skewness, and hence to eliminate that contribution, we must consider the nonlinear component or residual of the predictor $x\left(t{\right)}^{2}$ after removing the linear regression with $x\left(t\right),$ where is a regression constant equal to the skewness of $x.$

Another useful advantage of bicorrelation is the Cox (1991) test of nonlinearity, that we will apply bellow, also based on skewness. More general tests exist, which analyse nonlinear predictability originating from a set of past values (Granger and Anderson, 1978). Under the null hypothesis of linearity, the test ${T}_{\mathit{Cox}}\left(\Delta \right)\equiv \mathit{cor}\left[x\left(t+\Delta \right),{x}_{nl}\left(t\right){\right]}^{2}$ must vanish for all lags. Threshold values of ${T}_{\mathit{Cox}}$ under an inexistent predictability hypothesis are obtained by randomly shuffling $x\left(t\right)$ and ${x}_{nl}\left(t\right)$ (10,000 times) and then computing the 95% quantile (denoted as ${T}_{\mathit{Cox}-95\mathrm{%}}$) of the sorted values of ${T}_{\mathit{Cox}}\left(0\right).$ Nonlinearity of El Niño is thus accepted if ${T}_{\mathit{Cox}}\left(\Delta \right)>{T}_{\mathit{Cox}-95\mathrm{%}}$ at a 5% significance level.

4.2.

### Results for El Niño index

4.2.1.

#### Bicorrelation

The bicorrelation (Fig. 4b), exhibits fluctuations of the order of 12–20 trimesters similar to the autocorrelation (Fig. 2). The maximum value of the bicorrelation coincides with the skewness: ${C}_{x}\left(0,0\right)=0.46.$

In order to test the departure of the bicorrelation from the Gaussian hypothesis, we have generated 10,000, $N$-sized simulations with the AR(5) and computed uncertainties. The 90% (95%) quantiles of are equal to 0.24 (0.29), which are both below the observed skewness 0.46, hence the null hypothesis of Gaussianity is rejected at the 5% significance level. For ${\tau }_{1}\sim {\tau }_{2}$ (diagonals of the bicovariance graph) or ( and/or ${\tau }_{2}\sim 0$), the 95% (90%) quantiles are ∼0.14 (0.16) and 0.10 (0.12) elsewhere. The rejection regions rejecting of the null hypothesis (at the 5% significance level) appear within thick black contours in Fig. 4b.

From inspection of Fig. 4b, we verify that the deepest bicovariance minimum ${C}_{x}\left(0,4\right)=-0.40,$ is significant at the 5% significance level, corresponding to a nonlinear correlation (Eq. (9)) of $\mathit{cor}\left[x\left(t+4\right),x\left(t{\right)}^{2}\right]=-0.24$ with This implies that an extreme El Niño or La Niña (high ${x\left(t\right)}^{2}$ than average) favours the occurrence of La Niña occurrence four trimesters later (negative $x\left(t+4\right)\right),$ whereas mild conditions favour El Niño. Another local bicorrelation minimum ${C}_{x}\left(8,8\right)=-0.15$ (10% significance level) implies that La Niña event favours a strong El Niño or La Niña 2 years later (8 trm).

4.2.2.

#### Linear and nonlinear predictability

As regards El Niño predictability skill, Fig. 5a shows the linear and nonlinear correlations: $\mathit{cor}\left[x\left(t+\Delta \right),x\left(t\right)\right]$ and $\mathit{cor}\left[x\left(t+\Delta \right),{x}_{nl}\left(t\right)\right],$ with ${x}_{nl}\left(t\right)$ defined in Section 4.1 and for forecast lags $\Delta$ up to 20 trm. The linear correlations of El Niño 3.4 are significant at 10% level for lags (Fig. 5a thick black line) whereas the nonlinear correlations show significant values for lags (Fig. 5a thick red line). Those correlations are also evaluated for the stronger El Niño season i.e., the JFM trimester ($t+\Delta$ in JFM) (Fig. 5a thin black line for linear and thin red line for nonlinear correlations respectively). We note here the presence of the Spring predictability barrier of El Niño (Duan and Wei, 2013) i.e., the barely weak linear extrapolation (i.e. persistence) of El Niño index from current Spring (AMJ) to the next Winter (JFM), as shown by the small value (∼0.05) of where | means 'conditionated to' and $\Delta =3.$ However, the Spring barrier is reduced if we include the nonlinear term in the forecast. In fact, the nonlinear correlation evaluated at $\mathit{JFM}$ (Fig. 5a thin red line) for forecast lags (–0.25 to −0.4) is statistically significant, e.g. a negative value of between $x\left(t\right)$ in Spring and $x\left(t+3\right)$ in next Winter (see Fig. 5b).

Fig. 5

(a) Linear (thick black) and nonlinear (thick red) correlations ($\mathit{cor}\left[x\left(t+\Delta \right),x\left(t\right)\right]$ and $\mathit{cor}\left[x\left(t+\Delta \right),{x}_{nl}\left(t\right)\right]$ respectively) of El Niño 3.4 index. The same correlations restricted to forecast trimester JFM (black thin and red thin respectively). Horizontal dashed lines show the 10% significance level interval $\left[-\frac{1.64}{\sqrt{{N}_{\mathit{eff}}-3}},\frac{1.64}{\sqrt{{N}_{\mathit{eff}}-3}}\right].$ (b) AMJ versus following year JFM El Niño and best quadratic fitting. (c) Cox test ${T}_{\mathit{Cox}}\left(\Delta \right)$ (thick) and the 95% (dotted) and 99% (dashed) confidence level threshold of nonrejection of the linearity hypothesis.

This is corroborated by the conditional expectations of the Winter signal conditioned to the previous Spring signal. First, we have (see left sector of Fig. 5b). We argue that strong trade-winds (or persistent eastern wind bursts in the Eastern Pacific), favouring strong La Niña Spring conditions tend to persist over some trimesters eventually reaching the next Winter season (e.g. JFM of years 1989, 1974, and 2011).

On the other hand, we have (central sector of Fig. 5b) corresponding to (Spring) near climatological conditions in the Eastern Pacific. Here, the tendency is to favour a Winter with El Niño predominance in agreement with Boreal Winter phase-locking. In particular, strong Winter El Niños (e.g. 1983, 1998, and 2016) were preceded by quite mild Spring conditions. Finally, from strong El Niño Spring conditions (associated with westerly winds anomalies) tend to reverse in the next trimesters. This suggests that the nonlinear predictability is a consequence of the asymmetric persistence of El Niño signal and Pacific trade winds in Spring, as a function of their intensity and phase. A possible mechanism for this is seasonal growth rate dependence on the ENSO regime and feedbacks controlling SST (Yishuai et al., 2020), and the seasonal dependence of easterly wind bursts from Spring to Autumn (Fan et al., 2019).

Finally, we compute the nonlinear forecast score ${T}_{\mathit{Cox}}\left(\Delta \right)$ for lags up to 80 trm (Fig. 5c). Clearly, the nonlinearity is especially significant at certain lags (4–8, 28, 36, 52, 72, 80 trm) where ${T}_{\mathit{Cox}}\left(\Delta \right)$ is even larger than the quantile ${T}_{\mathit{Cox}-99\mathrm{%}}$ of nonrejection of the linearity hypothesis. Those lags are related to phase synchronization between Fourier frequencies, namely those with periods ${\tau }_{1}$ and ${\tau }_{2}=2{\tau }_{1}.$ Lags are generally close to multiples of the half-period of the shorter oscillation, i.e. For instance, from Fig. 3, the Fourier spectral peaks at periods and ${\tau }_{2}=\frac{1}{0.045}\approx 2{\tau }_{1}$ justify the peak of ${T}_{\mathit{Cox}}\left(\Delta \right)$ at lag This frequency relationship is known as quadratic phase coupling (Biswas et al., 1995) and examples in relation to decadal variability of ENSO are given in Timmermann (2003).

5.

## Bispectrum

5.1.

### General properties

5.1.1.

#### Bispectrum background

The bicovariance of El Niño time series (Fig. 4b) exhibits certain features and periodicities in the lag-time domain. Therefore, the two-dimensional Fourier transform of the bicovariance, i.e. bispectrum (Brillinger and Rosenblatt, 1967), can provide a dual complementary information about the most relevant spectral interactions contributing to the bicovariance and skewness of the time series making easier the physical interpretation of such interactions.

The bispectrum is the two-dimensional version of polyspectra (Brillinger, 1965), providing relevant information on non-Gaussian processes. The bispectrum is given by:

((10))
${\Gamma }_{3,x}\left({f}_{1},{f}_{2}\right)\equiv \sum _{{\tau }_{1}=-\mathrm{\infty }}^{\mathrm{\infty }}\sum _{{\tau }_{2}=-\mathrm{\infty }}^{\mathrm{\infty }}{\gamma }_{x}\left({\tau }_{1},{\tau }_{2}\right) \text{exp} \left[-2\pi i\left({f}_{1}{\tau }_{1}+{f}_{2}{\tau }_{2}\right)\right],$
where its real and imaginary parts are discrete Fourier transforms (DFT) of the symmetric and antisymmetric parts of the bicovariance, respectively, with the last one vanishing if the underlying stochastic process is reversible (Weiss, 1975).

For the simplest case of a purely random noise $w\left(t\right),$ the bicovariance is a spike at the origin i.e. ${\gamma }_{w}\left({\tau }_{1},{\tau }_{2}\right)=E\left({w}^{3}\right)\delta \left({\tau }_{1}\right)\delta \left({\tau }_{2}\right)$ where $\delta \left(·\right)$ is the Kronecker delta, yielding a flat, constant and real bispectrum ${\mathrm{\Gamma }}_{3,w}\left({f}_{1},{f}_{2}\right)=E\left({w}^{3}\right).$

5.1.2.

#### Bispectrum and spectral components

The asymptotic bispectrum of a N-sized time series writes in terms of DFTs ${X}_{x}\left(·\right)$ of the time series, taken at triplets of frequencies (multiples of the minimum frequency $\frac{1}{N}$) (Hinich, 1982) at ${f}_{1},{f}_{2}$ and ${f}_{3}=\left[\left({f}_{1}+{f}_{2}+\frac{1}{2}\right)mod\left(1\right)-\frac{1}{2}\right]$ as:

((11))
${\Gamma }_{3,x}\left({f}_{1},{f}_{2}\right)=\underset{N\to \mathrm{\infty }}{\mathit{lim}}\frac{1}{N}E\left[{X}_{x}\left({f}_{1}\right){X}_{x}\left({f}_{2}\right){{X}_{x}\left({f}_{3}\right)}^{\mathrm{*}}\right]$
where (*) stands for conjugate complex. Eq. (11) shows that the bispectrum comes from the interaction between Fourier components at three frequencies ${f}_{1},{f}_{2}$ and ${f}_{3}={f}_{1}+{f}_{2}<1/2$ in the area outside of the aliasing region, i.e. lower than the Nyquist frequency. When ${f}_{1}+{f}_{2}>1/2,$ then ${f}_{3}={1-\left(f}_{1}+{f}_{2}\right)$ becomes an aliased frequency (Hinich and Wolinsky, 1988).

Equation (11) can still be interpreted in terms of the amplitude and phase of the DFT in polar form, i.e. By denoting and ${e}^{i\left[{\mathrm{\Theta }}_{x}\left({f}_{1}\right)+{\mathrm{\Theta }}_{x}\left({f}_{2}\right)-{\mathrm{\Theta }}_{x}\left({f}_{3}\right)\right]}\equiv {e}^{i{\mathrm{\Theta }}_{x,123}},$ and applying the product expectation decomposition to (11), we get (Kovach et al., 2018):

((12))
where the first r.h.s. term of (12) depends on phase synchronization of the three Fourier components and the second term is a covariance between amplitudes and phases, vanishing for linear processes. In fact, the Volterra representation of a linear process writes as a convolution: $x\left(t\right)={\sum }_{k}^{}\alpha \left(k\right)w\left(t-k\right),$ where $w\left(t\right)$ is a purely random noise. Its DFT is where are DFTs, respectively of the sequence $\alpha \left(·\right)$ and of the noise. The independence between ${A}_{w}\left(f\right)$ and ${\mathrm{\Theta }}_{w}\left(f\right)$ yields restricted to the synchronization term (Nikias and Raghuveer, 1987), where showing hence the intrinsic nonlinear origin of the covariance term of (12).

5.1.3.

#### Properties of the bispectrum

Like the spectrum, the bispectrum of a real signal satisfies ${\mathrm{\Gamma }}_{3,x}\left({-f}_{1},{-f}_{2}\right)={{\mathrm{\Gamma }}_{3,x}\left({f}_{1},{f}_{2}\right)}^{\mathrm{*}},$ and (11) leads to 5 identities of symmetry, namely:

((13))
$\begin{array}{c}{\Gamma }_{3,x}\left({f}_{1},{f}_{2}\right)={\Gamma }_{3,x}\left({f}_{2},{f}_{1}\right)={\Gamma }_{3,x}\left({f}_{1},-{f}_{1}-{f}_{2}\right)={\Gamma }_{3,x}\left(-{f}_{1}-{f}_{2},{f}_{2}\right)\\ ={\Gamma }_{3,x}\left(-{f}_{1}-{f}_{2},{f}_{1}\right)={\Gamma }_{3,x}\left({f}_{2},-{f}_{1}-{f}_{2}\right).\end{array}$

This allows for a partition of the bispectrum domain (BD) into 12 polygonal regions in which the bispectrum can be reproduced from the Principal Domain (PD): the triangle of vertices (0,0), (0,1/2) and (1/3,1/3) (shown by triangle 1 in Fig. 6a). The periodicities are evident in the theoretical bispectrum in Fig. 6b (real part) and Fig. 6c (imaginary part) of a non-Gaussian AR(5) model of El Niño (presented later in Section 5.3.1).

Fig. 6.

Principal domain (triangle 1) in the spectral plane and its symmetric replicas (a) (adapted from Rao and Gabr (1984). Real (b) and Imaginary (c) parts of the bispectrum of the NGAR(5) model. Note the symmetries associated with the 12 sectors shown in panel a).

The reconstruction of bicovariance is obtained through the inverse FT of (10):

((14))
${\gamma }_{x}\left({\tau }_{1},{\tau }_{2}\right)={\int }_{-\frac{1}{2}}^{\frac{1}{2}}{\int }_{-\frac{1}{2}}^{\frac{1}{2}}{\Gamma }_{3,x}\left({f}_{1},{f}_{2}\right) \text{exp} \left[2\pi i\left({f}_{1}{\tau }_{1}+{f}_{2}{\tau }_{2}\right)\right]d{f}_{1}d{f}_{2},$
which leads to the skewness decomposition in terms of positive or negative contributions given by the real parts along the PD:
((15))
$E\left({x}^{3}\right)={\gamma }_{x}\left(0,0\right)=12{\iint }_{PD}\mathit{Re}\left[{\Gamma }_{3,x}\left({f}_{1},{f}_{2}\right)\right]d{f}_{1}d{f}_{2.}$

It is important to note here that, like the power spectrum (4), (15) implies that the element of the bispectrum $\mathit{Re}\left[{\mathrm{\Gamma }}_{3,x}\left({f}_{1},{f}_{2}\right)\right]d{f}_{1}d{f}_{2}$ provides the contribution to the skewness $E\left({x}^{3}\right)$ from the bi-spectral bin $\left[{f}_{1},{f}_{1}+d{f}_{1}\right]×\left[{f}_{2},{f}_{2}+d{f}_{2}\right].$

Finally, since the square correlation is a predictability measure coming from third-order moments (see Eq. (9)), its overall sum can be distributed over the bispectral domain through the Parseval relationship:

((16))
${\sum }_{{\tau }_{1}}^{}{\sum }_{{\tau }_{2}}^{}\left[{\gamma }_{x}\left({\tau }_{1},{\tau }_{2}\right){\right]}^{2}={\int }_{-\frac{1}{2}}^{\frac{1}{2}}{\int }_{-\frac{1}{2}}^{\frac{1}{2}}|{\Gamma }_{3,x}\left({f}_{1},{f}_{2}\right){|}^{2}d{f}_{1}d{f}_{2}=12{\iint }_{PD}|{\Gamma }_{3,x}\left({f}_{1},{f}_{2}\right){|}^{2}d{f}_{1}d{f}_{2}.$

5.2.

### Bispectrum estimation

The estimation of bispectrum has been addressed by many authors (e.g. Brillinger and Rosenblatt, 1967; Raghuvver and Nikias, 1986; Nikias and Raghuveer, 1987). The empirical bispectrum or biperiodogram of a finite sample of length $N$ is the two-dimensional DFT of the sampled bicovariance, which can also be expressed in terms of DFTs of the signal (see Section 3.1):

((17))
${S}_{3,x}\left({f}_{1},{f}_{2}\right)\equiv \sum _{{\tau }_{1}=-\left(N-1\right)}^{N-1}\sum _{{\tau }_{2}=-\left(N-1\right)}^{N}{C}_{x}\left({\tau }_{1},{\tau }_{2}\right) \text{exp} \left[-2\pi i\left({f}_{1}{\tau }_{1}+{f}_{2}{\tau }_{2}\right)\right]=\frac{1}{N}{X}_{x}\left({f}_{1}\right){X}_{x}\left({f}_{2}\right){{X}_{x}\left({f}_{3}\right)}^{\mathrm{*}}$
where ${f}_{3}=\left({f}_{1}+{f}_{2}+\frac{1}{2}\right)mod\left(1\right)-\frac{1}{2}.$ The inverse FT of Eq. (17) yields the empirical bicovariance:
((18))
${C}_{x}\left({\tau }_{1},{\tau }_{2}\right)=\frac{1}{{N}^{2}}\sum _{{k}_{1}=-\frac{N}{2}}^{\frac{N}{2}}\sum _{{k}_{2}=-\frac{N}{2}}^{\frac{N}{2}}{S}_{3,x}\left(\frac{{k}_{1}}{N},\frac{{k}_{2}}{N}\right) \text{exp} \left[2\pi i\left(\frac{{k}_{1}}{\mathrm{N}}{\tau }_{1}+\frac{{k}_{2}}{N}{\tau }_{2}\right)\right]$

Like the periodogram (1), the estimator (17) is not consistent, which can be overcome by (i) smoothing the sample bispectrum (Hinich, 1982) or dividing the sample into pieces and then averaging and smoothing bispectra (Lii and Rosenblatt, 1982); (ii) using multi-tapers (Birkelund and Hanssen, 1999, 2000) or (iii) smoothing the bicovariance function (indirect-method) (Rao and Gabr, 1984), which we use here.

The smoothed bispectrum is:

((19))
${\stackrel{̂}{S}}_{3,x}\left({f}_{1},{f}_{2}\right)=\sum _{{\tau }_{1}=-N}^{N}\sum _{{\tau }_{2}=-N}^{N}{C}_{x}\left({\tau }_{1},{\tau }_{2}\right)\Lambda \left({\tau }_{1},{\tau }_{2}\right) \text{exp} \left[-2\pi i\left({f}_{1}{\tau }_{1}+{f}_{2}{\tau }_{2}\right)\right]$
where the 2D-lag window $\mathrm{\Lambda }\left(·\right)$ satisfies similar properties of symmetry as the bicovariance (8) and is taken to be $\mathrm{\Lambda }\left({\tau }_{1},{\tau }_{2}\right)=\stackrel{̂}{\lambda }\left(\frac{{\tau }_{1}}{{M}_{2}}\right)\stackrel{̂}{\lambda }\left(\frac{{\tau }_{2}}{{M}_{2}}\right)\stackrel{̂}{\lambda }\left(\frac{{\tau }_{1}}{{M}_{2}}-\frac{{\tau }_{2}}{{M}_{2}}\right)$ where is the lag window function and ${M}_{2}$ is the window length. The equivalent to bandwidth is given by $\frac{{b}_{2}}{{M}_{2}^{2}}$ where (Appendix A).

5.3.

### Bispectrum estimation of El Niño index

5.3.1.

#### Null hypothesis bispectrum

To reproduce the observed skewness, we can construct an AR process driven by a non-Gaussian noise as a null hypothesis

The model we wish to fit is like model (6) $\stackrel{˜}{x}\left(t\right)={\sum }_{\tau =1}^{p}a\left(k\right)\stackrel{˜}{x}\left(t-k\right)+{\sigma }_{w}w\left(t\right),$ but with a non-Gaussian $w\left(t\right)$ white noise, ${\sigma }_{w}^{2}=\frac{E\left({x}^{2}\right)}{{\int }_{-\frac{1}{2}}^{\frac{1}{2}}{|A\left(f\right)|}^{-2}df}$ and $A\left(f\right)=1-{\sum }_{\tau =1}^{p}a\left(k\right){e}^{\left(-2\mathit{\text{πiτf}}\right)}.$ The bispectrum of such a linear process (Nikias and Raghuveer, 1987) is:

((20))
${\Gamma }_{3,\stackrel{˜}{x}}\left({f}_{1},{f}_{2}\right)=\frac{{\sigma }_{w}^{3}E\left({w}^{3}\right)}{A\left({f}_{1}\right)A\left({f}_{2}\right){A\left({f}_{1}+{f}_{2}\right)}^{\mathrm{*}}},$
with:
((21))
$E\left({w}^{3}\right)=\frac{E\left({x}^{3}\right)}{{\sigma }_{w}^{3}{\int }_{-\frac{1}{2}}^{\frac{1}{2}}{\int }_{-\frac{1}{2}}^{\frac{1}{2}}{\left[A\left({f}_{1}\right)A\left({f}_{2}\right){A\left({f}_{3}\right)}^{\mathrm{*}}\right]}^{-1}d{f}_{1}d{f}_{2}}$

By using the coefficients of the AR(5) model (6), we get the approximations ${\sigma }_{w}^{2}=0.275$ and $E\left({w}^{3}\right)=0.895,$ hence ${\sigma }_{w}^{3}E\left({w}^{3}\right)=0.129.$ This null ${H}_{0}$ is hereafter designated NGAR(5).

Figure 6 shows (Fig. 6b) and imaginary (Fig. 6c) parts of the bispectrum (20) of the NGAR(5) model over the global bifrequency domain. The real part is mostly positive whereas the imaginary part is formed by dipolar structures. Both parts reflect the symmetry shown in (13) and exhibit a positive band of maximum absolute values in the region near cpt and ${f}_{1},{f}_{2}\in \left[0.02,0.08\right]$ cpt, coinciding with the frequency range where the power of the AR(5) model (6) exceeds 3 (°C)2/cpt as seen in Fig. 3.

5.3.2.

#### Empirical smoothed bispectrum

To estimate the empirical bispectrum using the smoothed estimator (19), we start by identifying the ideal window function ${M}_{2},$ according to (A4) in Appendix A. The values of the bispectral fluctuations ${\sigma }_{{\stackrel{̂}{s}}_{3,x}}$ and the average confidence interval half-size (cihs) of the bispectrum (given by the square root of the l.h.s. of (A4)) are given in Table 2 for several values of ${M}_{2}.$ In order to separate peaks, the condition ${\sigma }_{{\stackrel{̂}{s}}_{3,x}}>\mathit{cihs}$ must be satisfied. As expected, smaller bandwidths (larger ${M}_{2}$) lead to larger bispectral fluctuations and larger bispectrum estimation errors via cish. From Table 2, the largest ${M}_{2}$ around which ${\sigma }_{{\stackrel{̂}{s}}_{3,x}}>\mathit{cihs}$ is ${M}_{2}\sim 30,$ which is used below.

Figure 7 shows the real (Fig. 7a) and imaginary (Fig. 7b) parts of the smoothed bispectrum of the 3.4 El Niño index for the most relevant part of the first quadrant.

Fig. 7.

Real (a) and Imaginary (b) part of the empirical bispectrum using a window lag (M=30). Squared amplitude of the smoothed bispectrum (c). In figures a–c, significant regions at 20% significance level (or lower) appear within thick contours. Real (d) and imaginary (e) parts of the standardized bispectrum deviation of El Niño index, and corresponding sum of the squared real and imaginary parts (f). Bifrequencies for which the null hypothesis ${H}_{0}$ is rejected at 20% significance levels (or lower) are color-shaded ( for each part). Values are restricted to the most significant part of bispectrum.

Both parts show several peaks. In particular, both present high absolute values for cpt, as for null NGAR(5) model (Fig. 6) but with much larger amplitude. As expected, the average diameter of peaks is of the order of The integral of $Re\left({\stackrel{̂}{S}}_{3,x}$) (Fig. 7a) is the estimated skewness as Eq. (15). According to Eq. (11), the superposition of Fourier components along the triplet of frequencies where is positive (negative), will mostly generate extreme positive (negative) values, i.e. El Niño (La Niña) events. The integral of positive and negative values of $Re\left({\stackrel{̂}{\mathrm{S}}}_{3,x}$) over the frequency domain is 0.54 and −0.08 respectively (adding up to the observed skewness 0.46). The local maxima of the real part, mostly contributing to El Niño extremes, lie near the frequency triplet and the band corresponding to local maxima of the power spectrum (Fig. 3), (e.g. quadratic phase synchronization Jajcay et al., 2018). On the other hand, the local minima of the real part, mostly contributing to La Niña extremes, lie near the frequency triplets and which are again close to relative power maxima (see Fig. 3).

Figure 7c shows the squared bispectrum amplitude, providing the bispectral contribution to the predictability through (16), which agree quite well with Timmermann (2003). Its maxima occur near the local extremes of the real or imaginary parts of ${\stackrel{̂}{S}}_{3,x}$ (Fig. 7a,b). The most relevant region for nonlinear predictability occurs for frequencies satisfying cpt. Another maximum is observed for cpt, producing oscillations with periods of 5–6 trimesters, and suggests possible source of the high nonlinear predictability for lags 5–6 trm, as diagnosed by Cox test in Fig. 5c.

5.3.3.

#### Bispectrum and bicovariance near the origin

The bispectrum is relevant for the bicovariance behaviour near the origin. In fact, ${\gamma }_{x}\left({\tau }_{1},{\tau }_{2}\right)$ can be approximated by a Taylor expansion:

((22))
where derivatives are computed at ${\tau }_{1}={\tau }_{2}=0.$ Using Eq. (14), we get:
((23))
$\frac{\partial {\gamma }_{x}}{\partial {\tau }_{p}}=-{\iint }_{BD}\mathit{Im}\left[{\Gamma }_{3,x}\left({f}_{1},{f}_{2}\right)\right]2\pi {f}_{p}d{f}_{1}d{f}_{2}$
((24))
$\frac{{\partial }^{2}{\gamma }_{x}}{\partial {\tau }_{p}\partial {\tau }_{q}}=-{\iint }_{BD}\mathit{Re}\left[{\Gamma }_{3,x}\left({f}_{1},{f}_{2}\right)\right]{\left(2\pi \right)}^{2}{f}_{p}{f}_{q}d{f}_{1}d{f}_{2}$
((25))
$\frac{{\partial }^{3}{\gamma }_{x}}{\partial {\tau }_{p}\partial {\tau }_{q}\partial {\tau }_{r}}={\iint }_{BD}\mathit{Im}\left[{\Gamma }_{3,x}\left({f}_{1},{f}_{2}\right)\right]{\left(2\pi \right)}^{3}{f}_{p}{f}_{q}{f}_{r}d{f}_{1}d{f}_{2}.$
where bicovariance symmetries lead to symmetries at the origin: $\frac{\partial {\gamma }_{x}}{\partial {\tau }_{\mathrm{p}}}=0,$ i.e. a local bicovariance extreme (see Fig. 4b) and

The partial derivatives at the origin, estimated with the smoothed El Niño bispectrum yield which explains most of the symmetric decrease of ${C}_{x}{\left(\tau }_{1},{\tau }_{2}\right)$ near the origin (see Fig. 4b). The term $\frac{{\partial }^{3}{C}_{x}}{\partial {\tau }_{p}\partial {\tau }_{p}\partial {\tau }_{p}}=-0.114,$ however explains the asymmetry of that decrease, which is stronger for positive than negative lags yielding the bicovariance minimum ${C}_{x}\left(0,4\right)=-0.4.$

5.3.4.

#### Bispectrum from frequency-band partitions

A coarse-grained description of the bispectrum can be achieved by classifying the triplets ${f}_{1},{f}_{2},{f}_{3}={f}_{1}+{f}_{2}$ into sets of frequencies forming a partition of $\left[0,1/2\right].$ Each triplet is then characterized by the number of frequencies in each set. We consider the simple partition of the frequency interval using a cutoff frequency 0<${f}_{\mathit{cut}}$<1/2, separating low (S) and high (F) frequencies, with the corresponding decomposition $x\left(t\right)=s\left(t\right)+f\left(t\right)$ (see Fig. 1). Figure 8a shows the 4 obtained subdomains namely, SSS, SSF, SFF and FFF, yielding an expansion into 4 terms of third-order statistics, e.g.,

((26))
$E\left({x}^{3}\right)=E\left({s}^{3}\right)+3E\left({s}^{2}f\right)+3E\left({sf}^{2}\right)+E\left({f}^{3}\right)=\mathit{SSS}+\mathit{SSF}+\mathit{SFF}+\mathit{FFF},$

Fig. 8.

(a) Subdomains SSS, SSF, SFF and FFF obtained from a frequency partition using a cut-off frequency ${f}_{\mathit{cut}}=0.08$ cpt (∼3 years). (b) Contributing terms to El Niño 3.4 index skewness (Eq. (26)).

Figure 8b shows the terms in the r.h.s. of Eq. (26) as a function of ${f}_{\mathit{cut}}.$ A reasonable criterion of discrimination among the different components is to choose that maximizes the sum $|\mathit{SSS}|+|\mathit{SSF}|+|\mathit{SFF}|+|\mathit{FFF}$|, which takes place at (3.04 years). This yields inter- (slow S) and intra-triennial (fast F) variations with respective 82% and 38% explained variance, with a well-marked scale separation lying at a local minimum of the smoothed power spectrum (see Fig. 3), and a minimum value of (–0.066).

To see the impact of spectral decomposition on the different terms of skewness, we compare in Table 3 the terms of Eq. (26), derived directly from the time series to those obtained from the partial integrals of the smoothed bispectrum (Fig. 7a). Table 3 shows that the values are quite close, except for the negative value of $E\left({s}^{3}\right).$ The underestimation obtained from the smoothed bispectrum is suggested to be due to the weak resolution of low frequencies.

Extremes are classified according to the dominant term (SSS, SSF, SFF or FFF). From Table 3, extreme events of La Niña must be explained by the slow component $s\left(t\right)$ (e.g. 1887, 1917, 1956, 2000) or by phase synchronization of and $f\left(t\right)$ (mostly of SFF type, e.g. 1973, 1988, 2008, and 2011) (Fig. 1). On the other hand, extreme events of El Niño, are mainly due to slow-fast component interactions, namely of SSF type (e.g. 1877, 1918, 1930, 1958, 2015) and SFF type (e.g. 1926, 1951, 1965, 1972, 1982–83, 1992, 1997, 2002) or from fast components only (FFF type) (e.g. 1923, 1977, 2006, 2009). However, some El Niño events (e.g. 1905, 1940, 1986–87), have occurred due to long persistence of the slow component (SSS type) (Fig. 1). Note also that there are few cases of phase polarity between fast and slow components (e.g. 1974) that lead to weak El Niño index.

In order to determine temporal changes of the bispectrum, we assess the third moment and its decomposition, Eq. (26), both in the full period (FULL) and along the three half-centuries: 1870–1919 (HC1), 1920–1969 (HC2 and 1970–2018 (HC3) (Fig. 1). Moreover, we evaluate the above statistics during El Niños $\left(x\left(t\right)>0\right)$ and La Niñas $\left(x\left(t\right)<0\right)$ to examine the variability of extremes and corresponding spectral contributions. For a given term in Eq. (26), for instance, SSS, its average $E\left(\mathit{SSS}\right)$ decomposes as: $E\left(\mathit{SSS}\right)={E\left(\mathit{SSS}\right)}_{+}+{E\left(\mathit{SSS}\right)}_{-},$ where and giving the contributions to $E\left(\mathit{SSS}\right)$ during El Niños and La Niñas, respectively. Table 4 summarizes the results.

First, the most recent half century (HC3) has on average, the most extreme episodes of La Niña and El Niño, as observed from the high absolute values of ${sk}_{-}$ and ${sk}_{+}.$ The amplification of La Ninãs comes mostly from a clear increase of self-slow interaction SSS=–0.06 and cross interaction SFF=–0.4 whereas amplification of El Niños comes from the enhancement of the SFF term (0.6), as compared to the previous two half centuries. This suggests changes and decadal variability of the ENSO skewness, its bicovariance and bispectrum (Wu and Hsieh, 2003), associated to changes in the preferential Fourier phase couplings (Schulte et al., 2019). This was accompanied by an ENSO regime shift, near 1970 towards more nonlinearity (An and Wang, 2000; An and Jin, 2004; An, 2009).

5.3.5.

#### Statistical significance of the empirical bispectrum

It is important to check the acceptance or rejection of the null bispectrum ${H}_{0}$ of NGAR(5) (Fig. 6b,c). The spectral method used here is based on a variation of the Hinich (1982) test of linearity. We anticipate that both the local (A6) and the integrated (A7) spectral-based tests of nonlinearity in the bi-frequency domain reject the null ${H}_{0}$ in consistency with the nonlinearity Cox test in the time domain (see Section 4.2.2) and hence other sources of non-Gaussianity (e.g. deterministic nonlinearity and multiplicative noise) shall be necessary. Furthermore, as shown in Appendix A the asymptotic bias (A2), variance (A3) as well as the asymptotic Gaussian PDF (Rao and Gabr, 1984) of the smoothed estimator are not good approximations because of the small number of degrees freedom ${\left(N}_{\mathit{eff}}=66\right)$ of the time series and hence cannot be used to test the null ${H}_{0}.$ This could be alleviated by using, e.g. a very long run of a climate model simulation. We use a Monte-Carlo strategy by computing the statistics of bispectrum by generating 10,000 surrogates of the NGAR(5) model forced by a noise prescribed by its first three moments. In order to easily obtain noise realizations, we consider noises produced by polynomial independent standard Gaussian noises, by relating the coefficients of monomial expectations to the imposed noise moments. For instance, the first trial noise: ${\sigma }_{w}w\left(t\right)=a{w}_{1}\left(t\right)+b\left[{w}_{1}^{2}\left(t\right)-1\right]$ where ${w}_{1}$ is a standard Gaussian white noise, leads to ${\sigma }_{w}^{2}={a}^{2}+2{b}^{2}=0.275$ and ${\sigma }_{w}^{3}E\left({w}^{3}\right)=8{b}^{3}+6{a}^{2}b=0.129,$ yielding $a=0.5122,$ and $b=0.0794,$ and its excess kurtosis is 1.073.

Here, we choose: ${\sigma }_{w}w\left(t\right)=a{w}_{1}\left(t\right)+b\left[{w}_{2}^{2}\left(t\right)-1\right]$ where ${w}_{1},{w}_{2}$ are independent standard Gaussians with ${\sigma }_{w}^{2}={a}^{2}+2{b}^{2}=0.275$ and ${\sigma }_{w}^{3}E\left({w}^{3}\right)=8{b}^{3}=0.129,$ yielding $a=0.3840,$ and $b=0.2525$ and a excess kurtosis of 2.475. The results are quite robust to changes in the noise model. Other possible, though less practical, noises are generated by maximum entropy constrained by the four first moments (Pires et al., 2010). We then compute the Monte-Carlo ensemble average $E\left[{\stackrel{̂}{S}}_{3,\stackrel{˜}{x}}\right]$ and variance $\mathit{var}\left[{\stackrel{̂}{S}}_{3,\stackrel{˜}{x}}\right],$ of the real and imaginary parts of the smoothed bispectrum. Noise high-order moments (greater than 2) appear only to influence the high-order moments of the smoothed spectrum (e.g. skewness and kurtosis) which are not relevant for the linearity test devised here.

The deviation of El Niño smoothed bispectrum ${\stackrel{̂}{S}}_{3,x}\left({f}_{1},{f}_{2}\right)$ (Fig. 7a,b) from that of NGAR(5) model is assessed by the test statistic (standardized deviation) (Eq. (A6)): Under ${H}_{0},$ its real and imaginary parts are approximately standard Gaussian. We also limited the tests to frequencies with higher bispectrum amplitude, roughly corresponding to (as in Fig. 7a,c). Figure 7 shows the real (Fig. 7d) and imaginary (Fig. 7e) parts of ${T}_{x,\stackrel{˜}{x}}\left({f}_{1},{f}_{2}\right)$ where significant regions (at $\alpha =20\mathrm{%}$ significance level). are color-shaded. Fig. 7d shows that most peaks of $Re\left({\stackrel{̂}{S}}_{3,x}\right)$ (Fig. 7a) are significant. In particular, the low-frequency region (SSS type), producing La Niña events for $|{f}_{1}|,|{f}_{2}|,|{f}_{3}|<0.06$ cpt, is significant (i.e. rejecting ${H}_{0}$). The other positive and negative bispectrum extremes discussed in Section 5.3.2 are also significant at $\alpha =10$%. The imaginary part of the test (Fig. 7e) and the squared amplitude (Fig. 7f) are also highly significant in most of the relevant regions of the bispectrum with significance levels reaching 5%. The most significant region of nonlinear predictability holds approximately for within cpt where bispectrum is significant at $\alpha =5\mathrm{%},$ producing oscillations with periods of the order 5–6 trimesters. This suggests a possible source of the high nonlinear predictability for lags 3–6 trm, as diagnosed by Cox test (Fig. 5c), and for the nonlinear curtailing of El Niño Spring barrier (see Fig. 5a).

The local test computed on a frequency basis may lead, ambiguously, either to the rejection or to the acceptance of linearity, depending on ${f}_{1},{f}_{2}$ (see Fig. 7d–e). This is suggested by the fact that finite N-sized samples generated by the non-Gaussian NGAR(5) model led to local frequency tests ${T}_{\stackrel{˜}{x},\stackrel{˜}{x}}\left({f}_{1},{f}_{2}\right)$ where linearity is falsely rejected (not shown). Therefore, in order to overcome this difficulty and enhance test robustness, we propose the integrated test of nonlinearity (A7) given by over a representative lattice $L$ (Fig. A3). We found that nonlinearity cannot be rejected (at 5% level), thanks to the highly significant regions of El Niño bispectrum (Fig. 7d–e and 7a–c).

5.3.6.

#### Normalized bispectrum and bicoherency

An independent test of linearity, beyond that of previous section and diagnostic of phase synchronization comes from the normalized bispectrum or bicoherence spectrum (Kim and Powers, 1979; Nikias and Raghuveer, 1987; Hinich and Wolinsky, 2005; Rao et al., 2012). It is obtained by prewhitening $x\left(t\right)$ to yield a non-Gaussian white noise $y\left(t\right),$ and reconstructed by inverse FT of ${X}_{y}\left(f\right)\equiv \frac{{X}_{x}\left(f\right)}{{\left[{\mathrm{\Gamma }}_{x}\left(f\right)\right]}^{\frac{1}{2}}}={\sqrt{N}e}^{i\left[{\mathrm{\Theta }}_{x}\left(f\right)\right]}.$ The test is then:

((27))

We stress that the phases of ${X}_{y}\left(f\right)$ and are the same (i.e. ${\mathrm{\Theta }}_{x}\left(f\right)$), leading to a nonvanishing correlation

A linear process i.e. $x\left(t\right)={\sum }_{k}^{}\alpha \left(k\right)w\left(t-k\right)$ where $w$ is a white noise yields By using the result of Section 5.1.2, the normalized bispectrum (27) becomes ${\mathrm{\Gamma }}_{3,y}\left({f}_{1},{f}_{2}\right)=sk\left(w\right){e}^{i\left[{\mathrm{\Theta }}_{\alpha }\left({f}_{1}\right)+{\mathrm{\Theta }}_{\alpha }\left({f}_{2}\right)-{\mathrm{\Theta }}_{\alpha }\left({f}_{3}\right)\right]}$ where ${\mathrm{\Theta }}_{\alpha }\left(·\right)$ is the phase of the DFT of sequence $\alpha \left(k\right).$ Therefore, the amplitude of is uniform, i.e. which is precisely the Hinich (1982) null hypothesis of linearity.

In the case of El Niño, we get an estimated prewhitened non-Gaussian noise $y\left(t\right)$ by using the more reliable theoretical maximum entropy NGAR(5) spectrum for the normalization in (27), instead of the empirical smoothed spectrum. The lag correlation of the resulting noise $y\left(t\right)$ is very close to zero and thus can be considered a white noise. Its skewness is $0.282$ and the excess kurtosis is $0.531.$ The correlation with the signal $x\left(t\right)$ is quite high: 0.75, coming mainly from extreme events.

The smoothed normalized bispectrum, using the same window length ${M=M}_{2}=30,$ is hereby denoted ${\stackrel{̂}{S}}_{3,y}.$ Significant peaks (at 20% significance level) of ${\stackrel{̂}{S}}_{3,y},$ both of the real (Fig. 9a) and imaginary (Fig. 9b) parts are located nearly at the same bifrequencies as the non-normalized bispectrum ${\stackrel{̂}{S}}_{3,x}$ (Fig. 7a,b) though peaks are attenuated by normalization and a new peak appears at higher frequencies ${f}_{1}={f}_{2}=0.18$ (2.5 years), ${\left(f}_{3}=0.36\right)$ cpts (2.8 years). The squared bicoherency (Fig. 9c) is clearly nonuniform, showing regions of nonacceptance of the null bispectrum of NGAR(5) and hence rejecting the linearity hypothesis.

Fig. 9.

Real (a) and Imaginary (b) parts of the normalized smoothed bispectrum of El Niño index, (M2=30) and its squared amplitude (c). Regions statistically significant at 20% level are colour-shaded.

6.

## Stochastic modelling

6.1.

### The method

6.1.1.

#### The motivation

ENSO has been extensively studied to look for a deeper understanding of the underlying physics and complexity (see recent reviews of Chunzai, 2018; Timmermann et al., 2018 and references therein), improve predictability (see the review of Tang et al., 2018), as well as to get accurate statistics (e.g. pdf, extremes etc.). This was done through different dynamical models (physical-based deterministic models), statistical models (e.g. linear inverse modelling by Penland (1996) and Privalsky and Muzylev, 2013) and models based on machine learning (Dijkstra et al., 2019).

However, even complex models may exhibit biases of various statistics. The present top-down approach of fitting simple stochastic models to observations, attempts learn signal-noise relationships from models, enabling parametrizations of the nonlinear and complex effects of nonobserved variables and hence reproducing a set of relevant stochastic properties (e.g. spectrum, bispectrum).

A number of stochastic univariate models of ENSO have been fitted such as: smooth transition autoregressive (STAR) models (Hall et al., 2001; Ubilava and Helmers, 2013), autoregressive conditional heteroscedasticity type models (ARCH) (Ahn and Kim, 2005), and threshold AR models (De Gooijer, 2017). This section aims at fitting a minimal univariate stochastic model for El Niño 3.4. index driven by a multiplicative Gaussian delayed noise, able to reproduce the observed empirical spectrum, skewness and bispectrum as well as assess the impact it has on predictability, compared to benchmark linear models.

6.1.2.

#### The model formulation

The models fitted here, belong to a class in which the simulated scalar state at integer $t,$ is driven by a noise where $w\left(t\right)$ is a standard Gaussian white noise and ${\sigma }_{w}$ is a positive constant. The simulated state $\stackrel{˜}{x}\left(t\right)$ depends (through a function F) on: a) the simulated previous state values at $t<0,$ represented in delay coordinate (Takens et al., 1981) by $\stackrel{˜}{\mathbit{x}}\left(t-1\right)\equiv {\left[\dots \stackrel{˜}{x}\left(t-2\right),\stackrel{˜}{x}\left(t-1\right)\right]}^{T};$ b) the previous noise values at $t<0:$$\mathbit{u}\left(t-1\right)\equiv {\left[\dots u\left(t-2\right),u\left(t-1\right)\right]}^{T}$ and c) a parameter vector $\mathbit{\theta }.$ Throughout this section bold letters refer to vectors and italic to scalars. The model writes thus as:

((28))

The model (28) will also be used in forecast mode, and the $\tau$-lag forecast valid at time at $t,$$x\left(t,\tau \right),$ is given by:

((29))
where $\mathbit{x}\left(t-1,\tau -1\right)\equiv {\left[\dots ,x\left(t-2,\tau -2\right),x\left(t-1,\tau -1\right)\right]}^{T},$ with $x\left(t\mathrm{\prime },\tau \right)=x\left(t\mathrm{\prime }\right)$ for $\tau \le 0.$ The noise vector in (29) becomes $\mathbit{ϵ}\left(t-1,\tau \right)\equiv {\left[\dots ,\epsilon \left(t-2,\tau -1\right),\epsilon \left(t-1,\tau \right)\right]}^{T}$ with $\epsilon \left({t}^{\mathrm{\prime }},\tau \mathrm{\prime }\right)\equiv x\left({t}^{\mathrm{\prime }},\tau \mathrm{\prime }\mathrm{\prime }-1\right)-x\left({t}^{\mathrm{\prime }},\tau \mathrm{\prime }\mathrm{\prime }\right),$ where For instance the one-step forecast is where and $\mathbit{ϵ}\left(t-1,1\right)=\mathbit{ϵ}\left(t-1\right)={\left[\dots ,\epsilon \left(t-2\right),\epsilon \left(t-1\right)\right]}^{T}$ with $\epsilon \left({t}^{\mathrm{\prime }}\right)\equiv x\left({t}^{\mathrm{\prime }}\right)-x\left({t}^{\mathrm{\prime }},1\right)$ is the one-step error forecasts, being thus consistent with (28).

The models of form (28) include AR linear models, fitted in Section 3.3.2. However, in order to reproduce nonlinear and non-Gaussian ENSO behaviour, we have considered bilinear models (Haggan and Oyetunji, 1980; Rao and Gabr, 1984; Rao 1981), which differ from AR processes by the addition to them of lagged bilinear (BL) terms, denoted ARBL(p1, p2):

((30))
characterized by its correlated-additive–multiplicative (CAM) noise (Usoro, 2015), where are lags, with $\mathbit{\theta }\equiv \left({a}_{1},\dots ,{a}_{{p}_{1}},{b}_{1},\dots ,{b}_{{p}_{2}},\alpha \right).$ These models can produce non-Gaussian statistics and nonvanishing bicovariances and bispectra (Rao and Gabr, 1984). Note that the restricted case of lags (Monahan, 2020) are excluded from model of Eq. (30), to allow inverting $u\left(t\right)$ from past values for forecasting. Note also that the case ${r}_{k}>{s}_{k}$ leads to sub-diagonal bilinear models whereas the case ${r}_{k}\le {s}_{k}$ corresponds to diagonal/super-diagonal bilinear models. In the former case, for example, the intervening noise is independent of the most recent state whereas the latter case corresponds to nonlinear feedbacks, having in general nonvanishing time averages due to correlation between states and past noises. We stress that models (30) have a nonlinear Volterra development in terms of lagged noises, which in a certain way parametrizes nonlinearities which are not present in a deterministic form in function $F.$ However, other type of models could be fitted, e.g. adding quadratic terms in the deterministic forcing but which can lead to instabilities in simulations. Another difficulty with nonlinear deterministic part, is the very wide class of nonlinearity: which nonlinearity: if polynomial-what degree? However, we investigate it in another study.

6.1.3.

#### Model fitting

In order to optimize predictability and reproduce data statistics, we apply a hybrid fitting algorithm that minimizes a cost function which is the weighted sum of the normalized one-step forecast residuals and the normalized squared distance between a set of observed and simulated statistics (average, autocovariance and bi-covariance) ${J}_{\mathit{stat}}:$

((31))
${J}_{\mathit{hyb}}\left(\mathbit{\theta }\right)={c}_{\mathit{fitt}}{J}_{\mathit{fitt}}\left(\mathbit{\theta }\right)+{c}_{\mathit{stat}}{J}_{\mathit{stat}}\left(\mathbit{\theta },{\sigma }_{w}\left(\mathbit{\theta }\right)\right)$
where ${c}_{\mathit{fitt}}$ and ${c}_{\mathit{stat}}$ are positive weights and ${\sigma }_{w}\left(\mathbit{\theta }\right)={\sigma }_{x}{\left({J}_{\mathit{fitt}}\left(\mathbit{\theta }\right)\right)}^{1/2}$ is the RMS of the forecast residuals for a model using parameters $\mathbit{\theta }$ with ${\sigma }_{x}=1$ (observations are standardized).

The term is given by the one-step forecast error:

((32))
where ${N}_{\gamma }\equiv {\mathit{max}}_{k,k\mathrm{\prime }}\left({l}_{k},{r}_{k\mathrm{\prime }},{s}_{k\mathrm{\prime }}\right).$

Iterative minimization algorithms of ${J}_{\mathit{fitt}}\left(\mathbit{\theta }\right)$ for general bilinear models are discussed in Pham and Tran (1981), (Rao and Gabr 1984), Grahn 1995, Guegan and Pham (1989), Gabr 1998 and Falguerolles and Francis (1992). Traditionally, the method of moments is used to obtain implicit relationships between the parameters and lagged moments (e.g. Sesay and Rao, 1988; Tang and Mohler, 1988; Kim et al., 1990), where, in most cases parameters are difficult to be expressed as a function of moments. Here we apply a method where statistics are estimated from a long simulation of (Eq. (28)) with initial conditions: and $w\left(t\right)\sim N\left(0,1\right);t={-N}_{\mathit{past}}+{N}_{\gamma }+1,$…, Statistics are computed for $t=1,\dots ,{N}_{\mathit{sim}}$ with ${N}_{\mathit{sim}}=30,000$ and the initial ${N}_{\mathit{past}}=1000$ values are discarded from statistics as spin-up.

The term ${J}_{\mathit{stat}},$ involving the mean, autocovariance and bicovariance from observations, (${S}_{\mathit{obs}},{C}_{\mathit{obs}}\left(·\right),$ and ${B}_{\mathit{obs}}\left(·,·\right)$), and simulations (${S}_{\mathit{sim}},$ and ${B}_{\mathit{sim}}\left(·,·\right)\right),$ is

((33))
${J}_{\mathit{stat}}\left(\mathbit{\theta },{\sigma }_{w}\right)={J}_{S}\left(\mathbit{\theta },{\sigma }_{w}\right)+{J}_{C}\left(\mathbit{\theta },{\sigma }_{w}\right)+{J}_{B}\left(\mathbit{\theta },{\sigma }_{w}\right),$
where
((34))
((35))
and
((36))
with ${\tau }_{\mathit{max}}=16.$

The used window lag functions $\lambda \left(\tau \right)$ and $\lambda \left({\tau }_{1},{\tau }_{2}\right)$ are scaled by ${M={M}_{2}=\tau }_{\mathit{max}}.$ The Parseval Theorem applied to (35–36), shows that minimizing ${J}_{C}\left(\mathbit{\theta },{\sigma }_{w}\right),$${J}_{B}\left(\mathbit{\theta },{\sigma }_{w}\right)$ leads also to minimizing errors in the spectral domain.

In the analysis we compare two situations: ${c}_{\mathit{fitt}}=1,{c}_{\mathit{stat}}=0$ (simple fitting) and the hybrid fitting where /${\Delta J}_{\mathit{stat}},$ given by the ratio of typical variations of ${J}_{\mathit{fitt}}$ and ${J}_{\mathit{stat}}$ (hybrid fitting). The optimal parameters issued from simple and hybrid fittings are hereby denoted ${\mathbit{\theta }}_{\mathit{fitt}}$ and ${\mathbit{\theta }}_{\mathit{hyb}},$ respectively. Note that the term ${c}_{\mathit{stat}}{J}_{\mathit{stat}}$ reduces overfitting and the domination of one term over the other, see Appendix B for the description of the minimization of ${J}_{\mathit{hyb}}\left(\mathbit{\theta }\right).$ We compare both fittings in terms of ${J}_{\mathit{fitt}}$ and ${J}_{\mathit{stat}}$ for the AR(p1) model and several ARBL(p1, p2) models. The statistical significance is given through the normalized Akaike Information Criterion: $\mathit{NAIC}=\mathrm{log}\left({J}_{\mathit{fitt}}\right)-2dim\mathbit{\theta }/\left(N-{N}_{\gamma }\right)$ that penalizes the number of model terms.

6.2.

### Results for El Niño index

6.2.1.

#### Fitting statistics

We analyse and evaluate a sequence of models (Eq. (30)), for ${p}_{1}=5,$${l}_{k}=k;k=1,\dots ,{p}_{1},$ with various ${p}_{2}$ values using lags ${r}_{k}\le 6,{s}_{k}\le 6$ (Table 5) with every new lag pair producing the largest ${J}_{\mathit{fitt}}$ decrease.

Table 5 shows that the mean residual squares decreases with increasing model complexity. For hybrid, NAIC decreases with increasing complexity (i.e., no overfitting). Moreover, the hybrid fitting is able to get improved statistics compared to that from the simple fitting ${\left(J}_{\mathit{stat}}\left({\mathbit{\theta }}_{\mathit{hyb}}\right)<{J}_{\mathit{stat}}\left({\mathbit{\theta }}_{\mathit{fitt}}\right)\right),$ with reductions up to about one fifth for the ARBL(5,5) model. That reduction entails very tiny increments in the sum of residuals i.e. ${J}_{\mathit{fitt}}\left({\mathbit{\theta }}_{\mathit{hyb}}\right)>{J}_{\mathit{fitt}}\left({\mathbit{\theta }}_{\mathit{fitt}}\right),$ of the order of 1%, but still keeping significant NAIC values.

The estimated of the ARBL(5,5) model are shown in Table 6 and

6.2.2.

#### Simulated autocovariance and spectrum

We note that the autoregressive part of the ARBL(5,5) model is quite similar to that of AR(5) model (Eq. (6)). The bilinear coefficients have smaller amplitude than the autoregressive coefficients. ARBL(5,5) contains two feedback terms corresponding to whose nonzero mean values come from white noise squares and leading to a nonvanishing constant $\alpha .$ This parametrizes partially the bicovariance at the interseasonal scales which is at the origin of the nonlinear reduction of the Spring Predictability Barrier (see Section 4.2.2). The model recovers quite well the empirical autocovariance function (not shown), and also the smoothed spectrum (Fig. 10) using the window lag $M=30,$ with particularly similar peak in the band $f\sim 0.05-0.07$ cpt.

Fig. 10.

Smoothed empirical (black) and ARBL(5,5) simulated (red) spectra using a window lag M = 30.

6.2.3.

#### Simulated bicovariance and bispectrum

Most (∼80%) uncertainty in ${J}_{\mathit{hyb}}\left({\mathbit{\theta }}_{\mathit{hyb}}\right)$ comes from the bicovariance uncertainty through ${J}_{B}\left(\mathbit{\theta },{\sigma }_{w}\right).$ The simulated bicovariance (Fig. 11) reproduces very well the same pattern of the bicovariance inferred from observations (Fig. 4b), at least up to lags 12 trimesters. The simulation yields a skewness equal to 0.26, which is comparable with the observed skewness (0.46). Any simpler model with only one bilinear term is unable to yield positive skewness and negative bicovariances which are fundamental to get skillful El Niño predictions from Spring to the next Winter.

Fig. 11.

Bicorrelation of the ARBL(5,5) model, approaching the empirical bicorrelation of El Niño index (to be compared with Fig. 4b).

The smoothed bispectrum of the ARBL(5,5) simulation, using a window lag ${M}_{2}=30,$ is shown in Fig. 12a (real part) and Fig. 12b (imaginary part), to be compared with the empirical one (Fig. 7a,b). The real part (Fig. 7a) is quite well reproduced, with negative values in the low-frequency band $|{f}_{1}+{f}_{2}|<0.06,$ and positive local maxima near and (Fig. 12a). The decomposition of skewness (Eq. (26)), (with ${f}_{\mathit{cut}}=0.06$) yields the values −0.036, 0.062, 0.154 and 0.082 respectively for the SSS, SSF, SFF and SSS contributions, which agree approximately with observations (Table 3). The imaginary part of the simulated bispectrum (Fig. 12b) is also roughly well reproduced, showing negative values within the region ${f}_{1}+{f}_{2}<0.06$ (Fig. 7b) and positive elsewhere. The maximum is near ${f}_{1}={f}_{2}\sim 0.07$ cpts. Note that the existence of a single peak is related to the simulated single-peak spectrum.

Fig. 12.

Real (a) and Imaginary (b) parts of smoothed bispectrum of the ARBL(5,5) using a window lag M2=30.

We shall remark that the bilinear models chosen here, have a linear deterministic part with nonlinearity coming indirectly through the CAM noise. However, other type of models could be fitted, e.g. adding quadratic terms in the deterministic forcing but which can lead to other difficulties like instabilities and chaotic behaviour in simulations.

6.2.4.

#### Predictability

In order to assess the predictability impact of the inclusion of bilinear terms in ARBL models, compared to the AR(5) model, we compute the correlation skill ${\mathit{cor}}_{\tau },$ (Table 5) between observations and predictions for lags for $\tau =1,2,3$ trimesters. Models are optimized by hybrid fitting in the training period 1870–1969 (100 years) and predictions are evaluated in the validation period 1970–2018 (49 years), where the most intense La Niñas and El Niños have been observed (see Section 5.3.4). All the tested models’ predictions are skillful (${\mathit{cor}}_{\tau }$>0.5) for lags up to two trimesters.

Table 5 shows that, in general correlation skills increase with increasing complexity. For instance, the correlation skill of model ARBL(5,5) is about 2%, 7% and 9% larger than AR(5), respectively for lags of 1, 2 and 3 trimesters. The presence of at least two bilinear terms suggests the reason behind the improvement of the 2-trimester forecasts by ARBL(5,5), with respect to AR(5), for which some extreme El Niños (e.g. 1973, 1983, 1998, 2016) are more accurately predicted. The ARBL(5,5) model has no explicit deterministic nonlinearities, which are common to physically-based models. Here, nonlinearities are parametrized through the bilinear terms. Despite the simplicity of the ARBL(5,5) model, its correlation skills are not dramatically smaller, in the same period than those of physically-based models and to those of a much more complex neural-network based model as observed in Fig. 2a of (Ham et al., 2019).

7.

## Discussion and conclusion

El Niño Southern Oscillation (ENSO) is one of the most important coupled atmosphere–ocean system, exhibiting time scales ranging from seasons to decades and beyond, with a particularly worldwide teleconnection. Using different stochastic and/or dynamic approaches, most studies have emphasized and shown its intrinsic complexity, nonlinearity and non-Gaussianity. Most of those studies limited their investigations to the second order statistics in addition to skewness and/or kurtosis.

Here, we follow the same line of research by performing a data-driven systematic study of the third-order stochastic moments, both in the time and spectral domains, applied to the standardized trimonthly-average El Niño 3.4 index with a trimester sampling. Within the time domain, this comprises the bicovariance ${\gamma }_{x}\left({\tau }_{1},{\tau }_{2}\right)=E\left[x\left(t\right)x\left(t+{\tau }_{1}\right)x\left(t+{\tau }_{2}\right)\right],$ in addition to nonlinear correlations for testing nonlinearity and nonlinear predictability for forecasting horizons from seasons up to a few years. The study uses a 149-year period (1870–2018) time series with its statistically significant skewness of 0.46, peaking mostly at the boreal winter (1.04). The analysis of bicovariance maxima reveals high negative nonlinear correlation, implying that El Niño or La Niña extremes are likely to be followed by La Niña one year later, whereas mild conditions, on the other hand, favour El Niño occurrence.

The analysis of the nonlinear predictability, on seasonal time scales, shows that such nonlinear correlation is enhanced further when forecasts are issued at the NH Spring season (AMJ). This is linked to the persistence of many La Niñas starting in Spring up to next Winter and to the fact that strong Winter El Niños have only occurred under close climatological conditions in the previous boreal Spring. This strongly suggests that nonlinearity in the inter-seasonal timescale can contribute significantly to reduce the so-called El Niño Spring predictability barrier. Another equally important aspect is the fact that ENSO nonlinearity allows for the extension of predictability skill even for forecasting time of a few years, particularly when these forecasting time intervals satisfy phase synchronization and quadratic phase locking with certain dominant Fourier frequencies.

Similarly, within the spectral domain the bispectrum and bicoherence have been computed. As with power spectrum and variance, the bispectrum provides, in particular, the contribution of each bi-frequency bin to the observed skewness and squared bicovariance. This warrants the detection of combinations of El Niño Fourier components that mostly contribute to ENSO extremes by phase synchronization. The bispectrum also permits a test of nonlinearity in the spectral domain. The bispectrum has been estimated by a smoothed estimator using a window lag of 30 trimesters = 7.5 years, obtained from a trade-off between bispectrum bias, variance, and spectral resolution. To estimate the statistical significance of peaks, a conservative test has been adopted. The bispectrum of a non-Gaussian autoregressive null-hypothesis model NGAR(5) is tested and rejected at 5% significance level. We first obtained the coarse-grained spectral partition of the skewness by splitting the full signal into a slow component $s\left(t\right)$ (with periods larger than 3 years) and a fast component $f\left(t\right).$ The skewness is then decomposed into 4 components, namely $\mathit{SSS}=-0.066,$$\mathit{SSF}=0.263,$ and $\mathit{FFF}=0.071,$ implying, in particular, that most El Niños result from interaction between inter- and intra-triennial timescales. Some decadal tendency towards SFF-type El Niños is apparent from the time series, which is consistent with the observed ENSO decadal variability. Note that if a maximum of the bispectrum real part is observed at $\left({f}_{1},{f}_{2}\right)$ then a peak in the power spectrum is observed approximately at ${f}_{3}={f}_{1}+{f}_{2}.$ In particular, the leading bispectral maxima contributing to El Niño occurs at cpt (periods of 5 and 2.5 years inside the SSF region) and near the band crossing the SFF and FFF regions, with a maximum at (periods of 2.9 and 1.5 years). On the other hand, minima contributing mostly to La Niña extremes, lie near (periods of 14 and 7 years) and (periods of 5, 7 and 4 years), both inside the SSS region.

Lastly, a minimal stochastic model was constructed, which was able to reproduce the main features of the spectrum and bispectrum, and yielded robust improvement of the predictability skill, compared to an autoregressive AR(5) model. The model was selected from a large class of bilinear models with correlated-additive–multiplicative lagged noise. To gain predictability skill with the right stochastic properties, a hybrid fitting approach is used by minimizing a combination of forecasting squared residuals and squared deviations from empirical third-order statistics. The bilinear model yields forecast improvements, particularly at lags of 1, 2 and 3 trimesters with 2%, 7% and 9% of correlation skill increment respectively, suggested to be linked to the attenuation of El Niño predictability Spring barrier and to the more accurate prediction of super El Niños.

This study contributes to the understanding of ENSO predictability and modelling from the perspective of the bispectrum and phase synchronization. In a changing climate, this is especially relevant for the study and predictability of ENSO extremes resulting from resonant-type interaction. The study also provides the possibility to investigate other ENSO indices such as El-Nino Modoki or other Nino indices, and check whether other processes are at play. In particular, forecast skill of ENSO, based on the developed models are of great importance for seasonal (and longer) timescales forecasting. A systematic analysis of these topics is beyond the scope of this manuscript and is left for future research.