Growing set of optimization and regression techniques, based upon sparse representations of signals, to build models from data sets has received widespread attention recently with the advent of compressed sensing. This paper deals with the parameterization of the Lorenz-96 (

Global weather and climate models are the best available tools for projecting likely climate changes. However, current climate model projections are still considerably uncertain, due to uncertainties and errors in mathematical models, “curse of dimensionality”, model initialization, as well as inadequate observations and computational errors. Multi-layer models are adequate to capture the dynamics of large-scale atmospheric phenomena. Solving for small-scale processes with help of large eddy simulation (LES) models, on a high-resolution grid embedded (with grid spacing of a few tens of meters and can explicitly simulate the clouds and turbulence represented by parameterization schemes) within the low-resolution grid is very useful, but it’s applicability remains limited to its high computational cost. In this case, learning methods are used to develop parameterization schemes that utilize the explicitly simulated clouds and turbulence. In practice, low-resolution models resolve the large-scale processes while the effects of the small-scale processes are often parameterized in terms of the large-scale variables.

In

The objective of this paper is to develop a data-driven method to approximate the effects of small-scale processes with simplified functions called parameterizations which are characterized by compressed sensing (CS). Such parameterization schemes are inherently approximate, and the development of schemes which produce realistic simulations is a central challenge of model development. Most parameterization schemes depend critically on various parameters whose values cannot be determined a priori but must instead be estimated through data. Recent developments in optimization and sparse representations in high-dimensional spaces have provided powerful new techniques.

This paper deals with the parameterization of the Lorenz-96 model with two time-scale simplified ordinary differential equations describing advection, damping and forcing. We assume that the model that expresses the resolved atmospheric variables of the known physics or the resolved modes are well understood. Furthermore, Lorenz in introducing this multiscale model showed that the small-scale features of the atmosphere act like forcing. We present a CS and sparse recovery approach for constructing approximations to the Lorenz-96 model system that blends physics-informed candidate functions with functions that enforce sparse structure. The ultimate aim is to adapt the mathematical methods developed here for the realistic models that can be solved computationally to examine the turbulent atmospheric circulations. In the past few years, data-driven parameterization of the Lorenz-96 model using machine learning and deep learning techniques (

The aim of any parameterization is to develop enhanced models that capture the essence of dynamics and produce reliable predictions. Besides deterministic parameterization, stochastic parameterization was applied to the Lorenz-96 model with several kinds of noise and provided promising results (

The paper is organized as follows. In Section 2, we will introduce the 40-dimensional Lorenz-96 model, that was originally suggested by (

Let us now turn to a simple atmospheric model, which nonetheless exhibits many of the difficulties arising in realistic models, to gain insight into predictability and data assimilation. The Lorenz-96 model was originally introduced in (

Equation (1) describes the dynamics of some atmospheric quantity _{k}^{th}

_{k}_{1,1}, …, _{J}_{,1} (shown in the outer ring with _{k}

A recreation of a similar image by Wilks (

Each _{k}_{k}_{+1}, _{k}_{–1}, and _{k}_{–2} by (1). _{k}_{j,k}_{k}_{+}_{K}_{k}, Z_{j,k}_{+}_{K}_{j,k}, Z_{j}_{+}_{J,k}_{j,k}_{+1} so, for example, in _{9} = _{1}, _{1,9} = _{1,1}, and _{8,1} = _{1,2}. Even though this model is not a simplification of any atmospheric system, it is designed to satisfy three basic properties: contains a quadratic discrete advection-like term that conserves the total energy (i.e. it does not contribute to _{k}

The original _{j,k}_{k}_{j,k}

We consider 40-dimensional slow variable components and 400-dimensional fast variable components. With

According to (_{0} is known, all future states at time _{t}, t_{0} can lead to very large differences in the values of _{t}

Although individual orbits of chaotic dynamical systems are by definition unpredictable, the average behavior of typical trajectories can often be given a precise statistical description. Anosov (

Many statistical properties, such as the Central Limit Theorem and the Law of Iterated Logarithm, displayed by independent identically distributed stochastic processes, have been shown to hold for stochastic processes generated by chaotic dynamical systems. One of the better known techniques for establishing such results was proved by Young (

Among the features listed above, sensitivity to initial conditions is widely known to be associated with chaotic dynamical systems, and a quantitative measure of this sensitivity is given by the

Considering the phase space, which is assumed to be ℝ^{d}

where

When the chaotic system is dissipative, solutions will eventually reside on an attractor. The dimension of this attractor, i.e., the number of “active” degrees of freedom, is also known as the Kaplan-Yorke dimension _{KY}

where

The Kolmogorov-Sinai entropy _{KS}_{KS}

Complexity of the dynamics of the Lorenz-96 model varies considerably with different values of the constant forcing term

In our numerical simulations, we begin from random initial conditions and use a fourth-order Runge-Kutta time integration with a time step of 0.01. Typically, we integrate forward for 500 time units before starting the calculation of Lyapunov exponents to ensure that all transients have decayed. At this point, we begin integrating the tangent space equations using the Gram-Schmidt procedure update every

Summary of Lyapunov Exponents.

Largest Lyapunov exponent _{1} |
2.3098 |

Error doubling time | 0.3 time units |

Number of strictly positive Lyapunov exponents | 14 |

Number of neutral, |
1 |

Number of strictly negative Lyapunov exponents | 26 |

Kaplan-Yorke dimension | 29.4694 |

Kolmogorov-entropy | 14.8409 |

Time evolution of Lyapunov exponents for the Lorenz-96 system.

Typically, the PDFs are computed using bin-counting also known as histogram methods. For estimating the PDF of a continuous variable _{i}_{i}

More smoother estimates for PDFs can be obtained by using the Gaussian product kernel function which results in the estimation of the PDF known as the

where _{1}, _{8}, _{19} and _{31} simulated as described above in Section 2. The convergence of the marginal PDFs shown in

PDFs of the slow variables _{1}, _{8}, _{19} and _{31}.

The strongly mixing character of the slow dynamics can be inferred from the correlation function, which means that the correlation function decays quickly to zero or near zero values. The discrete version of the auto-correlation functions (ACFs) for the slow variables is given by

where

and these formulae can be given similarly for the fast variables. _{1} which decays very quickly and then oscillates around zero illustrating the strongly mixing character of the dynamics.

ACF of the slow variable _{1}.

The objective of this section is to compare model-based and data-driven tools for dimensional reduction and reducing uncertainties in prediction. It is impossible to effectively “learn” from high-dimensional data unless there is some kind of implicit or explicit low-dimensional structure – this can be made mathematically precise in different ways. The methods presented below provide the low-dimensional structure either because of the presence of multi-scales in the model or due to the measurement matrix chosen to satisfy with high probability the so called “Restricted Isometry Property (RIP)” condition introduced in compressed sensing (

Compressed sensing theory addresses the problem of recovering high-dimensional but sparse vectors from a limited number of measurements – the number of observed measurements _{1}-minimization.

This parameterization was proposed by Wilks (

where _{k}

And _{k}

where

Over the past two decades, researchers have focused on sparsity as one type of low-dimensional structure. Given the recent advances in both compressed sensing (

Compressed sensing (CS) is a technique for sampling and reconstructing signals as sparse signals, that is, the reconstructed signals have few non-zero components. More precisely,

In (^{m}^{n}

where Θ (=: Θ(X)) ∈ ℝ^{m}×^{n} with

In general, this problem cannot be solved uniquely. However, if

where ‖S‖_{0} denotes the _{0} norm for _{0} minimization method can exactly reconstruct the original signal in the absence of noise using a properly chosen sensing matrix Θ whenever _{0} minimization problem (15) is non-convex and combinatorial, which is NP-hard.

Hence, instead of problem (15) we consider its _{1} convex relaxation which may be stated as (

where ‖·‖_{1} represents the _{1} norm, which is a convex function and problem (16) is a convex optimization problem which can accurately approximate the signal _{1} minimization in (16) is closely related to the Lasso problem (

where

Even though there are several objectives in compressed sensing, we are mainly interested in estimating a good sensing matrix and efficiently reconstructing the

We now enforce this sparse structure within learning. One of the related problems in the context of sparsity is “dictionary learning”. We refer to the columns of the

In this paper, the effects of unresolved processes (2) on the resolved modes (1) in the Lorenz-96 model are denoted by ^{n}

where the _{1}(_{2}(_{n}^{T}

In particular, this method searches for a sparse approximation of the function mapping _{1}(_{2}(_{n}(^{T}^{n}

where Θ(_{1}(_{2}(_{p}^{1}×^{p}_{k}_{1}_{k} s_{2}_{k}_{pk}^{T}^{p}_{jk}

To learn the function mapping ^{m}^{n} is made of _{1}_{2}, …, _{m}^{m}^{n}

First, we construct an augmented dictionary matrix Θ(^{m}^{p}

The dictionary matrix Θ(

Sparse regression is performed to approximately solve

to determine the vector coefficients _{1} _{2} … _{n}^{p}^{n}_{i}

The goal of the sparse optimization to solve the regression problem (22) can be defined using the Lasso form (17) for

which uses _{1} norm constraint on the coefficient vector _{k}

In the Lasso method, given a collection of

for some Lagrange multiplier ^{–3}.

Having introduced the data-driven methods for model reduction in the presence of time-scale separation, let us now turn to a simple atmospheric model, which nonetheless exhibits many of the difficulties arising in realistic models, to gain insight into predictability and data assimilation. The dynamics of the Lorenz-96 equation are shown in (1) and (2). Let _{k}_{j,k}_{k}

The resolved modes in (1) is now modified as:

Our goal is to use the methods discussed in Section 3 to form parameterizations, in other words, express _{k}_{1}, …, _{K}_{train}_{test}

Using polynomial regression to solve stochastic parameterizations in the Lorenz-96 system was proposed by Wilks (_{k}_{k}

The polynomial obtained from this method is:

The goal of compressed sensing in this context is to find a sparse matrix _{1}, …, _{K}_{2} norm. Every column of Θ is a basis function of _{1}, …, _{k}_{2} norm. The goal of compressed sensing is to find unique equations for every _{k}_{k}

A computationally less expensive approach will be to find a solution of order 2. The 40-dimensional CS model is used for comparison of distributions, but we present below the single equation with averaged coefficients that will be used later for filtering:

This parameterization shows that every _{k}_{k}_{i}X_{j}_{k}

We then explore solutions of order 3. Given

This shows that every _{k}_{k}_{k}

Thus, our Θ matrix contains the basis functions _{k}

The biases are calculated by the compressed sensing algorithm as [2.173746, 1.393404, …, –2.973427] and their mean is –0.0243. The problem with this parameterization is that there is too much variation in the biases.

We investigate other ways to modify the biases. In the following model, the biases are strictly set to zero while keeping other coefficients constant. If we average the coefficients, we get:

In the next model, the biases are strictly set to the average of the biases while keeping other coefficients constant. If we average the coefficients, we get the same equation as (30). Both of these models perform equally well in predicting the trajectory of the system and significantly better than the raw 4-dimensional model. The model where biases are strictly set to the average of the biases performs better in the Kullback-Leibler divergence measure, but does not outperform linear regression.

A small variability was introduced in the biases with the goal of outperforming linear regression in terms of the Kullback-Leibler divergence measure. The biases were selected by sampling from ^{2}), where

We will use (32) as our compressed sensing model in the later sections.

The equations of the resolved variables associated with the vector field denoted by 𝒢_{k}^{th}_{k}_{k}

Section 2.1.2 shows that the largest Lyapynov exponent is 2.3. This shows that the error-doubling time is _{k}

where _{k}_{k}

Evaluation of each method.

METHOD | MSPE | AVERAGE K-L DIVERGENCE |
---|---|---|

Regression | 0.03606 | 0.06729 |

Compressed sensing (Raw) | 5.9993 | 1.9891 |

Compressed sensing [1] (Setting biases to zero) | 0.03398 | 0.10099 |

Compressed sensing [2] (Setting biases to average bias) | 0.03440 | 0.07351 |

Compressed sensing [3] (Adding noise to average bias) | 0.03387 | 0.05919 |

This section will compare the probability density functions (PDFs) of _{k}_{test}_{k}

The Kullback-Leibler divergence was introduced by Kullback and Leibler (

where _{k}_{k}

So far, we have investigated the deterministic part in stochastic parameterization. The goal of this subsection is to investigate the stochastic part in stochastic parameterization to further improve the regression and compressed sensing models explored so far. This is done by obtaining _{e}

The values for _{e}

where _{k}_{k}_{i}

_{k}_{i}

AR evaluation of residuals.

METHOD | _{e} |
MSPE | AVE. K-L DIVERGENCE | ||
---|---|---|---|---|---|

Regression | 0.9453 | 0.2265 | 0.6945 | 0.02624 | 0.07692 |

Compressed sensing | 0.9981 | 0.1710 | 2.775 | 0.02066 | 0.07142 |

Numerical simulations are done using the 4th order Runge-Kutta method. _{k}_{0}) is set to 0 for all

A comparison of

The goal of this section was to create a deterministic and AR model to improve the prediction of resolved variables. Based on the results shown, compressed sensing improves the prediction significantly compared to regression. The coefficients of the deterministic model as well as the values of _{e}

The second component of this paper deals with nonlinear filtering or data assimilation (DA). The main focus of filtering is to combine computational models with sensor data (observations) to predict the dynamics of large-scale evolving systems. Filtering provides a recursive algorithm for estimating a signal or state of a random dynamical system based on noisy measurements. The signal that is represented by a Markov process cannot be accessed or observed directly and is to be “filtered” from the trajectory of the observation process which is statistically related to the signal. Suppose we make a forecast about the behavior at a future time of a complex system with some uncertainties (randomness) and there is near-continuous data available from remote sensing instrumentation networks. Then, as new information becomes available through observations, it is natural to ask how to best incorporate this new information into the prediction. Data assimilation can be simply defined as the combination of observations with the model output to produce an optimal estimate of a dynamical model.

The standard notation for the discrete DA problems from the time _{k}_{k}_{+1} can be formulated as follows:

where _{k}

where

Data assimilation methods consist of _{k}_{k}^{t}_{k}

It is well documented that particle filters (

For the Kalman filter (

where

where _{k}

where

The idea behind EnKF (

where _{j}

where Γ and _{j}

In this section, we compare the filtering results of Wilks’ (polynomial) parameterization and compressed sensing parameterization. For each filtering implementation, 40 ensembles are used and observations are included in every 20 time steps. We observe that compressed sensing dynamics provides better prediction results compared to other parameterization methods. There are several software that can be used for filtering problems. Here, we benefit from the open source provided in (

Here, we analyze filtering results of Wilks’ parameterization with auto-regressive models. Instead of using random noise for the prediction, we use the auto-regressive modelling of noise as obtained in Section 4. While keeping the parametrized model same, we include modelled noise to the ensemble dynamics as the following:

where 𝒢_{k}

_{k}_{i}

True trajectories, observations and EnKF with Wilks’ parametrized model.

In addition to compressed sensing method in Section 4, we consider auto-regressive modelling of noise. While keeping the parametrized model same, we include modelled noise to the ensemble dynamics as the following:

where 𝒢_{k}

_{k}_{i}

True trajectories, observations and EnKF with compressed sensing model.

In

Depiction of Lorenz-96 true and EnKF prediction trajectories for 40 components (Auto-regressive Wilks’ parameterization).

Depiction of Lorenz-96 true and EnKF prediction trajectories for 40 components (Auto-regressive compressed sensing).

Parameterization methods and MSPE results including random noise and modelled noise with autoregression.

METHOD | MSPE |
---|---|

Wilks’ parameterization | 0.01006 |

Compressed sensing | 0.01005 |

Autoregressive Wilks’ parameterization | 0.00940 |

Autoregressive compressed sensing | 0.00940 |

The sheer number of calculations required in directly solving large-scale global weather and climate models becomes computationally overwhelming. Hence, in this paper, using the multi-scale Lorenz-96 system as a test problem, we introduced a data-driven stochastic-parameterization framework in which the equations of small-scales are integrated and incorporated using compressed sensing to reduce the cost of data assimilation. We presented a data-driven, SPR technique, where we learned to describe the corresponding unknown or unresolved physics from the data due to the fact the governing equations of the unresolved physics are too computationally expensive to solve. Since the atmospheric model considered in this paper is inherently chaotic, it has several positive Lyapunov exponents and any small error in the initial conditions grow in finite time.

Random matrices with independent and identically distributed (i.i.d) entries, satisfy RIP with high probability, and the RIP plays an important role in our calculations. We provided a characterization of the dynamical system based on the Lyapunov exponents for the

For parameterization of _{k}_{k}

Specifically, a reduced-order ensemble Kalman filter (EnKF) was applied to the stochastically parametrized model. For the EnKF, particle locations were corrected based on their respective distances from the observation (innovation). Magnitude of the correction was proportional to the error covariance and inverse to the observation noise covariance (Kalman gain). In between observations, particles in the EnKF sample evolve according to the original signal dynamics, so particle trajectories are expected to deviate from the truth as time moves further away from the last observation correction.

Our results show that the compressed sensing model produces the best deterministic parameterization results. After incorporating auto-regression to model residuals, the stochastic parameterizations outperform deterministic parameterizations in terms of prediction accuracy. Compressed sensing with auto-regression produces the best stochastic parameterization results.

Solving for small-scale processes with help of LES models, on a high-resolution grid embedded (with grid spacing of a few tens of meters and can explicitly simulate the clouds and turbulence represented by parameterization schemes) within the low-resolution grid is very useful, but its applicability remains limited to its high computational cost. A long term goal is to create a data-driven scheme, in which LES models embedded in selected grid columns of a global model explicitly simulate subgrid-scale processes which are represented by parameterization schemes in the other columns. The result is a seamless algorithm which only uses a small scale grid on a few selected columns that simulates interactively within a running global simulation, and is far cheaper than eddy-resolving simulation throughout the global simulation.

The additional file for this article can be found as follows:

Appendix A and B. DOI:

The authors acknowledge partial support for this work from Natural Sciences and Engineering Research Council (NSERC) Discovery grant 50503-10802, TECSIS/Fields-CQAM Laboratory for Inference and Prediction and NSERC-CRD grant 543433-19. The authors would like to thank Professor Peter Baxendale, Department of Mathematics, University of Southern California, for stimulating discussions, contributions and suggestions on many topics of this paper.

The authors have no competing interests to declare.