Atmospheric Boundary Layer (ABL) is a base of life on the Earth. The ABL includes a fair-weather cloud related to thermals and a stratocumulus cloud that fills the upper portion of the well-mixed moist boundary layer (Garratt, 1994). Fog is a form of stratocumulus cloud that touches the ground, also reported as a special feature of ABL (Stull, 2012). Continuously increasing atmospheric pollution and its long-term exposure pose many challenges for global public health which includes cardiovascular and cardio-respiratory diseases. It has become a latent need to accurately monitor the atmospheric variables that allow for the prediction of air pollution behaviour, to issue early alarms for the protection of the population. One of the variables of greatest interest is the dynamical height of the ABL, the lowest layer of the troposphere is directly influenced by the Earth surface by means of both natural and anthropogenic emissions (Koffi et al., 2016).
Many authors discussed soft computing and artificial intelligence-based model to study the effect of ABL on the atmosphere and human beings simultaneously (Terradellas et al., 2005). These models utilised different techniques i.e. Neural Network, Fuzzy Logic Technique, Deep Learning, Adaptive Neuro-Fuzzy Inference System (ANFIS), etc. (Niska et al., 2004; Freeman et al., 2018). These techniques are used to investigate, simulate, and analyse complex issues and phenomena in an attempt to solve real-world problems. The accurate modelling of ABL height and the realisation of prediction techniques are of great significance to the pollution control board, meteorological department, atmospheric scientists to study the climatic condition for local or remote areas (Caughey, 1984; Singal et al., 1994; Levi et al., 2020; Liu et al., 2020). For example, ABL height is significant in calculating the pollution carrying capacity in a specific area. Air quality assessments at the local or regional scale require for a variety of purposes such as emission control, air quality forecasts, and implementation of legislation. A key input to this model is the meteorological data that require computing the transport, dispersion, and removal of pollutants. ABL height determines the volume available for the diffusion of contaminants and also an essential parameter, i.e. Wind Profile in atmospheric flow models (Koffi et al., 2016). Besides, ABL height is vital to air-space study, wind structure of the area, atmospheric stability class, etc. To calculate/measure the ABL height, there are two methods, viz. in-situ or direct method (such as tethered balloons, masts, rawinsonde etc.) and remote sensing method (such as LIDAR, SODAR, wind profilers) (Bradley, 2007; Lee and Pal, 2017). SOnic Detection And Ranging (SODAR) is one of the remote sensing instruments (Beyrich, 1997; Emeis et al., 2008). This instrument works with the acoustic wave, where the acoustic wave is transmitted into the atmosphere and gets reflected on to the antenna due to the inhomogeneous structure in the atmosphere (Gilman et al., 1946). The output of the SODAR data is plotted in the form of an echogram, which represents the reflection of the signal from the atmosphere. This data has been used for the prediction of ABL height using time series prediction models. The time series models can be classified as Linear and Non-linear models. Auto Regressive (AR), Auto Regressive Moving-Average (ARMA), Auto Regressive Integrated Moving-Average (ARIMA) and its variation are linear models. They work on predefined equations to fit a mathematical model to a univariate time series (Selvin et al., 2017). These models fail to cope up with latent dynamics in the data. The advanced soft computing techniques such as Artificial Neural Network (ANN), Adaptive Network-based Fuzzy Inference (ANFIS) system, Genetic Algorithm, and Fuzzy Inference System, are non-linear models and being successfully applied for modelling of different parameters such as ABL Height, pollutant, temperature, etc. (Chelani, 2005; Rehman and Mohandes, 2008; Ettouney et al., 2009; Paoli et al., 2010; Kumar and Jha, 2013; Vivas et al., 2020). However, these models are less capable of identifying the hidden pattern and underlying dynamics of data. Deep learning algorithms are capable of identifying the hidden patterns and underlying dynamics in the data through a self-learning process (Selvin et al., 2017; Zaidi et al., 2020). In the case of ABL height, the data generated is enormous and is highly non-linear (Vivas et al., 2020). Unlike other algorithms, deep learning models can provide effectively good predictions by analysing the interactions and hidden patterns within the data. Long Short-Term Memory (LSTM) is the special type of a Recurrent Neural Network (RNN) (Hochreiter and Schmidhuber, 1997; Kawakami, 2008). LSTM is used in the deep learning methods, that works on the gradient-based learning algorithm and designed to minimise the backflow of the error.
In the area of meteorological parameters, time series analysis using neural network models and deep learning algorithms have used different input variables for predicting the weather data. The data for time series has been used for heterogeneous weather information. Vivas et al. (2020) has applied deep learning for the detection of ABL height from atmospheric LiDAR signals in Colombia. Rehman and Mohandes (2008) reported that an Artificial neural network has used for the prediction of global solar radiation with air temperature and relative humidity for Saudi Arabia from 1998 to 2002. Zhao et al. (2019) has applied the LSTM method to predict RADAR sea clutter and the performance of a particular method has found superior over neural network. However, for long term prediction, the performance of model is not as good as short-term prediction.
In the present work, the LSTM model has been implemented to obtain the future value of ABL height using SODAR data. The seasonal and annual variability have been studied for significant difference in ABL height. This model incorporates the factor and principle, which affect performance. Also, the performance of the model has been quantified for the data of ABL height using the SODAR system. Further, to estimate the performance of the model in picture, a time-series comparison of annual and seasonal ABL height variations is presented. This work will help to air quality study, seasonal variability for the Delhi region. Section 2 briefly describes the Delhi region, SODAR ABL height data. This section also describes the ABL height LSTM model and architecture. Section 3 begins with the general result of LSTM network then describes the annual and seasonal ABL height analysis and prediction. Section 4 and 5 discuss and conclusions of annual and seasonal ABL height prediction.
Data and methodology
Delhi is a site in the northern region of India at 715 feet above sea level, where it has a semi-arid or steppe climate, with scorching summers, heavy rainfalls in the monsoon months, and cold winter (Ramachandran et al., 2012; Kumar et al., 2017b). There are dust storms in summer and foggy mornings in winter. The temperature gradually rises to 460C in the summer and falls to 20C in winter (Roy et al., 2011). In the winter months, temperature inversion and low wind speed are the leading causes of the accumulation of airborne pollutants in Delhi. Delhi, being the capital of India has been chosen as the site for the installation of the SODAR as it faces extensive change in the atmosphere (Figure 1).
SODAR system was developed at CSIR-NPL in 1973 and modified from time to time. The system is installed on the roof of the main building, CSIR-NPL, New Delhi. Table 1 shown the design specifications of SODAR system which are used for measurement of ABL height (Gera et al., 2011; Kumar et al., 2019). The data of ABL height is obtained in the echogram, which is shown in Fig. 2 and measured by visualisation (Kumar et al., 2017a, 2017b). In this study, the data set has a temporal value (1 hour = one value) for one year (1 Dec 2018–30 Nov 2019).
Basic of LSTM network
LSTM network predicts the next moment state based on the data at the previous moment state (Zhao et al., 2019). From Fig. 3, it has been observed that the most significant difference between ordinary RNN and LSTM is the hidden unit. In LSTM network, the hidden layer does self-looping, which can be seen as multiple copies of the same neural network, and each neural network module passes the information to the next level. It is also recompensed for the incapability of RNN to predict long-distance relationships (Hochreiter and Schmidhuber, 1997). It is used due to the famous part of the current deep learning field to solve the time-series prediction problem. A neural network contains one input layer, one output layer, and many intermediate layers, which are called the hidden layer (Soni and Parmar, 2020). The output of the input layer appearances, the input of the first hidden layer, and the output of each hidden layer constitutes the input of each subsequent hidden layer (Kumar et al., 2015). LSTM is a type of RNN but the usually hidden layers are replaced with LSTM cells and these cells are composed of various gates that control the input flow (Selvin et al., 2017; Zhao et al., 2019). An LSTM cell consists of input gate (consist of input), cell state (runs through the entire network and has the ability to add or remove information with help of gates), forget gate (control level of cell state reset/forget), and output gate (control level of cell state added to hidden state). The details of architecture of LSTM network have been described in Hochreiter and Schmidhuber (1997) and Kawakami (2008).
Sodar ABL height prediction model
Figure 4 represents the block diagram of development of the ABL height LSTM network prediction model for the training and testing of the network. Block diagram depicts that the LSTM prediction network could uninterruptedly pass the ABL height characteristics, i.e. from the current step to the next step, and predicate the next point from the previous point. Thus, it predicts the temporal ABL height. While creating the LSTM prediction model, the input data are a set of temporal ABL height, that is measured by the SODAR system.
For the solving of ABL height prediction problematic, the input data are a set of ABL height temporal series. As the series progresses, the hidden layer of the previous point will affect the hidden layer of the next point. This feature is of great help to the ABL height that has a specific nonlinear relationship between the previous data and the last data. Therefore, the configuration of LSTM network was a sequence-to-sequence regression and trained from Back Propagation Through Time (BPTT). It is observed that as time increases, the values are also changing with a specific time, and there is some periodic change. From the above observation, LSTM neural network model is good for nonlinear parameter.
To check the generalised capacity and accuracy of model, first 90% of the input data are selected as the training set, and the last 10% as the test set, due to checking the pattern of convection and inversion period of ABL on temporal basis and below functions are selected as performance measures of prediction model:
In this paper, the pre-defined Deep Learning Models available in the Deep Learning toolbox of MATLAB R2019a software is used for deep learning training. These pre-defined networks are retrained and fine-tuned on augmented training and validation sets with SODAR ABL height data. For the better fit, to prevent the training and testing from diverging, the data are standardised with zero mean and unit variance (Unal et al., 2003). The networks are trained using ‘adam’ function as an optimiser (Kingma and Ba, 2014). LSTM network architecture parameters are defined in Table 2.
The performance of the prediction model is quantified for the test set and used for estimating the RMSE, rRMSE, MAE, and MAPE. The calculated error is lower than to gets a higher accuracy of the prediction model (Adnan et al., 2020; Kumar et al., 2021).
The input data set has a nearly one-year data point. When the complete data set is used in each training process, the gradient cannot be corrected, and the network cannot converge to the global optimum. The size of the mini-batch is selected with care such that the whole dataset is to be passed through the network every epoch during training and no data is discarded. The experiment is done for ABL height prediction models using the 3 months data (1 April 2019 to 30 June 2019). During this period, available data is continuous and more dynamic in nature. Also, the change in ABL height is maximum during May month. In this paper, two types of predictions are presented. Firstly, the network is updated with previously predicted values as input to the function, which is called Prediction-1 and secondly, the network is updated with observed values instead of the predicted values, this is called Prediction-2. When the LSTM network completes the one loop, i.e. the input ABL height data set passes through the network once and returns once, the process is called an epoch.
Performance evaluation of the experiment for ABL height LSTM model has been obtained for the optimal parameters. The testing is done for two parameters, i.e. hidden neuron and epochs with constant data set (2160 data point). To avoid overfitting, the networks are trained using different hidden layer varying in the range (2, 5, 10, 20, 30, 50, 100, 35, 28, 32, respectively) with different epochs (250, 500, 750). The network (32 hidden layer and 500 epochs) training progress is shown in Fig. 5, which provide the best result from among other combination.
Results obtained after testing is shown in Table 3 and Fig. 6. It has been observed that as the epochs and hidden layer (neuron) increases, the error value on the test set and training set both decreases. Table 3 shows that after certain epochs and the number of hidden layers and the error value increases, i.e. the prediction accuracy is improving.
To check the uncertainties in the LSTM model, different parameters have been calculated and represented in Table 4 and Fig. 7. It has been observed that the testing values and predicted values have followed the same pattern. Figure 7 shows the line plot of temporal average of testing ABL height data and prediction ABL height data. In the LSTM model, Prediction-1 represents the updated network with predicted value (1650 m maximum) and Prediction-2 represents the updated network with observed values (1700 m maximum), with respect to the highest value of ABL height from SODAR 1775 m.
Also, Non-linear Auto Regressive (NAR) model is used to predict the ABL height using the predefined model of MATLAB 2019a. The maximum value of errors is obtained for each model is shown in Table 5. It has been observed that Prediction-2 model values are more accurate than the NAR model. However, NAR model is used for the information from previous lags to predict future instances. The ABL height is highly dynamical in nature for day-to-day atmosphere. This causes a learning problem to NAR architecture and fails to capture the dynamical changes accurately.
Analysis and prediction of annual ABL height
The ABL is a zone having nearly constant potential temperature and specific relative humidity with height. ABL height determines the volume available for dispersion of pollutants and characterises the structure of the lower atmosphere. Higher the value of mixing height, the greater is dispersion rate and vice versa. Based on the observation of SODAR echograms, the ABL height is changing continuously. So, the box plot is used to interpret the data (Frigge et al., 1989; Williamson et al., 1989). The box plot uses the median, the approximate quartiles, and the lowest and highest data points to convey the level, spread, and symmetry of a distribution of data values. Every box has a central mark, which indicates median value and whereas the bottom-line represents the 25th percentiles, and top-line indicates 75th percentiles of the data. The whiskers cover the most extreme data points and the outliers have plotted individually using the '+' symbol. Figure 8 represents the annual variation of temporal ABL height and Month average. The vertical bars denote the ± σ standard deviation from the temporal average. Figure 8 and Table 6, presents the temporal average SODAR data for about one year (December 2018 to November 2019).
The annual ABL height data have been used to retained LSTM model to predict the annual temporal ABL height. The result obtained shows the Prediction-2 model has the lowest RSME (187.71 m). Figure 9, Prediction-1 model provides a good result for 30 days with RSME value (329.55 m). During convention period, the predicted ABL height is lower than observed, which increases the error. Also, the annual data set has been applied on NAR model and obtained a higher RSME value (261.80 m) compared to Prediction-2 model.
Comparison and prediction of seasonal ABL height
Total data set has been classified into four seasons based on meteorology over northern India (Soni et al., 2011; Ramachandran et al., 2012; Kumar et al., 2017b) namely, winter (December-January-February), Pre-monsoon (March-April-May), Monsoon (June-July-August-September) and Post-monsoon (October-November) for the analysis of seasonal ABL height. The temporal seasonal variation of mixing height during the whole year of observation is shown in Fig. 10. It is observed that the convection period is most extended during monsoon and lowest during the post-monsoon. The maximum value (about 1510 m) of mixing height has been observed in the pre-monsoon season, while the minimum value of mixing height (around 1315 m) is found during the winter season. The monthly average ABL Height during different hours has been represented in Table 6 and found that in May month (Pre-Monsoon season), ABL height is the highest and lowest in January (winter season).
ABL height has a positive correlation with temperature and wind speed, whereas a negative relationship with relative humidity (Kumar et al., 2017b). Temperature and wind speed influences positively to the ABL, while relative humidity influences negatively to the ABL heights, during all the seasons due to an increase or decrease in solar heating. The convective boundary layer height increases and decreases during the day time due to the change in surface temperature. The variation in surface temperature carbon monoxide mixing ratio the existence of atmospheric convection; therefore, it strongly affects the height. Generally, it is high between 11:00 to 14:00 during all seasons. Mixing height starts decreasing due to the decrease in solar heat during the evening. The ABL height during the convection period is found 1510 m, 1485 m, 1395 m, and 1315 m in pre-monsoon, monsoon, post-monsoon, and winter seasons, respectively.
All the season data is used for the prediction of seasonal ABL height. Both LSTM models are retrained and tested with seasonal data and to obtain the result for the prediction of seasonal ABL height. Figure 11 represents the seasonal ABL height prediction result from the Prediction-1 model. It is observed that when the prediction days increases the accuracy of Prediction-1 model decrease. Seasonal data is used to retrain and test the NAR model. It has been observed that the NAR model gives a higher RSME value.
The four season’s data sets of the ABL height in different atmospheric conditions are selected to train the LSTM model. Then, the trained prediction model is used to predict the ABL height of each season to compare with the annual prediction (Figures 9 and 11). The predictions ABL height shows a comparable result with the SODAR data, especially over transition period (from inversion to convection or vice versa). Table 7 shows the error of ABL height prediction using the LSTM network, which is lower in winter season and higher in annual data. Whereas error in all conditions of Prediction-2 is lower as compare to Prediction-1 and NAR model. But the related error generated by Prediction-1 is not much higher as compared to Prediction-2. The LSTM network model is providing high prediction accuracy as compared to NAR model. The LSTM models simulate the seasonal ABL height reasonably well, i.e. inferred from the periodicity in the ABL height time series. It also provides reliable ABL height simulations and predictions at any sites where the yearly ABL height pattern remains somewhat similar. It is observed that ABL height is highest in the convection period (daytime), do not follow any periodicity and somewhat errors are highest during this period. These LSTM models are useful to the pollution regulatory body to control atmospheric pollution.
A neural network based temporal ABL height prediction is presented in this paper. It has been observed that LSTM neural network architecture is capable of capturing the hidden dynamics of temporal ABL and able to make predictions. The critical point of this approach is the decomposition of the temporal data into seasonal and the annual prediction of ABL height. Application of LSTM instead of classical neural network has enabled to obtain better accuracy of the seasonal prediction and, aggregate annual data. The proposed model is applied to ABL height data measurement using the SODAR system situated at CSIR-NPL, Delhi. The obtained results of prediction are in good agreement with the actual measurements. It is found that the optimal results achieved in short training time, when the number of neurons is equal to 32 and epoch are equal to 500, which provide accurate prediction result to obtain a long-term prediction. The LSTM model has foundation for the next step of accurate prediction of ABL height. Two types of prediction LSTM model have been applied for prediction of annual and seasonal ABL height. Both the models have provided good accuracy as compared to NAR model. These LSTM networks will apply to a different environmental parameter to find out the atmospheric condition in future.