scispace - formally typeset
Open AccessJournal ArticleDOI

Development of novel hybrid machine learning models for monthly thunderstorm frequency prediction over Bangladesh

Reads0
Chats0
TLDR
A novel hybrid machine learning model through hybridization of data pre-processing Ensemble Empirical Mode Decomposition with two state-of-arts models namely artificial neural network (EEMD-ANN, support vector machine) for TSF prediction at three categories of yearly frequencies over Bangladesh indicates the potential of hybrids of EEMD with the conventional models for improving prediction precision.
Abstract
Thunderstorm frequency (TSF) prediction with higher accuracy is of great significance under climate extremes for reducing potential damages However, TSF prediction has received little attention because a thunderstorm event is a combination of intricate and unique weather scenarios with high instability, making it difficult to predict To close this gap, we proposed two novel hybrid machine learning models through hybridization of data pre-processing ensemble empirical mode decomposition (EEMD) with two state-of-arts models, namely artificial neural network (ANN), support vector machine for TSF prediction at three categories over Bangladesh We have demarcated the yearly TSF datasets into three categories for the period 1981–2016 recorded at 28 sites; high (March–June), moderate (July–October), and low (November–February) TSF months The performance of the proposed EEMD-ANN and EEMD-SVM hybrid models was compared with classical ANN, SVM, and autoregressive integrated moving average EEMD-ANN and EEMD-SVM hybrid models showed 802–2248% higher performance precision in terms of root mean square error compared to other models at high-, moderate-, and low-frequency categories Eleven out of 21 input parameters were selected based on the random forest variable importance analysis The sensitivity analysis results showed that each input parameter was positively contributed to building the best model of each category, and thunderstorm days are the most contributing parameters influencing TSF prediction The proposed hybrid models outperformed the conventional models where EEMD-ANN is the most skillful for high TSF prediction, and EEMD-SVM is for moderate and low TSF prediction The findings indicate the potential of hybridization of EEMD with the conventional models for improving prediction precision The hybrid models developed in this work can be adopted for TSF prediction in Bangladesh as well as different parts of the world

read more

Content maybe subject to copyright    Report

Development of novel hybrid machine learning
models for monthly thunderstorm frequency
prediction over Bangladesh
Md. Abul Kalam Azad
Begum Rokeya University
A R M Towqul Islam ( towq_dm@brur.ac.bd )
Begum Rokeya University https://orcid.org/0000-0001-5779-1382
Md. Siddiqur Rahman
Begum Rokeya University
Kurratul Ayen
Begum Rokeya University
Research Article
Keywords: Thunderstorm, Hybrid model, Ensemble empirical mode decomposition, Sensitivity analysis,
Random Forest, Bangladesh
Posted Date: February 10th, 2021
DOI: https://doi.org/10.21203/rs.3.rs-204328/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License. 
Read Full License
Version of Record: A version of this preprint was published at Natural Hazards on April 15th, 2021. See
the published version at https://doi.org/10.1007/s11069-021-04722-9.

1
1
Development of novel hybrid machine learning models for monthly thunderstorm frequency 2
prediction over Bangladesh 3
Md. Abul Kalam Azad
1
*, Abu Reza Md. Towfiqul Islam
1
*, Md. Siddiqur Rahman
1
, Kurratul Ayen
1
4
1
Department of Disaster Management, Begum Rokeya University, Rangpur 5400, Bangladesh 5
6
7
8
9
10
11
12
13
*
Corresponding author: towfiq_dm@brur.ac.bd; suborno19@gmail.com 14
ORCID: 0000-0001-5779-1382 15
Tel: +880-2-58616687 16
Fax: +880-2-58617946 17
Submission: January, 2021 18
19
20
Abstract 21
Accurate thunderstorm frequency (TSF) prediction is of great significance under climate extremes for reducing 22
potential damages. However, TSF prediction has received little attention because a thunderstorm event is a 23
combination of intricate and unique weather scenarios with high instability, making it difficult to predict. To 24
close this gap, we proposed a novel hybrid machine learning model through hybridization of data pre-processing 25
Ensemble Empirical Mode Decomposition (EEMD) with two state-of-arts models namely artificial neural 26
network (EEMD-ANN), support vector machine (EEMD-SVM) for TSF prediction at three categories of yearly 27
frequencies over Bangladesh. We were demarcated the yearly TSF datasets into three categories for the period 28
1981-2016 recorded at 28 sites; high (March-June), moderate (July-October), and low (November-February) 29
TSF months. The performance of the proposed EEMD-ANN and EEMD-SVM hybrid models was compared 30

2
with classical ANN, SVM, Autoregressive Integrated Moving Average (ARIMA). EEMD-ANN and EEMD-31
SVM hybrid models showed 8.02%-22.48% higher performance precision in terms of root mean square error 32
(RMSE) compared to other models at high, moderate and low-frequency categories. Eleven out of 21 input 33
parameters were selected based on the Random Forest (RF) variable importance analysis. The sensitivity analysis 34
results showed that each input parameter was positively contributed to building the best model of each category 35
and thunderstorm days are the most contributing parameters influencing TSF prediction. The proposed hybrid 36
models outperformed the conventional models where EEMD-ANN is the most skillful for high TSF prediction, 37
and EEMD-SVM is for moderate and low TSF prediction. The findings indicate the potential of hybridization 38
of EEMD with the conventional models for improving prediction precision. The hybrid model developed in this 39
work can be adopted for TSF prediction in Bangladesh as well as different parts of the world. 40
Keywords: Thunderstorm; Hybrid model; Ensemble empirical mode decomposition; Sensitivity analysis, 41
Random Forest, Bangladesh 42
1. Introduction 43
Thunderstorms are spectacular mesoscale phenomena that affect the environment and pose a severe threat to life, 44
economy, agriculture, and infrastructures. A thunderstorm event results from a turbulent convective activity, 45
which may bring about heavy rainfall, lightning, hail, tornadoes, and thunder (Islam et al., 2020). Thunderstorms 46
occur in almost every region of the world because of meteorological instability and strong moisture convergence, 47
which causes serious convections. It usually exists for less than an hour and typically has varying sizes ranging 48
from a few kilometers to a few hundred kilometers (Saha and Quadir 2016). It is now a well-acknowledged fact 49
that the climate system is getting warmer, which has implications for thunderstorm occurrences (Allen et al., 50
2014; Trenberth et al., 2007). Severe thunderstorms frequency is likely to increase in the 21
st
century due to the 51
increasing convective instability (Rädler et al., 2019). Therefore, it is essential to predict the number of 52
thunderstorm events that occur in a particular period under changing meteorological conditions in a given 53
location. Predicting the number of thunderstorm phenomena could provide insights about future thunderstorm 54
incidents under the climate change scenario. 55
Thunderstorm frequency (TSF) can be defined as the number of thunderstorm occurrences in a given location 56
over a day, month, season, or annum. It is estimated that daily TSF is nearly 45,000 and annually 16 million 57

3
worldwide (Siddiqui and Rashid 2008). Many parts of South Asia experience higher TSF during the summer 58
months (March-May) when high temperatures prevail at lower levels create a volatile atmosphere. Each year 59
Bangladesh and its surroundings witness high TSF, especially during the pre-monsoon and early months of the 60
monsoon season; however, thunderstorms occur in all seasons. Spatially, TSF is highest in the northeastern part 61
and less in the southeastern and northwestern parts of Bangladesh (Mannan et al., 2016). Before 1981, the 62
country endured thunderstorm strikes in about nine days in May, which later rose to 12 days. Besides, 63
thunderstorms associated disasters cause severe damage to agricultural yields and infrastructures and lives on 64
the ground and in aviation. Due to the exorbitant impact of thunderstorms on human life and the economy, the 65
Government of Bangladesh declared it a natural disaster on 17 May 2016 (Wahiduzzaman et al., 2020). In 66
contrast, thunderstorms bring crucial rainfall during the dry season, which benefits the country's crop production 67
and cleans the air from dust, haze, and pollutants. A TSF prediction model can help prepare and design a more 68
useful crop calendar adaptive to thunderstorm events. Besides, a TSF prediction model is essential for 69
policymakers to adopt a mitigation plan for reducing the potential damages of thunderstorm casualties. 70
Thunderstorm prediction is a challenging task due to its small spatiotemporal extension, and the event is a 71
combination of very complex and unique weather scenarios, which are highly unstable. Despite the challenges, 72
many a researcher has attempted to predict thunderstorms worldwide, e.g., Jacovides and Yonetani (1990) in 73
Cyprus; Mills and Colquhoun (1998) in Australia; Haklander and Delden (2003) in Netherland; Manzato (2007) 74
in Italy; Zhen-hui et al. (2013) in China; Ali et al. (2011) in Malaysia; Litta et al. (2013) and Meher et al. (2019) 75
in India; Collins and Tissot (2015) in the USA; Dowdy (2016) in the temperate and tropical regions; Osuri et al. 76
(2017) in Indian monsoon region; Rädler et al. (2019) in Europe; Chen et al. (2020) in Taiwan; Kulikov et al. 77
(2020) in Russia; Bouttier and Marchal (2020) in Western Europe; Islam et al. (2020) in Bangladesh. A variety 78
of approaches have been taken in those studies. For example, Collins and Tissot (2015) used and compared an 79
ANN and MLR model for thunderstorms prediction within 400 km
2
of South Texas; Rädler et al. (2019) used 80
an ensemble of 14 regional climate models such as AR-CHaMo models, EURO-CORDEX model to assess the 81
changes in the frequency of thunderstorm. Most of the studies have focused on Numerical Weather Prediction 82
(NWP) modeling or forecasting of a single thunderstorm event on an hourly basis based on the convective 83
indices. However, studies focused on predicting monthly TSF based on the convective indices and other 84

4
thunderstorm-related parameters are still scarce in the literature (Islam et al., 2020). In the present study, we 85
have employed machine learning models including Artificial Neural Network (ANN), Support Vector Machine 86
(SVM), incorporated with Ensemble Empirical Mode Decomposition (EEMD), and Auto-Regressive Integrated 87
Moving Average (ARIMA) modeling to predict the monthly TSF over Bangladesh. 88
Among the machine learning models, ANN is a powerful model that can identify complex inherent nonlinear 89
relationships between responses and predictors. Therefore, ANN models have drawn attention in the 90
thunderstorm forecasting community (Manzato. 2007; Collins and Tissot, 2015; Litta et al., 2013). SVM is also 91
a useful prediction technique that was used before in thunderstorm prediction (Qiu et al., 2010; Zhen-hui et al., 92
2013). The time series model like ARIMA is widely used because it can characterize nonlinear data; this model 93
was also applied previously in thunderstorm prediction (Islam et al., 2020). Though these models are not always 94
efficient enough to predict a target dataset accurately. Due to this reason, many researchers have developed 95
techniques that adjoin several types of methods to obtain more accuracy in their prediction (Chen and Letchford. 96
2007; Gao and Stensrud. 2014; Solari et al., 2017; Suparta and Putro. 2018; Bouttier and Marchal. 2020; 97
Kamangir et al., 2020). The hybrid EEMD integrated machine learning models have successfully applied in 98
different fields of studies, e.g., runoff (Tan et al., 2018); streamflow forecasting (Zhang et al. 2015); rainfall 99
forecasting (Johny et al. 2020) wind speed forecasting (Yu, 2020); groundwater level (Gong et al., 2018). 100
However, TSF prediction has received little attention in the existing literature due to its complicated nature and 101
unique weather feature with high instability, making it difficult to predict. Our work fills this research gap in 102
literature. Therefore, a hybrid EEMD-ANN and EEMD-SVM models, the combination of an ensemble empirical 103
mode decomposition (EEMD) with an ANN and SVM model, are proposed as effective methods to predict 104
monthly TSF. In this study, widely used convective indices and thunderstorm-related variables were used as 105
input parameters. The EEMD-ANN and EEMD-SVM prediction results were compared with three conventional 106
prediction methods, e.g., ANN, SVM, and ARIMA, based on five performance evaluation metrics, i.e., 107
Coefficient of determination (R
2
), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean 108
Absolute Percentage Error (MAPE), Index of Agreement (IA) along with the Taylor diagram. Even though 109
machine learning models can solve prediction problems with reasonable accuracy, their predictive capability 110
relies significantly on the input data quality. In such a case, sensitivity analysis can help identify which input 111

Figures
Citations
More filters
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Book

The Nature of Statistical Learning Theory

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

Statistical learning theory

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Book

Time series analysis, forecasting and control

TL;DR: In this article, a complete revision of a classic, seminal, and authoritative book that has been the model for most books on the topic written since 1970 is presented, focusing on practical techniques throughout, rather than a rigorous mathematical treatment of the subject.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What have the authors contributed in "Development of novel hybrid machine learning models for monthly thunderstorm frequency prediction over bangladesh" ?

Islam et al. this paper predicted the number of thunderstorm events in a particular period under changing meteorological conditions in a given location. 

The authors anticipate that due to low surface temperature and soil moisture, the 424 winter season (November to February) is the least favorable for forming TSF. 

494 Second, the coupling preprocessing technique with a machine learning algorithm, a division of the training and 495 testing datasets, and model selection criteria are a vital factor affecting the overall performance of the hybrid 496 models. 

Severe thunderstorms frequency is likely to increase in the 21st century due to the 51 increasing convective instability (Rädler et al., 2019). 

The deep cyan contours represent the Pearson correlation coe cient, green contours represent centered RMS error in the simulated eld, and violet contours represent the Standard Deviation of the simulated pattern. 

One probable reason for the improved performance of 434 EEMD-ANN can be the method's capability to solve complex and nonlinear problems (Phuong et al., 2017). 

Although the parameters like 404DP, KI, RH, ST, and WS50 have low sensitivity value, they help achieve better prediction accuracy. 

It can be said that the proposed methodology can not only predict the 456 complicated thunderstorm frequency over Bangladesh rationally well, but it can also attain extreme climatic 457 events. 

All the other convective parameters, e.g., TT, CPRCP, CRR, KI, and the meteorological 477 parameters, e.g., PRCP, RH, ST, WS50, have positively contributed to the best model building. 

422 Uncertainty increases in low TSF months (winter) because of the low SST and northeast wind flow from the 423 BoB and lowers vapor flux availability. 

Since the mean value of Gaussian white noise is equal to zero, the IMFs obtained are integrated and averaged 258 as the final result: 259 𝐼𝑀𝐹̅̅ ̅̅ ̅̅ = 1𝑁 ∑ 𝐶𝑗,𝑚𝑁𝑚=1 260 where 𝐶𝑗,𝑚 represents the 𝑗𝑡ℎ IMFs from the 𝑚𝑡ℎ time, 𝑁 denotes the number of the added white noise 261 sequences. 

The application of machine learning algorithms in a 488 thunderstorm prediction brings with a new promise for forthcoming studies concerning both operational 489 predictors and meteorological research that intend to examine observed and future variations in frequencies of 490 severe extreme events (Yasen et al., 2017; Taszarek et al., 2019). 

100 However, TSF prediction has received little attention in the existing literature due to its complicated nature and 101 unique weather feature with high instability, making it difficult to predict. 

Most of the studies have focused on Numerical Weather Prediction 82 (NWP) modeling or forecasting of a single thunderstorm event on an hourly basis based on the convective 83 indices.