scispace - formally typeset
Search or ask a question

Showing papers in "Geoscience frontiers in 2021"


Journal ArticleDOI
Wengang Zhang1, Chongzhi Wu1, Haiyi Zhong1, Yongqin Li1, Lin Wang1 
TL;DR: Novel data-driven extreme gradient boosting (XGBoost) and random forest ensemble learning methods are applied for capturing the relationships between the USS and various basic soil parameters to predict undrained shear strength of soft clays.
Abstract: Accurate assessment of undrained shear strength (USS) for soft sensitive clays is a great concern in geotechnical engineering practice. This study applies novel data-driven extreme gradient boosting (XGBoost) and random forest (RF) ensemble learning methods for capturing the relationships between the USS and various basic soil parameters. Based on the soil data sets from TC304 database, a general approach is developed to predict the USS of soft clays using the two machine learning methods above, where five feature variables including the preconsolidation stress (PS), vertical effective stress (VES), liquid limit (LL), plastic limit (PL) and natural water content (W) are adopted. To reduce the dependence on the rule of thumb and inefficient brute-force search, the Bayesian optimization method is applied to determine the appropriate model hyper-parameters of both XGBoost and RF. The developed models are comprehensively compared with three comparison machine learning methods and two transformation models with respect to predictive accuracy and robustness under 5-fold cross-validation (CV). It is shown that XGBoost-based and RF-based methods outperform these approaches. Besides, the XGBoost-based model provides feature importance ranks, which makes it a promising tool in the prediction of geotechnical parameters and enhances the interpretability of model.

367 citations


Journal ArticleDOI
TL;DR: In this paper, an overview of the current scenario of arsenic contamination of groundwater in various countries across the globe with an emphasis on the Indian Peninsula is presented and the corrective measures available include removing arsenic from groundwater using filters, exploring deeper or alternative aquifers, treatment of the aquifer itself, dilution method by artificial recharge to groundwater, conjunctive use and installation of nano-filter, among other procedures.
Abstract: More than 2.5 billion people on the globe rely on groundwater for drinking and providing high-quality drinking water has become one of the major challenges of human society. Although groundwater is considered as safe, high concentrations of heavy metals like arsenic (As) can pose potential human health concerns and hazards. In this paper, we present an overview of the current scenario of arsenic contamination of groundwater in various countries across the globe with an emphasis on the Indian Peninsula. With several newly affected regions reported during the last decade, a significant increase has been observed in the global scenario of arsenic contamination. It is estimated that nearly 108 countries are affected by arsenic contamination in groundwater (with concentration beyond maximum permissible limit of 10 ppb recommended by the World Health Organization. The highest among these are from Asia (32) and Europe (31), followed by regions like Africa (20), North America (11), South America (9) and Australia (4). More than 230 million people worldwide, which include 180 million from Asia, are at risk of arsenic poisoning. Southeast Asian countries, Bangladesh, India, Pakistan, China, Nepal, Vietnam, Burma, Thailand and Cambodia, are the most affected. In India, 20 states and 4 Union Territories have so far been affected by arsenic contamination in groundwater. An attempt to evaluate the correlation between arsenic poisoning and aquifer type shows that the groundwater extracted from unconsolidated sedimentary aquifers, particularly those which are located within the younger orogenic belts of the world, are the worst affected. More than 90% of arsenic pollution is inferred to be geogenic. We infer that alluvial sediments are the major source for arsenic contamination in groundwater and we postulate a strong relation with plate tectonic processes, mountain building, erosion and sedimentation. Prolonged consumption of arsenic-contaminated groundwater results in severe health issues like skin, lung, kidney and bladder cancer; coronary heart disease; bronchiectasis; hyperkeratosis and arsenicosis. Since the major source of arsenic in groundwater is of geogenic origin, the extend of pollution is complexly linked with aquifer geometry and aquifer properties of a region. Therefore, remedial measures are to be designed based on the source mineral, climatological and hydrogeological scenario of the affected region. The corrective measures available include removing arsenic from groundwater using filters, exploring deeper or alternative aquifers, treatment of the aquifer itself, dilution method by artificial recharge to groundwater, conjunctive use, and installation of nano-filter, among other procedures. The vast majority of people affected by arsenic contamination in the Asian countries are the poor who live in rural areas and are not aware of the arsenic poisoning and treatment protocols. Therefore, creating awareness and providing proper medical care to these people remain as a great challenge. Very few policy actions have been taken at international level over the past decade to reduce arsenic contamination in drinking water, with the goal of preventing toxic impacts on human health. We recommend that that United Nations Environment Programme (UNEP) and WHO should take stock of the global arsenic poisoning situation and launch a global drive to create awareness among people/medical professionals/health workers/administrators on this global concern.

337 citations


Journal ArticleDOI
TL;DR: The methodology and solution-oriented results presented in this paper will assist the regional as well as local authorities and the policy-makers for mitigating the risks related to floods and also help in developing appropriate mitigation measures to avoid potential damages.
Abstract: Floods are one of nature's most destructive disasters because of the immense damage to land, buildings, and human fatalities. It is difficult to forecast the areas that are vulnerable to flash flooding due to the dynamic and complex nature of the flash floods. Therefore, earlier identification of flash flood susceptible sites can be performed using advanced machine learning models for managing flood disasters. In this study, we applied and assessed two new hybrid ensemble models, namely Dagging and Random Subspace (RS) coupled with Artificial Neural Network (ANN), Random Forest (RF), and Support Vector Machine (SVM) which are the other three state-of-the-art machine learning models for modelling flood susceptibility maps at the Teesta River basin, the northern region of Bangladesh. The application of these models includes twelve flood influencing factors with 413 current and former flooding points, which were transferred in a GIS environment. The information gain ratio, the multicollinearity diagnostics tests were employed to determine the association between the occurrences and flood influential factors. For the validation and the comparison of these models, for the ability to predict the statistical appraisal measures such as Freidman, Wilcoxon signed-rank, and t-paired tests and Receiver Operating Characteristic Curve (ROC) were employed. The value of the Area Under the Curve (AUC) of ROC was above 0.80 for all models. For flood susceptibility modelling, the Dagging model performs superior, followed by RF, the ANN, the SVM, and the RS, then the several benchmark models. The approach and solution-oriented outcomes outlined in this paper will assist state and local authorities as well as policy makers in reducing flood-related threats and will also assist in the implementation of effective mitigation strategies to mitigate future damage.

195 citations


Journal ArticleDOI
TL;DR: In this paper, the authors evaluated the capabilities of seven advanced machine learning techniques (MLTs), including, Support Vector Machine (SVM), Random Forest (RF), Multivariate Adaptive Regression Spline (MARS), Artificial Neural Network (ANN), Quadratic Discriminant Analysis (QDA), Linear Discrimination Analysis (LDA), and Naive Bayes (NB), for landslide susceptibility modeling and comparison of their performances.
Abstract: The current study aimed at evaluating the capabilities of seven advanced machine learning techniques (MLTs), including, Support Vector Machine (SVM), Random Forest (RF), Multivariate Adaptive Regression Spline (MARS), Artificial Neural Network (ANN), Quadratic Discriminant Analysis (QDA), Linear Discriminant Analysis (LDA), and Naive Bayes (NB), for landslide susceptibility modeling and comparison of their performances. Coupling machine learning algorithms with spatial data types for landslide susceptibility mapping is a vitally important issue. This study was carried out using GIS and R open source software at Abha Basin, Asir Region, Saudi Arabia. First, a total of 243 landslide locations were identified at Abha Basin to prepare the landslide inventory map using different data sources. All the landslide areas were randomly separated into two groups with a ratio of 70% for training and 30% for validating purposes. Twelve landslide-variables were generated for landslide susceptibility modeling, which include altitude, lithology, distance to faults, normalized difference vegetation index (NDVI), landuse/landcover (LULC), distance to roads, slope angle, distance to streams, profile curvature, plan curvature, slope length (LS), and slope-aspect. The area under curve (AUC-ROC) approach has been applied to evaluate, validate, and compare the MLTs performance. The results indicated that AUC values for seven MLTs range from 89.0% for QDA to 95.1% for RF. Our findings showed that the RF (AUC ​= ​95.1%) and LDA (AUC ​= ​941.7%) have produced the best performances in comparison to other MLTs. The outcome of this study and the landslide susceptibility maps would be useful for environmental protection.

166 citations


Journal ArticleDOI
TL;DR: Two novel deep learning algorithms, the recurrent neural network (RNN) and convolutional Neural Network (CNN), are applied for national-scale landslide susceptibility mapping of Iran to generate landslide susceptibility maps of Iran.
Abstract: The identification of landslide-prone areas is an essential step in landslide hazard assessment and mitigation of landslide-related losses. In this study, we applied two novel deep learning algorithms, the recurrent neural network (RNN) and convolutional neural network (CNN), for national-scale landslide susceptibility mapping of Iran. We prepared a dataset comprising 4069 historical landslide locations and 11 conditioning factors (altitude, slope degree, profile curvature, distance to river, aspect, plan curvature, distance to road, distance to fault, rainfall, geology and land-sue) to construct a geospatial database and divided the data into the training and the testing dataset. We then developed RNN and CNN algorithms to generate landslide susceptibility maps of Iran using the training dataset. We calculated the receiver operating characteristic (ROC) curve and used the area under the curve (AUC) for the quantitative evaluation of the landslide susceptibility maps using the testing dataset. Better performance in both the training and testing phases was provided by the RNN algorithm (AUC=0.88) than by the CNN algorithm (AUC=0.85). Finally, we calculated areas of susceptibility for each province and found that 6% and 14% of the land area of Iran is very highly and highly susceptible to future landslide events, respectively, with the highest susceptibility in Chaharmahal and Bakhtiari Province (33.8%). About 31% of cities of Iran are located in areas with high and very high landslide susceptibility. The results of the present study will be useful for the development of landslide hazard mitigation strategies.

166 citations


Journal ArticleDOI
TL;DR: This research aims to develop six hybrid models of extreme gradient boosting (XGB) which are optimized by gray wolf optimization (GWO), particle swarm optimization (PSO), social spider optimization (SSO), sine cosine algorithm (SCA), multi verse optimization (MVO) and moth flame optimized (MFO) for estimation of the TBM penetration rate (PR).
Abstract: A reliable and accurate prediction of the tunnel boring machine (TBM) performance can assist in minimizing the relevant risks of high capital costs and in scheduling tunneling projects. This research aims to develop six hybrid models of extreme gradient boosting (XGB) which are optimized by gray wolf optimization (GWO), particle swarm optimization (PSO), social spider optimization (SSO), sine cosine algorithm (SCA), multi verse optimization (MVO) and moth flame optimization (MFO), for estimation of the TBM penetration rate (PR). To do this, a comprehensive database with 1286 data samples was established where seven parameters including the rock quality designation, the rock mass rating, Brazilian tensile strength (BTS), rock mass weathering, the uniaxial compressive strength (UCS), revolution per minute and trust force per cutter (TFC), were set as inputs and TBM PR was selected as model output. Together with the mentioned six hybrid models, four single models i.e., artificial neural network, random forest regression, XGB and support vector regression were also built to estimate TBM PR for comparison purposes. These models were designed conducting several parametric studies on their most important parameters and then, their performance capacities were assessed through the use of root mean square error, coefficient of determination, mean absolute percentage error, and a10-index. Results of this study confirmed that the best predictive model of PR goes to the PSO-XGB technique with system error of (0.1453, and 0.1325), R2 of (0.951, and 0.951), mean absolute percentage error (4.0689, and 3.8115), and a10-index of (0.9348, and 0.9496) in training and testing phases, respectively. The developed hybrid PSO-XGB can be introduced as an accurate, powerful and applicable technique in the field of TBM performance prediction. By conducting sensitivity analysis, it was found that UCS, BTS and TFC have the deepest impacts on the TBM PR.

140 citations


Journal ArticleDOI
TL;DR: By using machine learning and deep learning techniques, the proposed landslide identification method shows outstanding robustness and great potential in tackling the landslide identification problem.
Abstract: Landslide identification is critical for risk assessment and mitigation. This paper proposes a novel machine-learning and deep-learning method to identify natural-terrain landslides using integrated geodatabases. First, landslide-related data are compiled, including topographic data, geological data and rainfall-related data. Then, three integrated geodatabases are established; namely, Recent Landslide Database (RecLD), Relict Landslide Database (RelLD) and Joint Landslide Database (JLD). After that, five machine learning and deep learning algorithms, including logistic regression (LR), support vector machine (SVM), random forest (RF), boosting methods and convolutional neural network (CNN), are utilized and evaluated on each database. A case study in Lantau, Hong Kong, is conducted to demonstrate the application of the proposed method. From the results of the case study, CNN achieves an identification accuracy of 92.5% on RecLD, and outperforms other algorithms due to its strengths in feature extraction and multi dimensional data processing. Boosting methods come second in terms of accuracy, followed by RF, LR and SVM. By using machine learning and deep learning techniques, the proposed landslide identification method shows outstanding robustness and great potential in tackling the landslide identification problem.

124 citations


Journal ArticleDOI
TL;DR: In this paper, the teaching-learning-based optimization (TLBO) and satin bowerbird optimizer (SBO) algorithms were applied to optimize the adaptive neuro-fuzzy inference system (ANFIS) model for landslide susceptibility mapping.
Abstract: As threats of landslide hazards have become gradually more severe in recent decades, studies on landslide prevention and mitigation have attracted widespread attention in relevant domains. A hot research topic has been the ability to predict landslide susceptibility, which can be used to design schemes of land exploitation and urban development in mountainous areas. In this study, the teaching-learning-based optimization (TLBO) and satin bowerbird optimizer (SBO) algorithms were applied to optimize the adaptive neuro-fuzzy inference system (ANFIS) model for landslide susceptibility mapping. In the study area, 152 landslides were identified and randomly divided into two groups as training (70%) and validation (30%) dataset. Additionally, a total of fifteen landslide influencing factors were selected. The relative importance and weights of various influencing factors were determined using the step-wise weight assessment ratio analysis (SWARA) method. Finally, the comprehensive performance of the two models was validated and compared using various indexes, such as the root mean square error (RMSE), processing time, convergence, and area under receiver operating characteristic curves (AUROC). The results demonstrated that the AUROC values of the ANFIS, ANFIS-TLBO and ANFIS-SBO models with the training data were 0.808, 0.785 and 0.755, respectively. In terms of the validation dataset, the ANFIS-SBO model exhibited a higher AUROC value of 0.781, while the AUROC value of the ANFIS-TLBO and ANFIS models were 0.749 and 0.681, respectively. Moreover, the ANFIS-SBO model showed lower RMSE values for the validation dataset, indicating that the SBO algorithm had a better optimization capability. Meanwhile, the processing time and convergence of the ANFIS-SBO model were far superior to those of the ANFIS-TLBO model. Therefore, both the ensemble models proposed in this paper can generate adequate results, and the ANFIS-SBO model is recommended as the more suitable model for landslide susceptibility assessment in the study area considered due to its excellent accuracy and efficiency.

109 citations


Journal ArticleDOI
TL;DR: The ability of factor optimization methods to improve the performance of landslide susceptibility models is confirmed and the resultant hybrid models GeoDetector-RF and RFE-RF have high reliability and predictability.
Abstract: The present study aims to develop two hybrid models to optimize the factors and enhance the predictive ability of the landslide susceptibility models. For this, a landslide inventory map was created with 406 historical landslides and 2030 non-landslide points, which was randomly divided into two datasets for model training (70%) and model testing (30%). 22 factors were initially selected to establish a landslide factor database. We applied the GeoDetector and recursive feature elimination method (RFE) to address factor optimization to reduce information redundancy and collinearity in the data. Thereafter, the frequency ratio method, multicollinearity test, and interactive detector were used to analyze and evaluate the optimized factors. Subsequently, the random forest (RF) model was used to create a landslide susceptibility map with original and optimized factors. The resultant hybrid models GeoDetector-RF and RFE-RF were evaluated and compared by the area under the receiver operating characteristic curve (AUC) and accuracy. The accuracy of the two hybrid models (0.868 for GeoDetector-RF and 0.869 for RFE-RF) were higher than that of the RF model (0.860), indicating that the hybrid models with factor optimization have high reliability and predictability. Both RFE-RF GeoDetector-RF had higher AUC values, respectively 0.863 and 0.860, than RF (0.853). These results confirm the ability of factor optimization methods to improve the performance of landslide susceptibility models.

103 citations


Journal ArticleDOI
TL;DR: The results reveal that MLP can provide acceptable performance but is not robust and the standard RNN can perform better but the robustness is slightly affected when there are significant time lags between PWP changes and rainfall.
Abstract: Knowledge of pore-water pressure (PWP) variation is fundamental for slope stability. A precise prediction of PWP is difficult due to complex physical mechanisms and in situ natural variability. To explore the applicability and advantages of recurrent neural networks (RNNs) on PWP prediction, three variants of RNNs, i.e., standard RNN, long short-term memory (LSTM) and gated recurrent unit (GRU) are adopted and compared with a traditional static artificial neural network (ANN), i.e., multi-layer perceptron (MLP). Measurements of rainfall and PWP of representative piezometers from a fully instrumented natural slope in Hong Kong are used to establish the prediction models. The coefficient of determination (R2) and root mean square error (RMSE) are used for model evaluations. The influence of input time series length on the model performance is investigated. The results reveal that MLP can provide acceptable performance but is not robust. The uncertainty bounds of RMSE of the MLP model range from 0.24 kPa to 1.12 kPa for the selected two piezometers. The standard RNN can perform better but the robustness is slightly affected when there are significant time lags between PWP changes and rainfall. The GRU and LSTM models can provide more precise and robust predictions than the standard RNN. The effects of the hidden layer structure and the dropout technique are investigated. The single-layer GRU is accurate enough for PWP prediction, whereas a double-layer GRU brings extra time cost with little accuracy improvement. The dropout technique is essential to overfitting prevention and improvement of accuracy.

97 citations


Journal ArticleDOI
TL;DR: The results revealed that random forest (RF) classifier is a promising and optimum model for landslide susceptibility in the study area with a very high value of area under curve, lower value of mean absolute error, and higher value of Kappa index.
Abstract: Hazards and disasters have always negative impacts on the way of life. Landslide is an overwhelming natural as well as man-made disaster that causes loss of natural resources and human properties throughout the world. The present study aimed to assess and compare the prediction efficiency of different models in landslide susceptibility in the Kysuca river basin, Slovakia. In this regard, the fuzzy decision-making trial and evaluation laboratory combining with the analytic network process (FDEMATEL-ANP), Naive Bayes (NB) classifier, and random forest (RF) classifier were considered. Initially, a landslide inventory map was produced with 2000 landslide and non-landslide points by randomly divided with a ratio of 70%:30% for training and testing, respectively. The geospatial database for assessing the landslide susceptibility was generated with the help of 16 landslide conditioning factors by allowing for topographical, hydrological, lithological, and land cover factors. The ReliefF method was considered for determining the significance of selected conditioning factors and inclusion in the model building. Consequently, the landslide susceptibility maps (LSMs) were generated using the FDEMATEL-ANP, Naive Bayes (NB) classifier, and random forest (RF) classifier models. Finally, the area under curve (AUC) and different arithmetic evaluation were used for validating and comparing the results and models. The results revealed that random forest (RF) classifier is a promising and optimum model for landslide susceptibility in the study area with a very high value of area under curve (AUC = 0.954), lower value of mean absolute error (MAE = 0.1238) and root mean square error (RMSE = 0.2555), and higher value of Kappa index (K = 0.8435) and overall accuracy (OAC = 92.2%).

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper presented a machine learning approach based on the C5.0 decision tree (DT) model and the K-means cluster algorithm to produce a regional landslide susceptibility map.
Abstract: Machine learning algorithms are an important measure with which to perform landslide susceptibility assessments, but most studies use GIS-based classification methods to conduct susceptibility zonation. This study presents a machine learning approach based on the C5.0 decision tree (DT) model and the K-means cluster algorithm to produce a regional landslide susceptibility map. Yanchang County, a typical landslide-prone area located in northwestern China, was taken as the area of interest to introduce the proposed application procedure. A landslide inventory containing 82 landslides was prepared and subsequently randomly partitioned into two subsets: training data (70% landslide pixels) and validation data (30% landslide pixels). Fourteen landslide influencing factors were considered in the input dataset and were used to calculate the landslide occurrence probability based on the C5.0 decision tree model. Susceptibility zonation was implemented according to the cut-off values calculated by the K-means cluster algorithm. The validation results of the model performance analysis showed that the AUC (area under the receiver operating characteristic (ROC) curve) of the proposed model was the highest, reaching 0.88, compared with traditional models (support vector machine (SVM) = 0.85, Bayesian network (BN) = 0.81, frequency ratio (FR) = 0.75, weight of evidence (WOE) = 0.76). The landslide frequency ratio and frequency density of the high susceptibility zones were 6.76/km2 and 0.88/km2, respectively, which were much higher than those of the low susceptibility zones. The top 20% interval of landslide occurrence probability contained 89% of the historical landslides but only accounted for 10.3% of the total area. Our results indicate that the distribution of high susceptibility zones was more focused without containing more “stable” pixels. Therefore, the obtained susceptibility map is suitable for application to landslide risk management practices.

Journal ArticleDOI
TL;DR: This study presents a cutting-edge application of ensemble learning in geotechnical engineering and a reasonable methodology that allows engineers to determine the wall deflection in a fast, alternative way.
Abstract: This paper adopts the NGI-ADP soil model to carry out finite element analysis, based on which the effects of soft clay anisotropy on the diaphragm wall deflections in the braced excavation were evaluated. More than one thousand finite element cases were numerically analyzed, followed by extensive parametric studies. Surrogate models were developed via ensemble learning methods (ELMs), including the eXtreme Gradient Boosting (XGBoost), and Random Forest Regression (RFR) to predict the maximum lateral wall deformation (δhmax). Then the results of ELMs were compared with conventional soft computing methods such as Decision Tree Regression (DTR), Multilayer Perceptron Regression (MLPR), and Multivariate Adaptive Regression Splines (MARS). This study presents a cutting-edge application of ensemble learning in geotechnical engineering and a reasonable methodology that allows engineers to determine the wall deflection in a fast, alternative way.

Journal ArticleDOI
TL;DR: Multiple hybrid machine-learning models were developed to address parameter optimization limitations and enhance the spatial prediction of landslide susceptibility models to confirm the ability of metaheuristic algorithms to improve model performance.
Abstract: In this study, we developed multiple hybrid machine-learning models to address parameter optimization limitations and enhance the spatial prediction of landslide susceptibility models. We created a geographic information system database, and our analysis results were used to prepare a landslide inventory map containing 359 landslide events identified from Google Earth, aerial photographs, and other validated sources. A support vector regression (SVR) machine-learning model was used to divide the landslide inventory into training (70%) and testing (30%) datasets. The landslide susceptibility map was produced using 14 causative factors. We applied the established gray wolf optimization (GWO) algorithm, bat algorithm (BA), and cuckoo optimization algorithm (COA) to fine-tune the parameters of the SVR model to improve its predictive accuracy. The resultant hybrid models, SVR-GWO, SVR-BA, and SVR-COA, were validated in terms of the area under curve (AUC) and root mean square error (RMSE). The AUC values for the SVR-GWO (0.733), SVR-BA (0.724), and SVR-COA (0.738) models indicate their good prediction rates for landslide susceptibility modeling. SVR-COA had the greatest accuracy, with an RMSE of 0.21687, and SVR-BA had the least accuracy, with an RMSE of 0.23046. The three optimized hybrid models outperformed the SVR model (AUC = 0.704, RMSE = 0.26689), confirming the ability of metaheuristic algorithms to improve model performance.

Journal ArticleDOI
TL;DR: It is concluded that the DBPGA model is an excellent alternative tool for predicting flash flood susceptibility for other regions prone to flash floods.
Abstract: Flash floods are responsible for loss of life and considerable property damage in many countries. Flood susceptibility maps contribute to flood risk reduction in areas that are prone to this hazard if appropriately used by land-use planners and emergency managers. The main objective of this study is to prepare an accurate flood susceptibility map for the Haraz watershed in Iran using a novel modeling approach (DBPGA) based on Deep Belief Network (DBN) with Back Propagation (BP) algorithm optimized by the Genetic Algorithm (GA). For this task, a database comprising ten conditioning factors and 194 flood locations was created using the One-R Attribute Evaluation (ORAE) technique. Various well-known machine learning and optimization algorithms were used as benchmarks to compare the prediction accuracy of the proposed model. Statistical metrics include sensitivity, specificity accuracy, root mean square error (RMSE), and area under the receiver operatic characteristic curve (AUC) were used to assess the validity of the proposed model. The result shows that the proposed model has the highest goodness-of-fit (AUC = 0.989) and prediction accuracy (AUC = 0.985), and based on the validation dataset it outperforms benchmark models including LR (0.885), LMT (0.934), BLR (0.936), ADT (0.976), NBT (0.974), REPTree (0.811), ANFIS-BAT (0.944), ANFIS-CA (0.921), ANFIS-IWO (0.939), ANFIS-ICA (0.947), and ANFIS-FA (0.917). We conclude that the DBPGA model is an excellent alternative tool for predicting flash flood susceptibility for other regions prone to flash floods.

Journal ArticleDOI
TL;DR: In data-scarce environments, this research showed that utilizing GANs to generate supplementary samples is promising because it can improve the predictive capability of common landslide prediction models.
Abstract: In recent years, landslide susceptibility mapping has substantially improved with advances in machine learning. However, there are still challenges remain in landslide mapping due to the availability of limited inventory data. In this paper, a novel method that improves the performance of machine learning techniques is presented. The proposed method creates synthetic inventory data using Generative Adversarial Networks (GANs) for improving the prediction of landslides. In this research, landslide inventory data of 156 landslide locations were identified in Cameron Highlands, Malaysia, taken from previous projects the authors worked on. Elevation, slope, aspect, plan curvature, profile curvature, total curvature, lithology, land use and land cover (LULC), distance to the road, distance to the river, stream power index (SPI), sediment transport index (STI), terrain roughness index (TRI), topographic wetness index (TWI) and vegetation density are geo-environmental factors considered in this study based on suggestions from previous works on Cameron Highlands. To show the capability of GANs in improving landslide prediction models, this study tests the proposed GAN model with benchmark models namely Artificial Neural Network (ANN), Support Vector Machine (SVM), Decision Trees (DT), Random Forest (RF) and Bagging ensemble models with ANN and SVM models. These models were validated using the area under the receiver operating characteristic curve (AUROC). The DT, RF, SVM, ANN and Bagging ensemble could achieve the AUROC values of (0.90, 0.94, 0.86, 0.69 and 0.82) for the training; and the AUROC of (0.76, 0.81, 0.85, 0.72 and 0.75) for the test, subsequently. When using additional samples, the same models achieved the AUROC values of (0.92, 0.94, 0.88, 0.75 and 0.84) for the training and (0.78, 0.82, 0.82, 0.78 and 0.80) for the test, respectively. Using the additional samples improved the test accuracy of all the models except SVM. As a result, in data-scarce environments, this research showed that utilizing GANs to generate supplementary samples is promising because it can improve the predictive capability of common landslide prediction models.

Journal ArticleDOI
TL;DR: A machine learning model to predict the TBM performance in a real-time manner and it is found that the missing of a key parameter can significantly reduce the accuracy of the model, while the supplement of a parameter that highly-correlated with the missing one can improve the prediction.
Abstract: Predicting the performance of a tunneling boring machine is vitally important to avoid any possible accidents during tunneling boring. The prediction is not straightforward due to the uncertain geological conditions and the complex rock-machine interactions. Based on the big data obtained from the 72.1 ​km long tunnel in the Yin-Song Diversion Project in China, this study developed a machine learning model to predict the TBM performance in a real-time manner. The total thrust and the cutterhead torque during a stable period in a boring cycle was predicted in advance by using the machine-returned parameters in the rising period. A long short-term memory model was developed and its accuracy was evaluated. The results show that the variation in the total thrust and cutterhead torque with various geological conditions can be well reflected by the proposed model. This real-time predication shows superior performance than the classical theoretical model in which only a single value can be obtained based on the single measurement of the rock properties. To improve the accuracy of the model a filtering process was proposed. Results indicate that filtering the unnecessary parameters can enhance both the accuracy and the computational efficiency. Finally, the data deficiency was discussed by assuming a parameter was missing. It is found that the missing of a key parameter can significantly reduce the accuracy of the model, while the supplement of a parameter that highly-correlated with the missing one can improve the prediction.

Journal ArticleDOI
TL;DR: The results show that the ‘ensemble’ GBRT machine learning model yielded the most promising results for the spatial prediction of shallow landslides, with a 95% probability of landslide detection and 87% prediction efficiency.
Abstract: This paper introduces three machine learning (ML) algorithms, the ‘ensemble’ Random Forest (RF), the ‘ensemble’ Gradient Boosted Regression Tree (GBRT) and the MultiLayer Perceptron neural network (MLP) and applies them to the spatial modelling of shallow landslides near Kvam in Norway. In the development of the ML models, a total of 11 significant landslide controlling factors were selected. The controlling factors relate to the geomorphology, geology, geo-environment and anthropogenic effects: slope angle, aspect, plan curvature, profile curvature, flow accumulation, flow direction, distance to rivers, water content, saturation, rainfall and distance to roads. It is observed that slope angle was the most significant controlling factor in the ML analyses. The performance of the three ML models was evaluated quantitatively based on the Receiver Operating Characteristic (ROC) analysis. The results show that the ‘ensemble’ GBRT machine learning model yielded the most promising results for the spatial prediction of shallow landslides, with a 95% probability of landslide detection and 87% prediction efficiency.

Journal ArticleDOI
TL;DR: The maximum likelihood age (MLA) method as discussed by the authors was developed for fission track thermochronology by Galbraith and Laslett (1993) to estimate the maximum depositional age (MDA) of siliclastic rocks.
Abstract: In a recent review published in this journal, Coutts et al. (2019) compared nine different ways to estimate the maximum depositional age (MDA) of siliclastic rocks by means of detrital geochronology. Their results show that among these methods three are positively and six negatively biased. This paper investigates the cause of these biases and proposes a solution to it. A simple toy example shows that it is theoretically impossible for the reviewed methods to find the correct depositional age in even a best case scenario: the MDA estimates drift to ever smaller values with increasing sample size. The issue can be solved using a maximum likelihood model that was originally developed for fission track thermochronology by Galbraith and Laslett (1993). This approach parameterises the MDA estimation problem with a binary mixture of discrete and continuous distributions. The ‘Maximum Likelihood Age’ (MLA) algorithm converges to a unique MDA value, unlike the ad hoc methods reviewed by Coutts et al. (2019). It successfully recovers the depositional age for the toy example, and produces sensible results for realistic distributions. This is illustrated with an application to a published dataset of 13 sandstone samples that were analysed by both LA-ICPMS and CA-TIMS U–Pb geochronology. The ad hoc algorithms produce unrealistic MDA estimates that are systematically younger for the LA-ICPMS data than for the CA-TIMS data. The MLA algorithm does not suffer from this negative bias. The MLA method is a purely statistical approach to MDA estimation. Like the ad hoc methods, it does not readily accommodate geological complications such as post-depositional Pb-loss, or analytical issues causing erroneously young outliers. The best approach in such complex cases is to re-analyse the youngest grains using more accurate dating techniques. The results of the MLA method are best visualised on radial plots. Both the model and the plots have applications outside detrital geochronology, for example to determine volcanic eruption ages.

Journal ArticleDOI
TL;DR: The results indicate that ML models outperform empirical prediction formulations with lower prediction error and the predicted correlations between input and output variables using five ML models show great agreement with the physical explanation.
Abstract: Compression index Cc is an essential parameter in geotechnical design for which the effectiveness of correlation is still a challenge. This paper suggests a novel modelling approach using machine learning (ML) technique. The performance of five commonly used machine learning (ML) algorithms, i.e. back-propagation neural network (BPNN), extreme learning machine (ELM), support vector machine (SVM), random forest (RF) and evolutionary polynomial regression (EPR) in predicting Cc is comprehensively investigated. A database with a total number of 311 datasets including three input variables, i.e. initial void ratio e0, liquid limit water content wL, plasticity index Ip, and one output variable Cc is first established. Genetic algorithm (GA) is used to optimize the hyper-parameters in five ML algorithms, and the average prediction error for the 10-fold cross-validation (CV) sets is set as the fitness function in the GA for enhancing the robustness of ML models. The results indicate that ML models outperform empirical prediction formulations with lower prediction error. RF yields the lowest error followed by BPNN, ELM, EPR and SVM. If the ranges of input variables in the database are large enough, BPNN and RF models are recommended to predict Cc. Furthermore, if the distribution of input variables is continuous, RF model is the best one. Otherwise, EPR model is recommended if the ranges of input variables are small. The predicted correlations between input and output variables using five ML models show great agreement with the physical explanation.

Journal ArticleDOI
TL;DR: In this article, the authors classified the PM particles into coarse (2.5-10 μm), fine (0.1-2.1 μm) and ultrafine (1.5μm) classes according to their source of emission, geography, and local meteorology.
Abstract: Air pollution by particulate matter (PM) is one of the main threats to human health, particularly in large cities where pollution levels are continually exceeded. According to their source of emission, geography, and local meteorology, the pollutant particles vary in size and composition. These particles are conditioned to the aerodynamic diameter and thus classified as coarse (2.5–10 μm), fine (0.1–2.5 μm), and ultrafine (

Journal ArticleDOI
TL;DR: In this paper, a deep learning algorithm viz. convolutional neural network (CNN) and three popular machine learning techniques, i.e., random forest model (RF), artificial neural network model (ANN), and bagging model, were employed to prepare landslide susceptibility maps (LSMs).
Abstract: Landslide is considered as one of the most severe threats to human life and property in the hilly areas of the world. The number of landslides and the level of damage across the globe has been increasing over time. Therefore, landslide management is essential to maintain the natural and socio-economic dynamics of the hilly region. Rorachu river basin is one of the most landslide-prone areas of the Sikkim selected for the present study. The prime goal of the study is to prepare landslide susceptibility maps (LSMs) using computer-based advanced machine learning techniques and compare the performance of the models. To properly understand the existing spatial relation with the landslide, twenty factors, including triggering and causative factors, were selected. A deep learning algorithm viz. convolutional neural network model (CNN) and three popular machine learning techniques, i.e., random forest model (RF), artificial neural network model (ANN), and bagging model, were employed to prepare the LSMs. Two separate datasets including training and validation were designed by randomly taken landslide and non-landslide points. A ratio of 70:30 was considered for the selection of both training and validation points. Multicollinearity was assessed by tolerance and variance inflation factor, and the role of individual conditioning factors was estimated using information gain ratio. The result reveals that there is no severe multicollinearity among the landslide conditioning factors, and the triggering factor rainfall appeared as the leading cause of the landslide. Based on the final prediction values of each model, LSM was constructed and successfully portioned into five distinct classes, like very low, low, moderate, high, and very high susceptibility. The susceptibility class-wise distribution of landslides shows that more than 90% of the landslide area falls under higher landslide susceptibility grades. The precision of models was examined using the area under the curve (AUC) of the receiver operating characteristics (ROC) curve and statistical methods like root mean square error (RMSE) and mean absolute error (MAE). In both datasets (training and validation), the CNN model achieved the maximum AUC value of 0.903 and 0.939, respectively. The lowest value of RMSE and MAE also reveals the better performance of the CNN model. So, it can be concluded that all the models have performed well, but the CNN model has outperformed the other models in terms of precision.

Journal ArticleDOI
TL;DR: In this paper, the authors used Bayesian regularization back propagation (BRBP) neural network, classification and regression trees (CART), a statistical model (STM) using the evidence belief function (EBF), and their ensemble models (EMs) for three time periods (2000, 2014, and 2017).
Abstract: Bangladesh experiences frequent hydro-climatic disasters such as flooding. These disasters are believed to be associated with land use changes and climate variability. However, identifying the factors that lead to flooding is challenging. This study mapped flood susceptibility in the northeast region of Bangladesh using Bayesian regularization back propagation (BRBP) neural network, classification and regression trees (CART), a statistical model (STM) using the evidence belief function (EBF), and their ensemble models (EMs) for three time periods (2000, 2014, and 2017). The accuracy of machine learning algorithms (MLAs), STM, and EMs were assessed by considering the area under the curve—receiver operating characteristic (AUC-ROC). Evaluation of the accuracy levels of the aforementioned algorithms revealed that EM4 (BRBP-CART-EBF) outperformed (AUC > 90%) standalone and other ensemble models for the three time periods analyzed. Furthermore, this study investigated the relationships among land cover change (LCC), population growth (PG), road density (RD), and relative change of flooding (RCF) areas for the period between 2000 and 2017. The results showed that areas with very high susceptibility to flooding increased by 19.72% between 2000 and 2017, while the PG rate increased by 51.68% over the same period. The Pearson correlation coefficient for RCF and RD was calculated to be 0.496. These findings highlight the significant association between floods and causative factors. The study findings could be valuable to policymakers and resource managers as they can lead to improvements in flood management and reduction in flood damage and risks.

Journal ArticleDOI
TL;DR: It was found that MPMR model can be used as a reliable soft computing technique for non-linear problems for settlement of shallow foundations on soils and outperformed PSO-ANFIS andPSO-ANN.
Abstract: This research focuses on the application of three soft computing techniques including Minimax Probability Machine Regression (MPMR), Particle Swarm Optimization based Artificial Neural Network (ANN-PSO) and Particle Swarm Optimization based Adaptive Network Fuzzy Inference System (ANFIS-PSO) to study the shallow foundation reliability based on settlement criteria. Soil is a heterogeneous medium and the involvement of its attributes for geotechnical behaviour in soil-foundation system makes the prediction of settlement of shallow a complex engineering problem. This study explores the feasibility of soft computing techniques against the deterministic approach. The settlement of shallow foundation depends on the parameters γ (unit weight), e 0 (void ratio) and C C (compression index). These soil parameters are taken as input variables while the settlement of shallow foundation as output. To assess the performance of models, different performance indices i.e. RMSE, VAF, R2, Bias Factor, MAPE, LMI, U95, RSR, NS, RPD, etc. were used. From the analysis of results, it was found that MPMR model outperformed PSO-ANFIS and PSO-ANN. Therefore, MPMR can be used as a reliable soft computing technique for non-linear problems for settlement of shallow foundations on soils.

Journal ArticleDOI
Jiayao Chen1, Tongjun Yang, Dongming Zhang1, Hongwei Huang1, Yu Tian 
TL;DR: A framework for classifying multiple rock structures based on the geological images of tunnel face using convolutional neural networks (CNN), namely Inception-ResNet-V2 (IRV2) is presented, which exhibits the best performance in terms of various indicators, such as precision, recall, F-score, and testing time per image.
Abstract: The automated interpretation of rock structure can improve the efficiency, accuracy, and consistency of the geological risk assessment of tunnel face. Because of the high uncertainties in the geological images as a result of different regional rock types, as well as in-situ conditions (e.g., temperature, humidity, and construction procedure), previous automated methods have limited performance in classification of rock structure of tunnel face during construction. This paper presents a framework for classifying multiple rock structures based on the geological images of tunnel face using convolutional neural networks (CNN), namely Inception-ResNet-V2 (IRV2). A prototype recognition system is implemented to classify 5 types of rock structures including mosaic, granular, layered, block, and fragmentation structures. The proposed IRV2 network is trained by over 35,000 out of 42,400 images extracted from over 150 sections of tunnel faces and tested by the remaining 7400 images. Furthermore, different hyperparameters of the CNN model are introduced to optimize the most efficient algorithm parameter. Among all the discussed models, i.e., ResNet-50, ResNet-101, and Inception-v4, Inception-ResNet-V2 exhibits the best performance in terms of various indicators, such as precision, recall, F-score, and testing time per image. Meanwhile, the model trained by a large database can obtain the object features more comprehensively, leading to higher accuracy. Compared with the original image classification method, the sub-image method is closer to the reality considering both the accuracy and the perspective of error divergence. The experimental results reveal that the proposed method is optimal and efficient for automated classification of rock structure using the geological images of the tunnel face.

Journal ArticleDOI
TL;DR: The authors investigated the influence of incomplete landslide data on national scale statistical landslide susceptibility modeling for China and concluded that ignoring landslide inventory-based incompleteness can entail misleading modelling results and that the application of non-linear mixed-effect models can reduce the propagation of such biases into the final results for very large areas.
Abstract: China is one of the countries where landslides caused the most fatalities in the last decades. The threat that landslide disasters pose to people might even be greater in the future, due to climate change and the increasing urbanization of mountainous areas. A reliable national-scale rainfall induced landslide susceptibility model is therefore of great relevance in order to identify regions more and less prone to landsliding as well as to develop suitable risk mitigating strategies. However, relying on imperfect landslide data is inevitable when modelling landslide susceptibility for such a large research area. The purpose of this study is to investigate the influence of incomplete landslide data on national scale statistical landslide susceptibility modeling for China. In this context, it is aimed to explore the benefit of mixed effects modelling to counterbalance associated bias propagations. Six influencing factors including lithology, slope, soil moisture index, mean annual precipitation, land use and geological environment regions were selected based on an initial exploratory data analysis. Three sets of influencing variables were designed to represent different solutions to deal with spatially incomplete landslide information: Set 1 (disregards the presence of incomplete landslide information), Set 2 (excludes factors related to the incompleteness of landslide data), Set 3 (accounts for factors related to the incompleteness via random effects). The variable sets were then introduced in a generalized additive model (GAM: Set 1 and Set 2) and a generalized additive mixed effect model (GAMM: Set 3) to establish three national-scale statistical landslide susceptibility models: models 1, 2 and 3. The models were evaluated using the area under the receiver operating characteristics curve (AUROC) given by spatially explicit and non-spatial cross-validation. The spatial prediction pattern produced by the models were also investigated. The results show that the landslide inventory incompleteness had a substantial impact on the outcomes of the statistical landslide susceptibility models. The cross-validation results provided evidence that the three established models performed well to predict model-independent landslide information with median AUROCs ranging from 0.8 to 0.9. However, although Model 1 reached the highest AUROCs within non-spatial cross-validation (median of 0.9), it was not associated with the most plausible representation of landslide susceptibility. The Model 1 modelling results were inconsistent with geomorphological process knowledge and reflected a large extent the underlying data bias. The Model 2 susceptibility maps provided a less biased picture of landslide susceptibility. However, a lower predicted likelihood of landslide occurrence still existed in areas known to be underrepresented in terms of landslide data (e.g., the Kuenlun Mountains in the northern Tibetan Plateau). The non-linear mixed-effects model (Model 3) reduced the impact of these biases best by introducing bias-describing variables as random effects. Among the three models, Model 3 was selected as the best national-scale susceptibility model for China as it produced the most plausible portray of rainfall induced landslide susceptibility and the highest spatially explicit predictive performance (median AUROC of spatial cross validation 0.84) compared to the other two models (median AUROCs of 0.81 and 0.79, respectively). We conclude that ignoring landslide inventory-based incompleteness can entail misleading modelling results and that the application of non-linear mixed-effect models can reduce the propagation of such biases into the final results for very large areas.

Journal ArticleDOI
TL;DR: The spatially explicit deep learning neural network models are successful in capturing the heterogeneity of spatial patterns of flood probability in the Golestan Province, and the resulting probability maps can be used for the development of mitigation plans in response to the future floods.
Abstract: Flood probability maps are essential for a range of applications, including land use planning and developing mitigation strategies and early warning systems. This study describes the potential application of two architectures of deep learning neural networks, namely convolutional neural networks (CNN) and recurrent neural networks (RNN), for spatially explicit prediction and mapping of flash flood probability. To develop and validate the predictive models, a geospatial database that contained records for the historical flood events and geo-environmental characteristics of the Golestan Province in northern Iran was constructed. The step-wise weight assessment ratio analysis (SWARA) was employed to investigate the spatial interplay between floods and different influencing factors. The CNN and RNN models were trained using the SWARA weights and validated using the receiver operating characteristics technique. The results showed that the CNN model (AUC = 0.832, RMSE = 0.144) performed slightly better than the RNN model (AUC = 0.814, RMSE = 0.181) in predicting future floods. Further, these models demonstrated an improved prediction of floods compared to previous studies that used different models in the same study area. This study showed that the spatially explicit deep learning neural network models are successful in capturing the heterogeneity of spatial patterns of flood probability in the Golestan Province, and the resulting probability maps can be used for the development of mitigation plans in response to the future floods. The general policy implication of our study suggests that design, implementation, and verification of flood early warning systems should be directed to approximately 40% of the land area characterized by high and very susceptibility to flooding.

Journal ArticleDOI
TL;DR: An intelligent framework for predicting the advancing speed during earth pressure balance (EPB) shield tunnelling using five artificial intelligence models based on machine and deep learning techniques reveals that the main thrust, penetration, foam volume, and grouting volume have strong correlations with advancing speed.
Abstract: This paper introduces an intelligent framework for predicting the advancing speed during earth pressure balance (EPB) shield tunnelling. Five artificial intelligence (AI) models based on machine and deep learning techniques—back-propagation neural network (BPNN), extreme learning machine (ELM), support vector machine (SVM), long-short term memory (LSTM), and gated recurrent unit (GRU)—are used. Five geological and nine operational parameters that influence the advancing speed are considered. A field case of shield tunnelling in Shenzhen City, China is analyzed using the developed models. A total of 1000 field datasets are adopted to establish intelligent models. The prediction performance of the five models is ranked as GRU > LSTM > SVM > ELM > BPNN. Moreover, the Pearson correlation coefficient (PCC) is adopted for sensitivity analysis. The results reveal that the main thrust (MT), penetration (P), foam volume (FV), and grouting volume (GV) have strong correlations with advancing speed (AS). An empirical formula is constructed based on the high-correlation influential factors and their corresponding field datasets. Finally, the prediction performances of the intelligent models and the empirical method are compared. The results reveal that all the intelligent models perform better than the empirical method.

Journal ArticleDOI
TL;DR: In this paper, a simple equation based on analytical solution is proposed to calculate groundwater heads inside and outside of the excavation pit with waterproof curtain (hereafter refer to close barrier) in a confined aquifer.
Abstract: When pumping is conducted in confined aquifer inside excavation pit (waterproof curtain), the direction of the groundwater seepage outside the excavation changes from horizontal to vertical owing to the existence of the curtain barrier. There is no analytical calculation method for the groundwater head distribution induced by dewatering inside excavation. This paper first analyses the mechanism of the blocking effects from a close barrier in confined aquifer. Then, a simple equation based on analytical solution is proposed to calculate groundwater heads inside and outside of the excavation pit with waterproof curtain (hereafter refer to close barrier) in a confined aquifer. The distribution of groundwater head is derived according to two conditions: (i) pumping with a constant water head, and (ii) pumping with a constant flow rate. The proposed calculation equation is verified by both numerical simulation and experimental results. The comparisons demonstrate that the proposed model can be applied in engineering practice of excavation.

Journal ArticleDOI
TL;DR: In this article, a base numerical simulation model of a typical depleted high-temperature gas reservoir was established to simulate the geothermal energy exploitation processes via recycling CO2 and water, with a view to investigate whether and/or at which conditions CO2 is more suitable than water for geothermal EH exploitation.
Abstract: CO2 can be used as an alternative injectant to exploit geothermal energy from depleted high-temperature gas reservoirs due to its high mobility and unique thermal properties. However, there has been a lack of systematic analysis on the heat mining mechanism and performance of CO2, as well as the problems that may occur during geothermal energy exploitation at specific gas reservoir conditions. In this paper, a base numerical simulation model of a typical depleted high-temperature gas reservoir was established to simulate the geothermal energy exploitation processes via recycling CO2 and water, with a view to investigate whether and/or at which conditions CO2 is more suitable than water for geothermal energy exploitation. The problems that may occur during the CO2-based geothermal energy exploitation were also analyzed along with proposed feasible solutions. The results indicate that, for a depleted low-permeability gas reservoir with dimensions of 1000 m × 500 m × 50 m and temperature of 150 °C using a single injection-production well group for 40 years of operation, the heat mining rate of CO2 can be up to 3.8 MW at a circulation flow rate of 18 kg s−1 due to its high mobility along with the flow path in the gas reservoir, while the heat mining rate of water is only about 2 MW due to limitations on the injectivity and mobility. The reservoir physical property and injection-production scheme have some effects on the heat mining rate, but CO2 always has better performance than water at most reservoir and operation conditions, even under a high water saturation. The main problems for CO2 circulation are wellbore corrosion and salt precipitation that can occur when the reservoir has high water saturation and high salinity, in which serious salt precipitation can reduce formation permeability and result in a decline of CO2 heat mining rate (e.g. up to 24% reduction). It is proposed to apply a low-salinity water slug before CO2 injection to reduce the damage caused by salt precipitation. For high-permeability gas reservoirs with high water saturation and high salinity, the superiority of CO2 as a heat transmission fluid becomes obscure and water injection is recommended.