(Open Access) Predictive Modeling Approach for Surface Water Quality: Development and Comparison of Machine Learning Models (2021) | Muhammad Izhar Shah

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Data-driven soft computing modeling of groundwater quality parameters in southeast Nigeria: comparing the performances of different algorithms

[...]

Johnbosco C. Egbueri, J. C. Agbasi

25 Jan 2022-Environmental Science and Pollution Research

36 citations

Journal Article•DOI•

Comparative Assessment of Individual and Ensemble Machine Learning Models for Efficient Analysis of River Water Quality

[...]

Abdulaziz Alqahtani, Muhammad Izhar Shah, Ali Aldrees, Muhammad Faisal Javed

21 Jan 2022-Sustainability

TL;DR: In this paper , a comparison of individual supervised ML models, such as gene expression programming (GEP) and artificial neural network (ANN), with that of an ensemble learning model, i.e., random forest (RF), for predicting river water salinity in terms of electrical conductivity (EC) and dissolved solids (TDS) in the Upper Indus River basin, Pakistan.

...read moreread less

Abstract: The prediction accuracies of machine learning (ML) models may not only be dependent on the input parameters and training dataset, but also on whether an ensemble or individual learning model is selected. The present study is based on the comparison of individual supervised ML models, such as gene expression programming (GEP) and artificial neural network (ANN), with that of an ensemble learning model, i.e., random forest (RF), for predicting river water salinity in terms of electrical conductivity (EC) and dissolved solids (TDS) in the Upper Indus River basin, Pakistan. The projected models were trained and tested by using a dataset of seven input parameters chosen on the basis of significant correlation. Optimization of the ensemble RF model was achieved by producing 20 sub-models in order to choose the accurate one. The goodness-of-fit of the models was assessed through well-known statistical indicators, such as the coefficient of determination (R2), mean absolute error (MAE), root mean squared error (RMSE), and Nash–Sutcliffe efficiency (NSE). The results demonstrated a strong association between inputs and modeling outputs, where R2 value was found to be 0.96, 0.98, and 0.92 for the GEP, RF, and ANN models, respectively. The comparative performance of the proposed methods showed the relative superiority of the RF compared to GEP and ANN. Among the 20 RF sub-models, the most accurate model yielded the R2 equal to 0.941 and 0.938, with 70 and 160 numbers of corresponding estimators. The lowest RMSE values of 1.37 and 3.1 were yielded by the ensemble RF model on training and testing data, respectively. The results of the sensitivity analysis demonstrated that HCO3− is the most effective variable followed by Cl− and SO42− for both the EC and TDS. The assessment of the models on external criteria ensured the generalized results of all the aforementioned techniques. Conclusively, the outcome of the present research indicated that the RF model with selected key parameters could be prioritized for water quality assessment and management.

...read moreread less

16 citations

Estimation of Water Quality Parameters with Data-Driven Models (In Press)

[...]

Mohammad Taghi Sattari, Ali Rezazadeh Joudi, Andrew Kusiak

01 Jan 2016

TL;DR: In this paper, Chen et al. used neural networks (NNs), fuzzy inference methods, support vector machines (SVMs), and k-nearest neighbors (k-NN) to solve complex problems in high dimensions.

...read moreread less

Abstract: 2016 © American Water Works Association JOURNAL AWWA APRIL 2016 | 108:4 Assessment of surface water quality is important in the management of water resources (Dogan et al. 2009). Water quality in rivers is paramount to the well-being of nature and humans, and surface water quality is usually related to the type of surrounding industries, agriculture, and human activities. Water is withdrawn from the hydrologic cycle to meet various needs and then is returned (Banejad & Olyaie 2011). Given the essential role of rivers to agricultural, industrial, and urban needs, it is necessary to regularly monitor and evaluate water quality in rivers. As rivers pass through different regions, changes in water quality and the level of hydrochemical parameters are observed in these regions. Because of the gradual decline in water quality over time, regulatory bodies in various countries have made decisions to mitigate the damage. Ecologically acceptable water management calls for accurate modeling, forecasting, and analyzing water quality in rivers (Durdu 2010). Numerous models have been developed for management of water quality, such as QUAL2E, Water Quality Analysis Simulation, and the US Army Corps of Engineers’ Hydrologic Engineering Center-5Q (Chen et al. 2003). Using these models is time-consuming and expensive; therefore, development of cost-effective models is encouraged. Because of the propensity of varied standards for water quality, different parameters are used as quality indicators. The quantity of ammonia, cadmium, chemical oxygen demand, chlorine, copper, dissolved phosphorus, lead, nitrogen dioxide, suspended solids, total nitrogen, total phosphorus, zinc, sodium, sodium adsorption ratio, sulfate ions, bicarbonate ions, electrical conductivity (EC), total dissolved solids (TDS), and pH is frequently measured at water quality monitoring stations. EC and TDS levels in water are two of the main parameters used to determine quality of drinking and agricultural water because they directly represent the total concentration of salt in water. High EC and TDS values are not desirable in water used for irrigation because salt affects plant growth through osmosis (Phocaides 2000). Advances in data science and data mining methods such as neural networks (NNs), fuzzy inference methods, support vector machines (SVMs), and k-nearest neighbors (k-NN), have made it possible to solve complex problems in high dimensions. The general principle behind these methods lies in exploring hidden relationships in large volumes of data and building models that reflect physical processes governing the system under study. A data-derived model represents a relationship between input variables and output variables. Such a model can be highly accurate because it captures relationships of any kind that are expressed in data, including the underlying physics and chemistry.

...read moreread less

14 citations

Journal Article•DOI•

A Study of Assessment and Prediction of Water Quality Index Using Fuzzy Logic and ANN Models

[...]

Roman Trach, Yu. A. Trach, Agnieszka Kiersnowska, Anna Markiewicz, Marzena Lendo-Siwicka, Konstantin Rusakov - Show less +2 more

07 May 2022-Sustainability

TL;DR: The modification of the Ukrainian method for assessing the WQI, taking into account the level of negative impact of the most dangerous chemical elements is modified, using fuzzy logic and the creation of an artificial neural network model for the prediction of the W QI is proposed.

...read moreread less

Abstract: Various human activities have been the main causes of surface water pollution. The uneven distribution of industrial enterprises in the territories of the main river basins of Ukraine do not always allow the real state of the water quality to be assessed. This article has three purposes: (1) the modification of the Ukrainian method for assessing the WQI, taking into account the level of negative impact of the most dangerous chemical elements, (2) the modeling of WQI assessment using fuzzy logic and (3) the creation of an artificial neural network model for the prediction of the WQI. The fuzzy logic model used four input variables and calculated one output variable (WQI). In the final stage of the study, six ANN models were analyzed, which differed from each other in various loss function optimizers and activation functions. The optimal results were shown using an ANN with the softmax activation function and Adam’s loss function optimizer (MAPE = 9.6%; R2 = 0.964). A comparison of the MAPE and R2 indicators of the created ANN model with other models for assessing water quality showed that the level of agreement between the forecast and target data is satisfactory. The novelty of this study is in the proposal to modify the WQI assessment methodology which is used in Ukraine. At the same time, the phased and joint use of mathematical tools such as the fuzzy logic method and the ANN allow one to effectively evaluate and predict WQI values, respectively.

...read moreread less

11 citations

Journal Article•DOI•

Multigene Expression Programming Based Forecasting the Hardened Properties of Sustainable Bagasse Ash Concrete.

[...]

Muhammad Nasir Amin¹, Kaffayatullah Khan¹, Fahid Aslam², Muhammad Izhar Shah³, Muhammad Faisal Javed³, Muhammad Ali Musarat⁴, Kseniia Usanova⁵ - Show less +3 more•Institutions (5)

King Faisal University¹, Salman bin Abdulaziz University², COMSATS Institute of Information Technology³, Universiti Teknologi Petronas⁴, Saint Petersburg State Polytechnic University⁵

28 Sep 2021-Materials

TL;DR: In this paper, an improved form of supervised machine learning, i.e., multigene expression programming (MEP), has been used to propose models for the compressive strength (fc'), splitting tensile strength (fSTS), and flexural strength of sustainable bagasse ash concrete (BAC).

...read moreread less

Abstract: The application of multiphysics models and soft computing techniques is gaining enormous attention in the construction sector due to the development of various types of concrete. In this research, an improved form of supervised machine learning, i.e., multigene expression programming (MEP), has been used to propose models for the compressive strength (fc'), splitting tensile strength (fSTS), and flexural strength (fFS) of sustainable bagasse ash concrete (BAC). The training and testing of the proposed models have been accomplished by developing a reliable and comprehensive database from published literature. Concrete specimens with varying proportions of sugarcane bagasse ash (BA), as a partial replacement of cement, were prepared, and the developed models were validated by utilizing the results obtained from the tested BAC. Different statistical tests evaluated the accurateness of the models, and the results were cross-validated employing a k-fold algorithm. The modeling results achieve correlation coefficient (R) and Nash-Sutcliffe efficiency (NSE) above 0.8 each with relative root mean squared error (RRMSE) and objective function (OF) less than 10 and 0.2, respectively. The MEP model leads in providing reliable mathematical expression for the estimation of fc', fSTS and fFS of BA concrete, which can reduce the experimental workload in assessing the strength properties. The study's findings indicated that MEP-based modeling integrated with experimental testing of BA concrete and further cross-validation is effective in predicting the strength parameters of BA concrete.

...read moreread less

9 citations

Predictive Modeling Approach for Surface Water Quality: Development and Comparison of Machine Learning Models

Citations

References

Related Papers (5)