scispace - formally typeset

Journal ArticleDOI

A novel bagging ensemble approach for predicting summertime ground-level ozone concentration.

01 Feb 2019-Journal of The Air & Waste Management Association (Taylor & Francis)-Vol. 69, Iss: 2, pp 220-233

TL;DR: The feasibility of using ensemble model with seven meteorological parameters as input variables to predict the surface level O3 concentration andagged random forest predicted the ground level ozone better with higher Nash-Sutcliffe coefficient than conventional models.
Abstract: Ozone pollution appears as a major air quality issue, e.g. for the protection of human health and vegetation. Formation of ground level ozone is a complex photochemical phenomenon and involves numerous intricate factors most of which are interrelated with each other. Machine learning techniques can be adopted to predict the ground level ozone. The main objective of the present study is to develop the state-of-the-art ensemble bagging approach to model the summer time ground level ozone in an industrial area comprising a hazardous waste management facility. In this study, the feasibility of using ensemble model with seven meteorological parameters as input variables to predict the surface level O3 concentration. Multilayer perceptron, RTree, REPTree, and Random forest were employed as the base learners. The error measures used for checking the performance of each model includes IoAd, R2, and PEP. The model results were validated against an independent test data set. Bagged random forest predicted the ground level ozone better with higher Nash-Sutcliffe coefficient 0.93. This study scaffolded the current research gap in big data analysis identified with air pollutant prediction. Implications: The main focus of this paper is to model the summer time ground level O3 concentration in an Industrial area comprising of hazardous waste management facility. Comparison study was made between the base classifiers and the ensemble classifiers. Most of the conventional models can well predict the average concentrations. In this case the peak concentrations are of importance as it has serious effect on human health and environment. The models developed should also be homoscedastic.
Citations
More filters

Journal ArticleDOI
Abstract: Hydrogen sulfide (H2S) is regarded as a broad-spectrum poison associated with severe health consequences. Among the available treatment options, photocatalytic technology may be effectively applied to the production of hydrogen gas through the splitting of H2S molecules and the addition of 79.9 kJ mol−1 of energy. As a result, advanced photo-reactive media may provide a win-win strategy to treat the parent pollutant (H2S) while producing hydrogen gas. This review encompasses both TiO2 and non-TiO2 catalysts capable of operating under ultraviolet, visible, and solar light irradiation. The performances of photocatalysts are assessed in terms of quantum yield, space-time yield, and other operational variables, including mode of operation, irradiation time, and relative humidity. The concept of space velocity is used to compare photocatalysts in reference to benchmark parameters for the treatment of H2S. This review addresses current limitations and future prospects of the application of photocatalytic technology to efficiently mitigate H2S pollution.

45 citations


Journal ArticleDOI
01 May 2020-Sustainability
TL;DR: This study critically investigates, analyses, and summarizes the existing soft computing modeling approaches in air quality modeling, and reviews and discusses artificial neural network (ANN), support vector machine (SVM), evolutionary ANN and SVM, the fuzzy logic model, neuro-fuzzy systems, the deep learning model, ensemble, and other hybrid models.
Abstract: Air quality models simulate the atmospheric environment systems and provide increased domain knowledge and reliable forecasting. They provide early warnings to the population and reduce the number of measuring stations. Due to the complexity and non-linear behavior associated with air quality data, soft computing models became popular in air quality modeling (AQM). This study critically investigates, analyses, and summarizes the existing soft computing modeling approaches. Among the many soft computing techniques in AQM, this article reviews and discusses artificial neural network (ANN), support vector machine (SVM), evolutionary ANN and SVM, the fuzzy logic model, neuro-fuzzy systems, the deep learning model, ensemble, and other hybrid models. Besides, it sheds light on employed input variables, data processing approaches, and targeted objective functions during modeling. It was observed that many advanced, reliable, and self-organized soft computing models like functional network, genetic programming, type-2 fuzzy logic, genetic fuzzy, genetic neuro-fuzzy, and case-based reasoning are rarely explored in AQM. Therefore, the partially explored and unexplored soft computing techniques can be appropriate choices for research in the field of air quality modeling. The discussion in this paper will help to determine the suitability and appropriateness of a particular model for a specific modeling context.

11 citations


Journal ArticleDOI
28 Feb 2021-Atmosphere
TL;DR: This study selected a set of research works in the field of air quality prediction and concentrated on the exploration of the datasets utilised in them, finding meteorological datasets were used in 94.6% of the papers and the usage of various datasets combinations has been commenced since 2009.
Abstract: Air pollution and its consequences are negatively impacting on the world population and the environment, which converts the monitoring and forecasting air quality techniques as essential tools to combat this problem. To predict air quality with maximum accuracy, along with the implemented models and the quantity of the data, it is crucial also to consider the dataset types. This study selected a set of research works in the field of air quality prediction and is concentrated on the exploration of the datasets utilised in them. The most significant findings of this research work are: (1) meteorological datasets were used in 94.6% of the papers leaving behind the rest of the datasets with a big difference, which is complemented with others, such as temporal data, spatial data, and so on; (2) the usage of various datasets combinations has been commenced since 2009; and (3) the utilisation of open data have been started since 2012, 32.3% of the studies used open data, and 63.4% of the studies did not provide the data.

3 citations


Journal ArticleDOI
Adnan Om. Abuassba1, Zhang Dezheng2, Hazrat Ali3, Fan Zhang2  +1 moreInstitutions (3)
TL;DR: The proposed ensemble framework consists of four stages: objectives, data preparing, model training, and model testing, which is comprehensive to design diverse ensembles and can be used for a wide variety of machine learning tasks.
Abstract: Ensemble is a technique that combines basic models in a strategic manner to achieve better accuracy rates. Diversity, combination methods, and selection topology are among the main factors to determine the ensemble performance. Consequently, it is a challenging task to design an efficient ensemble scheme. Even though numerous paradigms have been proposed to classify ensemble schemes, there is still much room for improvement. This paper proposes a general framework for creating ensembles in the context of classification. Specifically, the ensemble framework consists of four stages: objectives, data preparing, model training, and model testing. It is comprehensive to design diverse ensembles. The proposed ensemble approach can be used for a wide variety of machine learning tasks. We validate our approach on real world datasets. The experimental results show the efficiency of the proposed approach.

1 citations


Posted Content
TL;DR: A machine learning approach applied to the forecast of the day-ahead maximum value of the ozone concentration for several geographical locations in southern Switzerland suggests that the trained models effectively learned explanatory cross-dependencies among atmospheric variables, which are described in the ozone photochemistry literature.
Abstract: The ability to forecast the concentration of air pollutants in an urban region is crucial for decision-makers wishing to reduce the impact of pollution on public health through active measures (e.g. temporary traffic closures). In this study, we present a machine learning approach applied to the forecast of the day-ahead maximum value of the ozone concentration for several geographical locations in southern Switzerland. Due to the low density of measurement stations and to the complex orography of the use case terrain, we adopted feature selection methods instead of explicitly restricting relevant features to a neighbourhood of the prediction sites, as common in spatio-temporal forecasting methods. We then used Shapley values to assess the explainability of the learned models in terms of feature importance and feature interactions in relation to ozone predictions; our analysis suggests that the trained models effectively learned explanatory cross-dependencies among atmospheric variables. Finally, we show how weighting observations helps in increasing the accuracy of the forecasts for specific ranges of ozone's daily peak values.

References
More filters

Journal ArticleDOI
J.E. Nash1, J.V. Sutcliffe1Institutions (1)
Abstract: The principles governing the application of the conceptual model technique to river flow forecasting are discussed. The necessity for a systematic approach to the development and testing of the model is explained and some preliminary ideas suggested.

17,307 citations


"A novel bagging ensemble approach f..." refers background in this paper

  • ...R2 is calculated since it is sensitive to the differences in observed and modeled means and variances (Nash and Sutcliffe 1970)....

    [...]


Book ChapterDOI
Thomas G. Dietterich1Institutions (1)
21 Jun 2000-
TL;DR: Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.
Abstract: Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, Bagging, and boosting. This paper reviews these methods and explains why ensembles can often perform better than any single classifier. Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.

4,767 citations


"A novel bagging ensemble approach f..." refers methods in this paper

  • ...Ensemble methods have the advantage of reducing these key shortcomings of standard learning algorithms (Dietterich 2000)....

    [...]


Journal ArticleDOI
Jerome H. Friedman1Institutions (1)
TL;DR: It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by incorporating randomization into the procedure.
Abstract: Gradient boosting constructs additive regression models by sequentially fitting a simple parameterized function (base learner) to current "pseudo'-residuals by least squares at each iteration. The pseudo-residuals are the gradient of the loss functional being minimized, with respect to the model values at each training data point evaluated at the current step. It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by incorporating randomization into the procedure. Specifically, at each iteration a subsample of the training data is drawn at random (without replacement) from the full training data set. This randomly selected subsample is then used in place of the full sample to fit the base learner and compute the model update for the current iteration. This randomized approach also increases robustness against overcapacity of the base learner.

4,111 citations


"A novel bagging ensemble approach f..." refers methods in this paper

  • ...Bagging decreases the residual error between the observed and predicted values by creating bootstrapped replica data sets (Friedman 2002)....

    [...]


Journal ArticleDOI
Cort J. Willmott1Institutions (1)
01 Jul 1981-Physical Geography
Abstract: Traditional methods of evaluating geographic models by statistical comparisons between observed and simulated variates are criticized. In particular, it is suggested that the correlation coefficien...

3,248 citations


"A novel bagging ensemble approach f..." refers background in this paper

  • ...It is calculated as the ratio between the mean squared error and the potential error (Willmott 1981)....

    [...]


Book
24 Nov 1999-
Abstract: Overview of the Chemistry of Polluted and Remote Atmospheres. The Atmospheric System. Spectroscopy and Photochemistry: Fundamentals. Photochemistry of Important Atmospheric Species. Kinetics and Atmospheric Chemistry. Rates and Mechanisms of Gas-Phase Reactions in Irradiated Organic-NOx-Air Mixtures. Chemistry of Inorganic Nitrogen Compounds. Acid Deposition: Formation and Fates of Inorganic and Organic Acids in the Troposphere. Particles in the Troposphere. Airborne Polycyclic Aromatic Hydrocarbons and Their Derivatives: Atmospheric Chemistry and Toxicological Implications. Analytical Methods and Typical Atmospheric Concentrations for Gases and Particles. Homogeneous and Heterogeneous Chemistry in the Stratosphere. Scientific Basis for Control of Halogenated Organics. Global Tropospheric Chemistry and Climate Change. Indoor Air Pollution: Sources, Levels, Chemistry, and Fates. Applications of Atmospheric Chemistry: Air Pollution Control Strategies and Risk Assessments for Tropospheric Ozone and Associated Photochemical Oxidants, Acids, Particles, and Hazardous Air Pollutants. Appendix I: Enthalpies of Formation of Some Gaseous Molecules, Atoms, and Free Radicals at 298 K. Appendix II: Bond Dissociation Energies. Appendix III: Running the OZIPR Model. Appendix IV: Some Relevant Web Sites. Appendix V: Pressures and Temperatures for Standard Atmosphere. Appendix VI: Answers to Selected Problems. Subject Index.

2,010 citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20221
20214
20202
20191