scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A novel bagging ensemble approach for predicting summertime ground-level ozone concentration.

01 Feb 2019-Journal of The Air & Waste Management Association (Taylor & Francis)-Vol. 69, Iss: 2, pp 220-233
TL;DR: The feasibility of using ensemble model with seven meteorological parameters as input variables to predict the surface level O3 concentration andagged random forest predicted the ground level ozone better with higher Nash-Sutcliffe coefficient than conventional models.
Abstract: Ozone pollution appears as a major air quality issue, e.g. for the protection of human health and vegetation. Formation of ground level ozone is a complex photochemical phenomenon and involves numerous intricate factors most of which are interrelated with each other. Machine learning techniques can be adopted to predict the ground level ozone. The main objective of the present study is to develop the state-of-the-art ensemble bagging approach to model the summer time ground level ozone in an industrial area comprising a hazardous waste management facility. In this study, the feasibility of using ensemble model with seven meteorological parameters as input variables to predict the surface level O3 concentration. Multilayer perceptron, RTree, REPTree, and Random forest were employed as the base learners. The error measures used for checking the performance of each model includes IoAd, R2, and PEP. The model results were validated against an independent test data set. Bagged random forest predicted the ground level ozone better with higher Nash-Sutcliffe coefficient 0.93. This study scaffolded the current research gap in big data analysis identified with air pollutant prediction. Implications: The main focus of this paper is to model the summer time ground level O3 concentration in an Industrial area comprising of hazardous waste management facility. Comparison study was made between the base classifiers and the ensemble classifiers. Most of the conventional models can well predict the average concentrations. In this case the peak concentrations are of importance as it has serious effect on human health and environment. The models developed should also be homoscedastic.
Citations
More filters
Journal ArticleDOI
TL;DR: The proposed ensemble framework consists of four stages: objectives, data preparing, model training, and model testing, which is comprehensive to design diverse ensembles and can be used for a wide variety of machine learning tasks.

6 citations

Journal ArticleDOI
TL;DR: This study selected a set of research works in the field of air quality prediction and concentrated on the exploration of the datasets utilised in them, finding meteorological datasets were used in 94.6% of the papers and the usage of various datasets combinations has been commenced since 2009.
Abstract: Air pollution and its consequences are negatively impacting on the world population and the environment, which converts the monitoring and forecasting air quality techniques as essential tools to combat this problem. To predict air quality with maximum accuracy, along with the implemented models and the quantity of the data, it is crucial also to consider the dataset types. This study selected a set of research works in the field of air quality prediction and is concentrated on the exploration of the datasets utilised in them. The most significant findings of this research work are: (1) meteorological datasets were used in 94.6% of the papers leaving behind the rest of the datasets with a big difference, which is complemented with others, such as temporal data, spatial data, and so on; (2) the usage of various datasets combinations has been commenced since 2009; and (3) the utilisation of open data have been started since 2012, 32.3% of the studies used open data, and 63.4% of the studies did not provide the data.

5 citations

Posted Content
TL;DR: A machine learning approach applied to the forecast of the day-ahead maximum value of the ozone concentration for several geographical locations in southern Switzerland suggests that the trained models effectively learned explanatory cross-dependencies among atmospheric variables, which are described in the ozone photochemistry literature.
Abstract: The ability to forecast the concentration of air pollutants in an urban region is crucial for decision-makers wishing to reduce the impact of pollution on public health through active measures (e.g. temporary traffic closures). In this study, we present a machine learning approach applied to the forecast of the day-ahead maximum value of the ozone concentration for several geographical locations in southern Switzerland. Due to the low density of measurement stations and to the complex orography of the use case terrain, we adopted feature selection methods instead of explicitly restricting relevant features to a neighbourhood of the prediction sites, as common in spatio-temporal forecasting methods. We then used Shapley values to assess the explainability of the learned models in terms of feature importance and feature interactions in relation to ozone predictions; our analysis suggests that the trained models effectively learned explanatory cross-dependencies among atmospheric variables. Finally, we show how weighting observations helps in increasing the accuracy of the forecasts for specific ranges of ozone's daily peak values.

4 citations

Journal ArticleDOI
TL;DR: In this paper, a machine learning approach applied to forecasts of the day-ahead maximum value of ozone concentration for several geographical locations in southern Switzerland was presented, where feature selection methods instead of explicitly restricting relevant features to a neighborhood of the prediction sites, as common in spatio-temporal forecasting methods, were adopted.

3 citations

Journal ArticleDOI
TL;DR: In this paper , a machine learning approach applied to forecasts of the day-ahead maximum value of ozone concentration for several geographical locations in southern Switzerland was presented, where feature selection methods instead of explicitly restricting relevant features to a neighborhood of the prediction sites, as common in spatio-temporal forecasting methods, were adopted.

3 citations

References
More filters
Journal ArticleDOI
TL;DR: In this article, the principles governing the application of the conceptual model technique to river flow forecasting are discussed and the necessity for a systematic approach to the development and testing of the model is explained and some preliminary ideas suggested.

19,601 citations


"A novel bagging ensemble approach f..." refers background in this paper

  • ...R2 is calculated since it is sensitive to the differences in observed and modeled means and variances (Nash and Sutcliffe 1970)....

    [...]

Book ChapterDOI
21 Jun 2000
TL;DR: Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.
Abstract: Ensemble methods are learning algorithms that construct a set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. The original ensemble method is Bayesian averaging, but more recent algorithms include error-correcting output coding, Bagging, and boosting. This paper reviews these methods and explains why ensembles can often perform better than any single classifier. Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.

5,679 citations


"A novel bagging ensemble approach f..." refers methods in this paper

  • ...Ensemble methods have the advantage of reducing these key shortcomings of standard learning algorithms (Dietterich 2000)....

    [...]

Journal ArticleDOI
TL;DR: It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by incorporating randomization into the procedure.

5,355 citations


"A novel bagging ensemble approach f..." refers methods in this paper

  • ...Bagging decreases the residual error between the observed and predicted values by creating bootstrapped replica data sets (Friedman 2002)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, it is suggested that the correlation coefficieness between observed and simulated variates is not as good as observed variates, and that correlation can be improved.
Abstract: Traditional methods of evaluating geographic models by statistical comparisons between observed and simulated variates are criticized. In particular, it is suggested that the correlation coefficien...

3,761 citations


"A novel bagging ensemble approach f..." refers background in this paper

  • ...It is calculated as the ratio between the mean squared error and the potential error (Willmott 1981)....

    [...]

Book
24 Nov 1999
TL;DR: A detailed overview of the chemistry of Polluted and Remote Atmospheres can be found in this paper, where the OZIPR model is used to simulate the formation of gases and particles in the Troposphere.
Abstract: Overview of the Chemistry of Polluted and Remote Atmospheres. The Atmospheric System. Spectroscopy and Photochemistry: Fundamentals. Photochemistry of Important Atmospheric Species. Kinetics and Atmospheric Chemistry. Rates and Mechanisms of Gas-Phase Reactions in Irradiated Organic-NOx-Air Mixtures. Chemistry of Inorganic Nitrogen Compounds. Acid Deposition: Formation and Fates of Inorganic and Organic Acids in the Troposphere. Particles in the Troposphere. Airborne Polycyclic Aromatic Hydrocarbons and Their Derivatives: Atmospheric Chemistry and Toxicological Implications. Analytical Methods and Typical Atmospheric Concentrations for Gases and Particles. Homogeneous and Heterogeneous Chemistry in the Stratosphere. Scientific Basis for Control of Halogenated Organics. Global Tropospheric Chemistry and Climate Change. Indoor Air Pollution: Sources, Levels, Chemistry, and Fates. Applications of Atmospheric Chemistry: Air Pollution Control Strategies and Risk Assessments for Tropospheric Ozone and Associated Photochemical Oxidants, Acids, Particles, and Hazardous Air Pollutants. Appendix I: Enthalpies of Formation of Some Gaseous Molecules, Atoms, and Free Radicals at 298 K. Appendix II: Bond Dissociation Energies. Appendix III: Running the OZIPR Model. Appendix IV: Some Relevant Web Sites. Appendix V: Pressures and Temperatures for Standard Atmosphere. Appendix VI: Answers to Selected Problems. Subject Index.

2,051 citations