A comparison of machine learning methods for ozone pollution prediction

doi:10.1186/s40537-023-00748-x

Open AccessJournal ArticleDOI

A comparison of machine learning methods for ozone pollution prediction

Fouzi Harrou, +1 more

- 15 May 2023 -

Journal of Big Data

- Vol. 10, Iss: 1, pp 1-31

TLDR

In this paper , the authors evaluated the predictive performance of nineteen machine learning models for ozone pollution prediction and investigate using time-lagged measurements to improve prediction accuracy, showing that dynamic models using timelagged data outperformed static and reduced machine learning.

Abstract:

Abstract Precise and efficient ozone ( $$\hbox {O}_{3}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msub> <mml:mtext>O</mml:mtext> <mml:mn>3</mml:mn> </mml:msub> </mml:math> ) concentration prediction is crucial for weather monitoring and environmental policymaking due to the harmful effects of high $$\hbox {O}_{3}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msub> <mml:mtext>O</mml:mtext> <mml:mn>3</mml:mn> </mml:msub> </mml:math> pollution levels on human health and ecosystems. However, the complexity of $$\hbox {O}_{3}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msub> <mml:mtext>O</mml:mtext> <mml:mn>3</mml:mn> </mml:msub> </mml:math> formation mechanisms in the troposphere presents a significant challenge in modeling $$\hbox {O}_{3}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msub> <mml:mtext>O</mml:mtext> <mml:mn>3</mml:mn> </mml:msub> </mml:math> accurately and quickly, especially in the absence of a process model. Data-driven machine-learning techniques have demonstrated promising performance in modeling air pollution, mainly when a process model is unavailable. This study evaluates the predictive performance of nineteen machine learning models for ozone pollution prediction. Specifically, we assess how incorporating features using Random Forest affects $$\hbox {O}_{3}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msub> <mml:mtext>O</mml:mtext> <mml:mn>3</mml:mn> </mml:msub> </mml:math> concentration prediction and investigate using time-lagged measurements to improve prediction accuracy. Air pollution and meteorological data collected at King Abdullah University of Science and Technology are used. Results show that dynamic models using time-lagged data outperform static and reduced machine learning models. Incorporating time-lagged data improves the accuracy of machine learning models by 300% and 200%, respectively, compared to static and reduced models, under RMSE metrics. And importantly, the best dynamic model with time-lagged information only requires 0.01 s, indicating its practical use. The Diebold-Mariano Test, a statistical test used to compare the forecasting accuracy of models, is also conducted.

References

PDF

Open Access

More filters

Journal ArticleDOI

Deep learning

Yann LeCun, +4 more

- 28 May 2015 -

Nature

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

...read moreread less

Journal ArticleDOI

Systematic Review: Process of Forming Academic Service Partnerships to Reform Clinical Education

Fatemeh Heshmati Nabavi, +2 more

- 01 Feb 2012 -

Western Journal of Nursing Research

TL;DR: This study’s findings can provide practical guidelines to steer partnership programs within the academic and clinical bodies, with the aim of providing a collaborative partnership approach to clinical education.

...read moreread less

Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

Robert Tibshirani

- 01 Jan 1996 -

Journal of the royal statistical society...

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Book

Deep Learning

Ian Goodfellow, +2 more

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Journal ArticleDOI

A tutorial on support vector regression

Alexander J. Smola, +1 more

- 01 Aug 2004 -

Statistics and Computing

TL;DR: This tutorial gives an overview of the basic ideas underlying Support Vector (SV) machines for function estimation, and includes a summary of currently used algorithms for training SV machines, covering both the quadratic programming part and advanced methods for dealing with large datasets.

...read moreread less