scispace - formally typeset
Search or ask a question
Posted Content

A data-driven approach to the forecasting of ground-level ozone concentration.

TL;DR: A machine learning approach applied to the forecast of the day-ahead maximum value of the ozone concentration for several geographical locations in southern Switzerland suggests that the trained models effectively learned explanatory cross-dependencies among atmospheric variables, which are described in the ozone photochemistry literature.
Abstract: The ability to forecast the concentration of air pollutants in an urban region is crucial for decision-makers wishing to reduce the impact of pollution on public health through active measures (e.g. temporary traffic closures). In this study, we present a machine learning approach applied to the forecast of the day-ahead maximum value of the ozone concentration for several geographical locations in southern Switzerland. Due to the low density of measurement stations and to the complex orography of the use case terrain, we adopted feature selection methods instead of explicitly restricting relevant features to a neighbourhood of the prediction sites, as common in spatio-temporal forecasting methods. We then used Shapley values to assess the explainability of the learned models in terms of feature importance and feature interactions in relation to ozone predictions; our analysis suggests that the trained models effectively learned explanatory cross-dependencies among atmospheric variables. Finally, we show how weighting observations helps in increasing the accuracy of the forecasts for specific ranges of ozone's daily peak values.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper , a multi-source and multivariate ozone prediction model based on fuzzy cognitive maps (FCMs) and evidential reasoning theory from the perspective of spatio-temporal fusion, termed as ERC-FCM, is proposed.

8 citations

Journal ArticleDOI
01 Nov 2022-Heliyon
TL;DR: In this article , a prediction method based on the KNN-Prophet-LSTM hybrid model is established by using the daily pollutant concentration data of Wuhan from January 1, 2014, to May 3, 2021, and considering the characteristics of time and space.

2 citations

Journal ArticleDOI
TL;DR: In this paper , the authors evaluated the predictive performance of nineteen machine learning models for ozone pollution prediction and investigate using time-lagged measurements to improve prediction accuracy, showing that dynamic models using timelagged data outperformed static and reduced machine learning.
Abstract: Abstract Precise and efficient ozone ( $$\hbox {O}_{3}$$ O 3 ) concentration prediction is crucial for weather monitoring and environmental policymaking due to the harmful effects of high $$\hbox {O}_{3}$$ O 3 pollution levels on human health and ecosystems. However, the complexity of $$\hbox {O}_{3}$$ O 3 formation mechanisms in the troposphere presents a significant challenge in modeling $$\hbox {O}_{3}$$ O 3 accurately and quickly, especially in the absence of a process model. Data-driven machine-learning techniques have demonstrated promising performance in modeling air pollution, mainly when a process model is unavailable. This study evaluates the predictive performance of nineteen machine learning models for ozone pollution prediction. Specifically, we assess how incorporating features using Random Forest affects $$\hbox {O}_{3}$$ O 3 concentration prediction and investigate using time-lagged measurements to improve prediction accuracy. Air pollution and meteorological data collected at King Abdullah University of Science and Technology are used. Results show that dynamic models using time-lagged data outperform static and reduced machine learning models. Incorporating time-lagged data improves the accuracy of machine learning models by 300% and 200%, respectively, compared to static and reduced models, under RMSE metrics. And importantly, the best dynamic model with time-lagged information only requires 0.01 s, indicating its practical use. The Diebold-Mariano Test, a statistical test used to compare the forecasting accuracy of models, is also conducted.
References
More filters
Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations


"A data-driven approach to the forec..." refers methods in this paper

  • ...Random Forests and Quantile Random Forests The random forest (RF) algorithm independently fits several decision trees, each trained on different datasets, created from the original one through random re-sampling of the observations and keeping only a fraction of the overall features, chosen at random (Hastie, Tibshirani and Friedman, 2009). The final prediction of the RF is then a (possibly weighted) average of the trees’ responses. One important variant of RF algorithms are Quantile Regression Forests (QRF); the main difference from RF is that QRF keeps the value of all the observations in the fitted trees’ nodes, not just their mean, and assesses the conditional distribution based on this information. In this paper, we have used theMatlab TreeBagger class, which implements the QRF algorithm described in Meinshausen (2014). Tree-based boosting algorithms Boosting algorithms employ additive training: starting from a constant model, at each iteration a new tree or any other so called "weak learner" hk(x) is added to the overall model Fk(x), so that Fk+1(x) = Fk(x) + hk(x) where ≤ 1 is a hyper-parameter denoting the learning rate, which helps reducing over-fitting....

    [...]

  • ...Instead of punishing using the L2 norm, the LASSO (Least Absolute Shrinkage and Selection Operator) regression (Tibshirani, 1996) penalizes using the L1 norm, such that some of the elements of ̂ could be set to zero....

    [...]

Proceedings ArticleDOI
13 Aug 2016
TL;DR: XGBoost as discussed by the authors proposes a sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning to achieve state-of-the-art results on many machine learning challenges.
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

14,872 citations

Journal ArticleDOI
TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.
Abstract: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research. Chapter 12 concludes the book with some commentary about the scientiŽ c contributions of MTS. The Taguchi method for design of experiment has generated considerable controversy in the statistical community over the past few decades. The MTS/MTGS method seems to lead another source of discussions on the methodology it advocates (Montgomery 2003). As pointed out by Woodall et al. (2003), the MTS/MTGS methods are considered ad hoc in the sense that they have not been developed using any underlying statistical theory. Because the “normal” and “abnormal” groups form the basis of the theory, some sampling restrictions are fundamental to the applications. First, it is essential that the “normal” sample be uniform, unbiased, and/or complete so that a reliable measurement scale is obtained. Second, the selection of “abnormal” samples is crucial to the success of dimensionality reduction when OAs are used. For example, if each abnormal item is really unique in the medical example, then it is unclear how the statistical distance MD can be guaranteed to give a consistent diagnosis measure of severity on a continuous scale when the larger-the-better type S/N ratio is used. Multivariate diagnosis is not new to Technometrics readers and is now becoming increasingly more popular in statistical analysis and data mining for knowledge discovery. As a promising alternative that assumes no underlying data model, The Mahalanobis–Taguchi Strategy does not provide sufŽ cient evidence of gains achieved by using the proposed method over existing tools. Readers may be very interested in a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods. Overall, although the idea of MTS/MTGS is intriguing, this book would be more valuable had it been written in a rigorous fashion as a technical reference. There is some lack of precision even in several mathematical notations. Perhaps a follow-up with additional theoretical justiŽ cation and careful case studies would answer some of the lingering questions.

11,507 citations

Journal Article
TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.
Abstract: While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.

10,306 citations

Journal ArticleDOI
TL;DR: The theory of proper scoring rules on general probability spaces is reviewed and developed, and the intuitively appealing interval score is proposed as a utility function in interval estimation that addresses width as well as coverage.
Abstract: Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distributionF if he or she issues the probabilistic forecast F, rather than G ≠ F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the ...

4,644 citations