scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Assessment Of Spatial Hazard And Impact Of PM10 Using Machine Learning

TL;DR: For the spatial hazard modeling of PM10 productive machine learning models such as Mixture Discriminant Analysis (MDA), Bagged Classification and Regression Trees (Bagged CART), Random Forest (RF), with accuracy 0.87, 0.92 and 0.93 respectively are used, however, these models cannot give accurate results with large samples and to overcome this eXtreme Gradient Boosting (XGBoost) method is used.
Abstract: Air pollution is one of the major threats to human health and environment. When some substances in the atmosphere exceed a certain concentration it becomes harmful to the ecological system and the normal conditions of human existence. Particulate matter (PM) refers to small solid or liquid particles floating in the air. These small particles can move deeper into the respiratory tract, including the lungs this can lead to cough, asthma attacks, high blood pressure, heart attack, stroke and so on. Particulate matter is considered as the air pollutant of greatest concern to health. So as a first step to understand the seriousness of the issue is to monitor PM concentration. For the spatial hazard modeling of PM10 productive machine learning models such as Mixture Discriminant Analysis (MDA), Bagged Classification and Regression Trees (Bagged CART), Random Forest (RF), with accuracy 0.87, 0.92 and 0.93 respectively are used. However, these models cannot give accurate results with large samples and to overcome this eXtreme Gradient Boosting (XGBoost) method is used.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors used remote sensing data such as elevation, slope, road density, Soil Adjusted Vegetation Index, Normalized difference Vegetation index, built-up index, land surface temperature, and wind speed.

16 citations

Proceedings ArticleDOI
19 Aug 2022
TL;DR: In this paper , the authors applied the XGBoost algorithm to predict the pathogenic infections from a big data repository of leukemia patients with fever of unknown origin (FUO) and compared the performance with other machine learning algorithms.
Abstract: Discovering the source of a patient's fever without clinically localised signs can be a daunting task for doctors. In particular for leukaemia patients with fever of unknown origin, fast discovering the source of the fever is a formidable challenge, as this population has the potential to lead to fever in many different situations. In this paper, we applied XGBoost algorithm to predict the pathogenic infections from a big data repository of leukemia patients with fever of unknown origin (FUO) and compared the performance with other machine learning algorithms. Our results illustrates that those machine learning algorithms achieves good performance. In particular, the XGBoost obtains the best performance with an area under receiving-operating-characteristics curve (AUC) of 0.8376 and F1-score of 0.7034. Compared with existing literature, our experiment provides new insights for doctors to determine the cause of fever in leukemia patients.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: The caret package, short for classification and regression training, contains numerous tools for developing predictive models using the rich set of models available in R to simplify model training and tuning across a wide variety of modeling techniques.
Abstract: The caret package, short for classification and regression training, contains numerous tools for developing predictive models using the rich set of models available in R. The package focuses on simplifying model training and tuning across a wide variety of modeling techniques. It also includes methods for pre-processing training data, calculating variable importance, and model visualizations. An example from computational chemistry is used to illustrate the functionality on a real data set and to benchmark the benefits of parallel processing with several types of models.

5,144 citations


"Assessment Of Spatial Hazard And Im..." refers background or methods in this paper

  • ...The caret kit, short for classification and regression t raining, includes numerous resources to use the rich coll ection of models available in R to create predictive mod els [9]....

    [...]

  • ...The goal is to: (i) Reducing syntactic discrepancies between many of the building and model prediction functions, (ii) Developing a collection of semi-automated, rational approaches to optimize tuning parameter values for most of these models [9]....

    [...]

  • ...Here the R language has a rich collection of modeling functions for both classification and regression so many that monitoring the syntactic complexities of each function becomes increasingly difficult [9]....

    [...]

Journal ArticleDOI
01 Nov 2007-Ecology
TL;DR: High classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods are observed.
Abstract: Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature.

3,368 citations


"Assessment Of Spatial Hazard And Im..." refers methods in this paper

  • ...Here RF model applies a subset of data to any decision tree, which is chosen randomly [3]....

    [...]

Proceedings Article
04 Aug 1996
TL;DR: Results of applying Breiman's bagging and Freund and Schapire's boosting to a system that learns decision trees and testing on a representative collection of datasets show boosting shows the greater benefit.
Abstract: Breiman's bagging and Freund and Schapire's boosting are recent methods for improving the predictive power of classifier learning systems Both form a set of classifiers that are combined by voting, bagging by generating replicated bootstrap samples of the data, and boosting by adjusting the weights of training instances This paper reports results of applying both techniques to a system that learns decision trees and testing on a representative collection of datasets While both approaches substantially improve predictive accuracy, boosting shows the greater benefit On the other hand, boosting also produces severe degradation on some datasets A small change to the way that boosting combines the votes of learned classifiers reduces this downside and also leads to slightly better results on most of the datasets considered

1,597 citations

Journal ArticleDOI
TL;DR: A random forest model incorporating aerosol optical depth data, meteorological fields, and land use variables to estimate daily 24 h averaged ground-level PM2.5 concentrations over the conterminous United States in 2011 is developed.
Abstract: To estimate PM25 concentrations, many parametric regression models have been developed, while nonparametric machine learning algorithms are used less often and national-scale models are rare In this paper, we develop a random forest model incorporating aerosol optical depth (AOD) data, meteorological fields, and land use variables to estimate daily 24 h averaged ground-level PM25 concentrations over the conterminous United States in 2011 Random forests are an ensemble learning method that provides predictions with high accuracy and interpretability Our results achieve an overall cross-validation (CV) R2 value of 080 Mean prediction error (MPE) and root mean squared prediction error (RMSPE) for daily predictions are 178 and 283 μg/m3, respectively, indicating a good agreement between CV predictions and observations The prediction accuracy of our model is similar to those reported in previous studies using neural networks or regression models on both national and regional scales In addition, the

379 citations


"Assessment Of Spatial Hazard And Im..." refers methods in this paper

  • ...Random forests are an integrated learning approach that provides high accuracy and interpretability predictions [11] Better decision taking and disaggregation each node several classification algorithms are used....

    [...]

Journal ArticleDOI
TL;DR: Taking advantage of a novel application of modeling framework and the most recent ground-level PM2.5 observations, the machine learning method showed higher predictive ability than previous studies.

331 citations


"Assessment Of Spatial Hazard And Im..." refers methods in this paper

  • ...The predictive performance of random forest models was much higher than the other two standard regression models, explaining the majority of spatial variation in daily PM10[15]....

    [...]