scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Machine learning-based estimation of ground-level NO2 concentrations over China.

TL;DR: Zhang et al. as mentioned in this paper proposed a machine learning estimation method for retrieving the ground-level NO2 concentrations throughout China based on the tropospheric NO2 column concentrations from the TROPOspheric Monitoring Instrument (TROPOMI) and multisource geographic data from 2018 to 2020.
About: This article is published in Science of The Total Environment.The article was published on 2022-02-10. It has received 13 citations till now. The article focuses on the topics: Environmental science & Ground level.
Citations
More filters
Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors developed a novel artificial intelligence approach by integrating spatio-temporally weighted information into the missing extra-trees and deep forest models to first fill the satellite data gaps and increase data availability by 49% and then derive daily 1 km surface NO2 concentrations over mainland China with full spatial coverage (100%) for the period 2019-2020 by combining surface No2 measurements, satellite tropospheric NO2 columns derived from TROPOMI and OMI, atmospheric reanalysis, and model simulations.
Abstract: Nitrogen dioxide (NO2) at the ground level poses a serious threat to environmental quality and public health. This study developed a novel, artificial intelligence approach by integrating spatiotemporally weighted information into the missing extra-trees and deep forest models to first fill the satellite data gaps and increase data availability by 49% and then derive daily 1 km surface NO2 concentrations over mainland China with full spatial coverage (100%) for the period 2019–2020 by combining surface NO2 measurements, satellite tropospheric NO2 columns derived from TROPOMI and OMI, atmospheric reanalysis, and model simulations. Our daily surface NO2 estimates have an average out-of-sample (out-of-city) cross-validation coefficient of determination of 0.93 (0.71) and root-mean-square error of 4.89 (9.95) μg/m3. The daily seamless high-resolution and high-quality dataset “ChinaHighNO2” allows us to examine spatial patterns at fine scales such as the urban–rural contrast. We observed systematic large differences between urban and rural areas (28% on average) in surface NO2, especially in provincial capitals. Strong holiday effects were found, with average declines of 22 and 14% during the Spring Festival and the National Day in China, respectively. Unlike North America and Europe, there is little difference between weekdays and weekends (within ±1 μg/m3). During the COVID-19 pandemic, surface NO2 concentrations decreased considerably and then gradually returned to normal levels around the 72nd day after the Lunar New Year in China, which is about 3 weeks longer than the tropospheric NO2 column, implying that the former can better represent the changes in NOx emissions.

42 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors took advantage of big data and artificial-intelligence technologies to generate seamless daily maps of three major ambient pollutants (NO2, SO2, and CO) across China from 2013 to 2020 at a uniform spatial resolution of 10 km.
Abstract: Abstract. Gaseous pollutants at the ground level seriously threaten the urban air quality environment and public health. There are few estimates of gaseous pollutants that are spatially and temporally resolved and continuous across China. This study takes advantage of big data and artificial-intelligence technologies to generate seamless daily maps of three major ambient pollutant gases, i.e., NO2, SO2, and CO, across China from 2013 to 2020 at a uniform spatial resolution of 10 km. Cross-validation between our estimates and ground observations illustrated a high data quality on a daily basis for surface NO2, SO2, and CO concentrations, with mean coefficients of determination (root-mean-square errors) of 0.84 (7.99 µg m−3), 0.84 (10.7 µg m−3), and 0.80 (0.29 mg m−3), respectively. We found that the COVID-19 lockdown had sustained impacts on gaseous pollutants, where surface CO recovered to its normal level in China on around the 34th day after the Lunar New Year, while surface SO2 and NO2 rebounded more than 2 times slower due to more CO emissions from residents' increased indoor cooking and atmospheric oxidation capacity. Surface NO2, SO2, and CO reached their peak annual concentrations of 21.3 ± 8.8 µg m−3, 23.1 ± 13.3 µg m−3, and 1.01 ± 0.29 mg m−3 in 2013, then continuously declined over time by 12 %, 55 %, and 17 %, respectively, until 2020. The declining rates were more prominent from 2013 to 2017 due to the sharper reductions in anthropogenic emissions but have slowed down in recent years. Nevertheless, people still suffer from high-frequency risk exposure to surface NO2 in eastern China, while surface SO2 and CO have almost reached the World Health Organization (WHO) recommended short-term air quality guidelines (AQG) level since 2018, benefiting from the implemented stricter “ultra-low” emission standards. This reconstructed dataset of surface gaseous pollutants will benefit future (especially short-term) air pollution and environmental health-related studies.

18 citations

Journal ArticleDOI
TL;DR: In this paper , a random forest model was developed to estimate ground-level NO2 concentrations in China at a monthly time scale based on groundlevel observed NO2 concentration, tropospheric NO2 column concentration data from the Ozone Monitoring Instrument (OMI), and meteorological covariates.
Abstract: Nitrogen dioxide (NO2) is a major air pollutant with serious environmental and human health impacts. A random forest model was developed to estimate ground-level NO2 concentrations in China at a monthly time scale based on ground-level observed NO2 concentrations, tropospheric NO2 column concentration data from the Ozone Monitoring Instrument (OMI), and meteorological covariates (the MAE, RMSE, and R2 of the model were 4.16 µg/m3, 5.79 µg/m3, and 0.79, respectively, and the MAE, RMSE, and R2 of the cross-validation were 4.3 µg/m3, 5.82 µg/m3, and 0.77, respectively). On this basis, this article analyzed the spatial and temporal variation in NO2 population exposure in China from 2005 to 2020, which effectively filled the gap in the long-term NO2 population exposure assessment in China. NO2 population exposure over China has significant spatial aggregation, with high values mainly distributed in large urban clusters in the north, east, south, and provincial capitals in the west. The NO2 population exposure in China shows a continuous increasing trend before 2012 and a continuous decreasing trend after 2012. The change in NO2 population exposure in western and southern cities is more influenced by population density compared to northern cities. NO2 pollution in China has substantially improved from 2013 to 2020, but Urumqi, Lanzhou, and Chengdu still maintain high NO2 population exposure. In these cities, the Environmental Protection Agency (EPA) could reduce NO2 population exposure through more monitoring instruments and limiting factory emissions.

6 citations

Journal ArticleDOI
TL;DR: In this article , the authors used remote-sensing datasets from the Atmospheric Infrared Sounder (AIRS) and Ozone Monitoring Instrument (OMI) to analyze the spatio-temporal variations of four trace gases, like methane (CH4), ozone (O3), carbon monoxide (CO), and nitrogen dioxide (NO2) over India region during 2006-2015 and taken four seasons (i.e., winter, spring, summer, and winter) to interpret the seasonal variation.
Abstract: India is one of the largest contributors to anthropogenic emissions during the recent decade associated with its rapid economic growth in India. Trace gases are important components in the climate change process and due to that climate change, there will be a change in their atmospheric concentrations as the climate is sensitive to Earth’s; therefore, proper assessment of trace gases is necessary for ongoing sudden changes in climate. In this study, we used remote-sensing datasets from the Atmospheric Infrared Sounder (AIRS) and Ozone Monitoring Instrument (OMI) to analyze the spatio-temporal variations of four trace gases, like methane (CH4), ozone (O3), carbon monoxide (CO), and nitrogen dioxide (NO2) over India region during 2006–2015 and taken four seasons (i.e., winter, spring, summer, and winter) to interpret the seasonal variation. The project focuses on the temporal pattern of pollutant trace gases i.e., monthly, seasonal, and annual mean variations of trace gases, trend analysis of trace gases, and a comparison of the seasonal behavior of the trace gases by trend analysis was assessed. Higher concentrations of CO show east-to-west, CH4 show north-to-south, and O3 south-to-north gradient, indicating the variations in trace gases due to the impact of emissions and local meteorology. On the other hand, due to immense population density, huge traffic emissions, tremendous, polluted air, and overgrown industrial activities, total NO2 concentrations shoot up over Delhi, Lucknow, and Kolkata. Now as a result of seasonal variation in the long-range transport of air parcels and biomass burning activities, all trace gases shown significant seasonal variations in the spring season and substantially reduced in the summer season. However, in the winter season, O3 concentration evaluates minimum due to less amount of heat on cold days which leads to the reduction of O3 formation. Due to trace gases, all are significant to get regional climate variability. In this study by taking 2006 as a base year and investigate the behaviors of gases for 2007–2015 years to exhibit the increment and decrements in four seasons of all trace gases by taking the most populated 11 different cities of India.

2 citations

Journal ArticleDOI
TL;DR: In this paper , the Tropospheric vertical column density of NO2 (Trop NO2 VCD) can be obtained using satellite remote sensing, but it has been discovered that the Trop VCD is affected by uncertainties such as the cloud fraction, terrain reflectivity, and aerosol optical depth.
Abstract: The tropospheric vertical column density of NO2 (Trop NO2 VCD) can be obtained using satellite remote sensing, but it has been discovered that the Trop NO2 VCD is affected by uncertainties such as the cloud fraction, terrain reflectivity, and aerosol optical depth. A certain error occurs in terms of data inversion accuracy, necessitating additional ground observation verification. This study uses surface NO2 mass concentrations from the China National Environmental Monitoring Center (CNEMC) sites in Jiangsu Province, China in 2019 and the Trop NO2 VCD measured by MAX-DOAS, respectively, to verify the Trop NO2 VCD product (daily and monthly average data), that comes from the TROPOspheric Monitoring Instrument (TROPOMI) and Ozone Monitoring Instrument (OMI). The results show that the spatial distributions of NO2 in TROPOMI and OMI exhibit a similar tendency and seasonality, showing the characteristics of being high in spring and winter and low in summer and autumn. On the whole, the concentration of NO2 in the south of Jiangsu Province is higher than that in the north. The Pearson correlation coefficient (r) between the monthly average TROPOMI VCD NO2 and the CNEMC NO2 mass concentration is 0.9, which is greater than the r (0.78) between OMI and CNEMC; the r (0.69) between TROPOMI and the MAX-DOAS VCD NO2 is greater than the r (0.59) between OMI and the MAX-DOAS. As such, the TROPOMI is better than the previous generation of OMI at representing the spatio-temporal distribution of NO2 in the regional scope. On the other hand, the uncertainties of the satellite products provided in this study can constrain regional air quality forecasting models and top-down emission inventory estimation.

2 citations

References
More filters
Journal ArticleDOI
TL;DR: ERA-Interim as discussed by the authors is the latest global atmospheric reanalysis produced by the European Centre for Medium-Range Weather Forecasts (ECMWF), which will extend back to the early part of the twentieth century.
Abstract: ERA-Interim is the latest global atmospheric reanalysis produced by the European Centre for Medium-Range Weather Forecasts (ECMWF). The ERA-Interim project was conducted in part to prepare for a new atmospheric reanalysis to replace ERA-40, which will extend back to the early part of the twentieth century. This article describes the forecast model, data assimilation method, and input datasets used to produce ERA-Interim, and discusses the performance of the system. Special emphasis is placed on various difficulties encountered in the production of ERA-40, including the representation of the hydrological cycle, the quality of the stratospheric circulation, and the consistency in time of the reanalysed fields. We provide evidence for substantial improvements in each of these aspects. We also identify areas where further work is needed and describe opportunities and objectives for future reanalysis projects at ECMWF. Copyright © 2011 Royal Meteorological Society

22,055 citations

Proceedings ArticleDOI
TL;DR: This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

13,333 citations

Journal ArticleDOI
TL;DR: This work shows that this seemingly mysterious phenomenon of boosting can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood, and develops more direct approximations and shows that they exhibit nearly identical results to boosting.
Abstract: Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data and then taking a weighted majority vote of the sequence of classifiers thus produced. For many classification algorithms, this simple strategy results in dramatic improvements in performance. We show that this seemingly mysterious phenomenon can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multiclass generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multiclass generalizations of boosting in most situations, and far superior in some. We suggest a minor modification to boosting that can reduce computation, often by factors of 10 to 50. Finally, we apply these insights to produce an alternative formulation of boosting decision trees. This approach, based on best-first truncated tree induction, often leads to better performance, and can provide interpretable descriptions of the aggregate decision rule. It is also much faster computationally, making it more suitable to large-scale data mining applications.

6,598 citations

Journal ArticleDOI
TL;DR: It is found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backfit, and that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference.
Abstract: Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer. The purpose of the study is to improve our understanding of why and when these algorithms, which use perturbation, reweighting, and combination techniques, affect classification error. We provide a bias and variance decomposition of the error to show how different methods and variants influence these two terms. This allowed us to determine that Bagging reduced variance of unstable methods, while boosting methods (AdaBoost and Arc-x4) reduced both the bias and variance of unstable methods but increased the variance for Naive-Bayes, which was very stable. We observed that Arc-x4 behaves differently than AdaBoost if reweighting is used instead of resampling, indicating a fundamental difference. Voting variants, some of which are introduced in this paper, include: pruning versus no pruning, use of probabilistic estimates, weight perturbations (Wagging), and backfitting of data. We found that Bagging improves when probabilistic estimates in conjunction with no-pruning are used, as well as when the data was backfit. We measure tree sizes and show an interesting positive correlation between the increase in the average tree size in AdaBoost trials and its success in reducing the error. We compare the mean-squared error of voting methods to non-voting methods and show that the voting methods lead to large and significant reductions in the mean-squared errors. Practical problems that arise in implementing boosting algorithms are explored, including numerical instabilities and underflows. We use scatterplots that graphically show how AdaBoost reweights instances, emphasizing not only “hard” areas but also outliers and noise.

2,686 citations

Journal ArticleDOI
TL;DR: In this article, the authors evaluated four statistical models (Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.
Abstract: The task of modeling the distribution of a large number of tree species under future climate scenarios presents unique challenges. First, the model must be robust enough to handle climate data outside the current range without producing unacceptable instability in the output. In addition, the technique should have automatic search mechanisms built in to select the most appropriate values for input model parameters for each species so that minimal effort is required when these parameters are fine-tuned for individual tree species. We evaluated four statistical models—Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS)—for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model. To test, we applied these techniques to four tree species common in the eastern United States: loblolly pine (Pinus taeda), sugar maple (Acer saccharum), American beech (Fagus grandifolia), and white oak (Quercus alba). When the four techniques were assessed with Kappa and fuzzy Kappa statistics, RF and BT were superior in reproducing current importance value (a measure of basal area in addition to abundance) distributions for the four tree species, as derived from approximately 100,000 USDA Forest Service’s Forest Inventory and Analysis plots. Future estimates of suitable habitat after climate change were visually more reasonable with BT and RF, with slightly better performance by RF as assessed by Kappa statistics, correlation estimates, and spatial distribution of importance values. Although RTA did not perform as well as BT and RF, it provided interpretive models for species whose distributions were captured well by our current set of predictors. MARS was adequate for predicting current distributions but unacceptable for future climate. We consider RTA, BT, and RF modeling approaches, especially when used together to take advantage of their individual strengths, to be robust for predictive mapping and recommend their inclusion in the ecological toolbox.

1,879 citations