scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Twitter data analysis to understand societal response to air quality

18 Jul 2018-pp 82-90
TL;DR: Using machine learning algorithms, it is determined that health concerns dominated public response when air quality degraded, with the strongest increase in concern being in New Delhi, where pollution levels are the highest amongst the three cities studied.
Abstract: Air quality is recognized to be major risk factor for human health globally. Critical to addressing this important public health issue is the effective dissemination of air quality data, information about adverse health effects, and the necessary mitigation measures. The ability of people to understand air quality information and take actions to protect their health is not clear. Recent studies have shown that even when public get data on air quality and understand its importance, they do not exhibit a pro-environmental behavior to address the problem. All existing studies on public attitude and response to air quality are based on offline studies, with a limited number of survey participants and over limited number of geographical locations. For a larger survey size and global set of locations, we analyzed Twitter data collected over a period of nearly two years. We identify a limited number of hashtags (3) that can best correlate the frequency of tweets with local air quality (PM2.5) in three major cities around the world: Paris, London, and New Delhi. Using tweets with just these three hashtags, we determined that people's response to air quality in the three cities was nearly identical when considering relative changes in air pollution. Using machine learning algorithms, we determined that health concerns dominated public response when air quality degraded, with the strongest increase in concern being in New Delhi, where pollution levels are the highest amongst the three cities studied. The public call for political solutions when air quality worsens is consistent with similar findings with offline surveys in other cities. Our approach will allow for global analysis of public response to air quality and aid public health officials respond appropriately.
Citations
More filters
Journal ArticleDOI
01 Dec 2018
TL;DR: In this paper, a survey is conducted of a representative sample of Seoul Metropolitan Area residents to determine how attention to the air pollution issue drives attitudes and, in turn, how such attitudes may be associated with specific pollution-reduction actions.
Abstract: The lack of a coherent policy to address seasonal air pollution in Northeast Asia is partly due to the complexity of the issue—that it is both domestic and transboundary in nature—and partly due to media frames that emerge in response to seasonal fluctuations. To provide a better understanding of these factors as well as their potential impact on public opinion and behavior, this paper claims that the varying narratives conveyed in the Korean media have an impact on and provide a basis for assessing Koreans’ opinions about the problem of air pollution, both transboundary and domestically. Invoking the extant literature that connects media effects with public opinion about air pollution, the analysis proceeds in two stages. First, based on an analysis of Yonhap News Agency reporting, it is shown that the emerging media-based frames are dominated by China and health-related content. Second, and in light of these frames, a survey is conducted of a representative sample of Seoul Metropolitan Area residents to determine how attention to the air pollution issue drives attitudes and, in turn, how such attitudes may be associated with specific pollution-reduction actions. Consistent with the media-based frames, it is shown that one’s attentiveness to the air pollution issue increases the importance one places on reducing Korea’s air pollution. Knowledge about air pollution also decreases one’s satisfaction with both Korea’s and China’s air pollution-reduction efforts. Knowledge about air pollution does not impact on Koreans’ decision to minimize exposure to air pollution outdoors (by wearing masks) or indoors (by using air purifiers), but it does increase the likelihood that one will simply stay indoors, indicating that health concerns are paramount for the average citizen.

8 citations


Cites background from "Twitter data analysis to understand..."

  • ...(2015)], particularly social media dominated by health-related matters (Gurajala and Matthews 2018)....

    [...]

Journal ArticleDOI
TL;DR: In this paper , the authors used a multilayer classification model with first layer as an embedding layer and second layer as bi-directional long-short term memory (BiLSTM) layer.
Abstract: Social media platforms are one of the prominent new-age methods used by public for spreading awareness or drawing attention on an issue or concern. This study demonstrates how the twitter responses of public can be used for qualitative monitoring of air pollution in an urban area. Tweets discussing about air quality in Delhi, India, were extracted during 2019-2020 using a machine learning technique based on self-attention network. These tweets were cleaned, sorted, and classified into 3-class quality viz. poor air quality, good air quality, and noise or neutral tweets. The present study used a multilayer classification model with first layer as an embedding layer and second layer as bi-directional long-short term memory (BiLSTM) layer. A method was then devised for estimating PM2.5 concentration from the tweets using 'spaCy' similarity analysis of classified tweets and data extracted from Continuous Ambient Air Quality Monitoring Stations (CAAQMS) in Delhi for the study period. The accuracy of this estimation was found to be high (80-99%) for extreme air quality conditions (extremely good or severe) and lower during moderate variations in air quality. Application of this methodology depended on perceivable changes in air quality, twitter engagement, and environmental consciousness among public.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

37,861 citations

Posted Content
TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from this http URL.

28,898 citations

Book ChapterDOI
21 Apr 1998
TL;DR: This paper explores the use of Support Vector Machines for learning text classifiers from examples and analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task.
Abstract: This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.

8,658 citations


"Twitter data analysis to understand..." refers background in this paper

  • ...sparse data) [17] as is the case with tweets....

    [...]

01 Jan 2008
TL;DR: A simple procedure is proposed, which usually gives reasonable results and is suitable for beginners who are not familiar with SVM.
Abstract: Support vector machine (SVM) is a popular technique for classication. However, beginners who are not familiar with SVM often get unsatisfactory results since they miss some easy but signicant steps. In this guide, we propose a simple procedure, which usually gives reasonable results.

7,069 citations


"Twitter data analysis to understand..." refers methods in this paper

  • ...We use a Linear Support Vector Classifier (SVC) as it has been shown to be as accurate as a non-linear model when the feature set is large, as is the case here [14]....

    [...]

Proceedings Article
01 Jan 1998
TL;DR: It is found that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi -variateBernoulli model at any vocabulary size.
Abstract: Recent work in text classification has used two different first-order probabilistic models for classification, both of which make the naive Bayes assumption. Some use a multi-variate Bernoulli model, that is, a Bayesian Network with no dependencies between words and binary word features (e.g. Larkey and Croft 1996; Koller and Sahami 1997). Others use a multinomial model, that is, a uni-gram language model with integer word counts (e.g. Lewis and Gale 1994; Mitchell 1997). This paper aims to clarify the confusion by describing the differences and details of these two models, and by empirically comparing their classification performance on five text corpora. We find that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes--providing on average a 27% reduction in error over the multi-variate Bernoulli model at any vocabulary size.

3,601 citations


"Twitter data analysis to understand..." refers background in this paper

  • ...Naïve Bayes, however, has some well recognized problems, particularly the assumption of feature-independence ([22])....

    [...]