Discovering explanatory models to identify relevant tweets on Zika
read more
Citations
What Are People Tweeting about Zika? An Exploratory Study Concerning Symptoms, Treatment, Transmission, and Prevention
Sentiment Analysis of Twitter Data: A Hybrid Approach
Dynamics of Health Agency Response and Public Engagement in Public Health Emergency: A Case Study of CDC Tweeting Patterns During the 2016 Zika Epidemic.
Identifying Key Topics Bearing Negative Sentiment on Twitter: Insights Concerning the 2015-2016 Zika Epidemic.
What do College Undergraduates Know about Zika and What Precautions Are They Willing to Take to Prevent its Spread
References
The measurement of observer agreement for categorical data
The scree test for the number of factors
Interrater reliability: the kappa statistic
AIC model selection using Akaike weights
Sentiment Analysis of Twitter Data
Related Papers (5)
Frequently Asked Questions (11)
Q2. How many features were generated by POS tagger?
Component 1 was comprised of topical features generated by n-grams such as ’birth defects’, ’cdc’, ’microcephaly’, whereas, component 2 was comprised of lexical features generated by POS tagger such as ’adverb’, ’pronoun’, ’verb’.
Q3. What is the purpose of this study?
This study focuses on extracting features using Part of Speech tagging and N-gram techniques and identifying the set of features through model selection which will improve the classification results.
Q4. What were the features removed from the tweets?
As part of pre-processing, URLs, hashtags, and stopwords were removed from the tweets as these terms appear commonly in tweets and will not help the classifier to learn and distinguish Zika related tweets.
Q5. What did the authors find useful in classifying the tweets?
The authors also observed that the Stepwise model contains 15 POS tag features, indicating that lexical components were useful in discriminating between the relevant and non-relevant tweets.
Q6. What features were excluded from the tweet?
From these 25 POS features,two features were excluded namely ’existential verbal’ and ’proper noun verbal’, as none of the tweets contained those two features.
Q7. What was the ground truth of the study?
The ground truth consisted of 100 annotated tweets for 15 categories which were then compared to crowd sourced annotators by calculating Kappa values for each of the categories.
Q8. What is the model for a probabilistic framework?
the Stepwise model was chosen as the best model based on the Akaike weights (w), which are used to give the relative likelihood of existence among the models within a probabilistic framework [18].
Q9. How did the authors improve the relevance classifier?
From this study, the authors were successfully able to not only improve the performance of the relevance classifier as compared to the state-of-the-art classifiers [8], but were also able to extract meaningful and explanatory features for classification, as compared to the complete set of unigram features (1000 total) used for classification in their earlier study [8].
Q10. How many features were used to determine the correlation between the principal components?
Since there is a low correlation between the individual features, it is highly unlikely for the features to have high correlation with the principal components.
Q11. What is the method for generating features from tweets?
Theclassifiers used the whole tweet to generate a set of features whereas in this study, the authors extracted features based on natural language techniques to build a simple model with a fewer number of features.