Open AccessProceedings Article
Dealing with big data: The case of Twitter
A.P.J. van den Bosch,E.F. Tjong Kim Sang +1 more
- Vol. 3, pp 121-134
TLDR
This paper shows how the data was collected and stored, and how the usefulness of this tweet analysis resource was determined: relating word frequency to real-life events, finding words related to a topic, and gathering information about conversations.Abstract:
As data sets keep growing, computational linguists are experiencing more big data problems: challenging demands on storage and processing caused by very large data sets. An example of this is dealing with social media data: including metadata, the messages of the social media site Twitter in 2012 comprise more than 250 terabytes of structured text. Handling data volumes like this requires parallel computing architectures with appropriate software tools. In this paper we present our experiences in working with such a big data set, a collection of two billion Dutch tweets. We show how we collected and stored the data. Next we deal with searching in the data using the Hadoop framework and visualizing search results. In order to determine the usefulness of this tweet analysis resource, we have performed three case studies based on the data: relating word frequency to real-life events, finding words related to a topic, and gathering information about conversations. The three case studies are presented in this paper. Access to this current and expanding tweet data set is offered via the website twiqs.nl.read more
Citations
More filters
Journal ArticleDOI
Social Media data: Challenges, opportunities and limitations in urban studies
TL;DR: A comprehensive and descriptive framework for the study of urban phenomena through LBSN data is the main contribution of this study.
Journal ArticleDOI
Signaling sarcasm
TL;DR: It is hypothesized that explicit markers such as hashtags are the digital extralinguistic equivalent of non-verbal expressions that people employ in live interaction when conveying sarcasm.
Journal ArticleDOI
Too Far to Care? Measuring Public Attention and Fear for Ebola Using Twitter
TL;DR: Spatial and social distance are important predictors of public attention to worldwide crisis such as epidemics and need to be taken into account when communicating about human tragedies.
Journal ArticleDOI
Big data and social media: A scientometrics analysis
TL;DR: Thematic analysis shows that the subject nearly maintained an important and well-developed research field and for better results the research can merge with “big data analytics” and “twitter” that are important topics in this field but not developed well.
Journal Article
Extracting Actionable Information from Microtexts
TL;DR: This dissertation proposes a semi-automatic method for extracting actionable information from microtexts and suggests a method which facilitates the definition of relevance for an analyst’s context and the use of this definition to analyze new data.
References
More filters
Proceedings Article
Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment
TL;DR: It is found that the mere number of messages mentioning a party reflects the election result, and joint mentions of two parties are in line with real world political ties and coalitions.
Proceedings ArticleDOI
WTF: the who to follow service at Twitter
TL;DR: An architectural overview of the architecture of WTF is provided and a few graph recommendation algorithms implemented in Cassovary are described and evaluated, including a novel approach based on a combination of random walks and SALSA.
Book ChapterDOI
Using Statistics in Lexical Analysis
TL;DR: The computational tools available for studying machine-readable corpora are at present still rather primitive and use these corpora and the basic concordancing tool mentioned above to fill in detailed syntactic descriptions (prompting a move, towards more thorough descriptions of lexical syntax).
Proceedings Article
Recognizing Named Entities in Tweets
TL;DR: This work proposes to combine a K-Nearest Neighbors classifier with a linear Conditional Random Fields model under a semi-supervised learning framework to tackle the challenges of Named Entities Recognition for tweets.
Book
Lexical acquisition: Exploiting on-line resources to build a lexicon.
TL;DR: This book discusses Lexical Acquisition Through Symbol Recirculation, Lexical Representation, and Lexicons for Broad Coverage Semantics, which are concerned with the acquisition of semantic meaning in the Lexical Knowledge-Base.