Dealing with big data: The case of Twitter

Open AccessProceedings Article

Dealing with big data: The case of Twitter

- Vol. 3, pp 121-134

TLDR

This paper shows how the data was collected and stored, and how the usefulness of this tweet analysis resource was determined: relating word frequency to real-life events, finding words related to a topic, and gathering information about conversations.

Abstract:

As data sets keep growing, computational linguists are experiencing more big data problems: challenging demands on storage and processing caused by very large data sets. An example of this is dealing with social media data: including metadata, the messages of the social media site Twitter in 2012 comprise more than 250 terabytes of structured text. Handling data volumes like this requires parallel computing architectures with appropriate software tools. In this paper we present our experiences in working with such a big data set, a collection of two billion Dutch tweets. We show how we collected and stored the data. Next we deal with searching in the data using the Hadoop framework and visualizing search results. In order to determine the usefulness of this tweet analysis resource, we have performed three case studies based on the data: relating word frequency to real-life events, finding words related to a topic, and gathering information about conversations. The three case studies are presented in this paper. Access to this current and expanding tweet data set is offered via the website twiqs.nl.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Social Media data: Challenges, opportunities and limitations in urban studies

Pablo Martí, +2 more

- 01 Mar 2019 -

Computers, Environment and Urban Systems

TL;DR: A comprehensive and descriptive framework for the study of urban phenomena through LBSN data is the main contribution of this study.

...read moreread less

Journal ArticleDOI

Signaling sarcasm

F.A. Kunneman, +3 more

- 01 Jul 2015 -

Information Processing and Management

TL;DR: It is hypothesized that explicit markers such as hashtags are the digital extralinguistic equivalent of non-verbal expressions that people employ in live interaction when conveying sarcasm.

...read moreread less

Journal ArticleDOI

Too Far to Care? Measuring Public Attention and Fear for Ebola Using Twitter

Liza G. G. van Lent, +4 more

- 13 Jun 2017 -

Journal of Medical Internet Research

TL;DR: Spatial and social distance are important predictors of public attention to worldwide crisis such as epidemics and need to be taken into account when communicating about human tragedies.

...read moreread less

Journal ArticleDOI

Big data and social media: A scientometrics analysis

Hossein Jelvehgaran Esfahani, +2 more

TL;DR: Thematic analysis shows that the subject nearly maintained an important and well-developed research field and for better results the research can merge with “big data analytics” and “twitter” that are important topics in this field but not developed well.

...read moreread less

Journal Article

Extracting Actionable Information from Microtexts

Ali Hürriyetoğlu

- 01 Jan 2019 -

arXiv: Computation and Language

TL;DR: This dissertation proposes a semi-automatic method for extracting actionable information from microtexts and suggests a method which facilitates the definition of relevance for an analyst’s context and the use of this definition to analyze new data.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment

Andranik Tumasjan, +3 more

TL;DR: It is found that the mere number of messages mentioning a party reflects the election result, and joint mentions of two parties are in line with real world political ties and coalitions.

...read moreread less

Proceedings ArticleDOI

WTF: the who to follow service at Twitter

Pankaj Gupta, +5 more

TL;DR: An architectural overview of the architecture of WTF is provided and a few graph recommendation algorithms implemented in Cassovary are described and evaluated, including a novel approach based on a combination of random walks and SALSA.

...read moreread less

Book ChapterDOI

Using Statistics in Lexical Analysis

Kenneth Church, +3 more

TL;DR: The computational tools available for studying machine-readable corpora are at present still rather primitive and use these corpora and the basic concordancing tool mentioned above to fill in detailed syntactic descriptions (prompting a move, towards more thorough descriptions of lexical syntax).

...read moreread less

Proceedings Article

Recognizing Named Entities in Tweets

Xiaohua Liu, +3 more

TL;DR: This work proposes to combine a K-Nearest Neighbors classifier with a linear Conditional Random Fields model under a semi-supervised learning framework to tackle the challenges of Named Entities Recognition for tweets.

...read moreread less

Book

Lexical acquisition: Exploiting on-line resources to build a lexicon.

Uri Zernik

TL;DR: This book discusses Lexical Acquisition Through Symbol Recirculation, Lexical Representation, and Lexicons for Broad Coverage Semantics, which are concerned with the acquisition of semantic meaning in the Lexical Knowledge-Base.

...read moreread less

Dealing with big data: The case of Twitter

Citations

Social Media data: Challenges, opportunities and limitations in urban studies

Signaling sarcasm

Too Far to Care? Measuring Public Attention and Fear for Ebola Using Twitter

Big data and social media: A scientometrics analysis

Extracting Actionable Information from Microtexts

References

Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment

WTF: the who to follow service at Twitter

Using Statistics in Lexical Analysis

Recognizing Named Entities in Tweets

Lexical acquisition: Exploiting on-line resources to build a lexicon.

Related Papers (5)

Open domain event extraction from twitter

Multi-lingual sentiment analysis of Twitter data by using classification algorithms

Social Web Data Analytics: Relevance, Redundancy, Diversity

Entity extraction, linking, classification, and tagging for social media: a wikipedia-based approach

A linguistic approach for determining the topics of Spanish Twitter messages