Named Entity Recognition in Tweets: An Experimental Study

Open AccessProceedings Article

Named Entity Recognition in Tweets: An Experimental Study

Alan Ritter, +2 more

- pp 1524-1534

Chats0

TLDR

The novel T-ner system doubles F1 score compared with the Stanford NER system, and leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision.

Abstract:

People tweet more than 100 Million times daily, yielding a noisy, informal, but sometimes informative corpus of 140-character messages that mirrors the zeitgeist in an unprecedented manner. The performance of standard NLP tools is severely degraded on tweets. This paper addresses this issue by re-building the NLP pipeline beginning with part-of-speech tagging, through chunking, to named-entity recognition. Our novel T-ner system doubles F1 score compared with the Stanford NER system. T-ner leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision. LabeledLDA outperforms co-training, increasing F1 by 25% over ten common entity types. Our NLP tools are available at: http://github.com/aritter/twitter_nlp

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

SemEval-2017 Task 4: Sentiment Analysis in Twitter

Sara Rosenthal, +2 more

TL;DR: Crowdourcing on Amazon Mechanical Turk was used to label a large Twitter training dataset along with additional test sets of Twitter and SMS messages for both subtasks, which included two subtasks: A, an expression-level subtask, and B, a message level subtask.

...read moreread less

Journal ArticleDOI

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

Han Hu, +3 more

- 24 Jun 2014 -

IEEE Access

TL;DR: This paper presents a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics, and presents the prevalent Hadoop framework for addressing big data challenges.

...read moreread less

Proceedings Article

Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters

Olutobi Owoputi, +5 more

TL;DR: This work systematically evaluates the use of large-scale unsupervised word clustering and new lexical features to improve tagging accuracy on Twitter and achieves state-of-the-art tagging results on both Twitter and IRC POS tagging tasks.

...read moreread less

Journal ArticleDOI

Processing Social Media Messages in Mass Emergency: A Survey

Muhammad Imran, +3 more

- 26 Jun 2015 -

ACM Computing Surveys

TL;DR: This survey surveys the state of the art regarding computational methods to process social media messages and highlights both their contributions and shortcomings, and methodically examines a series of key subproblems ranging from the detection of events to the creation of actionable and useful summaries.

...read moreread less

Journal ArticleDOI

A Survey of Techniques for Event Detection in Twitter

Farzindar Atefeh, +1 more

TL;DR: A survey of techniques for event detection from Twitter streams aimed at finding real‐world occurrences that unfold over space and time and highlights the need for public benchmarks to evaluate the performance of different detection approaches and various features.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Proceedings Article

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +2 more

TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

...read moreread less

Probabilistic Models for Segmenting and Labeling Sequence Data

John Lafferty, +3 more

ReportDOI

Building a large annotated corpus of English: the penn treebank

Mitchell Marcus, +2 more

- 01 Jun 1993 -

Computational Linguistics

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.

...read moreread less

Collapse

Named Entity Recognition in Tweets: An Experimental Study

Citations

SemEval-2017 Task 4: Sentiment Analysis in Twitter

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters

Processing Social Media Messages in Mass Emergency: A Survey

A Survey of Techniques for Event Detection in Twitter

References

Latent dirichlet allocation

Latent Dirichlet Allocation

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Probabilistic Models for Segmenting and Labeling Sequence Data

Building a large annotated corpus of English: the penn treebank

Related Papers (5)

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling

Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

Glove: Global Vectors for Word Representation

Earthquake shakes Twitter users: real-time event detection by social sensors