A Dependency Parser for Tweets

doi:10.3115/V1/D14-1108

Open AccessProceedings ArticleDOI

A Dependency Parser for Tweets

Lingpeng Kong, +5 more

- pp 1001-1012

Chats0

TLDR

A new dependency parser for English tweets, TWEEBOPARSER, which builds on several contributions: new syntactic annotations for a corpus of tweets, with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data.

Abstract:

We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions. Our dataset and parser can be found at http://www.ark.cs.cmu.edu/TweetNLP.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A survey on opinion mining and sentiment analysis

Kumar Satish Ravi, +1 more

- 01 Nov 2015 -

Knowledge Based Systems

TL;DR: A rigorous survey on sentiment analysis is presented, which portrays views presented by over one hundred articles published in the last decade regarding necessary tasks, approaches, and applications of sentiment analysis.

...read moreread less

Proceedings ArticleDOI

Two/Too Simple Adaptations of Word2Vec for Syntax Problems

Wang Ling, +3 more

TL;DR: Two simple modifications to the models in the popular Word2Vec tool are presented, in order to generate embeddings more suited to tasks involving syntax.

...read moreread less

Proceedings Article

Contextualized Sarcasm Detection on Twitter

David Bamman, +1 more

TL;DR: By including extra-linguistic information from the context of an utterance on Twitter — such as properties of the author, the audience and the immediate communicative environment — this work is able to achieve gains in accuracy compared to purely linguistic features in the detection of this complex phenomenon, while also shedding light on features of interpersonal interaction that enable sarcasm in conversation.

...read moreread less

Proceedings Article

Target-dependent twitter sentiment classification with rich automatic features

Duy Tin Vo, +1 more

TL;DR: This paper shows that competitive results can be achieved without the use of syntax, by extracting a rich set of automatic features from a tweet, using distributed word representations and neural pooling functions to extract features.

...read moreread less

Proceedings ArticleDOI

That's So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets

William Yang Wang, +1 more

TL;DR: In quantitative analysis, it is shown that lexical and syntactic features are useful for automatic categorization of annoying behaviors, and frame-semantic features further boost the performance; that leveraging large lexical embeddings to create additional training instances significantly improves the lexical model; and incorporating frame- semantic embedding achieves the best overall performance.

...read moreread less

Collapse

References

PDF

Open Access

More filters

ReportDOI

Building a large annotated corpus of English: the penn treebank

Mitchell Marcus, +2 more

- 01 Jun 1993 -

Computational Linguistics

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.

...read moreread less

Journal ArticleDOI

Class-based n -gram models of natural language

Peter Fitzhugh Brown, +4 more

- 01 Dec 1992 -

Computational Linguistics

TL;DR: This work addresses the problem of predicting a word from previous words in a sample of text and discusses n-gram models based on classes of words, finding that these models are able to extract classes that have the flavor of either syntactically based groupings or semanticallybased groupings, depending on the nature of the underlying statistics.

...read moreread less

Proceedings Article

Word Representations: A Simple and General Method for Semi-Supervised Learning

Joseph Turian, +2 more

TL;DR: This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.

...read moreread less

Proceedings ArticleDOI

Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

Michael Collins

TL;DR: Experimental results on part-of-speech tagging and base noun phrase chunking are given, in both cases showing improvements over results for a maximum-entropy tagger.

...read moreread less

Proceedings Article

Named Entity Recognition in Tweets: An Experimental Study

Alan Ritter, +2 more

TL;DR: The novel T-ner system doubles F1 score compared with the Stanford NER system, and leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision.

...read moreread less