scispace - formally typeset
Open AccessProceedings ArticleDOI

A Dependency Parser for Tweets

Reads0
Chats0
TLDR
A new dependency parser for English tweets, TWEEBOPARSER, which builds on several contributions: new syntactic annotations for a corpus of tweets, with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data.
Abstract
We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions. Our dataset and parser can be found at http://www.ark.cs.cmu.edu/TweetNLP.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A survey on opinion mining and sentiment analysis

TL;DR: A rigorous survey on sentiment analysis is presented, which portrays views presented by over one hundred articles published in the last decade regarding necessary tasks, approaches, and applications of sentiment analysis.
Proceedings ArticleDOI

Two/Too Simple Adaptations of Word2Vec for Syntax Problems

TL;DR: Two simple modifications to the models in the popular Word2Vec tool are presented, in order to generate embeddings more suited to tasks involving syntax.
Proceedings Article

Contextualized Sarcasm Detection on Twitter

TL;DR: By including extra-linguistic information from the context of an utterance on Twitter — such as properties of the author, the audience and the immediate communicative environment — this work is able to achieve gains in accuracy compared to purely linguistic features in the detection of this complex phenomenon, while also shedding light on features of interpersonal interaction that enable sarcasm in conversation.
Proceedings Article

Target-dependent twitter sentiment classification with rich automatic features

TL;DR: This paper shows that competitive results can be achieved without the use of syntax, by extracting a rich set of automatic features from a tweet, using distributed word representations and neural pooling functions to extract features.
Proceedings ArticleDOI

That's So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets

TL;DR: In quantitative analysis, it is shown that lexical and syntactic features are useful for automatic categorization of annoying behaviors, and frame-semantic features further boost the performance; that leveraging large lexical embeddings to create additional training instances significantly improves the lexical model; and incorporating frame- semantic embedding achieves the best overall performance.
References
More filters
ReportDOI

Building a large annotated corpus of English: the penn treebank

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Journal ArticleDOI

Class-based n -gram models of natural language

TL;DR: This work addresses the problem of predicting a word from previous words in a sample of text and discusses n-gram models based on classes of words, finding that these models are able to extract classes that have the flavor of either syntactically based groupings or semanticallybased groupings, depending on the nature of the underlying statistics.
Proceedings Article

Word Representations: A Simple and General Method for Semi-Supervised Learning

TL;DR: This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.
Proceedings ArticleDOI

Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms

TL;DR: Experimental results on part-of-speech tagging and base noun phrase chunking are given, in both cases showing improvements over results for a maximum-entropy tagger.
Proceedings Article

Named Entity Recognition in Tweets: An Experimental Study

TL;DR: The novel T-ner system doubles F1 score compared with the Stanford NER system, and leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision.
Related Papers (5)