A Dependency Parser for Tweets
Lingpeng Kong,Nathan Schneider,Swabha Swayamdipta,Archna Bhatia,Chris Dyer,Noah A. Smith +5 more
- pp 1001-1012
Reads0
Chats0
TLDR
A new dependency parser for English tweets, TWEEBOPARSER, which builds on several contributions: new syntactic annotations for a corpus of tweets, with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data.Abstract:
We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled attachment accuracy on our new, high-quality test set and measure the benefit of our contributions.
Our dataset and parser can be found at http://www.ark.cs.cmu.edu/TweetNLP.read more
Citations
More filters
Journal ArticleDOI
A survey on opinion mining and sentiment analysis
Kumar Satish Ravi,Vadlamani Ravi +1 more
TL;DR: A rigorous survey on sentiment analysis is presented, which portrays views presented by over one hundred articles published in the last decade regarding necessary tasks, approaches, and applications of sentiment analysis.
Proceedings ArticleDOI
Two/Too Simple Adaptations of Word2Vec for Syntax Problems
TL;DR: Two simple modifications to the models in the popular Word2Vec tool are presented, in order to generate embeddings more suited to tasks involving syntax.
Proceedings Article
Contextualized Sarcasm Detection on Twitter
David Bamman,Noah A. Smith +1 more
TL;DR: By including extra-linguistic information from the context of an utterance on Twitter — such as properties of the author, the audience and the immediate communicative environment — this work is able to achieve gains in accuracy compared to purely linguistic features in the detection of this complex phenomenon, while also shedding light on features of interpersonal interaction that enable sarcasm in conversation.
Proceedings Article
Target-dependent twitter sentiment classification with rich automatic features
Duy Tin Vo,Yue Zhang +1 more
TL;DR: This paper shows that competitive results can be achieved without the use of syntax, by extracting a rich set of automatic features from a tweet, using distributed word representations and neural pooling functions to extract features.
Proceedings ArticleDOI
That's So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets
William Yang Wang,Diyi Yang +1 more
TL;DR: In quantitative analysis, it is shown that lexical and syntactic features are useful for automatic categorization of annoying behaviors, and frame-semantic features further boost the performance; that leveraging large lexical embeddings to create additional training instances significantly improves the lexical model; and incorporating frame- semantic embedding achieves the best overall performance.
References
More filters
ReportDOI
Building a large annotated corpus of English: the penn treebank
TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Journal ArticleDOI
Class-based n -gram models of natural language
Peter Fitzhugh Brown,Peter Vincent Desouza,Robert Leroy Mercer,Vincent J. Della Pietra,Jenifer C. Lai +4 more
TL;DR: This work addresses the problem of predicting a word from previous words in a sample of text and discusses n-gram models based on classes of words, finding that these models are able to extract classes that have the flavor of either syntactically based groupings or semanticallybased groupings, depending on the nature of the underlying statistics.
Proceedings Article
Word Representations: A Simple and General Method for Semi-Supervised Learning
TL;DR: This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.
Proceedings ArticleDOI
Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms
TL;DR: Experimental results on part-of-speech tagging and base noun phrase chunking are given, in both cases showing improvements over results for a maximum-entropy tagger.
Proceedings Article
Named Entity Recognition in Tweets: An Experimental Study
TL;DR: The novel T-ner system doubles F1 score compared with the Stanford NER system, and leverages the redundancy inherent in tweets to achieve this performance, using LabeledLDA to exploit Freebase dictionaries as a source of distant supervision.