Open AccessProceedings Article
What to do about bad language on the internet
Jacob Eisenstein
- pp 359-369
Reads0
Chats0
TLDR
A critical review of the NLP community's response to the landscape of bad language is offered, and a quantitative analysis of the lexical diversity of social media text, and its relationship to other corpora is presented.Abstract:
The rise of social media has brought computational linguistics in ever-closer contact with bad language: text that defies our expectations about vocabulary, spelling, and syntax. This paper surveys the landscape of bad language, and offers a critical review of the NLP community’s response, which has largely followed two paths: normalization and domain adaptation. Each approach is evaluated in the context of theoretical and empirical work on computer-mediated communication. In addition, the paper presents a quantitative analysis of the lexical diversity of social media text, and its relationship to other corpora.read more
Citations
More filters
Proceedings Article
Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters
TL;DR: This work systematically evaluates the use of large-scale unsupervised word clustering and new lexical features to improve tagging accuracy on Twitter and achieves state-of-the-art tagging results on both Twitter and IRC POS tagging tasks.
Journal ArticleDOI
Survey of the state of the art in natural language generation: core tasks, applications and evaluation
Albert Gatt,Emiel Krahmer +1 more
TL;DR: A survey of the state of the art in natural language generation can be found in this article, with an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organized.
Proceedings ArticleDOI
BERTweet: A pre-trained language model for English Tweets
TL;DR: BERweet as discussed by the authors is the first large-scale pre-trained language model for English Tweets, having the same architecture as BERT-base and is trained using the RoBERTa pre-training procedure.
Journal ArticleDOI
Predicting crime using Twitter and kernel density estimation
TL;DR: This article uses Twitter-specific linguistic analysis and statistical topic modeling to automatically identify discussion topics across a major city in the United States and shows that the addition of Twitter data improves crime prediction performance versus a standard approach based on kernel density estimation.
Posted Content
Language (Technology) is Power: A Critical Survey of "Bias" in NLP
TL;DR: The authors survey 146 papers analyzing "bias" in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing bias is an inherently normative process.
References
More filters
Proceedings Article
Joint Inference of Named Entity Recognition and Normalization for Tweets
TL;DR: A novel graphical model is proposed to simultaneously conduct NER and NEN on multiple tweets to address the problem of named entity normalization for tweets, which introduces a binary random variable for each pair of words with the same lemma across similar tweets.
Contextual Bearing on Linguistic Variation in Social Media
TL;DR: This article investigated the writing conventions that different groups of users use to express themselves in microtexts and found that different populations of users exhibit different amounts of shortened English terms and different shortening styles.
Language change and digital media: A review of conceptions and evidence
Tore Kristiansen,Nikolas Coupland,Barbara Soukup,Sylvia Moosmüller,Frans Gregersen,Peter Garrett,Charlotte Selleck,Pirkko Nuolijärvi,Johanna Vaattovaara,Jan-Ola Östman,Leila Mattfolk,Philipp Stoeckle,Christoph Hare Svenstrup,Stephen Pax Leonard,Kristján Árnason,Tadhg Ó hIfearnáin,Noel Ó Murchadha,Loreta Vaicekauskienė,Stefan Grondelaers,Roeland van Hout,Helge Sandøy,Mats Thelander,Elen Robert,Jannis Androutsopoulos,Peter Auer,Helmut Spiekermann,Allan Bell,Dirk Speelman,Jane Stuart-Smith +28 more
Proceedings Article
Dude, srsly?: The Surprisingly Formal Nature of Twitter's Language
TL;DR: Twitter’s language is surprisingly more conservative, and less informal than SMS and online chat, and Twitter users appear to be developing linguistically unique styles, as well as several key insights.
Proceedings Article
From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0
Jennifer Foster,Özlem Çetinoğlu,Joachim Wagner,Joseph Le Roux,Joakim Nivre,Deirdre Hogan,Josef van Genabith +6 more
TL;DR: It is found that the Wall-Street-Journal-trained statistical parsers have a particular problem with tweets and that a substantial part of this problem is related to POS tagging accuracy.