scispace - formally typeset
Open AccessProceedings Article

What to do about bad language on the internet

Reads0
Chats0
TLDR
A critical review of the NLP community's response to the landscape of bad language is offered, and a quantitative analysis of the lexical diversity of social media text, and its relationship to other corpora is presented.
Abstract
The rise of social media has brought computational linguistics in ever-closer contact with bad language: text that defies our expectations about vocabulary, spelling, and syntax. This paper surveys the landscape of bad language, and offers a critical review of the NLP community’s response, which has largely followed two paths: normalization and domain adaptation. Each approach is evaluated in the context of theoretical and empirical work on computer-mediated communication. In addition, the paper presents a quantitative analysis of the lexical diversity of social media text, and its relationship to other corpora.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Char2Subword: Extending the Subword Embedding Space Using Robust Character Compositionality.

TL;DR: This paper proposed a character-based subword module (char2subword) that learns the subword embedding table in pre-trained models like BERT to alleviate the problem of infrequent spelling sequences.
Book ChapterDOI

PolSentiLex: Sentiment Detection in Socio-Political Discussions on Russian Social Media

TL;DR: The authors presented a freely available Russian language sentiment lexicon PolSentiLex designed to detect sentiment in user-generated content related to social and political issues, which was generated from a database of posts and comments of the top 2,000 LiveJournal bloggers posted during one year (\(\sim \)1.5 million posts and 20 million comments).
DissertationDOI

Language change and evolution in Online Social Networks

TL;DR: The results demonstrate that the grammatical context through which innovations emerge also play an essential role in diffusion dynamics - this indicates that the adoption of new words is enabled by a complex interplay of both network and linguistic factors.
DissertationDOI

Inferring Aspect-Specific Opinion Structure in Product Reviews

David Carter
TL;DR: The implementation of a semi-supervised co-trainingmachine classification method for identifying both product aspects and sentiments expressed about such aspects, which was above the mean in its ability to identify the aspects of restaurants about which people expressed opinions, even when co- training using only half of the labelled training data at the outset.
Proceedings ArticleDOI

Measuring and Modeling Language Change

TL;DR: This tutorial will provide an overview of techniques and datasets from the quantitative social sciences and the digital humanities, which are not well-known in the computational linguistics community, which include vector autoregressive models, multiple comparisons corrections for hypothesis testing, and causal inference.
References
More filters
Proceedings ArticleDOI

Earthquake shakes Twitter users: real-time event detection by social sensors

TL;DR: This paper investigates the real-time interaction of events such as earthquakes in Twitter and proposes an algorithm to monitor tweets and to detect a target event and produces a probabilistic spatiotemporal model for the target event that can find the center and the trajectory of the event location.
Journal ArticleDOI

Critical questions for big data

TL;DR: The era of Big Data has begun as discussed by the authors, where diverse groups argue about the potential benefits and costs of analyzing genetic sequences, social media interactions, health records, phone logs, government records, and other digital traces left by people.
Proceedings ArticleDOI

Feature-rich part-of-speech tagging with a cyclic dependency network

TL;DR: A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models.
Book

Natural Language Processing with Python

TL;DR: This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation.
Proceedings ArticleDOI

Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling

TL;DR: By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference.
Related Papers (5)