Empath: Understanding Topic Signals in Large-Scale Text

doi:10.1145/2858036.2858535

Open AccessProceedings ArticleDOI

Empath: Understanding Topic Signals in Large-Scale Text

Ethan Fast, +2 more

- pp 4647-4657

Chats0

TLDR

Empath is a tool that can generate and validate new lexical categories on demand from a small set of seed terms, which draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction.

Abstract:

Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

A New Chatbot for Customer Service on Social Media

Anbang Xu, +4 more

TL;DR: A new conversational system to automatically generate responses for users requests on social media that is integrated with state-of-the-art deep learning techniques and is trained by nearly 1M Twitter conversations between users and agents from over 60 brands.

...read moreread less

Journal ArticleDOI

A survey of multimodal sentiment analysis

Mohammad Soleymani, +8 more

- 01 Sep 2017 -

Image and Vision Computing

TL;DR: The thesis is that multimodal sentiment analysis holds a significant untapped potential with the arrival of complementary data streams for improving and going beyond text-based sentiment analysis.

...read moreread less

Posted Content

Measuring Emotions in the COVID-19 Real World Worry Dataset

Bennett Kleinberg, +2 more

- 08 Apr 2020 -

arXiv: Computation and Language

TL;DR: This paper presents the first ground truth dataset of emotional responses to COVID-19, asking participants to indicate their emotions and express these in text and suggests that emotional responses correlated with linguistic measures.

...read moreread less

Proceedings ArticleDOI

Gender and Representation Bias in GPT-3 Generated Stories

Li Lucy, +1 more

TL;DR: The authors found that stories generated by GPT-3 exhibit many known gender stereotypes, with feminine characters more likely to be associated with family and appearance, and described as less powerful than masculine characters, even when associated with high power verbs in a prompt.

...read moreread less

Journal ArticleDOI

Linguistic Signals under Misinformation and Fact-Checking: Evidence from User Comments on Social Media

Shan Jiang, +1 more

TL;DR: It is found that linguistic signals in user comments vary significantly with the veracity of posts, e.g., more misinformation-awareness signals and extensive emoji and swear word usage with falser posts, and that these signals can help to detect misinformation.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Journal ArticleDOI

WordNet: a lexical database for English

George A. Miller

- 01 Nov 1995 -

Communications of The ACM

TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.

...read moreread less

Proceedings Article

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Richard Socher, +6 more

TL;DR: A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

...read moreread less

Proceedings ArticleDOI

Thumbs up? Sentiment Classification using Machine Learning Techniques

Bo Pang, +2 more

TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.

...read moreread less

Journal ArticleDOI

The psychological meaning of words: LIWC and computerized text analysis methods

Yla R. Tausczik, +1 more

- 01 Mar 2010 -

Journal of Language and Social Psycholog...

TL;DR: The Linguistic Inquiry and Word Count (LIWC) system as discussed by the authors is a text analysis system that counts words in psychologically meaningful categories to detect meaning in a wide variety of experimental settings, including to show attentional focus, emotionality, social relationships, thinking styles and individual differences.

...read moreread less