scispace - formally typeset
Open AccessProceedings ArticleDOI

Empath: Understanding Topic Signals in Large-Scale Text

Reads0
Chats0
TLDR
Empath is a tool that can generate and validate new lexical categories on demand from a small set of seed terms, which draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction.
Abstract
Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.

read more

Citations
More filters
Proceedings ArticleDOI

A New Chatbot for Customer Service on Social Media

TL;DR: A new conversational system to automatically generate responses for users requests on social media that is integrated with state-of-the-art deep learning techniques and is trained by nearly 1M Twitter conversations between users and agents from over 60 brands.
Journal ArticleDOI

A survey of multimodal sentiment analysis

TL;DR: The thesis is that multimodal sentiment analysis holds a significant untapped potential with the arrival of complementary data streams for improving and going beyond text-based sentiment analysis.
Posted Content

Measuring Emotions in the COVID-19 Real World Worry Dataset

TL;DR: This paper presents the first ground truth dataset of emotional responses to COVID-19, asking participants to indicate their emotions and express these in text and suggests that emotional responses correlated with linguistic measures.
Proceedings ArticleDOI

Gender and Representation Bias in GPT-3 Generated Stories

Li Lucy, +1 more
TL;DR: The authors found that stories generated by GPT-3 exhibit many known gender stereotypes, with feminine characters more likely to be associated with family and appearance, and described as less powerful than masculine characters, even when associated with high power verbs in a prompt.
Journal ArticleDOI

Linguistic Signals under Misinformation and Fact-Checking: Evidence from User Comments on Social Media

TL;DR: It is found that linguistic signals in user comments vary significantly with the veracity of posts, e.g., more misinformation-awareness signals and extensive emoji and swear word usage with falser posts, and that these signals can help to detect misinformation.
References
More filters
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Journal ArticleDOI

WordNet: a lexical database for English

TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.
Proceedings Article

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

TL;DR: A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
Proceedings ArticleDOI

Thumbs up? Sentiment Classification using Machine Learning Techniques

TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Journal ArticleDOI

The psychological meaning of words: LIWC and computerized text analysis methods

TL;DR: The Linguistic Inquiry and Word Count (LIWC) system as discussed by the authors is a text analysis system that counts words in psychologically meaningful categories to detect meaning in a wide variety of experimental settings, including to show attentional focus, emotionality, social relationships, thinking styles and individual differences.
Related Papers (5)