Empath: Understanding Topic Signals in Large-Scale Text
Ethan Fast,Binbin Chen,Michael S. Bernstein +2 more
- pp 4647-4657
Reads0
Chats0
TLDR
Empath is a tool that can generate and validate new lexical categories on demand from a small set of seed terms, which draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction.Abstract:
Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.read more
Citations
More filters
Proceedings ArticleDOI
A New Chatbot for Customer Service on Social Media
TL;DR: A new conversational system to automatically generate responses for users requests on social media that is integrated with state-of-the-art deep learning techniques and is trained by nearly 1M Twitter conversations between users and agents from over 60 brands.
Journal ArticleDOI
A survey of multimodal sentiment analysis
Mohammad Soleymani,David Garcia,Brendan Jou,Björn Schuller,Björn Schuller,Björn Schuller,Shih-Fu Chang,Maja Pantic,Maja Pantic +8 more
TL;DR: The thesis is that multimodal sentiment analysis holds a significant untapped potential with the arrival of complementary data streams for improving and going beyond text-based sentiment analysis.
Posted Content
Measuring Emotions in the COVID-19 Real World Worry Dataset
TL;DR: This paper presents the first ground truth dataset of emotional responses to COVID-19, asking participants to indicate their emotions and express these in text and suggests that emotional responses correlated with linguistic measures.
Proceedings ArticleDOI
Gender and Representation Bias in GPT-3 Generated Stories
Li Lucy,David Bamman +1 more
TL;DR: The authors found that stories generated by GPT-3 exhibit many known gender stereotypes, with feminine characters more likely to be associated with family and appearance, and described as less powerful than masculine characters, even when associated with high power verbs in a prompt.
Journal ArticleDOI
Linguistic Signals under Misinformation and Fact-Checking: Evidence from User Comments on Social Media
Shan Jiang,Christo Wilson +1 more
TL;DR: It is found that linguistic signals in user comments vary significantly with the veracity of posts, e.g., more misinformation-awareness signals and extensive emoji and swear word usage with falser posts, and that these signals can help to detect misinformation.
References
More filters
Proceedings Article
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Journal ArticleDOI
WordNet: a lexical database for English
TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.
Proceedings Article
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
Richard Socher,Alex Perelygin,Jean Y. Wu,Jason Chuang,Christopher D. Manning,Andrew Y. Ng,Christopher Potts +6 more
TL;DR: A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
Proceedings ArticleDOI
Thumbs up? Sentiment Classification using Machine Learning Techniques
TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Journal ArticleDOI
The psychological meaning of words: LIWC and computerized text analysis methods
TL;DR: The Linguistic Inquiry and Word Count (LIWC) system as discussed by the authors is a text analysis system that counts words in psychologically meaningful categories to detect meaning in a wide variety of experimental settings, including to show attentional focus, emotionality, social relationships, thinking styles and individual differences.