scispace - formally typeset
Open AccessProceedings Article

A Latent Variable Model for Geographic Lexical Variation

TLDR
A multi-level generative model that reasons jointly about latent topics and geographical regions is presented, which recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency.
Abstract
The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as "sports" or "entertainment" are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author's geographic location from raw text, outperforming both text regression and supervised topic models.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach

TL;DR: This represents the largest study, by an order of magnitude, of language and personality, and found striking variations in language with personality, gender, and age.
Proceedings Article

You Are What You Tweet: Analyzing Twitter for Public Health

TL;DR: This work applies the recently introduced Ailment Topic Aspect Model to over one and a half million health related tweets and discovers mentions of over a dozen ailments, including allergies, obesity and insomnia, suggesting that Twitter has broad applicability for public health research.
Journal ArticleDOI

stm: An R Package for Structural Topic Models

TL;DR: This paper demonstrates how to use the R package stm for structural topic modeling, which allows researchers to flexibly estimate a topic model that includes document-level metadata.
Posted Content

Simplifying Graph Convolutional Networks

TL;DR: In this paper, the authors reduce the complexity of GCN by successively removing nonlinearities and collapsing weight matrices between consecutive layers, which corresponds to a fixed low-pass filter followed by a linear classifier.
Proceedings ArticleDOI

Open domain event extraction from twitter

TL;DR: TwiCal is described-- the first open-domain event-extraction and categorization system for Twitter, and a novel approach for discovering important event categories and classifying extracted events based on latent variable models is presented.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Book

Pattern Recognition and Machine Learning

TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Journal ArticleDOI

Pattern Recognition and Machine Learning

Radford M. Neal
- 01 Aug 2007 - 
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Journal ArticleDOI

Regularization Paths for Generalized Linear Models via Coordinate Descent

TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Related Papers (5)