A Latent Variable Model for Geographic Lexical Variation

Open AccessProceedings Article

A Latent Variable Model for Geographic Lexical Variation

- pp 1277-1287

TLDR

A multi-level generative model that reasons jointly about latent topics and geographical regions is presented, which recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency.

Abstract:

The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as "sports" or "entertainment" are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author's geographic location from raw text, outperforming both text regression and supervised topic models.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach

H. Andrew Schwartz, +10 more

- 25 Sep 2013 -

PLOS ONE

TL;DR: This represents the largest study, by an order of magnitude, of language and personality, and found striking variations in language with personality, gender, and age.

...read moreread less

Proceedings Article

You Are What You Tweet: Analyzing Twitter for Public Health

Michael J. Paul, +1 more

TL;DR: This work applies the recently introduced Ailment Topic Aspect Model to over one and a half million health related tweets and discovers mentions of over a dozen ailments, including allergies, obesity and insomnia, suggesting that Twitter has broad applicability for public health research.

...read moreread less

Journal ArticleDOI

stm: An R Package for Structural Topic Models

Margaret E. Roberts, +3 more

- 31 Oct 2019 -

Journal of Statistical Software

TL;DR: This paper demonstrates how to use the R package stm for structural topic modeling, which allows researchers to flexibly estimate a topic model that includes document-level metadata.

...read moreread less

Posted Content

Simplifying Graph Convolutional Networks

Felix Wu, +5 more

- 19 Feb 2019 -

arXiv: Learning

TL;DR: In this paper, the authors reduce the complexity of GCN by successively removing nonlinearities and collapsing weight matrices between consecutive layers, which corresponds to a fixed low-pass filter followed by a linear classifier.

...read moreread less

Proceedings ArticleDOI

Open domain event extraction from twitter

Alan Ritter, +2 more

TL;DR: TwiCal is described-- the first open-domain event-extraction and categorization system for Twitter, and a novel approach for discovering important event categories and classifying extracted events based on latent variable models is presented.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Book

Pattern Recognition and Machine Learning

Christopher M. Bishop

TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.

...read moreread less

Journal ArticleDOI

Pattern Recognition and Machine Learning

Radford M. Neal

- 01 Aug 2007 -

Technometrics

TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.

...read moreread less

Journal ArticleDOI

Regularization Paths for Generalized Linear Models via Coordinate Descent

Jerome H. Friedman, +2 more

- 02 Feb 2010 -

Journal of Statistical Software

TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.

...read moreread less

Collapse

Related Papers (5)

You are where you tweet: a content-based approach to geo-locating twitter users

Zhiyuan Cheng, +2 more

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

A Latent Variable Model for Geographic Lexical Variation

Citations

Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach

You Are What You Tweet: Analyzing Twitter for Public Health

stm: An R Package for Structural Topic Models

Simplifying Graph Convolutional Networks

Open domain event extraction from twitter

References

Latent dirichlet allocation

Latent Dirichlet Allocation

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

Regularization Paths for Generalized Linear Models via Coordinate Descent

Related Papers (5)

You are where you tweet: a content-based approach to geo-locating twitter users

Latent dirichlet allocation

Discovering geographical topics in the twitter stream

Earthquake shakes Twitter users: real-time event detection by social sensors

Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles