scispace - formally typeset
Search or ask a question

Showing papers by "Jacob Eisenstein published in 2010"


Proceedings Article
09 Oct 2010
TL;DR: A multi-level generative model that reasons jointly about latent topics and geographical regions is presented, which recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency.
Abstract: The rapid growth of geotagged social media raises new computational possibilities for investigating geographic linguistic variation. In this paper, we present a multi-level generative model that reasons jointly about latent topics and geographical regions. High-level topics such as "sports" or "entertainment" are rendered differently in each geographic region, revealing topic-specific regional distinctions. Applied to a new dataset of geotagged microblogs, our model recovers coherent topics and their regional variants, while identifying geographic areas of linguistic consistency. The model also enables prediction of an author's geographic location from raw text, outperforming both text regression and supervised topic models.

691 citations


Proceedings Article
06 Jun 2010
TL;DR: The long term goal is to develop joint sociolinguistic models that explain the social basis of linguistic variation by combining large linguistic corpora with explicit representations of social network structures.
Abstract: Language use is overlaid on a network of social connections, which exerts an influence on both the topics of discussion and the ways that these topics can be expressed (Halliday, 1978). In the past, efforts to understand this relationship were stymied by a lack of data, but social media offers exciting new opportunities. By combining large linguistic corpora with explicit representations of social network structures, social media provides a new window into the interaction between language and society. Our long term goal is to develop joint sociolinguistic models that explain the social basis of linguistic variation.

33 citations


01 Jan 2010
TL;DR: A Bayesian generative model of how demographic social factors influence lexical choice is proposed for a corpus of geo-tagged Twitter messages originating from mobile phones, cross-referenced against U.S. Census demographic data.
Abstract: We propose a Bayesian generative model of how demographic social factors influence lexical choice. We apply the method to a corpus of geo-tagged Twitter messages originating from mobile phones, cross-referenced against U.S. Census demographic data. Our method discovers communities jointly defined by linguistic and demographic properties.

18 citations