Topic Lifecycle on Social Networks: Analyzing the Effects of Semantic Continuity and Social Communities

doi:10.1007/978-3-319-76941-7_3

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Assessing Topical Homophily on Twitter

[...]

Kuntal Dey¹, Ritvik Shrivastava², Saroj Kaushik³, Kritika Garg•Institutions (3)

IBM¹, Columbia University², Indian Institute of Technology Delhi³

11 Dec 2018-Studies in computational intelligence

TL;DR: It is empirically show that homophily grows linearly with increase of familiarity, reaches a peak, and subsequently falls, indicating that, familiarity correlates with similarity up to a point, beyond which, similarity occurs for other reasons.

...read moreread less

Abstract: We perform a first-of-its-kind characterization of topical homophily - familiarity co-occurring with topic-participation similarity of user pairs - by correlating topic participation similarity and degree of familiarity of users on Twitter. We quantify similarity between a user pair by measuring their distribution of participation in topics, wherein topics are defined as clusters of hashtags formed using semantically related user-generated content. We examine the topic participation similarity of users against different degrees of familiarity: edges, shared neighbors, and structural communities. We provide varying relaxation in identifying topics, and characterize the correlation of topical similarity with the degree of familiarity over the range of relaxation. We empirically substantiate the characteristics of topical homophily, over the varying relaxation of identified topics. We empirically show that homophily grows linearly with increase of familiarity, reaches a peak, and subsequently falls, indicating that, familiarity correlates with similarity up to a point, beyond which, similarity occurs for other reasons.

...read moreread less

4 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

Latent dirichlet allocation

[...]

David M. Blei¹, Andrew Y. Ng², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Stanford University²

01 Mar 2003-Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

...read moreread less

30,570 citations

Proceedings Article•DOI•

Glove: Global Vectors for Word Representation

[...]

Jeffrey Pennington¹, Richard Socher², Christopher D. Manning¹•Institutions (2)

Stanford University¹, University of Colorado Boulder²

01 Oct 2014

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Abstract: Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

...read moreread less

30,558 citations

Proceedings Article•

Latent Dirichlet Allocation

[...]

David M. Blei¹, Andrew Y. Ng¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

03 Jan 2001

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

...read moreread less

25,546 citations

Journal Article•DOI•

Fast unfolding of communities in large networks

[...]

Vincent D. Blondel¹, Jean-Loup Guillaume², Jean-Loup Guillaume¹, Renaud Lambiotte³, Renaud Lambiotte¹, Etienne Lefebvre¹ - Show less +2 more•Institutions (3)

Université catholique de Louvain¹, Pierre-and-Marie-Curie University², Imperial College London³

04 Mar 2008-arXiv: Physics and Society

TL;DR: This work proposes a heuristic method that is shown to outperform all other known community detection methods in terms of computation time and the quality of the communities detected is very good, as measured by the so-called modularity.

...read moreread less

Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks. .

...read moreread less

13,519 citations

Journal Article•DOI•

Fast unfolding of communities in large networks

[...]

Vincent D. Blondel¹, Jean-Loup Guillaume², Jean-Loup Guillaume¹, Renaud Lambiotte¹, Renaud Lambiotte³, Etienne Lefebvre¹ - Show less +2 more•Institutions (3)

Université catholique de Louvain¹, Pierre-and-Marie-Curie University², Imperial College London³

01 Oct 2008-Journal of Statistical Mechanics: Theory and Experiment

TL;DR: In this paper, the authors proposed a simple method to extract the community structure of large networks based on modularity optimization, which is shown to outperform all other known community detection methods in terms of computation time.

...read moreread less

Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection methods in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2 million customers and by analysing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad hoc modular networks.

...read moreread less

11,078 citations

Topic Lifecycle on Social Networks: Analyzing the Effects of Semantic Continuity and Social Communities

Citations

References

Related Papers (5)