scispace - formally typeset
Search or ask a question
Book ChapterDOI

Topic Lifecycle on Social Networks: Analyzing the Effects of Semantic Continuity and Social Communities

TL;DR: In this article, a word embedding-based approach was used to cluster different hashtags together and the temporal concurrency of the hashtag usages, thus forming topics (a semantically and temporally related group of hashtags).
Abstract: Topic lifecycle analysis on Twitter, a branch of study that investigates Twitter topics from their birth through lifecycle to death, has gained immense mainstream research popularity. In the literature, topics are often treated as one of (a) hashtags (independent from other hashtags), (b) a burst of keywords in a short time span or (c) a latent concept space captured by advanced text analysis methodologies, such as Latent Dirichlet Allocation (LDA). The first two approaches are not capable of recognizing topics where different users use different hashtags to express the same concept (semantically related), while the third approach misses out the user’s explicit intent expressed via hashtags. In our work, we use a word embedding based approach to cluster different hashtags together, and the temporal concurrency of the hashtag usages, thus forming topics (a semantically and temporally related group of hashtags). We present a novel analysis of topic lifecycles with respect to communities. We characterize the participation of social communities in the topic clusters, and analyze the lifecycle of topic clusters with respect to such participation. We derive first-of-its-kind novel insights with respect to the complex evolution of topics over communities and time: temporal morphing of topics over hashtags within communities, how the hashtags die in some communities but morph into some other hashtags in some other communities (that, it is a community-level phenomenon), and how specific communities adopt to specific hashtags. Our work is fundamental in the space of topic lifecycle modeling and understanding in communities: it redefines our understanding of topic lifecycles and shows that the social boundaries of topic lifecycles are deeply ingrained with community behavior.
Citations
More filters
Book ChapterDOI
TL;DR: It is empirically show that homophily grows linearly with increase of familiarity, reaches a peak, and subsequently falls, indicating that, familiarity correlates with similarity up to a point, beyond which, similarity occurs for other reasons.
Abstract: We perform a first-of-its-kind characterization of topical homophily - familiarity co-occurring with topic-participation similarity of user pairs - by correlating topic participation similarity and degree of familiarity of users on Twitter. We quantify similarity between a user pair by measuring their distribution of participation in topics, wherein topics are defined as clusters of hashtags formed using semantically related user-generated content. We examine the topic participation similarity of users against different degrees of familiarity: edges, shared neighbors, and structural communities. We provide varying relaxation in identifying topics, and characterize the correlation of topical similarity with the degree of familiarity over the range of relaxation. We empirically substantiate the characteristics of topical homophily, over the varying relaxation of identified topics. We empirically show that homophily grows linearly with increase of familiarity, reaches a peak, and subsequently falls, indicating that, familiarity correlates with similarity up to a point, beyond which, similarity occurs for other reasons.

4 citations

References
More filters
Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations

Proceedings ArticleDOI
01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Abstract: Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

30,558 citations

Proceedings Article
03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

25,546 citations

Journal ArticleDOI
TL;DR: This work proposes a heuristic method that is shown to outperform all other known community detection methods in terms of computation time and the quality of the communities detected is very good, as measured by the so-called modularity.
Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks. .

13,519 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a simple method to extract the community structure of large networks based on modularity optimization, which is shown to outperform all other known community detection methods in terms of computation time.
Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection methods in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2 million customers and by analysing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad hoc modular networks.

11,078 citations