scispace - formally typeset
Search or ask a question
Book ChapterDOI

A Semantic Continuity Based Analysis of Topic Lifecycle on Social Networks

TL;DR: This paper jointly utilizes the temporal concurrency of the hashtags contained in given tweets and the latent concept space addressed by the tweet content, to identify groups of hashtags representing concept space—a “topic”—addressed by many tweets.
Abstract: Analyzing the lifecycle of topics, that are present in user-generated text content, has emerged as a mainstream topic of social network research. The literature presently identifies topics on Twitter, a prominent online social network, as either individual hashtags, or a burst of keywords within a short span of time, or as latent concept spaces obtained from sophisticated text analysis mechanisms, such as Latent Dirichlet Allocation (LDA). The first and second approaches fail to recognize that topics do not restrict themselves to individual hashtags and are likely to span across (semantically related) keywords, while the third does not capture the user’s intended topics expressed via hashtags. In the current paper, we propose a novel methodology that addresses these shortcomings. We jointly utilize the temporal concurrency of the hashtags contained in given tweets and the latent concept space addressed by the tweet content, to identify groups of hashtags representing concept space—a “topic”—addressed by many tweets. A given topic, thus, is represented by a different set of representative hashtags at different times; the usage rate of the different hashtags change such that some hashtags gain prominence over others over time. Unlike the literature, where lifecycle analysis of one topic typically comprises of analyzing one hashtag, we analyze and characterize the lifecycle of a topic as a combination of multiple semantically and temporally related hashtags. We derive novel insights about lifecyle of topics: the inception and continuity of the topics over time (expressed over different hashtags), and how topics morph over hashtags, from one set of hashtags to another, before eventually dying down.
Citations
More filters
Journal ArticleDOI
TL;DR: This work jointly model the hashtags present and the semantic concepts embedded in the content, which in turn helps to identify hashtag groups that define a “topic”—a concept space—that are used by a large number of tweets.
Abstract: Topic lifecycle analysis on social networks aims to analyze and track how topics are born from user-generated content, and how they evolve. Twitter researchers have no agreed-upon definition of topics; topics on Twitter are typically derived in the form of (a) frequently used hashtags, or (b) keywords showing sudden trends of large occurrence in a short span of time (“bursty keywords”), or (c) concepts latent within the tweets that are grouped using variations of semantic clustering techniques. In the current paper, we jointly model the hashtags present and the semantic concepts embedded in the content, which in turn helps us identify hashtag groups that define a “topic”—a concept space—that are used by a large number of tweets. We observe that different hashtags belonging to a given cluster are more prominent compared to the others, at different times. We further observe that the participation and influence levels of the different users play important roles in determining which hashtag would be more prominent than the others at given times. We thus observe topics to often morph from one to the other (via morphing of dominant hashtags representing the same semantic concept space), rather than becoming extinct outright, which is a novel insight about topic lifecycles. We further present novel observations about the role of users in determining the lifecycle of discussion topics on Twitter. We infer that topic lifecycles are governed by user interests, and not by user influence, which is a key observation made by our work.

Cites background or result from "A Semantic Continuity Based Analysi..."

  • ...This approach is different from our earlier work [4], where the artifact used to perform hashtag clustering was Latent Dirichlet allocation (LDA) [5]....

    [...]

  • ...We presented a preliminary version of our findings in an initial report [4]; however, the definition of topics used LDA and not the concept of word-embedding, as well as, the influence of users in the lifecycle of topics, has not been studied in the literature....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations


"A Semantic Continuity Based Analysi..." refers background in this paper

  • ...And in the third approach, the latent semantic concepts of given tweets—often identified with sophisticated text-to-topic assignment techniques such as Latent Diriclet Allocation (LDA) [3]—are treated as topics, and the tweets that address these spaces are said to belong to these topics....

    [...]

  • ...Note that, LDA [3] is traditionally modeled as a joint distribution as follows:...

    [...]

Proceedings Article
03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

25,546 citations

Journal Article
TL;DR: An interval-based temporal logic is introduced, together with a computationally effective reasoning algorithm based on constraint propagation, which is notable in offering a delicate balance between space and time.
Abstract: An interval-based temporal logic is introduced, together with a computationally effective reasoning algorithm based on constraint propagation. This system is notable in offering a delicate balance between

7,466 citations


"A Semantic Continuity Based Analysi..." refers background in this paper

  • ...Allen [1] created an exhaustive list of temporal relationships that can exist between a pair of time periods....

    [...]

Journal ArticleDOI
TL;DR: In this paper, an interval-based temporal logic is introduced, together with a computationally effective reasoning algorithm based on constraint propagation, which is notable in offering a delicate balance between time and space.
Abstract: An interval-based temporal logic is introduced, together with a computationally effective reasoning algorithm based on constraint propagation. This system is notable in offering a delicate balance between

7,362 citations

Proceedings ArticleDOI
26 Apr 2010
TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.
Abstract: Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing.We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks [28]. In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets. We have analyzed the tweets of top trending topics and reported on their temporal behavior and user participation. We have classified the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.To the best of our knowledge this work is the first quantitative study on the entire Twittersphere and information diffusion on it.

6,108 citations