scispace - formally typeset
Search or ask a question
Book ChapterDOI

Modeling Topical Information Diffusion over Microblog Networks

TL;DR: This work designs a predictive topical spreading activation model (TopSPA) that utilizes the social connection structures of users, along with their topic affinities, to model the information flow and empirically validate the model on multiple social event datasets on Twitter.
Abstract: Traditional information spread and activation models on social networks, fail to take user interests towards specific content (topics) into account. To this, we propose a predictive topical spreading activation model (TopSPA). Following cues from the well-known spreading activation (SPA) model, we design the TopSPA algorithm to include the affinity of users to given topics. TopSPA utilizes the social connection structures of users, along with their topic affinities, to model the information flow. We use topic-based skew in energy seeding and energy propagation resistance in the network to form our overall information diffusion model. We empirically validate our model on multiple social event datasets on Twitter, predicting information diffusion over the social graph with a high accuracy.
Citations
More filters
References
More filters
Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations

Proceedings Article
03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

25,546 citations

Proceedings ArticleDOI
26 Apr 2010
TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.
Abstract: Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing.We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks [28]. In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets. We have analyzed the tweets of top trending topics and reported on their temporal behavior and user participation. We have classified the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.To the best of our knowledge this work is the first quantitative study on the entire Twittersphere and information diffusion on it.

6,108 citations

Proceedings Article
16 May 2010
TL;DR: An in-depth comparison of three measures of influence, using a large amount of data collected from Twitter, is presented, suggesting that topological measures such as indegree alone reveals very little about the influence of a user.
Abstract: Directed links in social media could represent anything from intimate friendships to common interests, or even a passion for breaking news or celebrity gossip. Such directed links determine the flow of information and hence indicate a user's influence on others — a concept that is crucial in sociology and viral marketing. In this paper, using a large amount of data collected from Twitter, we present an in-depth comparison of three measures of influence: indegree, retweets, and mentions. Based on these measures, we investigate the dynamics of user influence across topics and time. We make several interesting observations. First, popular users who have high indegree are not necessarily influential in terms of spawning retweets or mentions. Second, most influential users can hold significant influence over a variety of topics. Third, influence is not gained spontaneously or accidentally, but through concerted effort such as limiting tweets to a single topic. We believe that these findings provide new insights for viral marketing and suggest that topological measures such as indegree alone reveals very little about the influence of a user.

3,041 citations

Proceedings ArticleDOI
09 Feb 2011
TL;DR: It is concluded that word-of-mouth diffusion can only be harnessed reliably by targeting large numbers of potential influencers, thereby capturing average effects and that predictions of which particular user or URL will generate large cascades are relatively unreliable.
Abstract: In this paper we investigate the attributes and relative influence of 1.6M Twitter users by tracking 74 million diffusion events that took place on the Twitter follower graph over a two month interval in 2009. Unsurprisingly, we find that the largest cascades tend to be generated by users who have been influential in the past and who have a large number of followers. We also find that URLs that were rated more interesting and/or elicited more positive feelings by workers on Mechanical Turk were more likely to spread. In spite of these intuitive results, however, we find that predictions of which particular user or URL will generate large cascades are relatively unreliable. We conclude, therefore, that word-of-mouth diffusion can only be harnessed reliably by targeting large numbers of potential influencers, thereby capturing average effects. Finally, we consider a family of hypothetical marketing strategies, defined by the relative cost of identifying versus compensating potential "influencers." We find that although under some circumstances, the most influential users are also the most cost-effective, under a wide range of plausible assumptions the most cost-effective performance can be realized using "ordinary influencers"---individuals who exert average or even less-than-average influence.

1,834 citations