scispace - formally typeset
Search or ask a question
Book ChapterDOI

A socio-temporal hashtag recommendation system for twitter

TL;DR: This work modify the state-of-the-art existing natural language processing (NLP) technique and deeply ingrain socio-temporal techniques into the overall process, to model a novel hashtag recommendation system.
Abstract: The hashtag recommendation systems on Twitter have largely focused on analyzing the text content of tweets. In this work, we modify the state-of-the-art existing natural language processing (NLP) technique and deeply ingrain socio-temporal techniques into the overall process, to model a novel hashtag recommendation system. The social aspect of the system aims to make use of the hashtags generated by familiar individuals possess, as well as, the hashtags used by the individual at the past (profile). The temporal aspect aims to age the tweets, thereby ensuring that the more recent hashtags receive higher weights in the process of recommendation. The NLP technique is modified to offer an initial score based upon text embedding of hashtags, and a socio-temporal function and a burst function are applied to generate a final relevance score for hashtags towards a given tweet. The hashtags with top-K relevance scores are recommended to the user.
References
More filters
Proceedings ArticleDOI
26 Apr 2010
TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.
Abstract: Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing.We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks [28]. In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets. We have analyzed the tweets of top trending topics and reported on their temporal behavior and user participation. We have classified the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.To the best of our knowledge this work is the first quantitative study on the entire Twittersphere and information diffusion on it.

6,108 citations

Proceedings ArticleDOI
09 Feb 2011
TL;DR: This work develops the K-Spectral Centroid (K-SC) clustering algorithm that effectively finds cluster centroids with the authors' similarity measure and presents a simple model that reliably predicts the shape of attention by using information about only a small number of participants.
Abstract: Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored.We study temporal patterns associated with online content and how the content's popularity grows and fades over time. The attention that content receives on the Web varies depending on many factors and occurs on very different time scales and at different resolutions. In order to uncover the temporal dynamics of online content we formulate a time series clustering problem using a similarity metric that is invariant to scaling and shifting. We develop the K-Spectral Centroid (K-SC) clustering algorithm that effectively finds cluster centroids with our similarity measure. By applying an adaptive wavelet-based incremental approach to clustering, we scale K-SC to large data sets.We demonstrate our approach on two massive datasets: a set of 580 million Tweets, and a set of 170 million blog posts and news media articles. We find that K-SC outperforms the K-means clustering algorithm in finding distinct shapes of time series. Our analysis shows that there are six main temporal shapes of attention of online content. We also present a simple model that reliably predicts the shape of attention by using information about only a small number of participants. Our analyses offer insight into common temporal patterns of the content on theWeb and broaden the understanding of the dynamics of human attention.

1,041 citations

Proceedings ArticleDOI
06 Jun 2010
TL;DR: TwitterMonitor, a system that performs trend detection over the Twitter stream and provides meaningful analytics that synthesize an accurate description of each topic on Twitter in real time, is presented.
Abstract: We present TwitterMonitor, a system that performs trend detection over the Twitter stream The system identifies emerging topics (ie 'trends') on Twitter in real time and provides meaningful analytics that synthesize an accurate description of each topic Users interact with the system by ordering the identified trends using different criteria and submitting their own description for each trend We discuss the motivation for trend detection over social media streams and the challenges that lie therein We then describe our approach to trend detection, as well as the architecture of TwitterMonitor Finally, we lay out our demonstration scenario

942 citations

Proceedings ArticleDOI
12 Aug 2012
TL;DR: A model in which information can reach a node via the links of the social network or through the influence of external sources is presented and an efficient model parameter fitting technique is developed and applied to the emergence of URL mentions in the Twitter network.
Abstract: Social networks play a fundamental role in the diffusion of information. However, there are two different ways of how information reaches a person in a network. Information reaches us through connections in our social networks, as well as through the influence external out-of-network sources, like the mainstream media. While most present models of information adoption in networks assume information only passes from a node to node via the edges of the underlying network, the recent availability of massive online social media data allows us to study this process in more detail.We present a model in which information can reach a node via the links of the social network or through the influence of external sources. We then develop an efficient model parameter fitting technique and apply the model to the emergence of URL mentions in the Twitter network. Using a complete one month trace of Twitter we study how information reaches the nodes of the network. We quantify the external influences over time and describe how these influences affect the information adoption. We discover that the information tends to "jump" across the network, which can only be explained as an effect of an unobservable external influence on the network. We find that only about 71% of the information volume in Twitter can be attributed to network diffusion, and the remaining 29% is due to external events and factors outside the network.

515 citations

Proceedings ArticleDOI
01 Oct 2014
TL;DR: A convolutional neural network that learns feature representations for short textual posts using hashtags as a supervised signal that outperforms a number of baselines on a document recommendation task and is useful for other tasks as well.
Abstract: We describe a convolutional neural network that learns feature representations for short textual posts using hashtags as a supervised signal. The proposed approach is trained on up to 5.5 billion words predicting 100,000 possible hashtags. As well as strong performance on the hashtag prediction task itself, we show that its learned representation of text (ignoring the hashtag labels) is useful for other tasks as well. To that end, we present results on a document recommendation task, where it also outperforms a number of baselines.

199 citations