scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Classification of information in a twitter stream

TL;DR: A mechanism that can be used to categorize the information flowing across the Twitter streams is proposed and used on a real dataset and the results are reported.
Abstract: Social media has become an important mode of transfer of information across the individuals. Twitter is one such social media platform where individuals and corporations share information with public. Tweeting patterns can give important information about the context. In this paper we have proposed a mechanism that can be used to categorize the information flowing across the twitter streams. We have used the proposed mechanism on a real dataset and reported our results. We also look at the velocity of retweets and its growth and decay during the life of a tweet.
References
More filters
Proceedings ArticleDOI
26 Apr 2010
TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.
Abstract: Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing.We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks [28]. In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets. We have analyzed the tweets of top trending topics and reported on their temporal behavior and user participation. We have classified the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.To the best of our knowledge this work is the first quantitative study on the entire Twittersphere and information diffusion on it.

6,108 citations

Posted Content
TL;DR: E elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision are demonstrated.
Abstract: Commonly used evaluation measures including Recall, Precision, F-Measure and Rand Accuracy are biased and should not be used without clear understanding of the biases, and corresponding identification of chance or base case levels of the statistic. Using these measures a system that performs worse in the objective sense of Informedness, can appear to perform better under any of these commonly used measures. We discuss several concepts and measures that reflect the probability that prediction is informed versus chance. Informedness and introduce Markedness as a dual measure for the probability that prediction is marked versus chance. Finally we demonstrate elegant connections between the concepts of Informedness, Markedness, Correlation and Significance as well as their intuitive relationships with Recall and Precision, and outline the extension from the dichotomous case to the general multi-class case.

5,092 citations

Proceedings ArticleDOI
12 Aug 2007
TL;DR: It is found that people use microblogging to talk about their daily activities and to seek or share information and the user intentions associated at a community level are analyzed to show how users with similar intentions connect with each other.
Abstract: Microblogging is a new form of communication in which users can describe their current status in short posts distributed by instant messages, mobile phones, email or the Web. Twitter, a popular microblogging tool has seen a lot of growth since it launched in October, 2006. In this paper, we present our observations of the microblogging phenomena by studying the topological and geographical properties of Twitter's social network. We find that people use microblogging to talk about their daily activities and to seek or share information. Finally, we analyze the user intentions associated at a community level and show how users with similar intentions connect with each other.

3,025 citations


"Classification of information in a ..." refers background in this paper

  • ...[4] Categorizes different types of users on twitter in terms of information source, friends, information seekers etc....

    [...]

Journal ArticleDOI
TL;DR: The derivation and use of a measure of similarity between two hierarchical clusterings, Bk, is derived from the matching matrix, [mij], formed by cutting the two hierarchical trees and counting the number of matching entries in the k clusters in each tree.
Abstract: This article concerns the derivation and use of a measure of similarity between two hierarchical clusterings. The measure, Bk , is derived from the matching matrix, [mij ], formed by cutting the two hierarchical trees and counting the number of matching entries in the k clusters in each tree. The mean and variance of Bk are determined under the assumption that the margins of [mij ] are fixed. Thus, Bk represents a collection of measures for k = 2, …, n – 1. (k, Bk ) plots are found to be useful in portraying the similarity of two clusterings. Bk is compared to other measures of similarity proposed respectively by Baker (1974) and Rand (1971). The use of (k, Bk ) plots for studying clustering methods is explored by a series of Monte Carlo sampling experiments. An example of the use of (k, Bk ) on real data is given.

1,376 citations


"Classification of information in a ..." refers background in this paper

  • ...834 which is close to 1 that means these clusters are closer to benchmark clusters[8]....

    [...]

Proceedings ArticleDOI
20 Aug 2010
TL;DR: It is found that, amongst content features, URLs and hashtags have strong relationships with retweetability and the number of followers and followees as well as the age of the account seem to affect retweetability, while, interestingly, thenumber of past tweets does not predict retweetability of a user's tweet.
Abstract: Retweeting is the key mechanism for information diffusion in Twitter. It emerged as a simple yet powerful way of disseminating information in the Twitter social network. Even though a lot of information is shared in Twitter, little is known yet about how and why certain information spreads more widely than others. In this paper, we examine a number of features that might affect retweetability of tweets. We gathered content and contextual features from 74M tweets and used this data set to identify factors that are significantly associated with retweet rate. We also built a predictive retweet model. We found that, amongst content features, URLs and hashtags have strong relationships with retweetability. Amongst contextual features, the number of followers and followees as well as the age of the account seem to affect retweetability, while, interestingly, the number of past tweets does not predict retweetability of a user's tweet. We believe that this research would inform the design of sensemaking and analytics tools for social media streams.

1,192 citations


"Classification of information in a ..." refers background in this paper

  • ...al.[2] conclude that there are specific factors that impact whether a particular tweet would be retweeted or not....

    [...]