scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Understanding latent interactions in online social networks

TL;DR: A detailed measurement study on Renren, the largest OSN in China, is performed and it is found that latent interactions are much more prevalent and frequent than active events, are nonreciprocal in nature, and that profile popularity is correlated with page views of content rather than with quantity of content updates.
Abstract: Popular online social networks (OSNs) like Facebook and Twitter are changing the way users communicate and interact with the Internet. A deep understanding of user interactions in OSNs can provide important insights into questions of human social behavior and into the design of social platforms and applications. However, recent studies have shown that a majority of user interactions on OSNs are latent interactions, that is, passive actions, such as profile browsing, that cannot be observed by traditional measurement techniques.In this article, we seek a deeper understanding of both active and latent user interactions in OSNs. For quantifiable data on latent user interactions, we perform a detailed measurement study on Renren, the largest OSN in China with more than 220 million users to date. All friendship links in Renren are public, allowing us to exhaustively crawl a connected graph component of 42 million users and 1.66 billion social links in 2009. Renren also keeps detailed, publicly viewable visitor logs for each user profile. We capture detailed histories of profile visits over a period of 90 days for users in the Peking University Renren network and use statistics of profile visits to study issues of user profile popularity, reciprocity of profile visits, and the impact of content updates on user popularity. We find that latent interactions are much more prevalent and frequent than active events, are nonreciprocal in nature, and that profile popularity is correlated with page views of content rather than with quantity of content updates. Finally, we construct latent interaction graphs as models of user browsing behavior and compare their structural properties, evolution, community structure, and mixing times against those of both active interaction graphs and social graphs.
Citations
More filters
Posted Content
TL;DR: A strong effect of age on friendship preferences as well as a globally modular community structure driven by nationality are observed, but it is shown that while the Facebook graph as a whole is clearly sparse, the graph neighborhoods of users contain surprisingly dense structure.
Abstract: We study the structure of the social graph of active Facebook users, the largest social network ever analyzed. We compute numerous features of the graph including the number of users and friendships, the degree distribution, path lengths, clustering, and mixing patterns. Our results center around three main observations. First, we characterize the global structure of the graph, determining that the social network is nearly fully connected, with 99.91% of individuals belonging to a single large connected component, and we confirm the "six degrees of separation" phenomenon on a global scale. Second, by studying the average local clustering coefficient and degeneracy of graph neighborhoods, we show that while the Facebook graph as a whole is clearly sparse, the graph neighborhoods of users contain surprisingly dense structure. Third, we characterize the assortativity patterns present in the graph by studying the basic demographic and network properties of users. We observe clear degree assortativity and characterize the extent to which "your friends have more friends than you". Furthermore, we observe a strong effect of age on friendship preferences as well as a globally modular community structure driven by nationality, but we do not find any strong gender homophily. We compare our results with those from smaller social networks and find mostly, but not entirely, agreement on common structural network characteristics.

938 citations


Cites background from "Understanding latent interactions i..."

  • ...Two recent studies of Renren [15] and MSN messenger [14] included 42 million and 180 million individuals respectively....

    [...]

Proceedings ArticleDOI
01 Nov 2010
TL;DR: This paper presents an initial study to quantify and characterize spam campaigns launched using accounts on online social networks, and analyzes a large anonymized dataset of asynchronous "wall" messages between Facebook users to detect and characterize coordinated spam campaigns.
Abstract: Online social networks (OSNs) are popular collaboration and communication tools for millions of users and their friends. Unfortunately, in the wrong hands, they are also effective tools for executing spam campaigns and spreading malware. Intuitively, a user is more likely to respond to a message from a Facebook friend than from a stranger, thus making social spam a more effective distribution mechanism than traditional email. In fact, existing evidence shows malicious entities are already attempting to compromise OSN account credentials to support these "high-return" spam campaigns. In this paper, we present an initial study to quantify and characterize spam campaigns launched using accounts on online social networks. We study a large anonymized dataset of asynchronous "wall" messages between Facebook users. We analyze all wall messages received by roughly 3.5 million Facebook users (more than 187 million messages in all), and use a set of automated techniques to detect and characterize coordinated spam campaigns. Our system detected roughly 200,000 malicious wall posts with embed- ded URLs, originating from more than 57,000 user accounts. We find that more than 70% of all malicious wall posts advertise phishing sites. We also study the characteristics of malicious accounts, and see that more than 97% are compromised accounts, rather than "fake" accounts created solely for the purpose of spamming. Finally, we observe that, when adjusted to the local time of the sender, spamming dominates actual wall post activity in the early morning hours, when normal users are asleep.

436 citations

Posted Content
TL;DR: These efforts to detect, characterize, and understand Sybil account activity in the Renren Online Social Network (OSN) are described and it is shown that Sybils can effectively avoid existing community-based Sybil detectors.
Abstract: Sybil accounts are fake identities created to unfairly increase the power or resources of a single malicious user. Researchers have long known about the existence of Sybil accounts in online communities such as file-sharing systems, but have not been able to perform large scale measurements to detect them or measure their activities. In this paper, we describe our efforts to detect, characterize and understand Sybil account activity in the Renren online social network (OSN). We use ground truth provided by Renren Inc. to build measurement based Sybil account detectors, and deploy them on Renren to detect over 100,000 Sybil accounts. We study these Sybil accounts, as well as an additional 560,000 Sybil accounts caught by Renren, and analyze their link creation behavior. Most interestingly, we find that contrary to prior conjecture, Sybil accounts in OSNs do not form tight-knit communities. Instead, they integrate into the social graph just like normal users. Using link creation timestamps, we verify that the large majority of links between Sybil accounts are created accidentally, unbeknownst to the attacker. Overall, only a very small portion of Sybil accounts are connected to other Sybils with social links. Our study shows that existing Sybil defenses are unlikely to succeed in today's OSNs, and we must design new techniques to effectively detect and defend against Sybil attacks.

354 citations


Cites background or methods from "Understanding latent interactions i..."

  • ...In Figure 5, the degree distribution for the entire Renren population (the Actual Degree Distribution line) is not biased: the data comes from a complete snapshot of Renren gathered in 2008 [Jiang et al. 2010]....

    [...]

  • ...Unlike Facebook, Renren pro.les display a list of the nine most recent visitors [Jiang et al. 2010]....

    [...]

  • ...We use the PKU regional network on Renren as our target social graph, since its size is reasonable (170,000 nodes) and its properties have been studied by prior work [Jiang et al. 2010]....

    [...]

  • ...Figure 5 shows the degree distribution of all 667,723 Sybil accounts compared to the degree distribution of a complete, 42 million node snapshot of the Renren social graph collected in 2008 [Jiang et al. 2010]....

    [...]

Journal ArticleDOI
TL;DR: This paper defines three types of Sybil attacks: SA-1, SA-2, and SA-3 according to the Sybil attacker's capabilities, and presents some Sybil defense schemes, including social graph-based Sybil detection (SGSD), behavior classification-basedSybil Detection (BCSD), and mobile Sybil Detection with the comprehensive comparisons.
Abstract: The emerging Internet-of-Things (IoT) are vulnerable to Sybil attacks where attackers can manipulate fake identities or abuse pseudoidentities to compromise the effectiveness of the IoT and even disseminate spam. In this paper, we survey Sybil attacks and defense schemes in IoT. Specifically, we first define three types Sybil attacks: SA-1, SA-2, and SA-3 according to the Sybil attacker’s capabilities. We then present some Sybil defense schemes, including social graph-based Sybil detection (SGSD), behavior classification-based Sybil detection (BCSD), and mobile Sybil detection with the comprehensive comparisons. Finally, we discuss the challenging research issues and future directions for Sybil defense in IoT.

308 citations


Cites background from "Understanding latent interactions i..."

  • ...A tendency is to explore trustworthiness to facilitate the Sybil detection to SA-1....

    [...]

Proceedings Article
14 Aug 2013
TL;DR: A detection approach that groups "similar" user clickstreams into behavioral clusters, by partitioning a similarity graph that captures distances between clickstream sequences, and shows that it provides very high detection accuracy on clickstream traces.
Abstract: Fake identities and Sybil accounts are pervasive in today's online communities. They are responsible for a growing number of threats, including fake product reviews, malware and spam on social networks, and astroturf political campaigns. Unfortunately, studies show that existing tools such as CAPTCHAs and graph-based Sybil detectors have not proven to be effective defenses. In this paper, we describe our work on building a practical system for detecting fake identities using server-side clickstream models. We develop a detection approach that groups "similar" user clickstreams into behavioral clusters, by partitioning a similarity graph that captures distances between clickstream sequences. We validate our clickstream models using ground-truth traces of 16,000 real and Sybil users from Renren, a large Chinese social network with 220M users. We propose a practical detection system based on these models, and show that it provides very high detection accuracy on our clickstream traces. Finally, we worked with collaborators at Renren and LinkedIn to test our prototype on their server-side data. Following positive results, both companies have expressed strong interest in further experimentation and possible internal deployment.

292 citations


Cites background from "Understanding latent interactions i..."

  • ...The above analysis shows that Sybil sessions have very different characteristics compared to normal user sessions....

    [...]

References
More filters
Journal ArticleDOI
15 Oct 1999-Science
TL;DR: A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.
Abstract: Systems as diverse as genetic networks or the World Wide Web are best described as networks with complex topology. A common property of many large networks is that the vertex connectivities follow a scale-free power-law distribution. This feature was found to be a consequence of two generic mechanisms: (i) networks expand continuously by the addition of new vertices, and (ii) new vertices attach preferentially to sites that are already well connected. A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.

33,771 citations


"Understanding latent interactions i..." refers background in this paper

  • ...Figure 5 plots the complementary cumulative distribution function (CCDF) of user social degrees in the Renren network, and shows that Renren’s network structure roughly follows a powerlaw [2]....

    [...]

Journal ArticleDOI
TL;DR: A thorough exposition of community structure, or clustering, is attempted, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists.
Abstract: The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such clusters, or communities, can be considered as fairly independent compartments of a graph, playing a similar role like, e. g., the tissues or the organs in the human body. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. This problem is very hard and not yet satisfactorily solved, despite the huge effort of a large interdisciplinary community of scientists working on it over the past few years. We will attempt a thorough exposition of the topic, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks.

9,057 citations


"Understanding latent interactions i..." refers background in this paper

  • ...Community detection is a well-studied area, and there are many different algorithms for locating communities [Fortunato 2010]....

    [...]

  • ...Community detection is a well-studied area, and there are many different algorithms for locating communities [Fortunato 2010]....

    [...]

Journal ArticleDOI
TL;DR: This work proposes a principled statistical framework for discerning and quantifying power-law behavior in empirical data by combining maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov (KS) statistic and likelihood ratios.
Abstract: Power-law distributions occur in many situations of scientific interest and have significant consequences for our understanding of natural and man-made phenomena. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov-Smirnov (KS) statistic and likelihood ratios. We evaluate the effectiveness of the approach with tests on synthetic data and give critical comparisons to previous approaches. We also apply the proposed methods to twenty-four real-world data sets from a range of different disciplines, each of which has been conjectured to follow a power-law distribution. In some cases we find these conjectures to be consistent with the data, while in others the power law is ruled out.

8,753 citations

Journal ArticleDOI
TL;DR: A thorough exposition of the main elements of the clustering problem can be found in this paper, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks.

8,432 citations

Proceedings ArticleDOI
26 Apr 2010
TL;DR: In this paper, the authors have crawled the entire Twittersphere and found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks.
Abstract: Twitter, a microblogging service less than three years old, commands more than 41 million users as of July 2009 and is growing fast. Twitter users tweet about any topic within the 140-character limit and follow others to receive their tweets. The goal of this paper is to study the topological characteristics of Twitter and its power as a new medium of information sharing.We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4,262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks [28]. In order to identify influentials on Twitter, we have ranked users by the number of followers and by PageRank and found two rankings to be similar. Ranking by retweets differs from the previous two rankings, indicating a gap in influence inferred from the number of followers and that from the popularity of one's tweets. We have analyzed the tweets of top trending topics and reported on their temporal behavior and user participation. We have classified the trending topics based on the active period and the tweets and show that the majority (over 85%) of topics are headline news or persistent news in nature. A closer look at retweets reveals that any retweeted tweet is to reach an average of 1,000 users no matter what the number of followers is of the original tweet. Once retweeted, a tweet gets retweeted almost instantly on next hops, signifying fast diffusion of information after the 1st retweet.To the best of our knowledge this work is the first quantitative study on the entire Twittersphere and information diffusion on it.

6,108 citations