scispace - formally typeset
Search or ask a question

Showing papers by "Facebook published in 2011"


Posted Content
TL;DR: A strong effect of age on friendship preferences as well as a globally modular community structure driven by nationality are observed, but it is shown that while the Facebook graph as a whole is clearly sparse, the graph neighborhoods of users contain surprisingly dense structure.
Abstract: We study the structure of the social graph of active Facebook users, the largest social network ever analyzed. We compute numerous features of the graph including the number of users and friendships, the degree distribution, path lengths, clustering, and mixing patterns. Our results center around three main observations. First, we characterize the global structure of the graph, determining that the social network is nearly fully connected, with 99.91% of individuals belonging to a single large connected component, and we confirm the "six degrees of separation" phenomenon on a global scale. Second, by studying the average local clustering coefficient and degeneracy of graph neighborhoods, we show that while the Facebook graph as a whole is clearly sparse, the graph neighborhoods of users contain surprisingly dense structure. Third, we characterize the assortativity patterns present in the graph by studying the basic demographic and network properties of users. We observe clear degree assortativity and characterize the extent to which "your friends have more friends than you". Furthermore, we observe a strong effect of age on friendship preferences as well as a globally modular community structure driven by nationality, but we do not find any strong gender homophily. We compare our results with those from smaller social networks and find mostly, but not entirely, agreement on common structural network characteristics.

938 citations


Proceedings ArticleDOI
09 Feb 2011
TL;DR: In this article, a supervised random walk algorithm is proposed to predict the occurrence of links in Facebook social networks by combining the information from the network structure with node and edge level attributes to guide a random walk on the graph.
Abstract: Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open.We develop an algorithm based on Supervised Random Walks that naturally combines the information from the network structure with node and edge level attributes. We achieve this by using these attributes to guide a random walk on the graph. We formulate a supervised learning task where the goal is to learn a function that assigns strengths to edges in the network such that a random walker is more likely to visit the nodes to which new links will be created in the future. We develop an efficient training algorithm to directly learn the edge strength estimation function.Our experiments on the Facebook social graph and large collaboration networks show that our approach outperforms state-of-the-art unsupervised approaches as well as approaches that are based on feature extraction.

929 citations


Proceedings ArticleDOI
07 May 2011
TL;DR: Longitudinal surveys matched to server logs from 415 Facebook users reveal that receiving messages from friends is associated with increases in bridging social capital, but that other uses are not, and using the site to passively consume news assists those with lower social fluency draw value from their connections.
Abstract: Though social network site use is often treated as a monolithic activity, in which all time is equally social and its impact the same for all users, we examine how Facebook affects social capital depending upon: (1) types of site activities, contrasting one-on-one communication, broadcasts to wider audiences, and passive consumption of social news, and (2) individual differences among users, including social communication skill and self-esteem. Longitudinal surveys matched to server logs from 415 Facebook users reveal that receiving messages from friends is associated with increases in bridging social capital, but that other uses are not. However, using the site to passively consume news assists those with lower social fluency draw value from their connections. The results inform site designers seeking to increase social connectedness and the value of those connections.

659 citations


Patent
27 May 2011
TL;DR: In this paper, a method for tagging digital media is described, which includes selecting a digital media and selecting region within the digital media, associating a person or entity with the selected region and sending a notification of the association.
Abstract: A method for tagging digital media is described. The method includes selecting a digital media and selecting region within the digital media. The method may further include associating a person or entity with the selected region and sending a notification of the association the person or entity or a different person or entity. The method may further include sending advertising with the notification.

506 citations


Journal ArticleDOI
TL;DR: In this article, the authors describe a family of attacks such that even from a single anonymized copy of a social network, it is possible for an adversary to learn whether edges exist or not between specific targeted pairs of nodes.
Abstract: In a social network, nodes correspond topeople or other social entities, and edges correspond to social links between them. In an effort to preserve privacy, the practice of anonymization replaces names with meaningless unique identifiers. We describe a family of attacks such that even from a single anonymized copy of a social network, it is possible for an adversary to learn whether edges exist or not between specific targeted pairs of nodes.

427 citations


Proceedings ArticleDOI
12 Jun 2011
TL;DR: The reasons why Facebook chose Hadoop and HBase over other systems such as Apache Cassandra and Voldemort are described and the application's requirements for consistency, availability, partition tolerance, data model and scalability are discussed.
Abstract: Facebook recently deployed Facebook Messages, its first ever user-facing application built on the Apache Hadoop platform. Apache HBase is a database-like layer built on Hadoop designed to support billions of messages per day. This paper describes the reasons why Facebook chose Hadoop and HBase over other systems such as Apache Cassandra and Voldemort and discusses the application's requirements for consistency, availability, partition tolerance, data model and scalability. We explore the enhancements made to Hadoop to make it a more effective realtime system, the tradeoffs we made while configuring the system, and how this solution has significant advantages over the sharded MySQL database scheme used in other applications at Facebook and many other web-scale companies. We discuss the motivations behind our design choices, the challenges that we face in day-to-day operations, and future capabilities and improvements still under development. We offer these observations on the deployment as a model for other companies who are contemplating a Hadoop-based solution over traditional sharded RDBMS deployments.

410 citations


Patent
08 Feb 2011
TL;DR: In this paper, a method is described for tracking information about the activities of users of a social networking system while on another domain, including maintaining a profile for each of one or more users of the social network system.
Abstract: In one embodiment, a method is described for tracking information about the activities of users of a social networking system while on another domain. The method includes maintaining a profile for each of one or more users of the social networking system, each profile identifying a connection to one or more other users of the social networking system and including information about the user. The method additionally includes receiving one or more communications from a third-party website having a different domain than the social network system, each message communicating an action taken by a user of the social networking system on the third-party website. The method additionally includes logging the actions taken on the third-party website in the social networking system, each logged action including information about the action. The method further includes correlating the logged actions with one or more advertisements presented to the one or more users on the third-party website as well as correlating the logged actions with a user of the social networking system.

375 citations


Proceedings ArticleDOI
11 Apr 2011
TL;DR: This paper presents a big data placement structure called RCFile (Record Columnar File) and its implementation in the Hadoop system and shows the effectiveness of RCFile in satisfying the four requirements.
Abstract: MapReduce-based data warehouse systems are playing important roles of supporting big data analytics to understand quickly the dynamics of user behavior trends and their needs in typical Web service providers and social network sites (e.g., Facebook). In such a system, the data placement structure is a critical factor that can affect the warehouse performance in a fundamental way. Based on our observations and analysis of Facebook production systems, we have characterized four requirements for the data placement structure: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) strong adaptivity to highly dynamic workload patterns. We have examined three commonly accepted data placement structures in conventional databases, namely row-stores, column-stores, and hybrid-stores in the context of large data analysis using MapReduce. We show that they are not very suitable for big data processing in distributed systems. In this paper, we present a big data placement structure called RCFile (Record Columnar File) and its implementation in the Hadoop system. With intensive experiments, we show the effectiveness of RCFile in satisfying the four requirements. RCFile has been chosen in Facebook data warehouse system as the default option. It has also been adopted by Hive and Pig, the two most widely used data analysis systems developed in Facebook and Yahoo!

285 citations


Patent
Aaron Sittig1, Mark Zuckerberg1
27 Dec 2011
TL;DR: In this paper, a system, method, and computer program for generating a social timeline is provided, where a plurality of data items associated with at least one relationship between users associated with a social network is received, each data item having an associated time.
Abstract: A system, method, and computer program for generating a social timeline is provided. A plurality of data items associated with at least one relationship between users associated with a social network is received, each data item having an associated time. The data items are ordered according to the at least one relationship. A social timeline is generated according to the ordered data items.

263 citations


Patent
30 Jun 2011
TL;DR: In this paper, the first user action relating to a first topic from a first user, identifying the first topic based on the user action, identifying one or more second posts that relate to the first topics, and transmitting to the user the information associated with the second posts in a structured document.
Abstract: In one embodiment, a method includes receiving a first user action relating to a first topic from a first user, identifying the first topic based on the first user action, identifying one or more second posts that relate to the first topic, and transmitting to the first user one or more of the second posts or information associated with the second posts in a structured document for display to the first user, the structured document further comprising one or more interactive elements that enable the first user to interact with the one or more second posts or to respective second users that declared the second posts.

234 citations


Patent
18 Nov 2011
TL;DR: In this paper, a social networking system provides relevant third-party content objects to users by matching user location, interests, and other social information with the content, location, and timing associated with content objects.
Abstract: A social networking system provides relevant third-party content objects to users by matching user location, interests, and other social information with the content, location, and timing associated with the content objects. Content objects are provided based on relevance scores specific to a user. Relevance scores may be calculated based on the user's previous interactions with content object notifications, or based on interests that are common between the user and his or her connections in the social network. Context search is also provided for a user, wherein a list of search of results is ranked according to the relevance score of content object associated with the search results. Notifications may also be priced and distributed to users based on their relevance. In this way, the system can provide notifications that are relevant to user's interests and current circumstances, increasing the likelihood that they will find content objects of interest.

Proceedings ArticleDOI
Tao Stein1, Erdong Chen1, Karan Mangla1
10 Apr 2011
TL;DR: The design of the Facebook Immune System is outlined, the challenges the system has faced and overcome, and the challenges it continues to face.
Abstract: Popular Internet sites are under attack all the time from phishers, fraudsters, and spammers. They aim to steal user information and expose users to unwanted spam. The attackers have vast resources at their disposal. They are well-funded, with full-time skilled labor, control over compromised and infected accounts, and access to global botnets. Protecting our users is a challenging adversarial learning problem with extreme scale and load requirements. Over the past several years we have built and deployed a coherent, scalable, and extensible realtime system to protect our users and the social graph. This Immune System performs realtime checks and classifications on every read and write action. As of March 2011, this is 25B checks per day, reaching 650K per second at peak. The system also generates signals for use as feedback in classifiers and other components. We believe this system has contributed to making Facebook the safest place on the Internet for people and their information. This paper outlines the design of the Facebook Immune System, the challenges we have faced and overcome, and the challenges we continue to face.

Patent
11 Aug 2011
TL;DR: In this article, a user of a social networking system requests to check in a place near the user's current location, and the social network system generates a list of places near the users' current location and ranks the places in the list.
Abstract: In one embodiment, a user of a social networking system requests to check in a place near the user's current location. The social networking system generates a list of places near the user's current location, ranks the places in the list of places near the user's current location by a distance between each place and the user's current location, as well as activity of the user and the user's social contacts for each place, and returns the ranked list to the user.

Journal ArticleDOI
TL;DR: A learning framework that combines elements of the well-known PAC and mistake-bound models is introduced, designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to.
Abstract: We introduce a learning framework that combines elements of the well-known PAC and mistake-bound models. The KWIK (knows what it knows) framework was designed particularly for its utility in learning settings where active exploration can impact the training examples the learner is exposed to, as is true in reinforcement-learning and active-learning problems. We catalog several KWIK-learnable classes as well as open problems, and demonstrate their applications in experience-efficient reinforcement learning.

Patent
Yael Maguire1
31 May 2011
TL;DR: In this article, a wireless device consisting of an RF interface, logic circuitry, power circuitry, an impedance matching transformer, and a transducer is configured to produce an audio signal based on the output analog signal.
Abstract: A wireless device includes an RF interface, logic circuitry, power circuitry, an impedance matching transformer, and a transducer. The RF interface is configured to receive an RF signal and provide an output data signal derived from the RF signal. The logic circuitry is configured to receive the output data signal and provide an output analog signal. The power circuitry is coupled to the RF interface and configured to provide DC operating power derived from the RF signal to the RF interface and the logic circuitry. The impedance matching transformer has an input coupled to the logic circuitry and an output. The transducer is coupled to the output of the impedance matching transformer and is configured to produce an audio signal based on the output analog signal.

Proceedings ArticleDOI
20 Jun 2011
TL;DR: Y Smart, a correlation aware SQL-to-MapReduce translator that applies a set of rules to use the minimal number of MapReduce jobs to execute multiple correlated operations in a complex query, can significantly reduce redundant computations, I/O operations and network transfers compared to existing translators.
Abstract: MapReduce has become an effective approach to big data analytics in large cluster systems, where SQL-like queries play important roles to interface between users and systems. However, based on our Face book daily operation results, certain types of queries are executed at an unacceptable low speed by Hive (a production SQL-to-MapReduce translator). In this paper, we demonstrate that existing SQL-to-MapReduce translators that operate in a one-operation-to-one-job mode and do not consider query correlations cannot generate high-performance MapReduce programs for certain queries, due to the mismatch between complex SQL structures and simple MapReduce framework. We propose and develop a system called Y Smart, a correlation aware SQL-to-MapReduce translator. Y Smart applies a set of rules to use the minimal number of MapReduce jobs to execute multiple correlated operations in a complex query. Y Smart can significantly reduce redundant computations, I/O operations and network transfers compared to existing translators. We have implemented Y Smart with intensive evaluation for complex queries on two Amazon EC2 clusters and one Face book production cluster. The results show that Y Smart can outperform Hive and Pig, two widely used SQL-to-MapReduce translators, by more than four times for query execution.

Proceedings ArticleDOI
30 Mar 2011
TL;DR: A new testing framework for cloud recovery is proposed: FATE (Failure Testing Service) and DESTINI (Declarative Testing Specifications).
Abstract: As the cloud era begins and failures become commonplace, failure recovery becomes a critical factor in the availability, reliability and performance of cloud services. Unfortunately, recovery problems still take place, causing downtimes, data loss, and many other problems. We propose a new testing framework for cloud recovery: FATE (Failure Testing Service) and DESTINI (Declarative Testing Specifications). With FATE, recovery is systematically tested in the face of multiple failures. With DESTINI, correct recovery is specified clearly, concisely, and precisely. We have integrated our framework to several cloud systems (e.g., HDFS [33]), explored over 40,000 failure scenarios, wrote 74 specifications, found 16 new bugs, and reproduced 51 old bugs.

Patent
07 Oct 2011
TL;DR: In this article, an automated agent, such as an instant message robot, is used to facilitate introduction of a chat participant to a small group of other chat participants in a chat room.
Abstract: An automated agent, such as an instant message robot, is be used to facilitate introduction of a chat participant to a small group of other chat participants in a chat room. To do so, for example, a BOT may present a chat participant who desires to be introduced to a small group of chat participants in a chat room with a series of multiple-choice questions, identify a subset of chat participants based on responses to the multiple-choice questions, and provide introductions among the chat participants in the subset to facilitate conversation therebetween. For example, the introductions provided by the BOT may indicate areas of mutual interest among chat participants in the subset, similar responses to one or more multiple-choice questions, and/or diverse responses to one or more multiple-choice questions.

Patent
14 Apr 2011
TL;DR: In this article, a user device requests a web page from a web server of a third-party website, which is separate from a social networking system, and the user device then renders the web page with the personalized content contained in a frame and displays the rendered web page and the frame to the user.
Abstract: A user device requests a web page from a web server of a third-party website, which is separate from a social networking system. The web server from the third-party website sends a markup language document for the requested web page to the user device which includes an instruction for a browser application running on the user device to incorporate information obtained from the social networking system within the web page. Based on the instruction in the received markup language document, the user device requests personalized content from the social networking system, which generates the requested personalized content based on social information about the user. The user device then renders the web page with the personalized content contained in a frame and displays the rendered web page and the frame to the user.

Proceedings Article
14 Jun 2011
TL;DR: Results show that a deep learner did beat previously published results and reached human-level performance, and the hypothesis is that intermediate levels of representation, because they can be shared across tasks and examples from different but related distributions, can yield even more benefits.
Abstract: Recent theoretical and empirical work in statistical machine learning has demonstrated the potential of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple levels of representation. The hypothesis evaluated here is that intermediate levels of representation, because they can be shared across tasks and examples from different but related distributions, can yield even more benefits. Comparative experiments were performed on a large-scale handwritten character recognition setting with 62 classes (upper case, lower case, digits), using both a multi-task setting and perturbed examples in order to obtain out-ofdistribution examples. The results agree with the hypothesis, and show that a deep learner did beat previously published results and reached human-level performance.

Patent
Blake Groves1, W. Karl Renner1
07 Jan 2011
TL;DR: In this paper, instant messaging (IM) entities may be invited to an electronic calendar event using an instant message using an IM buddy list, and selecting the IM entities as invitees to the event may include dragging and dropping names of IM entities from a buddy list of an IM application to an event from an e-calendar application, or vice versa.
Abstract: Instant messaging (IM) entities may be invited to an electronic calendar event using an instant message. Selecting the IM entities as invitees to the event may include dragging and dropping names of the IM entities from a buddy list of an IM application to an event from an electronic calendar application, or vice versa. A method of inviting an entity to a calendar event includes providing a calendar event from a calendar application and recognizing, by the calendar application, an IM entity as an invitee to the event.

Patent
David W. Nesbitt1
28 Feb 2011
TL;DR: In this paper, techniques for presenting a route in a manner that emphasizes the route and provides context information are described. But the route is not presented in a way that is suitable for a large number of users.
Abstract: Techniques are provided for presenting a route in a manner that emphasizes the route and provides context information. For example, a vivid color or vivid colors may be used to display the route, and pastel colors or other desaturated colors may be used for non-route context information. This may result in a map in which the vivid colors of the route stand out over the faded style of the non-route context information to emphasize the route. In this manner, the map may both emphasize the route and provide context information for the route.

Patent
11 Jul 2011
TL;DR: In this paper, an indication of a plurality of product categories is received from a buyer and an offer amount associated with the plurality of categories is also received from the seller, and the buyer's offer is evaluated.
Abstract: Systems and methods are provided wherein an indication of a plurality of product categories is received, each product category being associated with a plurality of products. For example, the indication of the plurality of product categories may be received from a buyer. Buyer offer information, including an indication of an offer amount associated with the plurality of product categories, is also received. A subset of the plurality of products is selected for each of the product categories, and an indication of the selected products is provided. The buyer's offer may then be evaluated. If the buyer's offer is accepted, the selected products may be provided to the buyer in exchange for payment of the offer amount.

Patent
21 Nov 2011
TL;DR: In this paper, a geo-social networking system maintains a data store of shared space, wherein each shared space comprises one or more content objects, a location, and privacy settings.
Abstract: In one embodiment, a geo-social networking system maintains a data store of shared space, wherein each shared space comprises one or more content objects, a location, and one or more privacy settings. The geo-social networking system allows a user read-access to a shared space based on privacy settings associated with the shared space. The geo-social networking system allows a user write-access to a shared space if the user is at the location associated with the shared space.

Patent
09 Sep 2011
TL;DR: In this paper, a mobile electronic device is in a first operation state, and it receives sensor data from one or more sensors of the mobile electronic devices, and in response to a positive determination, initializes the camera subsystem so that the camera is ready to capture a face as soon as the user directs the camera lens to his or her face.
Abstract: In one embodiment, while a mobile electronic device is in a first operation state, it receives sensor data from one or more sensors of the mobile electronic device. The mobile electronic device in a locked state analyzes the sensor data to estimate whether an unlock operation is imminent, and in response to a positive determination, initializes the camera subsystem so that the camera is ready to capture a face as soon as the user directs the camera lens to his or her face. In particular embodiments, the captured image is utilized by a facial recognition algorithm to determine whether the user is authorized to use the mobile device. In particular embodiments, the captured facial recognition image may be leveraged for use on a social network.

Patent
07 Mar 2011
TL;DR: In this paper, a geo-social networking system determines a user's current location, generates a list of places near the user's location, and ranks the list based on distance, relevancy and a configurable rule set, and automatically checks in the user at the top ranked place.
Abstract: In one embodiment, a geo-social networking system determines a user's current location, generate a list of places near the user's current location, rank the list of places based on distance, relevancy and a configurable rule set, and automatically checks in the user at the top ranked place.

Proceedings Article
05 Jul 2011
TL;DR: A new measure for the analysis of personal networks is proposed, based on the way in which an individual divides his or her attention across contacts, to contrast people who focus a large fraction of their interactions on a small set of close friends with people who disperse their attention more widely.
Abstract: An individual's personal network — their set of social contacts — is a basic object of study in sociology. Studies of personal networks have focused on their size (the number of contacts) and their composition (in terms of categories such as kin and co-workers). Here we propose a new measure for the analysis of personal networks, based on the way in which an individual divides his or her attention across contacts. This allows us to contrast people who focus a large fraction of their interactions on a small set of close friends with people who disperse their attention more widely. Using data from Facebook, we find that this balance of attention is a relatively stable property of an individual over time, and that it displays interesting variation across both different groups of people and different modes of interaction. In particular, activities based on communication involve a much higher focus of attention than activities based simply on observation, and these two types of modalities also exhibit different forms of variation in interaction patterns both within and across groups. Finally, we contrast the amount of attention paid by individuals to their most frequent contacts with the rate of change in the identities of these contacts, providing a measure of churn for this set.

Proceedings ArticleDOI
Huan-Kai Peng1, Jiang Zhu1, Dongzhen Piao1, Rong Yan2, Ying Zhang1 
11 Dec 2011
TL;DR: This paper proposes modeling the retweet patterns using conditional random fields with a three types of user-tweet features: content influence, network influence and temporal decay factor, and demonstrates that CRF can improve prediction effectiveness by incorporating social relationships compared to the baselines that do not.
Abstract: Among the most popular micro-blogging service, Twitter recently introduced their reblogging service called retweet to allow a user to repopulate another user's content for his followers. It quickly becomes one of the most prominent features on Twitter and an important mean for secondary content promotion. However, it remains unclear what motivates users to retweet and whether the retweeting decisions are predictable based on a user's tweeting history and social relationships. In this paper, we propose modeling the retweet patterns using conditional random fields with a three types of user-tweet features: content influence, network influence and temporal decay factor. We also investigate approaches to partition the social graphs and construct the network relations for retweet prediction. Our experiments demonstrate that CRF can improve prediction effectiveness by incorporating social relationships compared to the baselines that do not.

Proceedings ArticleDOI
21 May 2011
TL;DR: This work proposed a new type of annotations - interrupt related annotations - and generated 96,821 such annotations for the Linux kernel with little manual effort and were used to automatically detect 9 real OS concurrency bugs (7 of which were previously unknown).
Abstract: Concurrency bugs in an operating system (OS) are detrimental as they can cause the OS to fail and affect all applications running on top of the OS. Detecting OS concurrency bugs is challenging due to the complexity of the OS synchronization, particularly with the presence of the OS specific interrupt context. Existing dynamic concurrency bug detection techniques are designed for user level applications and cannot be applied to operating systems. To detect OS concurrency bugs, we proposed a new type of annotations - interrupt related annotations - and generated 96,821 such annotations for the Linux kernel with little manual effort. These annotations have been used to automatically detect 9 real OS concurrency bugs (7 of which were previously unknown). Two of the key techniques that make the above contributions possible are: (1) using a hybrid approach to extract annotations from both code and comments written in natural language to achieve better coverage and accuracy in annotation extraction and bug detection; and (2) automatically propagating annotations to caller functions to improve annotating and bug detection. These two techniques are general and can be applied to non-OS code, code written in other programming languages such as Java, and for extracting other types of specifications.

Proceedings ArticleDOI
25 Jul 2011
TL;DR: It is shown that the throughput, response time, and power consumption of a high-core-count processor operating at a low clock rate and very low power consumption can perform well when compared to a platform using faster but fewer commodity cores.
Abstract: Scaling data centers to handle task-parallel work-loads requires balancing the cost of hardware, operations, and power. Low-power, low-core-count servers reduce costs in one of these dimensions, but may require additional nodes to provide the required quality of service or increase costs by under-utilizing memory and other resources. We show that the throughput, response time, and power consumption of a high-core-count processor operating at a low clock rate and very low power consumption can perform well when compared to a platform using faster but fewer commodity cores. Specific measurements are made for a key-value store, Memcached, using a variety of systems based on three different processors: the 4-core Intel Xeon L5520, 8-core AMD Opteron 6128 HE, and 64-core Tilera TILEPro64.