scispace - formally typeset
Search or ask a question

Showing papers by "Facebook published in 2010"


Proceedings Article
21 Jun 2010
TL;DR: A novel 3D CNN model for action recognition that extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames.
Abstract: We consider the fully automated recognition of actions in uncontrolled environment. Most existing work relies on domain knowledge to construct complex handcrafted features from inputs. In addition, the environments are usually assumed to be controlled. Convolutional neural networks (CNNs) are a type of deep models that can act directly on the raw inputs, thus automating the process of feature construction. However, such models are currently limited to handle 2D inputs. In this paper, we develop a novel 3D CNN model for action recognition. This model extracts features from both spatial and temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames. The developed model generates multiple channels of information from the input frames, and the final feature representation is obtained by combining information from all channels. We apply the developed model to recognize human actions in real-world environment, and it achieves superior performance without relying on handcrafted features.

4,087 citations


Journal ArticleDOI
TL;DR: Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure.
Abstract: Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure. Cassandra aims to run on top of an infrastructure of hundreds of nodes (possibly spread across different data centers). At this scale, small and large components fail continuously. The way Cassandra manages the persistent state in the face of these failures drives the reliability and scalability of the software systems relying on this service. While in many ways Cassandra resembles a database and shares many design and implementation strategies therewith, Cassandra does not support a full relational data model; instead, it provides clients with a simple data model that supports dynamic control over data layout and format. Cassandra system was designed to run on cheap commodity hardware and handle high write throughput while not sacrificing read efficiency.

2,870 citations


Posted Content
TL;DR: In this article, the authors demonstrate how to use Mechanical Turk for conducting behavioral research and lower the barrier to entry for researchers who could benefit from this platform, and illustrate the mechanics of putting a task on Mechanical Turk including recruiting subjects, executing the task, and reviewing the work submitted.
Abstract: Amazon’s Mechanical Turk is an online labor market where requesters post jobs and workers choose which jobs to do for pay. The central purpose of this paper is to demonstrate how to use this website for conducting behavioral research and lower the barrier to entry for researchers who could benefit from this platform. We describe general techniques that apply to a variety of types of research and experiments across disciplines. We begin by discussing some of the advantages of doing experiments on Mechanical Turk, such as easy access to a large, stable, and diverse subject pool, the low cost of doing experiments and faster iteration between developing theory and executing experiments. We will discuss how the behavior of workers compares to experts and to laboratory subjects. Then, we illustrate the mechanics of putting a task on Mechanical Turk including recruiting subjects, executing the task, and reviewing the work that was submitted. We also provide solutions to common problems that a researcher might face when executing their research on this platform including techniques for conducting synchronous experiments, methods to ensure high quality work, how to keep data private, and how to maintain code security.

2,755 citations


Proceedings ArticleDOI
13 Apr 2010
TL;DR: This work proposes a simple algorithm called delay scheduling, which achieves nearly optimal data locality in a variety of workloads and can increase throughput by up to 2x while preserving fairness.
Abstract: As organizations start to use data-intensive cluster computing systems like Hadoop and Dryad for more applications, there is a growing need to share clusters between users. However, there is a conflict between fairness in scheduling and data locality (placing tasks on nodes that contain their input data). We illustrate this problem through our experience designing a fair scheduler for a 600-node Hadoop cluster at Facebook. To address the conflict between locality and fairness, we propose a simple algorithm called delay scheduling: when the job that should be scheduled next according to fairness cannot launch a local task, it waits for a small amount of time, letting other jobs launch tasks instead. We find that delay scheduling achieves nearly optimal data locality in a variety of workloads and can increase throughput by up to 2x while preserving fairness. In addition, the simplicity of delay scheduling makes it applicable under a wide variety of scheduling policies beyond fair sharing.

1,514 citations


Proceedings ArticleDOI
10 Apr 2010
TL;DR: It is found that directed communication is associated with greater feelings of bonding social capital and lower loneliness, but has only a modest relationship with bridging social capital, which is primarily related to overall friend network size.
Abstract: Previous research has shown a relationship between use of social networking sites and feelings of social capital. However, most studies have relied on self-reports by college students. The goals of the current study are to (1) validate the common self-report scale using empirical data from Facebook, (2) test whether previous findings generalize to older and international populations, and (3) delve into the specific activities linked to feelings of social capital and loneliness. In particular, we investigate the role of directed interaction between pairs---such as wall posts, comments, and "likes" --- and consumption of friends' content, including status updates, photos, and friends' conversations with other friends. We find that directed communication is associated with greater feelings of bonding social capital and lower loneliness, but has only a modest relationship with bridging social capital, which is primarily related to overall friend network size. Surprisingly, users who consume greater levels of content report reduced bridging and bonding social capital and increased loneliness. Implications for designs to support well-being are discussed.

972 citations


Proceedings ArticleDOI
01 Mar 2010
TL;DR: Hive is presented, an open-source data warehousing solution built on top of Hadoop that supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into map-reduce jobs that are executed using Hadoops.
Abstract: The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop [1] is a popular open-source map-reduce implementation which is being used in companies like Yahoo, Facebook etc. to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse. In this paper, we present Hive, an open-source data warehousing solution built on top of Hadoop. Hive supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into map-reduce jobs that are executed using Hadoop. In addition, HiveQL enables users to plug in custom map-reduce scripts into queries. The language includes a type system with support for tables containing primitive types, collections like arrays and maps, and nested compositions of the same. The underlying IO libraries can be extended to query data in custom formats. Hive also includes a system catalog - Metastore - that contains schemas and statistics, which are useful in data exploration, query optimization and query compilation. In Facebook, the Hive warehouse contains tens of thousands of tables and stores over 700TB of data and is being used extensively for both reporting and ad-hoc analyses by more than 200 users per month.

959 citations


Posted Content
TL;DR: An algorithm based on Supervised Random Walks is developed that naturally combines the information from the network structure with node and edge level attributes and outperforms state-of-the-art unsupervised approaches as well as approaches that are based on feature extraction.
Abstract: Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open. We develop an algorithm based on Supervised Random Walks that naturally combines the information from the network structure with node and edge level attributes. We achieve this by using these attributes to guide a random walk on the graph. We formulate a supervised learning task where the goal is to learn a function that assigns strengths to edges in the network such that a random walker is more likely to visit the nodes to which new links will be created in the future. We develop an efficient training algorithm to directly learn the edge strength estimation function. Our experiments on the Facebook social graph and large collaboration networks show that our approach outperforms state-of-the-art unsupervised approaches as well as approaches that are based on feature extraction.

903 citations


Proceedings ArticleDOI
26 Apr 2010
TL;DR: Using user-supplied address data and the network of associations between members of the Facebook social network, an algorithm is introduced that predicts the location of an individual from a sparse set of located users with performance that exceeds IP-based geolocation.
Abstract: Geography and social relationships are inextricably intertwined; the people we interact with on a daily basis almost always live near us. As people spend more time online, data regarding these two dimensions -- geography and social relationships -- are becoming increasingly precise, allowing us to build reliable models to describe their interaction. These models have important implications in the design of location-based services, security intrusion detection, and social media supporting local communities.Using user-supplied address data and the network of associations between members of the Facebook social network, we can directly observe and measure the relationship between geography and friendship. Using these measurements, we introduce an algorithm that predicts the location of an individual from a sparse set of located users with performance that exceeds IP-based geolocation. This algorithm is efficient and scalable, and could be run on a network containing hundreds of millions of users.

785 citations


Proceedings ArticleDOI
Doug Beaver1, Sanjeev Kumar1, Harry C. Li1, Jason Sobel1, Peter Vajgel1 
04 Oct 2010
TL;DR: This paper describes Haystack, an object storage system optimized for Facebook's Photos application, which provides a less expensive and higher performing solution than the previous approach, which leveraged network attached storage appliances over NFS.
Abstract: This paper describes Haystack, an object storage system optimized for Facebook's Photos application Facebook currently stores over 260 billion images, which translates to over 20 petabytes of data Users upload one billion new photos (∼60 terabytes) each week and Facebook serves over one million images per second at peak Haystack provides a less expensive and higher performing solution than our previous approach, which leveraged network attached storage appliances over NFS Our key observation is that this traditional design incurs an excessive number of disk operations because of metadata lookups We carefully reduce this per photo metadata so that Haystack storage machines can perform all metadata lookups in main memory This choice conserves disk operations for reading actual data and thus increases overall throughput

473 citations


Proceedings ArticleDOI
06 Jun 2010
TL;DR: This paper presents how Scribe, Hadoop and Hive together form the cornerstones of the log collection, storage and analytics infrastructure at Facebook and enabled us to implement a data warehouse that stores more than 15PB of data and loads more than 60TB of new data every day.
Abstract: Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Apart from ad hoc analysis of data and creation of business intelligence dashboards by analysts across the company, a number of Facebook's site features are also based on analyzing large data sets. These features range from simple reporting applications like Insights for the Facebook Advertisers, to more advanced kinds such as friend recommendations. In order to support this diversity of use cases on the ever increasing amount of data, a flexible infrastructure that scales up in a cost effective manner, is critical. We have leveraged, authored and contributed to a number of open source technologies in order to address these requirements at Facebook. These include Scribe, Hadoop and Hive which together form the cornerstones of the log collection, storage and analytics infrastructure at Facebook. In this paper we will present how these systems have come together and enabled us to implement a data warehouse that stores more than 15PB of data (2.5PB after compression) and loads more than 60TB of new data (10TB after compression) every day. We discuss the motivations behind our design choices, the capabilities of this solution, the challenges that we face in day today operations and future capabilities and improvements that we are working on.

455 citations


Patent
Chris Cheah1
17 Jun 2010
TL;DR: In this article, an information management and distribution system facilitates the controlled exchange of contact information over a network, which can support one or more of creation and design, rolodex, exchange, and update features.
Abstract: An information management and distribution system is disclosed. The information management and distribution system facilitates the controlled exchange of contact information over a network. The system can support one or more of creation and design, rolodex, exchange, and update features. In one embodiment, the information management and distribution system can include a networked server system accessible by remote user devices via the network, and at least one database maintained by the networked server system and storing content information and exchange settings of registered users.

Journal ArticleDOI
01 Sep 2010
TL;DR: A sharing framework tailored to MapReduce is proposed that transforms a batch of queries into a new batch that will be executed more efficiently, by merging jobs into groups and evaluating each group as a single query.
Abstract: Large-scale data analysis lies in the core of modern enterprises and scientific research. With the emergence of cloud computing, the use of an analytical query processing infrastructure (e.g., Amazon EC2) can be directly mapped to monetary value. MapReduce has been a popular framework in the context of cloud computing, designed to serve long running queries (jobs) which can be processed in batch mode. Taking into account that different jobs often perform similar work, there are many opportunities for sharing. In principle, sharing similar work reduces the overall amount of work, which can lead to reducing monetary charges incurred while utilizing the processing infrastructure. In this paper we propose a sharing framework tailored to MapReduce.Our framework, MRShare, transforms a batch of queries into a new batch that will be executed more efficiently, by merging jobs into groups and evaluating each group as a single query. Based on our cost model for MapReduce, we define an optimization problem and we provide a solution that derives the optimal grouping of queries. Experiments in our prototype, built on top of Hadoop, demonstrate the overall effectiveness of our approach and substantial savings.

Patent
26 Mar 2010
TL;DR: In this paper, an online social network is provided in which members can control who may view their personal information and who may communicate with them by setting a visibility preference, where the measure of relatedness between the two is greater than the visibility preference of the other member.
Abstract: An online social network is provided in which members of the online social network control who may view their personal information and who may communicate with them. The members control who may view their personal information by setting a visibility preference. A member may not view another member's full personal profile if the measure of relatedness between the two is greater than the visibility preference of the other member. The members also control who may communicate with them by setting a contactability preference. A member may not communicate with another member if the measure of relatedness between the two is greater than the contactability preference of the other member.

Patent
Srinivas P. Narayanan1, Alex Li1, Chad Eugene Little1, Namita Gupta1, Peter Xiu Deng1 
19 Apr 2010
TL;DR: In this paper, the authors define a plurality of edges that each define a connection between a corresponding pair of nodes including a first set and a second set of edges, each edge from the first set defining a relationship between a pair of user nodes and representing a social relationship between the users corresponding to the user nodes.
Abstract: In one embodiment, a system includes one or more computing systems that implement a social networking environment and are operable to access stored information including a plurality of nodes including a first set of user nodes that each correspond to a respective user and a second set of concept nodes that each correspond to a respective concept. The stored information further includes a plurality of edges that each define a connection between a corresponding pair of nodes including a first set and a second set of edges. Each edge from the first set defining a connection between a pair of user nodes and representing a social relationship between the users corresponding to the user nodes. Each edge from the second set defining a connection between a user node and a concept node and representing an interest of the user of the user node with respect to the corresponding concept node.

Patent
30 Mar 2010
TL;DR: In this article, search results are generated in response to a query, and are marked based on frequency of clicks on the search results by members of social network who are within a predetermined degree of separation from the member who submitted the query.
Abstract: Search results, including sponsored links and algorithmic search results, are generated in response to a query, and are marked based on frequency of clicks on the search results by members of social network who are within a predetermined degree of separation from the member who submitted the query. The markers are visual tags and comprise either a text string or an image.

Patent
19 Apr 2010
TL;DR: In this article, the authors present a method for maintaining access to a data store of information corresponding to nodes and edges; receiving a user-generated character string comprising one or more characters of text entered by a user in an input form as they are entered by the user; searching the stored information for matches between the user generated character string and existing nodes; determining whether or not a match between the UGC string and an existing node exists; and when it is determined that at least one match exists, generating an edge between the node corresponding to the user and the node for which the
Abstract: In one embodiment, a method includes maintaining access to a data store of information corresponding to nodes and edges; receiving a user-generated character string comprising one or more characters of text entered by a user in an input form as they are entered by the user; searching the stored information for matches between the user-generated character string and existing nodes; determining whether or not a match between the user-generated character string and an existing node exists; and when it is determined that at least one match exists, generating an edge between the node corresponding to the user and the node for which the best match is determined; and when it is determined that no match between the user-generated character string and an existing node exists, generating a new node based on the user-generated character string, and generating an edge between the node corresponding to the user and the new node.

Proceedings Article
06 Dec 2010
TL;DR: This work provides a sound and consistent foundation for the use of nonrandom exploration data in "contextual bandit" or "partially labeled" settings where only the value of a chosen action is learned.
Abstract: We provide a sound and consistent foundation for the use of nonrandom exploration data in "contextual bandit" or "partially labeled" settings where only the value of a chosen action is learned. The primary challenge in a variety of settings is that the exploration policy, in which "offline" data is logged, is not explicitly known. Prior solutions here require either control of the actions during the learning process, recorded random exploration, or actions chosen obliviously in a repeated manner. The techniques reported here lift these restrictions, allowing the learning of a policy for choosing actions given features from historical data where no randomization occurred or was logged. We empirically verify our solution on two reasonably sized sets of real-world data obtained from Yahoo!.

Patent
09 Jun 2010
TL;DR: In this paper, a social networking service encourages users to post content to a communication channel with varying levels of accessibility to other users, and users can select how content will be published and control the accessibility of uploaded content using a privacy setting for each content item.
Abstract: A social networking service encourages users to post content to a communication channel with varying levels of accessibility to other users. Users may select how content will be published and control the accessibility of uploaded content using a privacy setting for each content item that the user posts. The privacy setting defines, or identifies, the set of connections who may view the posted content item. The posted content item is placed in a particular communication channel in the social networking service, such as a newsfeed or stream, where the content item can be viewed by those who are permitted to view it according to its associated privacy setting. Varying granularities of privacy settings provide flexibility for content accessibility on a social networking service.

Patent
01 Mar 2010
TL;DR: In this article, a social CAPTCHA is presented to authenticate a member of the social network, which includes one or more challenge questions based on information available in the social networks, such as the user's activities and/or connections.
Abstract: A social CAPTCHA is presented to authenticate a member of the social network. The social CAPTCHA includes one or more challenge questions based on information available in the social network, such as the user's activities and/or connections in the social network. The social information selected for the social CAPTCHA may be determined based on affinity scores associated with the member's connections, so that the challenge question relates to information that the user is more likely to be familiar with. A degree of difficulty of challenge questions may be determined and used for selecting the CAPTCHA based on a degree of suspicion.

Patent
Ding Zhou1, Pierre Moreels1
29 Oct 2010
TL;DR: In this paper, user profile information for a user of a social networking system is inferred based on information about user profile of the user's connections in the social network system, including age, gender, education, affiliations, location, and the like.
Abstract: User profile information for a user of a social networking system is inferred based on information about user profile of the user's connections in the social networking system. The inferred user profile attributes may include age, gender, education, affiliations, location, and the like. To infer a value of a user profile attribute, the system may determine an aggregate value based on the attributes of the user's connections. A confidence score may also be associated with the inferred attribute value. The set of connections analyzed to infer a user profile attribute may depend on the attribute, the types of connections, and the interactions between the user and the connections. The inferred attribute values may be used to update the user's profile and to determine information relevant to the user to be presented to the user (e.g., targeting advertisements to the user based on the user's inferred attributes).

Proceedings ArticleDOI
22 Mar 2010
TL;DR: The P* algorithm, a best-first search method based on a novel hierarchical partition tree index and three effective heuristic evaluation functions are devised to evaluate probabilistic path queries efficiently.
Abstract: Path queries such as "finding the shortest path in travel time from my hotel to the airport" are heavily used in many applications of road networks. Currently, simple statistic aggregates such as the average travel time between two vertices are often used to answer path queries. However, such simple aggregates often cannot capture the uncertainty inherent in traffic. In this paper, we study how to take traffic uncertainty into account in answering path queries in road networks. To capture the uncertainty in traffic such as the travel time between two vertices, the weight of an edge is modeled as a random variable and is approximated by a set of samples. We propose three novel types of probabilistic path queries using basic probability principles: (1) a probabilistic path query like "what are the paths from my hotel to the airport whose travel time is at most 30 minutes with a probability of at least 90%?"; (2) a weight-threshold top-k path query like "what are the top-3 paths from my hotel to the airport with the highest probabilities to take at most 30 minutes?"; and (3) a probability-threshold top-k path query like "what are the top-3 shortest paths from my hotel to the airport whose travel time is guaranteed by a probability of at least 90%?" To evaluate probabilistic path queries efficiently, we develop three efficient probability calculation methods: an exact algorithm, a constant factor approximation method and a sampling based approach. Moreover, we devise the P* algorithm, a best-first search method based on a novel hierarchical partition tree index and three effective heuristic evaluation functions. An extensive empirical study using real road networks and synthetic data sets shows the effectiveness of the proposed path queries and the efficiency of the query evaluation methods.

Patent
19 May 2010
TL;DR: Methods of and apparatuses for providing human interaction with a computer, including human control of three dimensional input devices, force feedback, and force input, are described in this paper.
Abstract: Methods of and apparatuses for providing human interaction with a computer, including human control of three dimensional input devices, force feedback, and force input.

Patent
31 Mar 2010
TL;DR: In this article, a social networking system facilitates a user's creation of a group of other users from among the user's connections in the user-'s social network, based on a similarity of the suggested connections with one or more of the connections who have been added to the group.
Abstract: A social networking system facilitates a user's creation of a group of other users from among the user's connections in the user's social network. The created groups may be used, for example, to publish information to certain user-defined groups or to define privacy settings or other access rights to the user's content according to such user-defined groups. When a user adds connections to a group, the social networking system determines suggested connections that have not been added to the group, based on a similarity of the suggested connections with one or more of the connections who have been added to the group. These suggested connections are then presented to the user to facilitate the creation of the group. Both positive and negative feedback may be used to generate a useful set of suggestions, which may be updated as the user further defines the group.

Proceedings Article
16 May 2010
TL;DR: An approach to determine the ethnic breakdown of a population based solely on people's names and data provided by the U.S. Census Bureau is demonstrated to be able to predict the ethnicities of individuals as well as the ethnicity of an entire population better than natural alternatives.
Abstract: We propose an approach to determine the ethnic breakdown of a population based solely on people's names and data provided by the U.S. Census Bureau. We demonstrate that our approach is able to predict the ethnicities of individuals as well as the ethnicity of an entire population better than natural alternatives. We apply our technique to the population of U.S. Facebook users and uncover the demographic characteristics of ethnicities and how they relate. We also discover that while Facebook has always been diverse, diversity has increased over time leading to a population that today looks very similar to the overall U.S. population. We also find that different ethnic groups relate to one another in an assortative manner, and that these groups have different profiles across demographics, beliefs, and usage of site features.

Patent
James Wang1, Jennifer Burge1, Lars Backstrom1, Florin Ratiu1, Daniel Ferrante1 
16 Aug 2010
TL;DR: In this article, the system determines the likelihood that the user will connect to each candidate user if suggested to do so, and also computes the value to the social networking system if the user does connect to the candidate user.
Abstract: To suggest new connections to a user of a social networking system, the system generates a set of candidate users to whom the user has not already formed a connection. The system determines the likelihood that the user will connect to each candidate user if suggested to do so, and it also computes the value to the social networking system if the user does connect to the candidate user. Then, the system computes an expected value score for each candidate user based on the corresponding likelihood and the value. The candidate users are ranked and the suggestions are provided to the user based on the candidate users' expected value scores. The social networking system can suggest other actions to a user in addition to forming a new connection with other users.

Patent
Timothy A. Kendall1, Ding Zhou1
16 Mar 2010
TL;DR: In this article, a social network targets advertisements to its members using inferential ad targeting, where a member's connections in the social network that satisfy the targeting criteria are leveraged to infer a targeted interest.
Abstract: A social network targets advertisements to its members using inferential ad targeting An inferential ad enables advertisers to reach members that do not meet targeting criteria for lack of information A member's connections in the social network that satisfy the targeting criteria are leveraged to infer a targeted interest An inferential ad is selected from a candidate set to be presented to the member Varying complexities of targeting criteria, secondary inferential targeting criteria, and scopes of inference provide flexibility for inferential ad targeting in a social network

Posted Content
TL;DR: In this paper, the authors provide a sound and consistent foundation for the use of non-random exploration data in contextual bandit or partially labeled settings where only the value of a chosen action is learned.
Abstract: We provide a sound and consistent foundation for the use of \emph{nonrandom} exploration data in "contextual bandit" or "partially labeled" settings where only the value of a chosen action is learned. The primary challenge in a variety of settings is that the exploration policy, in which "offline" data is logged, is not explicitly known. Prior solutions here require either control of the actions during the learning process, recorded random exploration, or actions chosen obliviously in a repeated manner. The techniques reported here lift these restrictions, allowing the learning of a policy for choosing actions given features from historical data where no randomization occurred or was logged. We empirically verify our solution on two reasonably sized sets of real-world data obtained from Yahoo!.

Patent
Erick Tseng1
22 Dec 2010
TL;DR: In this article, a social networking system provides relevant third-party content objects to users by matching user location, interests, and other social information with the content, location, and timing associated with content objects.
Abstract: A social networking system provides relevant third-party content objects to users by matching user location, interests, and other social information with the content, location, and timing associated with the content objects. Content objects are provided based on relevance scores specific to a user. Relevance scores may be calculated based on the user's previous interactions with content object notifications, or based on interests that are common between the user and his or her connections in the social network. Context search is also provided for a user, wherein a list of search of results is ranked according to the relevance score of content object associated with the search results. Notifications may also be priced and distributed to users based on their relevance. In this way, the system can provide notifications that are relevant to user's interests and current circumstances, increasing the likelihood that they will find content objects of interest.

Patent
Erick Tseng1
24 Sep 2010
TL;DR: In this article, a social networking system automatically tags one or more users to an image file by creating a list of potential matches, and selecting a subset of potential match based on location.
Abstract: In one embodiment, a social networking system automatically tags one or more users to an image file by creating a list of potential matches, and selecting a subset of potential matches based on location, asking a first user to confirm the subset of potential matches, and tagging one or more matched users to the image file.

Patent
22 Dec 2010
TL;DR: In this paper, the authors propose a method to identify one or more sponsored web pages in response to a search query, where each sponsored web page is associated with a hyperlink and the response further includes a visual tag or a reference to the visual tag for the hyperlink if the web page has been accessed by at least one of the first users.
Abstract: Particular embodiments access a search query submitted by a first user; identify one or more sponsored web pages in response to the search query, wherein each sponsored web page is associated with a hyperlink; determine whether one or more of the sponsored web pages has been accessed by one or more second users, wherein the one or more second users are connected in a graph structure to the first user within a threshold degree of separation; and send a response comprising a hyperlink for at least one of the sponsored web pages in response to the search query, wherein the response further includes a visual tag or a reference to the visual tag for the hyperlink if the sponsored web page has been accessed by at least one of the one or more second users.