scispace - formally typeset
Search or ask a question
Author

Igor Kabiljo

Other affiliations: Google
Bio: Igor Kabiljo is an academic researcher from Facebook. The author has contributed to research in topics: Hash function & Social graph. The author has an hindex of 10, co-authored 11 publications receiving 300 citations. Previous affiliations of Igor Kabiljo include Google.

Papers
More filters
Proceedings ArticleDOI
13 Aug 2016
TL;DR: A novel theoretically sound reordering algorithm that is based on recursive graph bisection is designed and implemented and a significant improvement of the compression rate of graph and indexes over existing heuristics is shown.
Abstract: Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes. We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection. Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.

87 citations

Patent
26 Nov 2013
TL;DR: In this article, the online system receives requests to generate predictor models for predicting whether a user is likely to take an action of a particular action type and criteria for identifying a successful and failure instance of the action type.
Abstract: Online systems generate predictors for predicting actions of users of the online system. The online system receives requests to generate predictor models for predicting whether a user is likely to take an action of a particular action type. The request specifies the type of action and criteria for identifying a successful instance of the action type and a failure instance of the action type. The online system collects data including successful and failure instances of the action type. The online system generates one or more predictors of different types using the generated data. The online system evaluates and compares the performance of the different predictors generated and selects a predictor based on the performance. The online system returns a handle to access the generated predictor to the requester of the predictor.

67 citations

Journal ArticleDOI
Igor Kabiljo1, Brian Karrer1, Mayank Pundir1, Sergey Pupyrev1, Alon Shalita1 
01 Aug 2017
TL;DR: The Social Hash Partitioner as mentioned in this paper is a distributed algorithm for balanced k-way hypergraph partitioning that minimizes fanout, a fundamental hypergraph quantity also known as the communication volume and (k - 1)-cut metric, by optimizing a novel objective called probabilistic fanout.
Abstract: We design and implement a distributed algorithm for balanced k-way hypergraph partitioning that minimizes fanout, a fundamental hypergraph quantity also known as the communication volume and (k - 1)-cut metric, by optimizing a novel objective called probabilistic fanout. This choice allows a simple local search heuristic to achieve comparable solution quality to the best existing hypergraph partitioners.Our algorithm is arbitrarily scalable due to a careful design that controls computational complexity, space complexity, and communication. In practice, we commonly process hypergraphs with billions of vertices and hyperedges in a few hours. We explain how the algorithm's scalability, both in terms of hypergraph size and bucket count, is limited only by the number of machines available. We perform an extensive comparison to existing distributed hypergraph partitioners and find that our approach is able to optimize hypergraphs roughly 100 times bigger on the same set of machines.We call the resulting tool Social Hash Partitioner, and accompanying this paper, we open-source the most scalable version based on recursive bisection.

52 citations

Proceedings Article
16 Mar 2016
TL;DR: The framework uses a two-level scheme to decouple compute-intensive optimization from relatively low-overhead dynamic adaptation to optimize the operations of large social networks, such as Facebook's Social Graph.
Abstract: How objects are assigned to components in a distributed system can have a significant impact on performance and resource usage. Social Hash is a framework for producing, serving, and maintaining assignments of objects to components so as to optimize the operations of large social networks, such as Facebook's Social Graph. The framework uses a two-level scheme to decouple compute-intensive optimization from relatively low-overhead dynamic adaptation. The optimization at the first level occurs on a slow timescale, and in our applications is based on graph partitioning in order to leverage the structure of the social network. The dynamic adaptation at the second level takes place frequently to adapt to changes in access patterns and infrastructure, with the goal of balancing component loads. We demonstrate the effectiveness of Social Hash with two real applications. The first assigns HTTP requests to individual compute clusters with the goal of minimizing the (memory-based) cache miss rate; Social Hash decreased the cache miss rate of production workloads by 25%. The second application assigns data records to storage subsystems with the goal of minimizing the number of storage subsystems that need to be accessed on multiget fetch requests; Social Hash cut the average response time in half on production workloads for one of the storage systems at Facebook.

36 citations

Proceedings ArticleDOI
TL;DR: In this article, the authors extend the theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes.
Abstract: Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes. We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection. Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.

33 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data, and categorizes summarization approaches by the type of graphs taken as input and further organize each category by core methodology.
Abstract: While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Efficient computational methods for condensing and simplifying data are thus becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing interconnected data, or graphs, become popular. This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data. We first broach the motivation behind and the challenges of graph summarization. We then categorize summarization approaches by the type of graphs taken as input and further organize each category by core methodology. Finally, we discuss applications of summarization on real-world graphs and conclude by describing some open problems in the field.

186 citations

Proceedings Article
27 Mar 2017
TL;DR: This work argues that data-driven QoE optimization should instead be cast as a real-time exploration and exploitation (E2) process rather than as a prediction problem, and presents Pytheas, a framework which addresses these challenges using a group-based E2 mechanism.
Abstract: Content providers are increasingly using data-driven mechanisms to optimize quality of experience (QoE). Many existing approaches formulate this process as a prediction problem of learning optimal decisions (e.g., server, bitrate, relay) based on observed QoE of recent sessions. While prediction-based mechanisms have shown promising QoE improvements, they are necessarily incomplete as they: (1) suffer from many known biases (e.g., incomplete visibility) and (2) cannot respond to sudden changes (e.g., load changes). Drawing a parallel from machine learning, we argue that data-driven QoE optimization should instead be cast as a real-time exploration and exploitation (E2) process rather than as a prediction problem. Adopting E2 in network applications, however, introduces key architectural (e.g., how to update decisions in real time with fresh data) and algorithmic (e.g., capturing complex interactions between session features vs. QoE) challenges. We present Pytheas, a framework which addresses these challenges using a group-based E2 mechanism. The insight is that application sessions sharing the same features (e.g., IP prefix, location) can be grouped so that we can run E2 algorithms at a per-group granularity. This naturally captures the complex interactions and is amenable to realtime control with fresh measurements. Using an end-to-end implementation and a proof-of-concept deployment in CloudLab, we show that Pytheas improves video QoE over a state-of-the-art prediction-based system by up to 31% on average and 78% on 90th percentile of persession QoE.

105 citations

Proceedings ArticleDOI
11 Jul 2018
TL;DR: It is shown that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes.
Abstract: There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 13 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We provide a publicly-available benchmark suite containing our implementations.

100 citations

Posted Content
TL;DR: This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data, and categorizes summarization approaches by the type of graphs taken as input and further organize each category by core methodology.
Abstract: While advances in computing resources have made processing enormous amounts of data possible, human ability to identify patterns in such data has not scaled accordingly. Efficient computational methods for condensing and simplifying data are thus becoming vital for extracting actionable insights. In particular, while data summarization techniques have been studied extensively, only recently has summarizing interconnected data, or graphs, become popular. This survey is a structured, comprehensive overview of the state-of-the-art methods for summarizing graph data. We first broach the motivation behind, and the challenges of, graph summarization. We then categorize summarization approaches by the type of graphs taken as input and further organize each category by core methodology. Finally, we discuss applications of summarization on real-world graphs and conclude by describing some open problems in the field.

90 citations

Proceedings ArticleDOI
13 Aug 2016
TL;DR: A novel theoretically sound reordering algorithm that is based on recursive graph bisection is designed and implemented and a significant improvement of the compression rate of graph and indexes over existing heuristics is shown.
Abstract: Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes. We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection. Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.

87 citations