scispace - formally typeset
Search or ask a question
Author

Sergey Pupyrev

Bio: Sergey Pupyrev is an academic researcher from Facebook. The author has contributed to research in topics: Planar graph & Book embedding. The author has an hindex of 14, co-authored 84 publications receiving 778 citations. Previous affiliations of Sergey Pupyrev include Ural Federal University & Association for Computing Machinery.


Papers
More filters
Proceedings ArticleDOI
13 Aug 2016
TL;DR: A novel theoretically sound reordering algorithm that is based on recursive graph bisection is designed and implemented and a significant improvement of the compression rate of graph and indexes over existing heuristics is shown.
Abstract: Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes. We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection. Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.

87 citations

Book ChapterDOI
21 Sep 2011
TL;DR: A new and efficient algorithm is provided that solves a variant of the metro-line crossing minimization problem and creates aesthetically pleasing edge routes that give an overview of the global graph structure, while still drawing each edge separately, without intersecting graph nodes, and with few crossings.
Abstract: We propose a new approach to edge bundling. At the first stage we route the edge paths so as to minimize a weighted sum of the total length of the paths together with their ink. As this problem is NP-hard, we provide an efficient heuristic that finds an approximate solution. The second stage then separates edges belonging to the same bundle. To achieve this, we provide a new and efficient algorithm that solves a variant of the metro-line crossing minimization problem. The method creates aesthetically pleasing edge routes that give an overview of the global graph structure, while still drawing each edge separately, without intersecting graph nodes, and with few crossings.

62 citations

Book ChapterDOI
21 Sep 2010
TL;DR: This method modifies the layout produced by the Sugiyama scheme by bundling some of the edges together by a new algorithm based on minimizing the total ink needed to draw the graph edges.
Abstract: We show how to improve the Sugiyama scheme by edge bundling Our method modifies the layout produced by the Sugiyama scheme by bundling some of the edges together The bundles are created by a new algorithm based on minimizing the total ink needed to draw the graph edges We give several implementations that vary in quality of the resulting layout and execution time To diminish the number of edge crossings inside of the bundles we apply a metro-line crossing minimization technique The method preserves the Sugiyama style of the layout and creates a more readable view of the graph

52 citations

Journal ArticleDOI
Igor Kabiljo1, Brian Karrer1, Mayank Pundir1, Sergey Pupyrev1, Alon Shalita1 
01 Aug 2017
TL;DR: The Social Hash Partitioner as mentioned in this paper is a distributed algorithm for balanced k-way hypergraph partitioning that minimizes fanout, a fundamental hypergraph quantity also known as the communication volume and (k - 1)-cut metric, by optimizing a novel objective called probabilistic fanout.
Abstract: We design and implement a distributed algorithm for balanced k-way hypergraph partitioning that minimizes fanout, a fundamental hypergraph quantity also known as the communication volume and (k - 1)-cut metric, by optimizing a novel objective called probabilistic fanout. This choice allows a simple local search heuristic to achieve comparable solution quality to the best existing hypergraph partitioners.Our algorithm is arbitrarily scalable due to a careful design that controls computational complexity, space complexity, and communication. In practice, we commonly process hypergraphs with billions of vertices and hyperedges in a few hours. We explain how the algorithm's scalability, both in terms of hypergraph size and bucket count, is limited only by the number of machines available. We perform an extensive comparison to existing distributed hypergraph partitioners and find that our approach is able to optimize hypergraphs roughly 100 times bigger on the same set of machines.We call the resulting tool Social Hash Partitioner, and accompanying this paper, we open-source the most scalable version based on recursive bisection.

52 citations

Book ChapterDOI
24 Sep 2014
TL;DR: The results indicate that increasing the number of crossings negatively impacts accuracy and performance time and that impact is significant for small graphs but not significant for large graphs.
Abstract: Reducing the number of edge crossings is considered one of the most important graph drawing aesthetics. While real-world graphs tend to be large and dense, most of the earlier work on evaluating the impact of edge crossings utilizes relatively small graphs that are manually generated and manipulated. We study the effect on task performance of increased edge crossings in automatically generated layouts for graphs, from different datasets, with different sizes, and with different densities. The results indicate that increasing the number of crossings negatively impacts accuracy and performance time and that impact is significant for small graphs but not significant for large graphs. We also quantitatively evaluate the impact of edge crossings on crossing angles and stress in automatically constructed graph layouts. We find a moderate correlation between minimizing stress and the minimizing the number of crossings.

51 citations


Cited by
More filters
01 Nov 2004
TL;DR: In this article, the authors presented a new map representing the structure of all of science, based on journal articles, including both the natural and social sciences, which provides a bird's eye view of today's scientific landscape.
Abstract: This paper presents a new map representing the structure of all of science, based on journal articles, including both the natural and social sciences. Similar to cartographic maps of our world, the map of science provides a bird's eye view of today's scientific landscape. It can be used to visually identify major areas of science, their size, similarity, and interconnectedness. In order to be useful, the map needs to be accurate on a local and on a global scale. While our recent work has focused on the former aspect, this paper summarizes results on how to achieve structural accuracy. Eight alternative measures of journal similarity were applied to a data set of 7,121 journals covering over 1 million documents in the combined Science Citation and Social Science Citation Indexes. For each journal similarity measure we generated two-dimensional spatial layouts using the force-directed graph layout tool, VxOrd. Next, mutual information values were calculated for each graph at different clustering levels to give a measure of structural accuracy for each map. The best co-citation and inter-citation maps according to local and structural accuracy were selected and are presented and characterized. These two maps are compared to establish robustness. The inter-citation map is more » then used to examine linkages between disciplines. Biochemistry appears as the most interdisciplinary discipline in science. « less

702 citations

Journal ArticleDOI
TL;DR: A hierarchical taxonomy of techniques is derived by systematically categorizing and tagging publications and identifying the representation of time as the major distinguishing feature for dynamic graph visualizations.
Abstract: Dynamic graph visualization focuses on the challenge of representing the evolution of relationships between entities in readable, scalable and effective diagrams. This work surveys the growing number of approaches in this discipline. We derive a hierarchical taxonomy of techniques by systematically categorizing and tagging publications. While static graph visualizations are often divided into node-link and matrix representations, we identify the representation of time as the major distinguishing feature for dynamic graph visualizations: either graphs are represented as animated diagrams or as static charts based on a timeline. Evaluations of animated approaches focus on dynamic stability for preserving the viewer's mental map or, in general, compare animated diagrams to timeline-based ones. A bibliographic analysis provides insights into the organization and development of the field and its community. Finally, we identify and discuss challenges for future research. We also provide feedback from experts, collected with a questionnaire, which gives a broad perspective of these challenges and the current state of the field.

276 citations

Journal Article
TL;DR: In this paper, the authors show that the minimum degree greedy algorithm achieves a performance ratio of (Δ+2)/3 for approximating independent sets in graphs with degree bounded by Δ.
Abstract: Theminimum-degree greedy algorithm, or Greedy for short, is a simple and well-studied method for finding independent sets in graphs. We show that it achieves a performance ratio of (Δ+2)/3 for approximating independent sets in graphs with degree bounded by Δ. The analysis yields a precise characterization of the size of the independent sets found by the algorithm as a function of the independence number, as well as a generalization of Turan's bound. We also analyze the algorithm when run in combination with a known preprocessing technique, and obtain an improved $$(2\bar d + 3)/5$$ performance ratio on graphs with average degree $$\bar d$$ , improving on the previous best $$(\bar d + 1)/2$$ of Hochbaum. Finally, we present an efficient parallel and distributed algorithm attaining the performance guarantees of Greedy.

235 citations

DOI
01 Jan 2014
TL;DR: A hierarchical taxonomy of techniques is derived by systematically categorizing and tagging techniques in dynamic graph visualization and identifying the representation of time as the major distinguishing feature for dynamic graph visualizations.
Abstract: Dynamic graph visualization focuses on the challenge of representing the evolution of relationships between en- tities in readable, scalable, and effective diagrams. This work surveys the growing number of approaches in this discipline. We derive a hierarchical taxonomy of techniques by systematically categorizing and tagging publica- tions. While static graph visualizations are often divided into node-link and matrix representations, we identify the representation of time as the major distinguishing feature for dynamic graph visualizations: either graphs are represented as animated diagrams or as static charts based on a timeline. Evaluations of animated approaches focus on dynamic stability for preserving the viewer's mental map or, in general, compare animated diagrams to timeline-based ones. Finally, we identify and discuss challenges for future research.

230 citations

Proceedings ArticleDOI
01 Feb 2020
TL;DR: A set of real-world, production-scale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation are presented and in-depth analysis is conducted that underpins future system design and optimization for at-scale recommendation.
Abstract: The widespread application of deep learning has changed the landscape of computation in data centers. In particular, personalized recommendation for content ranking is now largely accomplished using deep neural networks. However, despite their importance and the amount of compute cycles they consume, relatively little research attention has been devoted to recommendation systems. To facilitate research and advance the understanding of these workloads, this paper presents a set of real-world, production-scale DNNs for personalized recommendation coupled with relevant performance metrics for evaluation. In addition to releasing a set of open-source workloads, we conduct in-depth analysis that underpins future system design and optimization for at-scale recommendation: Inference latency varies by 60% across three Intel server generations, batching and co-location of inference jobs can drastically improve latency-bounded throughput, and diversity across recommendation models leads to different optimization strategies.

217 citations