scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Fast and accurate estimation of shortest paths in large graphs

TL;DR: This paper presents a scalable sketch-based index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves, leading to near-exact shortest-path approximations in real world graphs.
Abstract: Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a number of techniques exist for answering reachability queries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes. In this paper, we address this problem and present a scalable sketch-based index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to near-exact shortest-path approximations in real world graphs. We evaluate our techniques - implemented within a fully functional RDF graph database system - over large real-world social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0% and 1% on average.
Citations
More filters
Posted Content
TL;DR: This work proposes a new exact method for shortest-path distance queries on large-scale networks that can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods.
Abstract: We propose a new exact method for shortest-path distance queries on large-scale networks. Our method precomputes distance labels for vertices by performing a breadth-first search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadth-first searches. While we can still answer the correct distance for any pair of vertices from the labels, it surprisingly reduces the search space and sizes of labels. Moreover, we show that we can perform 32 or 64 breadth-first searches simultaneously exploiting bitwise operations. We experimentally demonstrate that the combination of these two techniques is efficient and robust on various kinds of large-scale real-world networks. In particular, our method can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods, with comparable query time to those of previous methods.

278 citations


Cites methods from "Fast and accurate estimation of sho..."

  • ...However, some of these methods take milliseconds to answer queries [15, 38, 30], which is about three orders of magnitude slower than other methods....

    [...]

Proceedings ArticleDOI
22 Jun 2013
TL;DR: In this article, a new exact method for shortest-path distance queries on large-scale networks is proposed, where the key ingredient introduced here is pruning during breadth-first searches.
Abstract: We propose a new exact method for shortest-path distance queries on large-scale networks. Our method precomputes distance labels for vertices by performing a breadth-first search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadth-first searches. While we can still answer the correct distance for any pair of vertices from the labels, it surprisingly reduces the search space and sizes of labels. Moreover, we show that we can perform 32 or 64 breadth-first searches simultaneously exploiting bitwise operations. We experimentally demonstrate that the combination of these two techniques is efficient and robust on various kinds of large-scale real-world networks. In particular, our method can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods, with comparable query time to those of previous methods.

270 citations

Journal ArticleDOI
TL;DR: This survey reviews selected approaches, algorithms, and results on shortest-path queries from these fields, with the main focus lying on the tradeoff between the index size and the query time.
Abstract: We consider the point-to-point (approximate) shortest-path query problem, which is the following generalization of the classical single-source (SSSP) and all-pairs shortest-path (APSP) problems: we are first presented with a network (graph). A so-called preprocessing algorithm may compute certain information (a data structure or index) to prepare for the next phase. After this preprocessing step, applications may ask shortest-path or distance queries, which should be answered as fast as possible.Due to its many applications in areas such as transportation, networking, and social science, this problem has been considered by researchers from various communities (sometimes under different names): algorithm engineers construct fast route planning methods; database and information systems researchers investigate materialization tradeoffs, query processing on spatial networks, and reachability queries; and theoretical computer scientists analyze distance oracles and sparse spanners. Related problems are considered for compact routing and distance labeling schemes in networking and distributed computing and for metric embeddings in geometry as well.In this survey, we review selected approaches, algorithms, and results on shortest-path queries from these fields, with the main focus lying on the tradeoff between the index size and the query time. We survey methods for general graphs as well as specialized methods for restricted graph classes, in particular for those classes with arguable practical significance such as planar graphs and complex networks.

249 citations


Cites background from "Fast and accurate estimation of sho..."

  • ...Many implementations focus on the triangulation part, providing good estimates for long-range distances by carefully selecting landmarks [Potamias et al. 2009; Das Sarma et al. 2010; Gubichev et al. 2010; Tretyakov et al. 2011; Cao et al. 2011; Qiao et al. 2011; Qiao et al. 2012; Cheng et al. 2012]....

    [...]

  • ...…triangulation part, providing good estimates for long-range distances by carefully selecting landmarks [Potamias et al. 2009; Das Sarma et al. 2010; Gubichev et al. 2010; Tretyakov 5Whether or not many of these degree sequences actually obey power laws is a controversial question [Faloutsos et al.…...

    [...]

Proceedings ArticleDOI
20 May 2012
TL;DR: A novel labeling scheme, referred to as Highway-Centric Labeling, for answering distance queries in a large sparse graph that empowers the distance labeling with a highway structure and leverages a novel bipartite set cover framework/algorithm.
Abstract: The distance query, which asks the length of the shortest path from a vertex $u$ to another vertex v, has applications ranging from link analysis, semantic web and other ontology processing, to social network operations. Here, we propose a novel labeling scheme, referred to as Highway-Centric Labeling, for answering distance queries in a large sparse graph. It empowers the distance labeling with a highway structure and leverages a novel bipartite set cover framework/algorithm. Highway-centric labeling provides better labeling size than the state-of-the-art $2$-hop labeling, theoretically and empirically. It also offers both exact distance and approximate distance with bounded accuracy. A detailed experimental evaluation on both synthetic and real datasets demonstrates that highway-centric labeling can outperform the state-of-the-art distance computation approaches in terms of both index size and query time.

98 citations


Cites background or methods from "Fast and accurate estimation of sho..."

  • ...further generalize the Sketch method to discover the shortest path (not only the distance) in large graphs [26]....

    [...]

  • ...However, they generally do not have the spatial and planar-like properties a road network has [26]....

    [...]

  • ..., and the need for this basic graph operator have recently attracted much interest in the database community [47, 26]....

    [...]

  • ...Compared with online search algorithms [22, 26], the distance labeling approach can a provide much faster query result....

    [...]

Journal ArticleDOI
TL;DR: A taxonomy of privacy and security attacks in OSNs is introduced, existing solutions to mitigate those attacks are overviewed, and challenges still to overcome are outlined.

88 citations


Cites methods from "Fast and accurate estimation of sho..."

  • ...Canal efficiently computes an approximate max-flow (compromising accuracy with speed-up) path using existing landmark routing-based algorithm [Tsuchiya, 1988; Gubichev et al., 2010]....

    [...]

References
More filters
01 Jan 2005

19,250 citations

Book
01 Jan 1968
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Abstract: A fuel pin hold-down and spacing apparatus for use in nuclear reactors is disclosed. Fuel pins forming a hexagonal array are spaced apart from each other and held-down at their lower end, securely attached at two places along their length to one of a plurality of vertically disposed parallel plates arranged in horizontally spaced rows. These plates are in turn spaced apart from each other and held together by a combination of spacing and fastening means. The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid. This apparatus is particularly useful in connection with liquid cooled reactors such as liquid metal cooled fast breeder reactors.

17,939 citations

Journal ArticleDOI
TL;DR: In this article, a language similar to logo is used to draw geometric pictures using this language and programs are developed to draw geometrical pictures using it, which is similar to the one we use in this paper.
Abstract: The primary purpose of a programming language is to assist the programmer in the practice of her art. Each language is either designed for a class of problems or supports a different style of programming. In other words, a programming language turns the computer into a ‘virtual machine’ whose features and capabilities are unlimited. In this article, we illustrate these aspects through a language similar tologo. Programs are developed to draw geometric pictures using this language.

5,749 citations

Book ChapterDOI
11 Nov 2007
TL;DR: The extraction of the DBpedia datasets is described, and how the resulting information is published on the Web for human-andmachine-consumption and how DBpedia could serve as a nucleus for an emerging Web of open data.
Abstract: DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against datasets derived from Wikipedia and to link other datasets on the Web to Wikipedia data. We describe the extraction of the DBpedia datasets, and how the resulting information is published on the Web for human-andmachine-consumption. We describe some emerging applications from the DBpedia community and show how website authors can facilitate DBpedia content within their sites. Finally, we present the current status of interlinking DBpedia with other open datasets on the Web and outline how DBpedia could serve as a nucleus for an emerging Web of open data.

4,828 citations


"Fast and accurate estimation of sho..." refers background in this paper

  • ...INTRODUCTION Graphs are routinely used in the modern digital world in a number of settings, such as online social networks (like LinkedIn, Facebook, MySpace), synthesized entity-relationships in large-scale knowledge repositories [1, 25], biological interaction models [10, 12, 13], transportation networks [20], the massive hyperlink graph between documents of theWorld Wide Web, XML data, and many more....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the basic problem of interconnecting a given set of terminals with a shortest possible network of direct links is considered, and a set of simple and practical procedures are given for solving this problem both graphically and computationally.
Abstract: The basic problem considered is that of interconnecting a given set of terminals with a shortest possible network of direct links Simple and practical procedures are given for solving this problem both graphically and computationally It develops that these procedures also provide solutions for a much broader class of problems, containing other examples of practical interest

4,395 citations


"Fast and accurate estimation of sho..." refers background in this paper

  • ...INTRODUCTION Graphs are routinely used in the modern digital world in a number of settings, such as online social networks (like LinkedIn, Facebook, MySpace), synthesized entity-relationships in large-scale knowledge repositories [1, 25], biological interaction models [10, 12, 13], transportation networks [20], the massive hyperlink graph between documents of theWorld Wide Web, XML data, and many more....

    [...]