Open AccessJournal ArticleDOI

Efficient computation of distance labeling for decremental updates in large dynamic graphs

- 01 Sep 2017 -

- Vol. 20, Iss: 5, pp 915-937

Chats0

TLDR

This paper proposes maintenance algorithms based on distance labeling, which can handle decremental updates efficiently and can speed up index re-computation by up to an order of magnitude compared with the state-of-the-art method, Pruned Landmark Labeling (PLL).

Abstract:

Since today's real-world graphs, such as social network graphs, are evolving all the time, it is of great importance to perform graph computations and analysis in these dynamic graphs. Due to the fact that many applications such as social network link analysis with the existence of inactive users need to handle failed links or nodes, decremental computation and maintenance for graphs is considered a challenging problem. Shortest path computation is one of the most fundamental operations for managing and analyzing large graphs. A number of indexing methods have been proposed to answer distance queries in static graphs. Unfortunately, there is little work on answering such queries for dynamic graphs. In this paper, we focus on the problem of computing the shortest path distance in dynamic graphs, particularly on decremental updates (i.e., edge deletions). We propose maintenance algorithms based on distance labeling, which can handle decremental updates efficiently. By exploiting properties of distance labeling in original graphs, we are able to efficiently maintain distance labeling for new graphs. We experimentally evaluate our algorithms using eleven real-world large graphs and confirm the effectiveness and efficiency of our approach. More specifically, our method can speed up index re-computation by up to an order of magnitude compared with the state-of-the-art method, Pruned Landmark Labeling (PLL).

Content maybe subject to copyright Report

University of Huddersfield Repository

Qin, Yongrui, Sheng, Quan Z., Falkner, Nickolas J. G., Yao, Lina and Parkinson, Simon

Efficient Computation of Distance Labeling for Decremental Updates in Large Dynamic Graphs

Original Citation

Qin, Yongrui, Sheng, Quan Z., Falkner, Nickolas J. G., Yao, Lina and Parkinson, Simon (2016)

Efficient Computation of Distance Labeling for Decremental Updates in Large Dynamic Graphs.

World Wide Web Journal. ISSN 1386-145X

This version is available at http://eprints.hud.ac.uk/id/eprint/29768/

The University Repository is a digital collection of the research output of the

University, available on Open Access. Copyright and Moral Rights for the items

on this site are retained by the individual author and/or other copyright owners.

Users may access full items free of charge; copies of full text items generally

can be reproduced, displayed or performed and given to third parties in any

format or medium for personal research or study, educational or not-for-profit

purposes without prior permission or charge, provided:

• The authors, title and full bibliographic details is credited in any copy;

• A hyperlink and/or URL is included for the original metadata page; and

• The content is not changed in any way.

For more information, including our policy and submission procedure, please

contact the Repository Team at: E.mailbox@hud.ac.uk.

http://eprints.hud.ac.uk/

World Wide Web manuscript No.

(will be inserted by the editor)

Eﬃcient Computation of Distance Labeling for

Decremental Updates in Large Dynamic Graphs

Yongrui Qin · Quan Z. Sheng · Nickolas

J.G. Falkner · Lina Yao · Simon

Parkinson

Received: date / Accepted: date

Abstract Since today’s real-world graphs, such as social network graphs, are

evolving all the time, it is of great importance to perform graph computations

and analysis in these dynamic graphs. Due to the fact that many applications

such as social network link analysis with the existence of inactive users need

to handle failed links or nodes, decremental computation and maintenance for

graphs is considered a challenging problem. Shortest path computation is one

of the most fundamental operations for managing and analyzing large graphs.

A number of indexing methods have been proposed to answer distance queries

in static graphs. Unfortunately, there is little work on answering such queries

for dynamic graphs. In this paper, we focus on the problem of computing the

shortest path distance in dynamic graphs, particularly on decremental updates

(i.e., edge deletions). We propose maintenance algorithms based on distance

labeling, which can handle decremental updates eﬃciently. By exploiting prop-

erties of distance labeling in original graphs, we are able to eﬃciently maintain

distance labeling for new graphs. We experimentally evaluate our algorithms

using eleven real-world large graphs and conﬁrm the eﬀectiveness and eﬃ-

ciency of our approach. More speciﬁcally, our method can speed up index

re-computation by up to an order of magnitude compared with the state-of-

the-art method, Pruned Landmark Labeling (PLL).

Keywords Shortest Path · Graph Computation · Distance Labeling ·

Dynamic Graph

Yongrui Qin and Simon Parkinson are with School of Computing and Engineering, Univer-

sity of Huddersﬁeld, UK

Quan Z. Sheng and Nickolas J.G. Falkner are with School of Computer Science, The Uni-

versity of Adelaide, Australia

Lina Yao is with School of Computer Science and Engineering, The University of New South

Wales, Australia

Corresponding Author: Yongrui Qin, E-mail: y.qin2@hud.ac.uk

2 Yongrui Qin et al.

1 Introduction

Recent years have witnessed the fast emergence of massive graph data in many

application domains, such as the World Wide Web, linked data technology,

online social networks, and Web of Things. In a graph, one of the most funda-

mental problems is the computation of the shortest path or distance between

any given pair of vertices. For instance, distances or the numbers of links be-

tween web pages in a large web graph can be considered a robust measure

of web page relevancy, especially in relevance feedback analysis in web search

[21]. In RDF graphs of linked data, the shortest path distance from one entity

to another is important for ranking entity relationships and keyword querying

[18,15]. For online social networks, the shortest path distance can be used to

measure the closeness centrality between users [22,23].

A large body of indexing techniques have been recently proposed to pro-

cess exact shortest path distance queries in graphs [9, 24, 8, 7,2,26,16]. Among

them, a signiﬁcant portion of indexes are based on 2-hop distance labeling,

which is originally proposed by Cohen et al. [11]. The 2-hop distance label-

ing pre-computes a label for each vertex so that the shortest path distance

between any two vertices can be computed by giving only their labels. These

labeling indexes, such as [9,7,2,16], prove to be eﬃcient when processing large

graphs with edge numbers up to hundreds of millions.

Motivation. The above mentioned approaches generally make the assump-

tion that graphs are static. However, in reality, many graphs are subject to

constant changes. For example, it is reported that in the fourth quarter of 2012,

Facebook reached 1.056 billion users amounted to a 24.97% increase from the

same period in 2011 [14]. Around April 2013, DBpedia, one of the most popu-

lar RDF graphs, released its version 3.9. In this new release, an overall increase

in the number of concepts in the English edition changed from 3.7 million to

4.0 million things compared with its last release in June 2012

. Similarly, the

emerging social Web of Things also supports the need for dynamic graph data

management because smart things are normally moving and their connectivity

could be intermittent, leading to frequent and unpredictable changes in the

corresponding graph models [10,25].

We believe that it is imperative to design novel algorithms that can update

shortest path indexes eﬃciently for large dynamic graphs. Existing shortest

path indexing techniques based on 2-hop labeling may take up to hundreds

of seconds to pre-compute the whole shortest path index for a graph with

millions of edges. For larger graphs, it can take up to thousands of seconds

[2,16]. Applying indexing techniques designed for static graphs directly to

dynamic graphs may lead to ineﬃciency. This is because that if only a small

part of the graph is changed, i.e., only a deletion of an existing edge occurs, a

signiﬁcant proportion of the shortest paths are likely to remain unchanged and

the index for the original graph may contain a large amount of correct distance

http://wiki.dbpedia.org/

Eﬃcient Computation of Distance Labeling for Decremental Updates 3

information. In such case, simply recomputing the 2-hop distance index from

scratch would unnecessarily waste computing resources.

An alternative is to maintain dynamic all-pairs shortest paths (APSP).

Many approaches have been proposed to maintain dynamic APSP data struc-

tures. For example, in [12,13], a dynamic algorithm for general directed graphs

with non-negative edge weights was proposed with a computational complexity

of O(n

log

n), where n is the number of vertices. However, this time bound is

comparable to recomputing all-pairs shortest paths from scratch, which makes

the algorithm ineﬃcient for handling changes in graphs. Recently, an algo-

rithm for maintaining dynamic all-pairs (1 + ) approximate shortest paths for

directed graphs with polynomial weights is proposed in [5]. The total update

complexity is

O(mn/), where n is the number of vertices and m is the number

of edges. Unfortunately it only applies to dynamic approximate shortest path

problems.

Incremental updates (i.e., edge insertions) of 2-hop labeling in large dy-

namic graphs have been recently investigated in [3]. However, the problem of

supporting decremental updates (i.e., edge deletions) of 2-hop labeling still

remains unsolved and is considered a challenging problem [3]. Decremental

updates are very useful in the presence of many real-world problems such as

outdated web links in a web graph or obsolete user proﬁles in a social network.

Clearly, decremental maintenance is a fundamental and important operation

on graph data to support eﬃcient web link analysis and social network anal-

ysis.

Contributions. To address the deﬁciency of existing shortest path indexing

techniques, this paper proposes a generic framework to update shortest path

indexes eﬃciently for dynamic graphs where edge deletions are allowed. As an

initial attempt on this challenging issue, we focus on unweighted, undirected

graphs. Similar to other distance labeling based indexing methods [2,16], our

method can be extended to weighted and/or directed graphs. We highlight our

main contributions in the following:

– We present the concept of well-ordering 2-hop distance labeling and identify

its important properties that can be utilized to design update algorithms

for shortest path indexes in dynamic graphs.

– We analyze cases of shortest path index maintenance in dynamic graphs

with decremental updates. We develop the corresponding theorems as well

as novel algorithms to enable eﬃcient updates without reconstruction of

distance labeling for the entire graph.

– We conduct extensive experiments on eleven real-world large graphs to ver-

ify the eﬃciency and eﬀectiveness of our method. Compared with the state-

of-the-art technique [2] which is designed for static graphs, our method is

on average an order of magnitude faster.

The rest of this paper is organized as follows. In Section 2, we review the

related work. In Section 3, we present some preliminaries on 2-hop distance

labeling. We then present the framework and the details of our approach in

4 Yongrui Qin et al.

Section 4. In Section 5, we report the results of an extensive experimental study

using eleven large graphs from real-world. Finally, we present some concluding

remarks in Section 6.

2 Related Work

In this section, we review the major techniques that are most closely related

to our work.

Distance labeling has been an active research area in recent years. In [9],

Cheng and Yu exploit the strongly connected components property and graph

partitioning to pre-compute 2-hop distance cover. However, the graph parti-

tioning process introduces high cost because it has to ﬁnd vertex separators

recursively. Hierarchical hub labeling (HHL) proposed by Abraham et al. [1] is

based on the partial order of vertices. Smaller labeling results can be obtained

by computing labeling for diﬀerent partial order of vertices. In [17], Jin et al.

propose a highway-centric labeling (HCL) that uses a spanning tree as a high-

way and based on the highway, a 2-hop labeling is generated for fast distance

computation.

Very recently, the Pruned Landmark Labeling (PLL) [2] is proposed by

Akiba et al. to pre-compute 2-hop distance labels for vertices by perform-

ing a breadth-ﬁrst search from every vertex. The key is to prune vertices

that have obtained correct distance information during breadth-ﬁrst searches,

which helps reduce the search space and sizes of labels. Further, query perfor-

mance is also improved as the number of label entries per vertex is reduced.

IS-Label (or ISL) is developed by Fu et al. in [16] to pre-compute 2-hop distance

label for large graphs in memory constrained environments. ISL is based on

the idea of independent set of vertices in a large graph. By recursively remov-

ing an independent set of vertices from the original graph, and by augmenting

edges that preserve distance information after the removal of vertices in the

independent set, the remaining graph keeps the distance information for all

remaining vertices in the graph. Apart from the 2-hop distance labeling tech-

nique, a multi-hop distance labeling approach [7] is also studied, which can

reduce the overall size of labels at the cost of increased distance querying time.

Tree decomposition approaches have been recently investigated [24,4] for

answering distance queries in graphs. Wei proposes TEDI [24], which ﬁrst

decomposes a graph into a tree and forms a tree decomposition. A tree de-

composition of a graph is a tree with each vertex associated with a set of

vertices in the graph, which is also called a bag. The shortest paths among

vertices in the same bag are pre-computed and stored in bags. For any given

source and target vertices, a bottom-up operation along the tree can be exe-

cuted to ﬁnd the shortest path. An improved TEDI index is further proposed

by Akiba et al. in [4] that exploits a core-fringe structure to improve index

performance. However, due to the large size of some bags in the decomposed

tree, the construction time for a large graph is costly and thus such indexing

approaches cannot scale well.

HTML Viewer

Figures

Table 6 Index Size Comparisons (with bit-parallel, all in KB)

Table 5 Index Size Comparisons (all in KB)

Fig. 5 Breakdown of maintenance times (with bit-parallel)

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Scaling Distance Labeling on Small-World Networks

Wentao Li, +5 more

TL;DR: Scale distance labeling on small-world networks by proposing a Parallel Shortest-distance Labeling (PSL) scheme and further reducing the index size by exploiting graph and label properties and near-linear speedup in a multi-core environment.

...read moreread less

Proceedings ArticleDOI

Dynamic Hub Labeling for Road Networks

Mengxuan Zhang, +5 more

TL;DR: In this paper, the authors adopt the state-of-the-art tree decomposition-based hub labeling as the underlying index, and design efficient algorithms to incrementally maintain the index.

...read moreread less

Proceedings ArticleDOI

Efficient 2-Hop Labeling Maintenance in Dynamic Small-World Networks

Mengxuan Zhang, +3 more

TL;DR: Wang et al. as discussed by the authors adopt the state-of-the-art Parallel Shortest Distance Labeling (PSL) as the underlying 2-hop labeling construction method, and design algorithms to support efficient update of the index given edge weight change (increase and decrease) in the network.

...read moreread less

Proceedings ArticleDOI

Hub Labeling for Shortest Path Counting

Yikai Zhang, +1 more

TL;DR: This work proposes a hub labeling scheme based on hub pushing and discusses several graph reduction techniques to reduce the index size and proves several theoretical results on the performance of the scheme for some special graph classes.

...read moreread less

Proceedings Article

A Highly Scalable Labelling Approach for Exact Distance Queries in Complex Networks.

Muhammad Farhan, +3 more

TL;DR: Li et al. as discussed by the authors proposed a scalable algorithm for constructing minimal distance labelling and a querying framework that supports fast distance-bounded search on a sparsified graph, which can handle networks with billions of vertices and billions of edges.

...read moreread less

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Reachability and distance queries via 2-hop labels

Edith Cohen, +3 more

TL;DR: In this paper, the authors propose a new data structure for representing all distances in a graph, which is distributed in the sense that it may be viewed as assigning labels to the vertices, such that a query involving vertices u and v may be answered using only the labels of u and V.

...read moreread less

Journal ArticleDOI

A new approach to dynamic all pairs shortest paths

Camil Demetrescu, +1 more

- 01 Nov 2004 -

Journal of the ACM

TL;DR: A fully dynamic algorithm for general directed graphs with non-negative real-valued edge weights that supports any sequence of operations in O(n2log3n) amortized time per update and unit worst-case time per distance query, where n is the number of vertices.

...read moreread less

Posted Content

Fast Exact Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling

Takuya Akiba, +2 more

- 17 Apr 2013 -

arXiv: Data Structures and Algorithms

TL;DR: This work proposes a new exact method for shortest-path distance queries on large-scale networks that can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods.

...read moreread less

Proceedings ArticleDOI

Fast exact shortest-path distance queries on large networks by pruned landmark labeling

Takuya Akiba, +2 more

TL;DR: In this article, a new exact method for shortest-path distance queries on large-scale networks is proposed, where the key ingredient introduced here is pruning during breadth-first searches.

...read moreread less

Book ChapterDOI

Hierarchical hub labelings for shortest paths

Ittai Abraham, +3 more

TL;DR: This work studies hierarchical hub labelings for computing shortest paths to lead to faster preprocessing algorithms, making the labeling approach practical for a wider class of graphs.

...read moreread less

Collapse

SIEF: Efficiently Answering Distance Queries for Failure Prone Graphs

Yongrui Qin, +2 more

Reachability and Distance Queries via 2-Hop Labels

Edith Cohen, +3 more

- 01 May 2003 -

SIAM Journal on Computing

IS-Label: an independent-set based labeling scheme for point-to-point distance querying

Ada Wai-Chee Fu, +3 more

Reachability queries on large dynamic graphs: a total order approach

Andy Diwen Zhu, +3 more

Fast exact shortest-path distance queries on large networks by pruned landmark labeling

Takuya Akiba, +2 more

Frequently Asked Questions (21)

Q1. What are the contributions mentioned in the paper "Efficient computation of distance labeling for decremental updates in large dynamic graphs" ?

In this paper, the authors focus on the problem of computing the shortest path distance in dynamic graphs, particularly on decremental updates ( i. e., edge deletions ). The authors propose maintenance algorithms based on distance labeling, which can handle decremental updates efficiently. The authors experimentally evaluate their algorithms using eleven real-world large graphs and confirm the effectiveness and efficiency of their approach.

Q2. What future works have the authors mentioned in the paper "Efficient computation of distance labeling for decremental updates in large dynamic graphs" ?

Their future work will further investigate several aspects of maintaining distance labeling indexes for large dynamic graphs. The authors also plan to extend their work to efficiently update distance labeling in memory and computing resource constrained environments. The first one centers on how to further speed up the decremental maintenance. The authors will investigate possible ways to maintain auxiliary information and redundant label entries that could be useful to reduce the relabeling efforts when an update occurs.

Q3. What is the key to reducing distance information in graphs?

By recursively removing an independent set of vertices from the original graph, and by augmenting edges that preserve distance information after the removal of vertices in the independent set, the remaining graph keeps the distance information for all remaining vertices in the graph.

Q4. What is the fundamental problem in a graph?

In a graph, one of the most fundamental problems is the computation of the shortest path or distance between any given pair of vertices.

Q5. what is the shortest path between s and t in graph G?

After the deletion of edge (u, v) from graph G, for any vertex s, t in G′, if dG′(s, t) > dist(s, t, L), and suppose a shortest path between s and t in G is πG(s, t), then the authors must have uv ∈ πG(s, t) or vu ∈ πG(s, t).

Q6. How long does it take to pre-compute the whole shortest path index for a?

Existing shortest path indexing techniques based on 2-hop labeling may take up to hundreds of seconds to pre-compute the whole shortest path index for a graph with millions of edges.

Q7. What is the key to reducing the size of labels?

The key is to prune vertices that have obtained correct distance information during breadth-first searches, which helps reduce the search space and sizes of labels.

Q8. How many neighbors can be labeled in a batch mode?

To exploit parallel computing during the labeling for these first t roots of BFSs, the bit-parallel technique will be able to label up to a fixed number of neighbors (e.g., up to 32 or 64 neighbors) in a batch mode when processing one vertex.

Q9. What is the main idea behind decremental maintenance?

decremental maintenance is a fundamental and important operation on graph data to support efficient web link analysis and social network analysis.

Q10. What is the important factor in determining web page relevancy?

For instance, distances or the numbers of links between web pages in a large web graph can be considered a robust measure of web page relevancy, especially in relevance feedback analysis in web search [21].

Q11. Why do the authors need to perform a large number of BFSs to find alternative shortest?

Due to lack of alternative shortest paths information, the authors have to perform a large number of BFSs to discover alternative shortest paths in order to maintain the index.

Q12. what is the shortest path between w and v?

Proof: Since w ∈ PA(u), the authors must have that dG(r, v) = dG(r, u) + 1, which means that any shortest path between w and u, denoted as pwu, plus edge (u, v) in the original graph must also be a shortest path between w and v.

Q13. Why is it important to maintain dynamic all-pairs shortest paths?

This is because that if only a small part of the graph is changed, i.e., only a deletion of an existing edge occurs, a significant proportion of the shortest paths are likely to remain unchanged and the index for the original graph may contain a large amount of correct distance1

Q14. What is the way to improve the performance of the index?

A possible way to further improve performance on decremental maintenance would be to introduce auxiliary information on the labeling or even redundant label entries in the labeling index.

Q15. Why is the construction time for a large graph expensive?

due to the large size of some bags in the decomposed tree, the construction time for a large graph is costly and thus such indexing approaches cannot scale well.

Q16. How do the authors prune v from the rest of the current BFS process?

At a later stage, the authors run BFS rooted at u (Note that at the beginning of this BFS, r has been pruned since a BFS rooted at r has been completed) and if d3 >= d1 + d2, the authors prune v from the rest of the current BFS process.

Q17. What is the way to maintain distance labels?

To support fast incremental updates, outdated distance labels are kept, which will not affect the distance computation in the updated graphs in the incremental case.

Q18. What is the average speedup ratio for a bit-parallel algorithm?

when bit-parallel is applied, the average update times (AUT-bp) are even smaller, though the average speedup ratio is not as large as the instances without bit-parallel, which could be due to the faster indexing processes with bit-parallel and the fact that less room is available for speeding up the maintenance processes.

Q19. What are the popular graphing techniques?

A large body of indexing techniques have been recently proposed to process exact shortest path distance queries in graphs [9,24,8,7,2,26,16].

Q20. What is the key to an improved TEDI index?

An improved TEDI index is further proposed by Akiba et al. in [4] that exploits a core-fringe structure to improve index performance.

Q21. What is the way to construct a well-ordering 2-hop distance labeling?

Figure 1 shows an example graph with 11 vertices and Table 1 shows a wellordering 2-hop distance labeling result L for the graph (L can be constructed by PLL [2] using the same vertex ordering as that specified in the table).

Efficient computation of distance labeling for decremental updates in large dynamic graphs

Figures

Citations

Scaling Distance Labeling on Small-World Networks

Dynamic Hub Labeling for Road Networks

Efficient 2-Hop Labeling Maintenance in Dynamic Small-World Networks

Hub Labeling for Shortest Path Counting

A Highly Scalable Labelling Approach for Exact Distance Queries in Complex Networks.

References

Reachability and distance queries via 2-hop labels

A new approach to dynamic all pairs shortest paths

Fast Exact Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling

Fast exact shortest-path distance queries on large networks by pruned landmark labeling

Hierarchical hub labelings for shortest paths

Related Papers (5)

SIEF: Efficiently Answering Distance Queries for Failure Prone Graphs

Reachability and Distance Queries via 2-Hop Labels

IS-Label: an independent-set based labeling scheme for point-to-point distance querying

Reachability queries on large dynamic graphs: a total order approach

Fast exact shortest-path distance queries on large networks by pruned landmark labeling

Frequently Asked Questions (21)

Q1. What are the contributions mentioned in the paper "Efficient computation of distance labeling for decremental updates in large dynamic graphs" ?

Q2. What future works have the authors mentioned in the paper "Efficient computation of distance labeling for decremental updates in large dynamic graphs" ?

Q3. What is the key to reducing distance information in graphs?

Q4. What is the fundamental problem in a graph?

Q5. what is the shortest path between s and t in graph G?

Q6. How long does it take to pre-compute the whole shortest path index for a?

Q7. What is the key to reducing the size of labels?

Q8. How many neighbors can be labeled in a batch mode?

Q9. What is the main idea behind decremental maintenance?

Q10. What is the important factor in determining web page relevancy?

Q11. Why do the authors need to perform a large number of BFSs to find alternative shortest?

Q12. what is the shortest path between w and v?

Q13. Why is it important to maintain dynamic all-pairs shortest paths?

Q14. What is the way to improve the performance of the index?

Q15. Why is the construction time for a large graph expensive?

Q16. How do the authors prune v from the rest of the current BFS process?

Q17. What is the way to maintain distance labels?

Q18. What is the average speedup ratio for a bit-parallel algorithm?

Q19. What are the popular graphing techniques?

Q20. What is the key to an improved TEDI index?

Q21. What is the way to construct a well-ordering 2-hop distance labeling?