scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Parallel Subgraph Counting for Multicore Architectures

TL;DR: A novel multicore parallel algorithm for computing the frequency of small subgraphs on a large network using a state-of-the-art data structure, the g-trie, which allows for a very efficient sequential search and paves the way for the usage of such counting algorithms on larger subgraph and network sizes without the obligatory access to a cluster.
Abstract: Computing the frequency of small subgraphs on a large network is a computationally hard task. This is, however, an important graph mining primitive, with several applications, and here we present a novel multicore parallel algorithm for this task. At the core of our methodology lies a state-of-the-art data structure, the g-trie, which represents a collection of subgraphs and allows for a very efficient sequential search. Our implementation was done using Pthreads and can run on any multicore personal computer. We employ a diagonal work sharing strategy to dynamically and effectively divide work among threads during the execution. We assess the performance of our Pthreads implementation on a set of representative networks from various domains and with diverse topological features. For most networks, we obtain a speedup of over 50 for 64 cores and an almost linear speedup up to 32 cores, showcasing the flexibility and scalability of our algorithm. This paves the way for the usage of such counting algorithms on larger subgraph and network sizes without the obligatory access to a cluster.

Summary (3 min read)

Introduction

  • A typical motif discovery algorithm will need to count all subgraphs of a certain size both in the original network and in an ensemble of similar randomized networks [5].
  • Multicore architectures are, however, much more common and readily available to a typical practitioner, with multicores being pervasive even on personal computers.
  • The authors main contribution in this paper is precisely a novel parallel algorithm for subgraph counting geared towards multicores.
  • Section II formalizes the problem being tackled and talks about related work.

A. Problem Definition

  • Two occurrences are considered different if they have at least one node or edge that they do not share.
  • Figure 1 gives an example of a subgraph frequency computation, detailing the subgraph occurrences found (these are given as sets of nodes).
  • Note also how the authors distinguish occurrences: other possible frequency concepts do exist [10], but here they resort to the standard definition.

A. The G-Trie Data Structure

  • Instead of storing strings and identifying common prefixes, it stores subgraphs and identifies common subtopologies.
  • Like a classical string trie, it is a multiway tree, and each tree node contains information about a single subgraph vertex and its connections to the vertices stored in ancestor tree nodes.
  • The same concept can be easily applied to directed subgraphs by also storing the direction of each connection.
  • This capability is the main strength of a g-trie, not only because the authors compress the information (avoiding redundant storage), but also because, when they are matching a specific node in the g-trie, they are, at the same time, matching all possible descendant subgraphs stored in the g-trie.
  • Given the space constraints, the authors refer the reader to [9] for a detailed explanation of how a g-trie can be created.

B. Subgraph Counting with a G-Trie

  • The algorithm depicted in Figure 3 details how the authors can use g-tries for counting subgraphs sequentially.
  • The authors use the information stored in the g-trie to heavily constrain the search.
  • Essentially, from the current partial match, the authors look for the vertex that is both connected, in the current g-trie node, to the vertex being added and, at the same time, has the smallest number of neighbors in the network, which are the potential candidates for that position (lines 14 and 15).
  • For the sake of illustration, the authors will now exemplify how one occurrence is found.

IV. PARALLEL G-TRIE ALGORITHM

  • One of the most important aspects of their sequential algorithm is that it originates completely independent search tree branches.
  • In fact, each call to count(T, Vused) produces one different branch, and knowing the gtrie node T and the already matched vertices Vused is enough for continuing the search from that point.
  • The problem with this static strategy is that the generated search tree is highly irregular and unbalanced.
  • To achieve a scalable approach, an efficient dynamic sharing mechanism, that redistributes work during execution time, is required.
  • In their parallel approach the authors keep this crucial feature of the algorithm and do not artificially introduce explicit queues during the normal execution of the algorithm.

A. Overall View

  • The authors allocate one thread per core, with each thread being initially assigned an equal amount of vertices.
  • When a thread P finishes its allotted computation, it requests new work from another active thread Q, which responds by first stopping its computation.
  • Both threads then resume their execution, starting at the bottom (meaning the lowest levels of the g-trie) of their respective work trees.
  • The execution starts at the bottom so that only one Vused is necessary, taking advantage of the common subtopology of ancestor and descendant nodes in the same path.
  • The authors will now describe in more detail the various components of their algorithm.

B. Parallel Subgraph Frequency Counting

  • Figure 4 depicts their parallel counting algorithm.
  • At each step, the thread computes the vertex threadid positions after the previous one (line 13).
  • The authors do this in a round-robin fashion because it generally provides a more equitable initial division than 1The authors implementation, along with test data, can be consulted on the following URL: http://www.dcc.fc.up.pt/gtries/.
  • The authors intuition was verified empirically by observing that the threads would ask for work sooner if continuous intervals were used.
  • Initially, the authors kept in each g-trie node a shared array Fr[1..numthreads] where the threads would update the array at the position of their threadid.

E. Work Resuming

  • After the threads have shared work, they resume it and proceed with the computation.
  • If the thread receives a work request, work sharing is performed (line 7).
  • After work sharing is performed (lines 8 and 9), the thread continues its computation with the new work tree (line 10) and the current execution is discarded (line 11).
  • The thread first checks if it has arrived at a desired subgraph (line 12) and increases its frequency in that case (line 13).
  • Otherwise, the thread calls parallelCount with the new vertex added to Vused for each children of the g-trie node (lines 15 and 16).

V. RESULTS

  • The authors experimental results were gathered on a 64-core machine.
  • In Table II the authors show the size of the subgraphs and the resulting number of all possible subgraphs of that type and size that will be counted in that network.
  • The sequential time and the obtained speedups for 8, 16, 32 and 64 cores are shown in Tables III and IV.
  • Nevertheless, pbzip had a performance similar to their algorithm, with near-linear speedup up to 32 cores and with a speedup of around 50 for 64 cores, further substantiating the idea that, with a different architecture, their algorithm could still present near-linear speedup with more than 32 cores.
  • The authors can also observe that as the network size increases, the performance slightly degrades.

VI. CONCLUSION

  • In this paper the authors presented a scalable algorithm to count subgraph frequencies for multicore architectures.
  • The sequential version already performed significantly better than competing algorithms, making it a solid base for improvement.
  • To the best of their knowledge, their parallel algorithm is the fastest available method for shared memory environments and allows practitioners to take advantage of either their personal multicore machines or more dedicated computing resources.
  • The authors also intend to explore several variations on the g-tries algorithm, like, for instance, using different base graph data-structures or using sampling to obtain approximate results.
  • Finally, to give their work a more practical context, the authors will use their implementation in real world scenarios.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Parallel Subgraph Counting for Multicore Architectures
David Apar
´
ıcio, Pedro Ribeiro, Fernando Silva
CRACS & INESC-TEC LA,
Faculdade de Ci
ˆ
encias, Universidade do Porto
R. Campo Alegre, 1021/1055, 4169-007 Porto, Portugal
Email: {daparicio, pribeiro, fds}@dcc.fc.up.pt
Abstract—Computing the frequency of small subgraphs on
a large network is a computationally hard task. This is,
however, an important graph mining primitive, with several
applications, and here we present a novel multicore parallel
algorithm for this task. At the core of our methodology lies a
state-of-the-art data structure, the g-trie, which represents a
collection of subgraphs and allows for a very efficient sequential
search. Our implementation was done using Pthreads and
can run on any multicore personal computer. We employ a
diagonal work sharing strategy to dynamically and effectively
divide work among threads during the execution. We assess
the performance of our Pthreads implementation on a set of
representative networks from various domains and with diverse
topological features. For most networks, we obtain a speedup
of over 50 for 64 cores and an almost linear speedup up
to 32 cores, showcasing the flexibility and scalability of our
algorithm. This paves the way for the usage of such counting
algorithms on larger subgraph and network sizes without the
obligatory access to a cluster.
Keywords-Parallel Algorithms, Adaptive Load Balancing,
Complex Networks, Graph Mining, G-Tries
I. INTRODUCTION
Complex Networks are an ubiquitous representation of
systems in many domains [1]. Mining features from these
networks is, thus, a very important task with general appli-
cability [2]. One such feature is the number of occurrences
of subgraphs. This frequency computation lies at the core
of several graph metrics, such as graphlet degree distribu-
tions [3] or network motifs [4]. For instance, motifs are over-
represented subgraphs, appearing more often than expected.
A typical motif discovery algorithm will need to count all
subgraphs of a certain size both in the original network and
in an ensemble of similar randomized networks [5].
Computing the frequency of subgraphs is, however, a
computationally hard task, closely related to subgraph iso-
morphism, which is one of the classical NP-complete prob-
lems [6]. This means that, as we increase the size of either
the subgraphs or the network being analyzed, the execution
time increases exponentially. Nevertheless, improving the
execution time of subgraph counting can have a broad
impact. For example, even increasing by just one node the
size of the subgraphs may lead to the discovery of new
motifs, providing new insight into a network.
One way to make the subgraph counting algorithms faster
is using parallelism. Still, work in this area is very scarce and
the vast majority of the existing algorithms are sequential in
their nature. We have previous work on the parallelization
of subgraph frequency computation, but it was focused on
using MPI in distributed environments [7], [8]. Multicore
architectures are, however, much more common and readily
available to a typical practitioner, with multicores being
pervasive even on personal computers.
Our main contribution in this paper is precisely a novel
parallel algorithm for subgraph counting geared towards
multicores. As a basis, we use our own state-of-art g-trie data
structure, which is the core of one of the fastest sequential
algorithms for subgraph counting [9]. G-Tries are able to
store a collection of graphs, identifying common substruc-
tures, and provide an efficient method to search for those
graphs as subgraphs of another larger network. This search
induces a highly unbalanced search tree with independent
tree branches. We use one thread per core and schedule
work dynamically based on a diagonal splitting work sharing
strategy to try to ensure a fair division of the work. With
this technique, we achieve very good performance up to 64
cores and an almost linear speedup up to 32 cores. To the
best of our knowledge, this constitutes the fastest multicore
algorithm for subgraph counting.
The remainder of this paper is organized as follows.
Section II formalizes the problem being tackled and talks
about related work. Section III describes the g-trie data
structure and its sequential subgraph counting algorithm.
Section IV details our parallel approach. Section V shows
our experimental results on a series of representative net-
works. Finally, section VI concludes our paper and gives
some possible directions for future work.
II. S
UBGRAPH COUNTING PROBLEM
A. Problem Definition
We start by more formally defining the exact problem we
are tackling in this paper:
Definition 1 (General Subgraph Counting Problem):
Given a set of subgraphs S and a graph G, determine the
exact count of all induced occurrences of subgraphs of S
in G. Two occurrences are considered different if they have
at least one node or edge that they do not share. Other
nodes and edges can overlap.
2014 IEEE International Symposium on Parallel and Distributed Processing with Applications
978-1-4799-4293-0/14 $31.00 © 2014 IEEE
DOI 10.1109/ISPA.2014.14
34

Figure 1. An example subgraph counting output, with detailed subgraph
occurrences.
Figure 1 gives an example of a subgraph frequency com-
putation, detailing the subgraph occurrences found (these
are given as sets of nodes). Note also how we distinguish
occurrences: other possible frequency concepts do exist [10],
but here we resort to the standard definition.
B. Related Work
Sequential subgraph counting algorithms can be divided
into three different conceptual approaches. Network-centric
methods are based upon the enumeration of all sets of k
connected nodes, followed by isomorphism tests to deter-
mine the subgraph type of each occurrence. Examples of
this strategy include ESU [11], Kavosh [12] and FaSE [13].
By contrast, subgraph-centric methods, such as the one by
Grochow and Kellis [14], only search for one subgraph type
at a time, individually computing their frequency. G-Tries
provide a set-centric approach, standing conceptually in the
middle [9]. They allow the search of a customized set of
subgraphs: not necessarily all possible subgraphs of a certain
size (as network-centric methods) but also not only one
subgraph at a time (as subgraph-centric methods). These
algorithms provide exact results, and here we will also
concentrate on exact frequency computation, but we should
note that there exist some sampling alternatives for providing
approximate results. Some examples are Rand-ESU [11],
Randomized g-tries [15] and GUISE [16].
Regarding parallel approaches, the available work is
scarcer. We provided a distributed memory parallel approach
for both ESU [7] and g-tries [8], using MPI for commu-
nication. Our work here differs because we instead aim
for a shared memory environment with multiple cores. A
different parallel approach is the one by Wang et al. [17],
which employs a static pre-division of work and limits the
analysis to a single network and a fixed number of cores
(32). In our work, we use dynamic load balancing and do a
more thorough study of the scalability of our approach. A
subgraph-centric parallel algorithm using map-reduce was
developed by Afrati et al. [18], where they enumerate only
one individual subgraph at a time. By contrast, we use
a g-trie based set-centric approach and aim for a differ-
ent target platform (multicores). For more specific types
of subgraphs there are other paralell algorithms such as
Sahad [19] (an hadoop subgraph-centric method for tree sub-
graphs), Fascia [20] (a multicore subgraph-centric method
for approximate count of non-induced tree-like subgraphs)
or ParSE [21] (approximate count for subgraphs that can be
partitioned in two by a cut-edge), but our work stands apart
by aiming at a completely general set of subgraphs.
III. S
EQUENTIAL G-TRIE ALGORITHM
A. The G-Trie Data Structure
A g-trie is similar in concept to a prefix tree. However,
instead of storing strings and identifying common prefixes, it
stores subgraphs and identifies common subtopologies. Like
a classical string trie, it is a multiway tree, and each tree node
contains information about a single subgraph vertex and its
connections to the vertices stored in ancestor tree nodes.
Descendants of a tree node share a common topology with
a path from the root to a node defining a single subgraph.
Figure 2 gives an example of a g-trie with the 6 undirected
subgraphs previously mentioned stored in its leafs. The same
concept can be easily applied to directed subgraphs by also
storing the direction of each connection.
Figure 2. A g-trie representing a set of 6 undirected subgraphs. Each
g-trie node adds a new vertex (in black) to the already existing ones in
the ancestor nodes (white vertices). Clauses of the form X<Y indicate
symmetry breaking conditions.
35

In order to obtain a unique g-trie representation for
a certain subgraph collection, we employ a customized
canonical form that tries to ensure that the g-trie is as
compact as possible, that is, that we identify as many
common subtopologies as possible. This capability is the
main strength of a g-trie, not only because we compress the
information (avoiding redundant storage), but also because,
when we are matching a specific node in the g-trie, we are, at
the same time, matching all possible descendant subgraphs
stored in the g-trie. In order to avoid symmetries in the stored
graphs, g-tries also keep symmetry breaking conditions of
the form X<Y, indicating that the vertex in position X
should have a graph index smaller than the vertex in position
Y . Given the space constraints, we refer the reader to [9]
for a detailed explanation of how a g-trie can be created.
Nevertheless, in the following section we will explain how
it can be used to compute the frequency of subgraphs, so that
afterwards we can explain how we parallelize the process.
B. Subgraph Counting with a G-Trie
In order to avoid ambiguities in the description, from now
on we will use the term node to refer to the g-trie tree nodes,
and vertex to refer to a node in the graphs. The algorithm
depicted in Figure 3 details how we can use g-tries for
counting subgraphs sequentially.
1: procedure COUNTALL(T, G)
2: for all vertex v of G do
3: for all children c of T.root do
4: COUNT(c, {v})
5: procedure COUNT(T, V
used
)
6: V MATCHINGVERTICES(T, V
used
)
7: for all vertex v of V do
8: if T.isLeaf then
9: T.frequency++
10: else
11: for all children c of T do
12: COUNT(c, V
used
∪{v})
13: function MATCHINGVERTICES(T, V
used
)
14: V
conn
vertices connected to the vertex being added
15: m vertex of V
conn
with smallest neighborhood
16: V
cand
neighbors of m that respect both
17: connections to ancestors and
18: symmetry breaking conditions
19: return V
cand
Figure 3. Algorithm for computing the frequency of subgraphs of g-trie
T in graph G.
The core idea of the algorithm is to search for a set
of vertices (V
used
) that match to a path in the g-trie, thus
corresponding to an occurrence of the subgraph represented
by that path. We use the information stored in the g-trie to
heavily constrain the search. In the beginning, all vertices
are possible candidates for the initial g-trie root node (lines
2 to 4). Then, we find the set of vertices that fully match
with the current g-trie node (line 6) and we traverse that
set. If we are at a leaf, we have found an occurrence
and increment the respective frequency (line 9). If not, we
continue recursively to the other possible g-trie descendants.
Function matchingVertices() gives some detail on
how we efficiently find matches for the current g-trie node.
Essentially, from the current partial match, we look for the
vertex that is both connected, in the current g-trie node, to
the vertex being added and, at the same time, has the smallest
number of neighbors in the network, which are the potential
candidates for that position (lines 14 and 15). From those
vertices, we take the ones that have the exact set of needed
connections with the already matched vertices and respect
the symmetry breaking conditions stored in the g-trie node
(lines 16 to 18).
For the sake of illustration, we will now exemplify how
one occurrence is found. We use the notation (X, k) to
denote that vertex k is matched to X in the g-trie node.
Consider Figures 1 and 2 and take for instance the oc-
currence {2, 3, 7, 6} of type T 6 subgraph. Looking at the
respective g-trie leaf, we can see that the only path leading to
this occurrence will be (A, 3)(B, 7)(C, 2)(D, 6).A
path like (A, 2)(B, 3)(C, 7)(D, 6) could not happen
because when adding (C, 7) there would be no match-
ing g-trie node regarding the connections. A path like
(A, 7)(B,3)(C, 6)(D, 2) could not happen either be-
cause, even if that would correspond to valid connections, it
would break symmetry conditions. In particular, T 6 imposes
the condition A<Band, in this case, 7 is not smaller than
3. These two simple mechanisms (verifying connections and
symmetry conditions) form the basis of how a g-trie is able
to highly constrain and limit the candidates it is searching
and,
at the same time, guarantee that each occurrence is
found only once.
IV. P
ARALLEL G-TRIE ALGORITHM
One of the most important aspects of our sequential algo-
rithm is that it originates completely independent search tree
branches. In fact, each call to count(T, V
used
) produces
one different branch, and knowing the gtrie node T and
the already matched vertices V
used
is enough for continuing
the search from that point. Each of these calls can thus be
thought of as a work unit and, when designing our parallel
algorithm, we aimed to provide a balanced division of work
units per resource during execution time.
As we can see in Figure 3, each vertex in the input graph
G is given as a candidate for the root node (line 2). A naive
approach would be to simply divide these initial work units
among the available computing resources. The problem with
this static strategy is that the generated search tree is highly
irregular and unbalanced. A few of the vertices may take
most of the computing time, leading to some resources being
busy processing them for a long time while others were idle.
To achieve a scalable approach, an efficient dynamic sharing
mechanism, that redistributes work during execution time, is
required.
36

Another important factor in the sequential algorithm’s
performance is that there is no explicit queue of unprocessed
work units. Instead, the recursive stack implicitly stores the
work tree, with the two cycles between vertices and nodes
(lines 7 and 11) being responsible for generating new work
units that are recursively processed (line 12). In our parallel
approach we keep this crucial feature of the algorithm and do
not artificially introduce explicit queues during the normal
execution of the algorithm. These queues would introduce
a serious overhead both on the execution time and on the
needed memory, significantly deteriorating the sequential
algorithm’s performance. Our goal is, therefore, to scale up
our original efficient algorithm, providing the best possible
overall running time.
Since we want the end users to take advantage of their
personal multicore machines, our target is a shared memory
architecture. For that purpose we chose Pthreads, due to its
portability and flexibility.
1
A. Overall View
We allocate one thread per core, with each thread being
initially assigned an equal amount of vertices. When a
thread P finishes its allotted computation, it requests new
work from another active thread Q, which responds by first
stopping its computation. Q then builds a representation of
its state, bottom-up, to enable sharing. Q proceeds by di-
viding the unprocessed work units in a round-robin fashion,
achieving a diagonal split of the entire work tree, allowing
it to keep half of the work units and giving the other half
to P . Both threads then resume their execution, starting at
the bottom (meaning the lowest levels of the g-trie) of their
respective work trees. When all vertices for a certain g-trie
node are computed, the thread moves up in the work tree.
The execution starts at the bottom so that only one V
used
is
necessary, taking advantage of the common subtopology of
ancestor and descendant nodes in the same path. When there
is no more work, the threads terminate and the computed
frequencies are aggregated. We will now describe in more
detail the various components of our algorithm.
B. Parallel Subgraph Frequency Counting
Figure 4 depicts our parallel counting algorithm. All
threads start by executing parallelCountAll() with
an initially empty work tree W (line 2). The first vertex
that a thread computes is that of position thread
id
(lines
3 and 5). At each step, the thread computes the vertex
thread
id
positions after the previous one (line 13). Every
vertex is used as a candidate for the g-trie root node by some
thread (lines 11 and 12). This division gives approximately
|V (G)|/num
threads
vertices for each thread to initially
explore. We do this in a round-robin fashion because it
generally provides a more equitable initial division than
1
Our implementation, along with test data, can be consulted on the
following URL: http://www.dcc.fc.up.pt/gtries/.
1: procedure PARALLELCOUNTALL(T , G)
2: W ←∅
3: i thread
id
4: while i ≤|V (G)| do
5: v V (G)
i
6: if WORKREQUEST(P ) then
7: W.ADDWORK()
8: (W
Q
,W
P
) SPLITWORK(W)
9: GIVEWORK(W
P
, P )
10: RESUMEWORK(W
Q
)
11: for all children c of T.root do
12: PARALLELCOUNT(c, {v})
13: i i + thread
id
14: ASKFORWORK()
15: procedure PARALLELCOUNT(T, V
used
)
16: V MATCHINGVERTICES(T, V
used
)
17: for all vertex v of V do
18: if WORKREQUEST(P ) then
19: W.ADDWORK()
20: return
21: if T.isLeaf then
22: thread
freq
[T ]++
23: else
24: for all children c of T do
25: PARALLELCOUNT(c, V
used
∪{v})
Figure 4. Parallel algorithm for computing the frequency of subgraphs of
g-trie T in graph G.
simply allocating continuous intervals to each thread, due
to the way we use the symmetry breaking conditions. Our
intuition was verified empirically by observing that the
threads would ask for work sooner if continuous intervals
were used. When a thread Q receives a work request from
P (line 6) it needs to stop its computation, save what it still
had left to do (line 7), divide the work tree (line 8), give P
some work (line 9) and resume the remaining work (line 10).
On the other hand, if a thread finishes its initially assigned
work, it issues a work request to get new work (line 14).
parallelCount() remains almost the same as the
sequential version, except for now attending work requests
and storing subgraph frequencies differently. If the thread
receives a work request while computing matches, it first
adds them to the work tree W and then stops the current
execution (lines 18 to 20) to compute the current state and
build the work tree. In the sequential version we simply
needed to increase the frequency of a certain subgraph in the
g-trie structure. As for the parallel version, multiple threads
may be computing frequencies for the same subgraph, using
different vertices from the input graph, and so they need
to coordinate their frequency storing. Initially, we kept in
each g-trie node a shared array Fr[1..num
threads
] where
the threads would update the array at the position of their
thread
id
. In the end, the global frequencies would be
obtained by summing the values in the array. This resulted
in significant false sharing due to too many threads updating
those arrays simultaneously, and became a severe bottleneck.
37

Figure 5. The constructed work tree of a thread Q and its division by diagonal splitting when a work request is received from thread P .
Our solution was to create thread private arrays indexing
g-trie nodes, i.e. Fr[1..num
gtrieNodes
], which impacted
very favorably our efficiency. In our testing phase with a 24
cores machine, we had cases with speedups below 5 that,
only with this change, went to a speedup of over 22, thus
converting a modest into an almost linear speedup.
The matchingVertices() procedure remains the
same as the sequential version, the only difference being
that V
used
is now thread local, with threads computing a
different set of vertices.
C. Work Request
A work request is performed when some thread P has
completed its assigned work. Since there is no efficient way
of predicting exactly how much computation each active
thread still has in its work tree, it asks a random thread
Q for more work. Note that this kind of random polling
has been established as an efficient heuristic for dynamic
load balancing [22]. If Q sends some unprocessed work,
then P executes the resumeWork() procedure. If Q does
not have any work to share, P proceeds by asking another
random thread. The computation is over when all threads
are requesting work and thus no more work units remain to
be processed.
D. Work Sharing
When a thread Q receives a work request it builds a work
tree representing its current recursive state. In Figure 5 we
show a resulting work tree and its division with a caller
thread P . The yellow colored circles constitute V
used
and
the yellow colored squares form the g-trie path up to the
current level. The other nodes and vertices are still left to
be explored and are split in a round-robin fashion. This
division results in two work trees with approximately the
same number of work units. This does not, however, imply
that the two halves are of the same computational dimension
given the irregularity of the search tree they will induce, but,
nevertheless, they constitute our best guess of a fair division
across all levels.
As said before, we only build an explicit work tree
when a work request is received. In that situation, a thread
saves the current and the other unexplored vertices for
the current node and moves up in the recursive tree. This
process is repeated up to the top level, effectively populating
the work tree with the unprocessed work units, i.e., the
unexplored g-trie nodes and network vertices. This is a
very fast operation and it is done by stopping the execution
of the recursive parallelCount() calls and adding the
work to the work tree (line 19 in Figure 4) until we get to
parallelCountAll() and add the remaining nodes and
vertices of the top level (line 7). We also store the current
g-trie path and network vertices (V
used
).
E. Work Resuming
After the threads have shared work, they resume it
and proceed with the computation. The work tree W
is traversed in a bottom-up fashion (lines 2 to 5) and
the vertices of each level are computed (line 6). If the
thread receives a work request, work sharing is performed
(line 7). There is no call to addWork() since the work
1: procedure RESUMEWORK(W )
2: ORDERBYLOWEST(W )
3: for all level L of W do
4: depth L.depth 1
5: V
used
active vertices[1..depth]
6: for all vertices v of L.nodes do
7: if WORKREQUEST(P ) then
8: (W
Q
,W
P
) SPLITWORK(W)
9: GIVEWORK(W
P
, P )
10: RESUMEWORK(W
Q
)
11: return
12: if L.T.isLeaf then
13: thread
freq
[T ]++
14: else
15: for all children c of L.T do
16: PARALLELCOUNT(c, V
used
∪{v})
17: ASKFORWORK()
Figure 6. Algorithm for resuming work after sharing is performed.
38

Citations
More filters
Proceedings ArticleDOI
04 Oct 2015
TL;DR: Arabesque is presented, the first distributed data processing platform for implementing graph mining algorithms that automates the process of exploring a very large number of subgraphs and defines a high-level filter-process computational model that simplifies the development of scalableGraph mining algorithms.
Abstract: Distributed data processing platforms such as MapReduce and Pregel have substantially simplified the design and deployment of certain classes of distributed graph analytics algorithms. However, these platforms do not represent a good match for distributed graph mining problems, as for example finding frequent subgraphs in a graph. Given an input graph, these problems require exploring a very large number of subgraphs and finding patterns that match some "interestingness" criteria desired by the user. These algorithms are very important for areas such as social networks, semantic web, and bioinformatics.In this paper, we present Arabesque, the first distributed data processing platform for implementing graph mining algorithms. Arabesque automates the process of exploring a very large number of subgraphs. It defines a high-level filter-process computational model that simplifies the development of scalable graph mining algorithms: Arabesque explores subgraphs and passes them to the application, which must simply compute outputs and decide whether the subgraph should be further extended. We use Arabesque's API to produce distributed solutions to three fundamental graph mining problems: frequent subgraph mining, counting motifs, and finding cliques. Our implementations require a handful of lines of code, scale to trillions of subgraphs, and represent in some cases the first available distributed solutions.

208 citations


Cites methods from "Parallel Subgraph Counting for Mult..."

  • ...For motifs, [29] proposes a multicore parallel approach, while [34] develops methods for approximate motif counting on a tightly coupled HPC system using MPI....

    [...]

Proceedings ArticleDOI
08 Oct 2018
TL;DR: RStream is the first single-machine, out-of-core mining system that leverages disk support to store intermediate data and demonstrates that RStream outperforms all of them, running on a 10-node cluster, e.g., by at least a factor of 1.7×, and can process large graphs on an inexpensive machine.
Abstract: Graph mining is an important category of graph algorithms that aim to discover structural patterns such as cliques and motifs in a graph. While a great deal of work has been done recently on graph computation such as PageRank, systems support for scalable graph mining is still limited. Existing mining systems such as Arabesque focus on distributed computing and need large amounts of compute and memory resources.We built RStream, the first single-machine, out-of-core mining system that leverages disk support to store intermediate data. At its core are two innovations: (1) a rich programming model that exposes relational algebra for developers to express a wide variety of mining tasks; and (2) a runtime engine that implements relational algebra efficiently with tuple streaming. A comparison between RStream and four state-of-the-art distributed mining/Datalog systems--Arabesque, ScaleMine, DistGraph, and BigDatalog -- demonstrates that RStream outperforms all of them, running on a 10-node cluster, e.g., by at least a factor of 1.7×, and can process large graphs on an inexpensive machine.

81 citations


Cites background from "Parallel Subgraph Counting for Mult..."

  • ...Recently, a body of algorithms have been developed to leverage parallel [28, 12, 59, 64], distributed systems (such as Map/Reduce) [35, 19, 41, 44, 71, 6, 36, 82, 18], or GPUs [37]....

    [...]

Journal ArticleDOI
TL;DR: An unbiased graphlet estimation framework that is fast with large speedups compared to the state of the art; parallel with nearly linear speedups; accurate with less than 1% relative error; scalable and space efficient for massive networks with billions of edges; and effective for a variety of real-world settings.
Abstract: Graphlets are induced subgraphs of a large network and are important for understanding and modeling complex networks. Despite their practical importance, graphlets have been severely limited to applications and domains with relatively small graphs. Most previous work has focused on exact algorithms ; however, it is often too expensive to compute graphlets exactly in massive networks with billions of edges, and finding an approximate count is usually sufficient for many applications. In this paper, we propose an unbiased graphlet estimation framework that is: (a) fast with large speedups compared to the state of the art; (b) parallel with nearly linear speedups; (c) accurate with less than 1% relative error; (d) scalable and space efficient for massive networks with billions of edges; and (e) effective for a variety of real-world settings as well as estimating global and local graphlet statistics (e.g., counts). On 300 networks from 20 domains, we obtain <1% relative error for all graphlets. This is vastly more accurate than the existing methods while using less data. Moreover, it takes a few seconds on billion edge graphs (as opposed to days/weeks). These are by far the largest graphlet computations to date.

37 citations


Cites background from "Parallel Subgraph Counting for Mult..."

  • ...As an aside, there have been a few distributed memory [59] and shared memory [60], [61] exact algorithms....

    [...]

Posted Content
TL;DR: This survey aims to provide a comprehensive overview of the existing methods for subgraph counting, identifying and describing the main conceptual approaches, giving insight on their advantages and limitations, and providing pointers to existing implementations.
Abstract: Computing subgraph frequencies is a fundamental task that lies at the core of several network analysis methodologies, such as network motifs and graphlet-based metrics, which have been widely used to categorize and compare networks from multiple domains. Counting subgraphs is however computationally very expensive and there has been a large body of work on efficient algorithms and strategies to make subgraph counting feasible for larger subgraphs and networks. This survey aims precisely to provide a comprehensive overview of the existing methods for subgraph counting. Our main contribution is a general and structured review of existing algorithms, classifying them on a set of key characteristics, highlighting their main similarities and differences. We identify and describe the main conceptual approaches, giving insight on their advantages and limitations, and provide pointers to existing implementations. We initially focus on exact sequential algorithms, but we also do a thorough survey on approximate methodologies (with a trade-off between accuracy and execution time) and parallel strategies (that need to deal with an unbalanced search space).

37 citations


Cites background or methods from "Parallel Subgraph Counting for Mult..."

  • ...SM-Gtries [12] 2014 SM Vertices Subgraph-trees DFS Diagonal W-W [145]...

    [...]

  • ...In this strategy, an idle worker asks a random worker for work [10, 12]....

    [...]

  • ...This strategy achieves a balanced work-division during runtime, and the penalty caused by worker communication is negligible [10, 12]....

    [...]

  • ...Algorithms that employ this strategy [10, 12, 151, 152] perform an initial static work division....

    [...]

  • ...To avoid the cost of synchronization and of storing partial results, most subgraph counting algorithms traverse the search space in a depth-first fashion [3, 10, 12, 56, 151– 153, 172, 190]....

    [...]

Posted Content
TL;DR: This paper presents the first efficient distributed implementation for color coding that goes beyond tree queries, and applies to any query graph of treewidth 2, which is the first step into the realm of color coding for queries that require superlinear worst case running time.
Abstract: The problem of counting occurrences of query graphs in a large data graph, known as subgraph counting, is fundamental to several domains such as genomics and social network analysis. Many important special cases (e.g. triangle counting) have received significant attention. Color coding is a very general and powerful algorithmic technique for subgraph counting. Color coding has been shown to be effective in several applications, but scalable implementations are only known for the special case of {\em tree queries} (i.e. queries of treewidth one). In this paper we present the first efficient distributed implementation for color coding that goes beyond tree queries: our algorithm applies to any query graph of treewidth $2$. Since tree queries can be solved in time linear in the size of the data graph, our contribution is the first step into the realm of colour coding for queries that require superlinear running time in the worst case. This superlinear complexity leads to significant load balancing problems on graphs with heavy tailed degree distributions. Our algorithm structures the computation to work around high degree nodes in the data graph, and achieves very good runtime and scalability on a diverse collection of data and query graph pairs as a result. We also provide theoretical analysis of our algorithmic techniques, showing asymptotic improvements in runtime on random graphs with power law degree distributions, a popular model for real world graphs.

27 citations


Cites methods from "Parallel Subgraph Counting for Mult..."

  • ...Based on the above intuition, we apply dynamic programming to count the number colorful matches of Q....

    [...]

References
More filters
Journal ArticleDOI
25 Oct 2002-Science
TL;DR: Network motifs, patterns of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized networks, are defined and may define universal classes of networks.
Abstract: Complex networks are studied across many fields of science. To uncover their structural design principles, we defined “network motifs,” patterns of interconnections occurring in complex networks at numbers that are significantly higher than those in randomized networks. We found such motifs in networks from biochemistry, neurobiology, ecology, and engineering. The motifs shared by ecological food webs were distinct from the motifs shared by the genetic networks of Escherichia coli and Saccharomyces cerevisiae or from those found in the World Wide Web. Similar motifs were found in networks that perform information processing, even though they describe elements as different as biomolecules within a cell and synaptic connections between neurons in Caenorhabditis elegans. Motifs may thus define universal classes of networks. This

6,992 citations


"Parallel Subgraph Counting for Mult..." refers background in this paper

  • ...This frequency computation lies at the core of several graph metrics, such as graphlet degree distributions [3] or network motifs [4]....

    [...]

Proceedings ArticleDOI
03 May 1971
TL;DR: It is shown that any recognition problem solved by a polynomial time-bounded nondeterministic Turing machine can be “reduced” to the problem of determining whether a given propositional formula is a tautology.
Abstract: It is shown that any recognition problem solved by a polynomial time-bounded nondeterministic Turing machine can be “reduced” to the problem of determining whether a given propositional formula is a tautology. Here “reduced” means, roughly speaking, that the first problem can be solved deterministically in polynomial time provided an oracle is available for solving the second. From this notion of reducible, polynomial degrees of difficulty are defined, and it is shown that the problem of determining tautologyhood has the same polynomial degree as the problem of determining whether the first of two given graphs is isomorphic to a subgraph of the second. Other examples are discussed. A method of measuring the complexity of proof procedures for the predicate calculus is introduced and discussed.

6,675 citations


"Parallel Subgraph Counting for Mult..." refers background in this paper

  • ...computationally hard task, closely related to subgraph isomorphism, which is one of the classical NP-complete problems [6]....

    [...]

Journal ArticleDOI
TL;DR: A modularity matrix plays a role in community detection similar to that played by the graph Laplacian in graph partitioning calculations, and a spectral measure of bipartite structure in networks and a centrality measure that identifies vertices that occupy central positions within the communities to which they belong are proposed.
Abstract: We consider the problem of detecting communities or modules in networks, groups of vertices with a higher-than-average density of edges connecting them. Previous work indicates that a robust approach to this problem is the maximization of the benefit function known as ``modularity'' over possible divisions of a network. Here we show that this maximization process can be written in terms of the eigenspectrum of a matrix we call the modularity matrix, which plays a role in community detection similar to that played by the graph Laplacian in graph partitioning calculations. This result leads us to a number of possible algorithms for detecting community structure, as well as several other results, including a spectral measure of bipartite structure in networks and a centrality measure that identifies vertices that occupy central positions within the communities to which they belong. The algorithms and measures proposed are illustrated with applications to a variety of real-world complex networks.

4,559 citations


"Parallel Subgraph Counting for Mult..." refers background in this paper

  • ...73 No Coauthorships of scientists working on network experiments [24] Newman(1)...

    [...]

Proceedings ArticleDOI
21 Aug 2005
TL;DR: Differences in the behavior of liberal and conservative blogs are found, with conservative blogs linking to each other more frequently and in a denser pattern.
Abstract: In this paper, we study the linking patterns and discussion topics of political bloggers. Our aim is to measure the degree of interaction between liberal and conservative blogs, and to uncover any differences in the structure of the two communities. Specifically, we analyze the posts of 40 "A-list" blogs over the period of two months preceding the U.S. Presidential Election of 2004, to study how often they referred to one another and to quantify the overlap in the topics they discussed, both within the liberal and conservative communities, and also across communities. We also study a single day snapshot of over 1,000 political blogs. This snapshot captures blogrolls (the list of links to other blogs frequently found in sidebars), and presents a more static picture of a broader blogosphere. Most significantly, we find differences in the behavior of liberal and conservative blogs, with conservative blogs linking to each other more frequently and in a denser pattern.

2,800 citations


"Parallel Subgraph Counting for Mult..." refers background in this paper

  • ...76 Yes Network of hyperlinks between weblogs on US politics [23] Newman(1) netsc 1,589 2,742 1....

    [...]

Proceedings ArticleDOI
21 Aug 2005
TL;DR: A new graph generator is provided, based on a "forest fire" spreading process, that has a simple, intuitive justification, requires very few parameters (like the "flammability" of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.
Abstract: How do real graphs evolve over time? What are "normal" growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time.Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time, with the number of edges growing super-linearly in the number of nodes. Second, the average distance between nodes often shrinks over time, in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) or O(log(log n)).Existing graph generation models do not exhibit these types of behavior, even at a qualitative level. We provide a new graph generator, based on a "forest fire" spreading process, that has a simple, intuitive justification, requires very few parameters (like the "flammability" of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.

2,548 citations


"Parallel Subgraph Counting for Mult..." refers background in this paper

  • ...94 No Traffic flows between routers [26] SNAP(2) company 8,497 6,724 0....

    [...]

Frequently Asked Questions (2)
Q1. What are the future works in "Parallel subgraph counting for multicore architectures" ?

For example, the authors are in the process of building a large co-authorship network and plan to explore its structure using their algorithm. 

This is, however, an important graph mining primitive, with several applications, and here the authors present a novel multicore parallel algorithm for this task. The authors assess the performance of their Pthreads implementation on a set of representative networks from various domains and with diverse topological features. For most networks, the authors obtain a speedup of over 50 for 64 cores and an almost linear speedup up to 32 cores, showcasing the flexibility and scalability of their algorithm. This paves the way for the usage of such counting algorithms on larger subgraph and network sizes without the obligatory access to a cluster.