# Parallel Subgraph Counting for Multicore Architectures

## Summary (3 min read)

### Introduction

- A typical motif discovery algorithm will need to count all subgraphs of a certain size both in the original network and in an ensemble of similar randomized networks [5].
- Multicore architectures are, however, much more common and readily available to a typical practitioner, with multicores being pervasive even on personal computers.
- The authors main contribution in this paper is precisely a novel parallel algorithm for subgraph counting geared towards multicores.
- Section II formalizes the problem being tackled and talks about related work.

### A. Problem Definition

- Two occurrences are considered different if they have at least one node or edge that they do not share.
- Figure 1 gives an example of a subgraph frequency computation, detailing the subgraph occurrences found (these are given as sets of nodes).
- Note also how the authors distinguish occurrences: other possible frequency concepts do exist [10], but here they resort to the standard definition.

### A. The G-Trie Data Structure

- Instead of storing strings and identifying common prefixes, it stores subgraphs and identifies common subtopologies.
- Like a classical string trie, it is a multiway tree, and each tree node contains information about a single subgraph vertex and its connections to the vertices stored in ancestor tree nodes.
- The same concept can be easily applied to directed subgraphs by also storing the direction of each connection.
- This capability is the main strength of a g-trie, not only because the authors compress the information (avoiding redundant storage), but also because, when they are matching a specific node in the g-trie, they are, at the same time, matching all possible descendant subgraphs stored in the g-trie.
- Given the space constraints, the authors refer the reader to [9] for a detailed explanation of how a g-trie can be created.

### B. Subgraph Counting with a G-Trie

- The algorithm depicted in Figure 3 details how the authors can use g-tries for counting subgraphs sequentially.
- The authors use the information stored in the g-trie to heavily constrain the search.
- Essentially, from the current partial match, the authors look for the vertex that is both connected, in the current g-trie node, to the vertex being added and, at the same time, has the smallest number of neighbors in the network, which are the potential candidates for that position (lines 14 and 15).
- For the sake of illustration, the authors will now exemplify how one occurrence is found.

### IV. PARALLEL G-TRIE ALGORITHM

- One of the most important aspects of their sequential algorithm is that it originates completely independent search tree branches.
- In fact, each call to count(T, Vused) produces one different branch, and knowing the gtrie node T and the already matched vertices Vused is enough for continuing the search from that point.
- The problem with this static strategy is that the generated search tree is highly irregular and unbalanced.
- To achieve a scalable approach, an efficient dynamic sharing mechanism, that redistributes work during execution time, is required.
- In their parallel approach the authors keep this crucial feature of the algorithm and do not artificially introduce explicit queues during the normal execution of the algorithm.

### A. Overall View

- The authors allocate one thread per core, with each thread being initially assigned an equal amount of vertices.
- When a thread P finishes its allotted computation, it requests new work from another active thread Q, which responds by first stopping its computation.
- Both threads then resume their execution, starting at the bottom (meaning the lowest levels of the g-trie) of their respective work trees.
- The execution starts at the bottom so that only one Vused is necessary, taking advantage of the common subtopology of ancestor and descendant nodes in the same path.
- The authors will now describe in more detail the various components of their algorithm.

### B. Parallel Subgraph Frequency Counting

- Figure 4 depicts their parallel counting algorithm.
- At each step, the thread computes the vertex threadid positions after the previous one (line 13).
- The authors do this in a round-robin fashion because it generally provides a more equitable initial division than 1The authors implementation, along with test data, can be consulted on the following URL: http://www.dcc.fc.up.pt/gtries/.
- The authors intuition was verified empirically by observing that the threads would ask for work sooner if continuous intervals were used.
- Initially, the authors kept in each g-trie node a shared array Fr[1..numthreads] where the threads would update the array at the position of their threadid.

### E. Work Resuming

- After the threads have shared work, they resume it and proceed with the computation.
- If the thread receives a work request, work sharing is performed (line 7).
- After work sharing is performed (lines 8 and 9), the thread continues its computation with the new work tree (line 10) and the current execution is discarded (line 11).
- The thread first checks if it has arrived at a desired subgraph (line 12) and increases its frequency in that case (line 13).
- Otherwise, the thread calls parallelCount with the new vertex added to Vused for each children of the g-trie node (lines 15 and 16).

### V. RESULTS

- The authors experimental results were gathered on a 64-core machine.
- In Table II the authors show the size of the subgraphs and the resulting number of all possible subgraphs of that type and size that will be counted in that network.
- The sequential time and the obtained speedups for 8, 16, 32 and 64 cores are shown in Tables III and IV.
- Nevertheless, pbzip had a performance similar to their algorithm, with near-linear speedup up to 32 cores and with a speedup of around 50 for 64 cores, further substantiating the idea that, with a different architecture, their algorithm could still present near-linear speedup with more than 32 cores.
- The authors can also observe that as the network size increases, the performance slightly degrades.

### VI. CONCLUSION

- In this paper the authors presented a scalable algorithm to count subgraph frequencies for multicore architectures.
- The sequential version already performed significantly better than competing algorithms, making it a solid base for improvement.
- To the best of their knowledge, their parallel algorithm is the fastest available method for shared memory environments and allows practitioners to take advantage of either their personal multicore machines or more dedicated computing resources.
- The authors also intend to explore several variations on the g-tries algorithm, like, for instance, using different base graph data-structures or using sampling to obtain approximate results.
- Finally, to give their work a more practical context, the authors will use their implementation in real world scenarios.

Did you find this useful? Give us your feedback

##### Citations

208 citations

### Cites methods from "Parallel Subgraph Counting for Mult..."

...For motifs, [29] proposes a multicore parallel approach, while [34] develops methods for approximate motif counting on a tightly coupled HPC system using MPI....

[...]

81 citations

### Cites background from "Parallel Subgraph Counting for Mult..."

...Recently, a body of algorithms have been developed to leverage parallel [28, 12, 59, 64], distributed systems (such as Map/Reduce) [35, 19, 41, 44, 71, 6, 36, 82, 18], or GPUs [37]....

[...]

37 citations

### Cites background from "Parallel Subgraph Counting for Mult..."

...As an aside, there have been a few distributed memory [59] and shared memory [60], [61] exact algorithms....

[...]

37 citations

### Cites background or methods from "Parallel Subgraph Counting for Mult..."

...SM-Gtries [12] 2014 SM Vertices Subgraph-trees DFS Diagonal W-W [145]...

[...]

...In this strategy, an idle worker asks a random worker for work [10, 12]....

[...]

...This strategy achieves a balanced work-division during runtime, and the penalty caused by worker communication is negligible [10, 12]....

[...]

...Algorithms that employ this strategy [10, 12, 151, 152] perform an initial static work division....

[...]

...To avoid the cost of synchronization and of storing partial results, most subgraph counting algorithms traverse the search space in a depth-first fashion [3, 10, 12, 56, 151– 153, 172, 190]....

[...]

27 citations

### Cites methods from "Parallel Subgraph Counting for Mult..."

...Based on the above intuition, we apply dynamic programming to count the number colorful matches of Q....

[...]

##### References

6,992 citations

### "Parallel Subgraph Counting for Mult..." refers background in this paper

...This frequency computation lies at the core of several graph metrics, such as graphlet degree distributions [3] or network motifs [4]....

[...]

6,675 citations

### "Parallel Subgraph Counting for Mult..." refers background in this paper

...computationally hard task, closely related to subgraph isomorphism, which is one of the classical NP-complete problems [6]....

[...]

4,559 citations

### "Parallel Subgraph Counting for Mult..." refers background in this paper

...73 No Coauthorships of scientists working on network experiments [24] Newman(1)...

[...]

^{1}

2,800 citations

### "Parallel Subgraph Counting for Mult..." refers background in this paper

...76 Yes Network of hyperlinks between weblogs on US politics [23] Newman(1) netsc 1,589 2,742 1....

[...]

2,548 citations

### "Parallel Subgraph Counting for Mult..." refers background in this paper

...94 No Traffic flows between routers [26] SNAP(2) company 8,497 6,724 0....

[...]

##### Related Papers (5)

##### Frequently Asked Questions (2)

###### Q2. What contributions have the authors mentioned in the paper "Parallel subgraph counting for multicore architectures" ?

This is, however, an important graph mining primitive, with several applications, and here the authors present a novel multicore parallel algorithm for this task. The authors assess the performance of their Pthreads implementation on a set of representative networks from various domains and with diverse topological features. For most networks, the authors obtain a speedup of over 50 for 64 cores and an almost linear speedup up to 32 cores, showcasing the flexibility and scalability of their algorithm. This paves the way for the usage of such counting algorithms on larger subgraph and network sizes without the obligatory access to a cluster.