Accelerating large graph algorithms on the GPU using CUDA

doi:10.1007/978-3-540-77220-0_21

Book ChapterDOI

Accelerating large graph algorithms on the GPU using CUDA

Pawan Harish, +1 more

- pp 197-208

Chats0

TLDR

This work presents a few fundamental algorithms - including breadth first search, single source shortest path, and all-pairs shortest path - using CUDA on large graphs using the G80 line of Nvidia GPUs.

Abstract:

Large graphs involving millions of vertices are common in many practical applications and are challenging to process. Practical-time implementations using high-end computers are reported but are accessible only to a few. Graphics Processing Units (GPUs) of today have high computation power and low price. They have a restrictive programming model and are tricky to use. The G80 line of Nvidia GPUs can be treated as a SIMD processor array using the CUDA programming model. We present a few fundamental algorithms - including breadth first search, single source shortest path, and all-pairs shortest path - using CUDA on large graphs. We can compute the single source shortest path on a 10 million vertex graph in 1.5 seconds using the Nvidia 8800GTX GPU costing $600. In some cases optimal sequential algorithm is not the fastest on the GPU architecture. GPUs have great potential as high-performance co-processors.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Rodinia: A benchmark suite for heterogeneous computing

Shuai Che, +6 more

TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.

...read moreread less

Proceedings ArticleDOI

Analyzing CUDA workloads using a detailed GPU simulator

Ali Bakhoda, +4 more

TL;DR: In this paper, the performance of non-graphics applications written in NVIDIA's CUDA programming model is evaluated on a microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set.

...read moreread less

Proceedings ArticleDOI

A scalable processing-in-memory accelerator for parallel graph processing

Junwhan Ahn, +4 more

TL;DR: This work argues that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve memory-capacity-proportional performance and designs a programmable PIM accelerator for large-scale graph processing called Tesseract.

...read moreread less

Proceedings ArticleDOI

Scalable GPU graph traversal

Duane Merrill, +2 more

TL;DR: This work presents a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity.

...read moreread less

Proceedings ArticleDOI

A quantitative study of irregular programs on GPUs

Martin Burtscher, +2 more

TL;DR: This paper defines two measures of irregularity called control-flow irregularity and memory-access irregularity, and investigates, using performance-counter measurements, how irregular GPU kernels differ from regular kernels with respect to these measures.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Linear algebra operators for GPU implementation of numerical algorithms

Jens Krüger, +1 more

TL;DR: This work proposes a stream model for arithmetic operations on vectors and matrices that exploits the intrinsic parallelism and efficient communication on modern GPUs and introduces a framework for the implementation of linear algebra operators on programmable graphics processors (GPUs), thus providing the building blocks for the design of more complex numerical algorithms.

...read moreread less

Journal ArticleDOI

A fast algorithm for finding dominators in a flowgraph

Thomas Lengauer, +1 more

- 01 Jan 1979 -

ACM Transactions on Programming Language...

TL;DR: A fast algorithm for finding dominators in a flowgraph is presented, which beat the straightforward algorithm and the bit vector algorithm on all but the smallest graphs tested.

...read moreread less

Proceedings ArticleDOI

GPU Cluster for High Performance Computing

Zhe Fan, +3 more

TL;DR: A parallel flow simulation using the lattice Boltzmann model (LBM) on a GPU cluster and the dispersion of airborne contaminants in the Times Square area of New York City are simulated.

...read moreread less

Journal ArticleDOI

Linear algebra operators for GPU implementation of numerical algorithms

KrügerJens, +1 more

- 01 Jul 2003 -

ACM Transactions on Graphics

TL;DR: In this paper, the focus is on the acceleration of techniques for numerical computing on the graphics chip, in particular, for the case of a single-input single-output (SISO) processor.

...read moreread less

Proceedings ArticleDOI

Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

David A. Bader, +1 more

TL;DR: This paper presents fast parallel implementations of three fundamental graph theory problems, breadth-first search, st-connectivity and shortest paths for unweighted graphs, on multithreaded architectures such as the Cray MTA-2, and reports impressive results, both for algorithm execution time and parallel performance.

...read moreread less

Accelerating large graph algorithms on the GPU using CUDA

Citations

Rodinia: A benchmark suite for heterogeneous computing

Analyzing CUDA workloads using a detailed GPU simulator

A scalable processing-in-memory accelerator for parallel graph processing

Scalable GPU graph traversal

A quantitative study of irregular programs on GPUs

References

Linear algebra operators for GPU implementation of numerical algorithms

A fast algorithm for finding dominators in a flowgraph

GPU Cluster for High Performance Computing

Linear algebra operators for GPU implementation of numerical algorithms

Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

Related Papers (5)

Scalable GPU graph traversal

Accelerating CUDA graph algorithms at maximum warp

Pregel: a system for large-scale graph processing

R-MAT: A Recursive Model for Graph Mining

Gunrock: a high-performance graph processing library on the GPU