Book ChapterDOI
Accelerating large graph algorithms on the GPU using CUDA
Pawan Harish,P. J. Narayanan +1 more
- pp 197-208
Reads0
Chats0
TLDR
This work presents a few fundamental algorithms - including breadth first search, single source shortest path, and all-pairs shortest path - using CUDA on large graphs using the G80 line of Nvidia GPUs.Abstract:
Large graphs involving millions of vertices are common in many practical applications and are challenging to process. Practical-time implementations using high-end computers are reported but are accessible only to a few. Graphics Processing Units (GPUs) of today have high computation power and low price. They have a restrictive programming model and are tricky to use. The G80 line of Nvidia GPUs can be treated as a SIMD processor array using the CUDA programming model. We present a few fundamental algorithms - including breadth first search, single source shortest path, and all-pairs shortest path - using CUDA on large graphs. We can compute the single source shortest path on a 10 million vertex graph in 1.5 seconds using the Nvidia 8800GTX GPU costing $600. In some cases optimal sequential algorithm is not the fastest on the GPU architecture. GPUs have great potential as high-performance co-processors.read more
Citations
More filters
Proceedings ArticleDOI
Rodinia: A benchmark suite for heterogeneous computing
Shuai Che,Michael Boyer,Jiayuan Meng,David Tarjan,Jeremy W. Sheaffer,Sang-Ha Lee,Kevin Skadron +6 more
TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.
Proceedings ArticleDOI
Analyzing CUDA workloads using a detailed GPU simulator
TL;DR: In this paper, the performance of non-graphics applications written in NVIDIA's CUDA programming model is evaluated on a microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set.
Proceedings ArticleDOI
A scalable processing-in-memory accelerator for parallel graph processing
TL;DR: This work argues that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve memory-capacity-proportional performance and designs a programmable PIM accelerator for large-scale graph processing called Tesseract.
Proceedings ArticleDOI
Scalable GPU graph traversal
TL;DR: This work presents a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity.
Proceedings ArticleDOI
A quantitative study of irregular programs on GPUs
TL;DR: This paper defines two measures of irregularity called control-flow irregularity and memory-access irregularity, and investigates, using performance-counter measurements, how irregular GPU kernels differ from regular kernels with respect to these measures.
References
More filters
Proceedings ArticleDOI
Linear algebra operators for GPU implementation of numerical algorithms
Jens Krüger,Rüdiger Westermann +1 more
TL;DR: This work proposes a stream model for arithmetic operations on vectors and matrices that exploits the intrinsic parallelism and efficient communication on modern GPUs and introduces a framework for the implementation of linear algebra operators on programmable graphics processors (GPUs), thus providing the building blocks for the design of more complex numerical algorithms.
Journal ArticleDOI
A fast algorithm for finding dominators in a flowgraph
Thomas Lengauer,Robert E. Tarjan +1 more
TL;DR: A fast algorithm for finding dominators in a flowgraph is presented, which beat the straightforward algorithm and the bit vector algorithm on all but the smallest graphs tested.
Proceedings ArticleDOI
GPU Cluster for High Performance Computing
TL;DR: A parallel flow simulation using the lattice Boltzmann model (LBM) on a GPU cluster and the dispersion of airborne contaminants in the Times Square area of New York City are simulated.
Journal ArticleDOI
Linear algebra operators for GPU implementation of numerical algorithms
KrügerJens,WestermannRüdiger +1 more
TL;DR: In this paper, the focus is on the acceleration of techniques for numerical computing on the graphics chip, in particular, for the case of a single-input single-output (SISO) processor.
Proceedings ArticleDOI
Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2
David A. Bader,Kamesh Madduri +1 more
TL;DR: This paper presents fast parallel implementations of three fundamental graph theory problems, breadth-first search, st-connectivity and shortest paths for unweighted graphs, on multithreaded architectures such as the Cray MTA-2, and reports impressive results, both for algorithm execution time and parallel performance.