scispace - formally typeset
Book ChapterDOI

Accelerating large graph algorithms on the GPU using CUDA

Reads0
Chats0
TLDR
This work presents a few fundamental algorithms - including breadth first search, single source shortest path, and all-pairs shortest path - using CUDA on large graphs using the G80 line of Nvidia GPUs.
Abstract
Large graphs involving millions of vertices are common in many practical applications and are challenging to process. Practical-time implementations using high-end computers are reported but are accessible only to a few. Graphics Processing Units (GPUs) of today have high computation power and low price. They have a restrictive programming model and are tricky to use. The G80 line of Nvidia GPUs can be treated as a SIMD processor array using the CUDA programming model. We present a few fundamental algorithms - including breadth first search, single source shortest path, and all-pairs shortest path - using CUDA on large graphs. We can compute the single source shortest path on a 10 million vertex graph in 1.5 seconds using the Nvidia 8800GTX GPU costing $600. In some cases optimal sequential algorithm is not the fastest on the GPU architecture. GPUs have great potential as high-performance co-processors.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Rodinia: A benchmark suite for heterogeneous computing

TL;DR: This characterization shows that the Rodinia benchmarks cover a wide range of parallel communication patterns, synchronization techniques and power consumption, and has led to some important architectural insight, such as the growing importance of memory-bandwidth limitations and the consequent importance of data layout.
Proceedings ArticleDOI

Analyzing CUDA workloads using a detailed GPU simulator

TL;DR: In this paper, the performance of non-graphics applications written in NVIDIA's CUDA programming model is evaluated on a microarchitecture performance simulator that runs NVIDIA's parallel thread execution (PTX) virtual instruction set.
Proceedings ArticleDOI

A scalable processing-in-memory accelerator for parallel graph processing

TL;DR: This work argues that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve memory-capacity-proportional performance and designs a programmable PIM accelerator for large-scale graph processing called Tesseract.
Proceedings ArticleDOI

Scalable GPU graph traversal

TL;DR: This work presents a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum that achieves an asymptotically optimal O(|V|+|E|) work complexity.
Proceedings ArticleDOI

A quantitative study of irregular programs on GPUs

TL;DR: This paper defines two measures of irregularity called control-flow irregularity and memory-access irregularity, and investigates, using performance-counter measurements, how irregular GPU kernels differ from regular kernels with respect to these measures.
References
More filters
Proceedings ArticleDOI

Linear algebra operators for GPU implementation of numerical algorithms

TL;DR: This work proposes a stream model for arithmetic operations on vectors and matrices that exploits the intrinsic parallelism and efficient communication on modern GPUs and introduces a framework for the implementation of linear algebra operators on programmable graphics processors (GPUs), thus providing the building blocks for the design of more complex numerical algorithms.
Journal ArticleDOI

A fast algorithm for finding dominators in a flowgraph

TL;DR: A fast algorithm for finding dominators in a flowgraph is presented, which beat the straightforward algorithm and the bit vector algorithm on all but the smallest graphs tested.
Proceedings ArticleDOI

GPU Cluster for High Performance Computing

TL;DR: A parallel flow simulation using the lattice Boltzmann model (LBM) on a GPU cluster and the dispersion of airborne contaminants in the Times Square area of New York City are simulated.
Journal ArticleDOI

Linear algebra operators for GPU implementation of numerical algorithms

TL;DR: In this paper, the focus is on the acceleration of techniques for numerical computing on the graphics chip, in particular, for the case of a single-input single-output (SISO) processor.
Proceedings ArticleDOI

Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2

TL;DR: This paper presents fast parallel implementations of three fundamental graph theory problems, breadth-first search, st-connectivity and shortest paths for unweighted graphs, on multithreaded architectures such as the Cray MTA-2, and reports impressive results, both for algorithm execution time and parallel performance.