scispace - formally typeset
Proceedings ArticleDOI

Fast, Scalable Parallel Comparison Sort on Hybrid Multicore Architectures

Reads0
Chats0
TLDR
This work presents a hybrid comparison based sorting algorithm which utilizes a many-core GPU and a multi-core CPU to perform sorting and shows that such performance gains can be obtained on other hybrid CPU+GPU platforms.
Abstract
Sorting has been a topic of immense research value since the inception of Computer Science. Hybrid computing on multicore architectures involves computing simultaneously on a tightly coupled heterogeneous collection of devices. In this work, we consider a multicore CPU along with a many core GPU as our experimental hybrid platform. In this work, we present a hybrid comparison based sorting algorithm which utilizes a many-core GPU and a multi-core CPU to perform sorting. The algorithm is broadly based on splitting the input list according to a large number of splitters followed by creating independent sub lists. Sorting the independent sub lists results in sorting the entire original list. On a CPU+GPU platform consisting of an Intel i7 980 and an Nvidia GTX 580, our algorithm achieves a 20% gain over the current best known comparison sort result that was published by Davidson et. al. [In Par 2012]. On the above experimental platform, our results are better by 40% on average over a similar GPU-alone algorithm proposed by Leischner et. al. [IPDPS 2010]. Our results also show that our algorithm and its implementation scale with the size of the input. We also show that such performance gains can be obtained on other hybrid CPU+GPU platforms.

read more

Citations
More filters
Proceedings ArticleDOI

Can GPUs sort strings efficiently

TL;DR: This paper presents a fast and efficient string sort on the GPU that is built on the available radix sort, and achieves speed of up to 10 over current GPU methods, especially on large datasets.
Journal ArticleDOI

Kepler GPU accelerated recursive sorting using dynamic parallelism

TL;DR: This paper focuses on the performance gain obtained on the Kepler graphics processing units (GPUs) for multi‐key quicksort and the GPU implementation of string sorting algorithm using singleton elements in the literature.
Proceedings ArticleDOI

String sorting on multi and many-threaded architectures: A comparative study

TL;DR: A comparative study on the most popular and efficient string sorting algorithms that have been implemented on CPU and GPU machines and an efficient parallel multi-key quicksort implementation which uses ternary search tree in order to increase the speed up and efficiency of sorting large set of string data are produced.
Proceedings ArticleDOI

Architecture- and workload- aware heterogeneous algorithms for sparse matrix vector multiplication

TL;DR: This paper considers a class of sparse matrices that exhibit a scale-free nature and identifies a scheme that works well for such matrices and uses simple and effective mechanisms to determine the appropriate amount of work to be alloted to the CPU and the GPU.
Proceedings ArticleDOI

Applications of Ear Decomposition to Efficient Heterogeneous Algorithms for Shortest Path/Cycle Problems

TL;DR: The applicability of an ear decomposition of graphs to problems such as all-pairs-shortestpaths and minimum cost cycle basis is studied and it is shown that the resulting solutions are scalable in terms of both memory usage and also their speedup over best known current implementations.
References
More filters
Proceedings ArticleDOI

Designing efficient sorting algorithms for manycore GPUs

TL;DR: The design of high-performance parallel radix sort and merge sort routines for manycore GPUs, taking advantage of the full programmability offered by CUDA, are described, which are the fastest GPU sort and the fastest comparison-based sort reported in the literature.
Proceedings ArticleDOI

Scan primitives for GPU computing

TL;DR: Using the scan primitives, this work shows novel GPU implementations of quicksort and sparse matrix-vector multiply, and analyzes the performance of the scanPrimitives, several sort algorithms that use the scan Primitives, and a graphical shallow-water fluid simulation using the scan framework for a tridiagonal matrix solver.
Book

Vector models for data-parallel computing

TL;DR: A model of parallelism that extends and formalizes the Data-Parallel model on which the Connection Machine and other supercomputers are based is described, and it is argued that data-parallel models are not only practical and can be applied to a surprisingly wide variety of problems, they are also well suited for very-high-level languages and lead to a concise and clear description of algorithms and their complexity.
Proceedings ArticleDOI

GPUTeraSort: high performance graphics co-processor sorting for large database management

TL;DR: Overall, the results indicate that using a GPU as a co-processor can significantly improve the performance of sorting algorithms on large databases.
Related Papers (5)