Journal ArticleDOI
Work-time optimal k-merge algorithms on the PRAM
Reads0
Chats0
TLDR
This work designs and proves that /spl Omega/(n log k) work is required to solve the k-merge problem on the PRAM models, and designs a work-time optimal CREW-PRAM k-MERge algorithm that runs in /spl Theta/(log log n+log k) time and performs /spl theta/(n Log n) work.Abstract:
For 2/spl les/k/spl les/n, the k-merge problem is to merge a collection of ksorted sequences of total length n into a new sorted sequence. The k-merge problem is fundamental as it provides a common generalization of both merging and sorting. The main contribution of this work is to give simple and intuitive work-time optimal algorithms for the k-merge problem on three PRAM models, thus settling the status of the k-merge problem. We first prove that /spl Omega/(n log k) work is required to solve the k-merge problem on the PRAM models. We then show that the EREW-PRAM and both the CREW-PRAM and the CRCW require /spl Omega/(log n) time and /spl Omega/(log log n+log k) time, respectively, provided that the amount of work is bounded by O(n log k). Our first k-merge algorithm runs in /spl Theta/(log n) time and performs /spl Theta/(n log k) work on the EREW-PRAM. Finally, we design a work-time optimal CREW-PRAM k-merge algorithm that runs in /spl Theta/(log log n+log k) time and performs /spl Theta/(n log k) work. This latter algorithm is also work-time optimal on the CREW-PRAM model. Our algorithms completely settle the status of the k-merge problem on the three main PRAM models.read more
Citations
More filters
Proceedings ArticleDOI
Efficient hardware data mining with the Apriori algorithm on FPGAs
TL;DR: This work introduces an efficient "systolic injection" method for intelligently reporting unpredictably generated mid-array results to a controller without any chance of collision or excessive stalling in the Apriori algorithm.
Journal ArticleDOI
Accelerating data mining workloads: current approaches and future challenges in system architecture design
TL;DR: Experiments have shown that heterogeneous architectures employing GPUs or FPGAs can result in significant application speedups over homogenous CPU‐based systems, while increasing performance per watt.
Journal ArticleDOI
A Work Efficient Parallel Algorithm for Exact Euclidean Distance Transform
Manduhu Manduhu,Mark W. Jones +1 more
TL;DR: This algorithm is the first fully-parallelized and realized work-time optimal algorithm for GPUs and the experimental results show that this algorithm outperforms the prior state-of-the-art GPU algorithms.
Proceedings ArticleDOI
Design of a hardware accelerator for density based clustering applications
TL;DR: This paper proposes a hardware accelerator for density based clustering applications and shows that this accelerator when integrated with general purpose processors, speed up the kernel execution times by at least 300X.
Journal ArticleDOI
An efficient parallel sorting compatible with the standard qsort
Duhu Man,Yasuaki Ito,Koji Nakano +2 more
TL;DR: The main contribution of this paper is to present an efficient parallel sorting "psort" compatible with the standard qsort, implemented such that its interface is compatible with "qsort" in C Standard Library.
References
More filters
Book
The Art of Computer Programming
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Book
Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes
TL;DR: This chapter discusses sorting on a Linear Array with a Systolic and Semisystolic Model of Computation, which automates the very labor-intensive and therefore time-heavy and expensive process of manually sorting arrays.
Book
Data Structures and Algorithms
TL;DR: The basis of this book is the material contained in the first six chapters of the earlier work, The Design and Analysis of Computer Algorithms, and has added material on algorithms for external storage and memory management.
Book
An introduction to parallel algorithms
TL;DR: This book provides an introduction to the design and analysis of parallel algorithms, with the emphasis on the application of the PRAM model of parallel computation, with all its variants, to algorithm analysis.
Journal ArticleDOI
Parallel Prefix Computation
TL;DR: A recurstve construction is used to obtain a product circuit for solving the prefix problem and a Boolean clrcmt which has depth 2[Iog2n] + 2 and size bounded by 14n is obtained for n-bit binary addmon.