scispace - formally typeset
Journal ArticleDOI

Work-time optimal k-merge algorithms on the PRAM

Reads0
Chats0
TLDR
This work designs and proves that /spl Omega/(n log k) work is required to solve the k-merge problem on the PRAM models, and designs a work-time optimal CREW-PRAM k-MERge algorithm that runs in /spl Theta/(log log n+log k) time and performs /spl theta/(n Log n) work.
Abstract
For 2/spl les/k/spl les/n, the k-merge problem is to merge a collection of ksorted sequences of total length n into a new sorted sequence. The k-merge problem is fundamental as it provides a common generalization of both merging and sorting. The main contribution of this work is to give simple and intuitive work-time optimal algorithms for the k-merge problem on three PRAM models, thus settling the status of the k-merge problem. We first prove that /spl Omega/(n log k) work is required to solve the k-merge problem on the PRAM models. We then show that the EREW-PRAM and both the CREW-PRAM and the CRCW require /spl Omega/(log n) time and /spl Omega/(log log n+log k) time, respectively, provided that the amount of work is bounded by O(n log k). Our first k-merge algorithm runs in /spl Theta/(log n) time and performs /spl Theta/(n log k) work on the EREW-PRAM. Finally, we design a work-time optimal CREW-PRAM k-merge algorithm that runs in /spl Theta/(log log n+log k) time and performs /spl Theta/(n log k) work. This latter algorithm is also work-time optimal on the CREW-PRAM model. Our algorithms completely settle the status of the k-merge problem on the three main PRAM models.

read more

Citations
More filters
Proceedings ArticleDOI

Efficient hardware data mining with the Apriori algorithm on FPGAs

TL;DR: This work introduces an efficient "systolic injection" method for intelligently reporting unpredictably generated mid-array results to a controller without any chance of collision or excessive stalling in the Apriori algorithm.
Journal ArticleDOI

Accelerating data mining workloads: current approaches and future challenges in system architecture design

TL;DR: Experiments have shown that heterogeneous architectures employing GPUs or FPGAs can result in significant application speedups over homogenous CPU‐based systems, while increasing performance per watt.
Journal ArticleDOI

A Work Efficient Parallel Algorithm for Exact Euclidean Distance Transform

TL;DR: This algorithm is the first fully-parallelized and realized work-time optimal algorithm for GPUs and the experimental results show that this algorithm outperforms the prior state-of-the-art GPU algorithms.
Proceedings ArticleDOI

Design of a hardware accelerator for density based clustering applications

TL;DR: This paper proposes a hardware accelerator for density based clustering applications and shows that this accelerator when integrated with general purpose processors, speed up the kernel execution times by at least 300X.
Journal ArticleDOI

An efficient parallel sorting compatible with the standard qsort

TL;DR: The main contribution of this paper is to present an efficient parallel sorting "psort" compatible with the standard qsort, implemented such that its interface is compatible with "qsort" in C Standard Library.
References
More filters
Book

The Art of Computer Programming

TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Book

Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes

TL;DR: This chapter discusses sorting on a Linear Array with a Systolic and Semisystolic Model of Computation, which automates the very labor-intensive and therefore time-heavy and expensive process of manually sorting arrays.
Book

Data Structures and Algorithms

TL;DR: The basis of this book is the material contained in the first six chapters of the earlier work, The Design and Analysis of Computer Algorithms, and has added material on algorithms for external storage and memory management.
Book

An introduction to parallel algorithms

TL;DR: This book provides an introduction to the design and analysis of parallel algorithms, with the emphasis on the application of the PRAM model of parallel computation, with all its variants, to algorithm analysis.
Journal ArticleDOI

Parallel Prefix Computation

TL;DR: A recurstve construction is used to obtain a product circuit for solving the prefix problem and a Boolean clrcmt which has depth 2[Iog2n] + 2 and size bounded by 14n is obtained for n-bit binary addmon.