Journal ArticleDOI
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs
Changkyu Kim,Tim Kaldewey,Victor W. Lee,Eric Sedlar,Anthony D. Nguyen,Nadathur Satish,Jatin Chhugani,Andrea Di Blas,Pradeep Dubey +8 more
- Vol. 2, Iss: 2, pp 1378-1389
Reads0
Chats0
TLDR
This paper re-examines two popular join algorithms to determine if the latest computer architecture trends shift the tide that has favored hash join for many years and offers multicore implementations of hash join and sort-merge join which consistently outperform all previously reported results.Abstract:
Join is an important database operation. As computer architectures evolve, the best join algorithm may change hand. This paper re-examines two popular join algorithms -- hash join and sort-merge join -- to determine if the latest computer architecture trends shift the tide that has favored hash join for many years. For a fair comparison, we implemented the most optimized parallel version of both algorithms on the latest Intel Core i7 platform. Both implementations scale well with the number of cores in the system and take advantages of latest processor features for performance. Our hash-based implementation achieves more than 100M tuples per second which is 17X faster than the best reported performance on CPUs and 8X faster than that reported for GPUs. Moreover, the performance of our hash join implementation is consistent over a wide range of input data sizes from 64K to 128M tuples and is not affected by data skew. We compare this implementation to our highly optimized sort-based implementation that achieves 47M to 80M tuples per second. We developed analytical models to study how both algorithms would scale with upcoming processor architecture trends. Our analysis projects that current architectural trends of wider SIMD, more cores, and smaller memory bandwidth per core imply better scalability potential for sort-merge join. Consequently, sort-merge join is likely to outperform hash join on upcoming chip multiprocessors. In summary, we offer multicore implementations of hash join and sort-merge join which consistently outperform all previously reported results. We further conclude that the tide that favors the hash join algorithm has not changed yet, but the change is just around the corner.read more
Citations
More filters
Journal ArticleDOI
Sensitive protein alignments at tree-of-life scale using DIAMOND.
TL;DR: In this paper, an improved version of DIAMOND that greatly exceeds previous search performances and harnesses supercomputing to perform tree-of-life scale protein alignments in hours.
Proceedings ArticleDOI
FAST: fast architecture sensitive tree search on modern CPUs and GPUs
Changkyu Kim,Jatin Chhugani,Nadathur Satish,Eric Sedlar,Anthony D. Nguyen,Tim Kaldewey,Victor W. Lee,Scott A. Brandt,Pradeep Dubey +8 more
TL;DR: FAST is an extremely fast architecture sensitive layout of the index tree logically organized to optimize for architecture features like page size, cache line size, and SIMD width of the underlying hardware, and achieves a 6X performance improvement over uncompressed index search for large keys on CPUs.
Proceedings ArticleDOI
Design and evaluation of main memory hash join algorithms for multi-core CPUs
TL;DR: A very simple hash join algorithm is very competitive to the other more complex methods, and improves dramatically as the skew in the input data increases, and it quickly starts to outperform all other algorithms.
Proceedings ArticleDOI
Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware
TL;DR: Through the analysis, light is shed on how modern hardware affects the implementation of data operators and the fastest implementation of radix join to date is provided, reaching close to 200 million tuples per second.
Proceedings ArticleDOI
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
Nadathur Satish,Changkyu Kim,Jatin Chhugani,Anthony D. Nguyen,Victor W. Lee,Daehyun Kim,Pradeep Dubey +6 more
TL;DR: This paper presents a competitive analysis of comparison and non-comparison based sorting algorithms on two modern architectures - the latest CPU and GPU architectures, and proposes novel CPU radix sort and GPU merge sort implementations which are 2X faster than previously published results.
References
More filters
Book
Randomized Algorithms
TL;DR: This book introduces the basic concepts in the design and analysis of randomized algorithms and presents basic tools such as probability theory and probabilistic analysis that are frequently used in algorithmic applications.
Proceedings ArticleDOI
Sorting networks and their applications
TL;DR: To achieve high throughput rates today's computers perform several operations simultaneously; not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing operations are done concurrently.
Journal ArticleDOI
Data parallel algorithms
W. Daniel Hillis,Guy L. Steele +1 more
TL;DR: The success of data parallel algorithms—even on problems that at first glance seem inherently serial—suggests that this style of programming has much wider applicability than was previously thought.
Journal ArticleDOI
Larrabee: a many-core x86 architecture for visual computing
Larry D. Seiler,Doug Carmean,Eric Sprangle,Tom Forsyth,Michael Abrash,Pradeep Dubey,Stephen Junkins,Adam T. Lake,Jeremy Sugerman,Robert Dale Cavin,Roger Espasa,Ed Grochowski,Toni Juan,Pat Hanrahan +13 more
TL;DR: This article consists of a collection of slides from the author's conference presentation, some of the topics discussed include: architecture convergence; Larrabee architecture; and graphics pipeline.
Proceedings ArticleDOI
GPUTeraSort: high performance graphics co-processor sorting for large database management
TL;DR: Overall, the results indicate that using a GPU as a co-processor can significantly improve the performance of sorting algorithms on large databases.