scispace - formally typeset
Open AccessProceedings Article

Practical Skew Handling in Parallel Joins

Reads0
Chats0
TLDR
This work developed, implemented, and experimented with four new skew-handling parallel join algorithms, one of which, which is called virtual processor range partitioning, was the clear winner in high skew cases, while traditional hybrid hash join was theclear winner in lower skew or no skew cases.
Abstract
We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on non-skewed data. The main idea is to use multiple algorithms, each specialized for a different degree of skew, and to use a small sample of the relations being joined to determine which algorithm is appropriate. We developed, implemented, and experimented with four new skew-handling parallel join algorithms; one, which we call virtual processor range partitioning, was the clear winner in high skew cases, while traditional hybrid hash join was the clear winner in lower skew or no skew cases. We present experimental results from an implementation of all four algorithms on the Gamma parallel database machine. To our knowledge, these are the first reported skew-handling numbers from an actual implementation.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Principles of Distributed Database Systems

TL;DR: This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels and concentrates on fundamental theories as well as techniques and algorithms in distributed data management.
Journal ArticleDOI

The tail at scale

TL;DR: Software techniques that tolerate latency variability are vital to building responsive large-scale Web services.
Journal ArticleDOI

The Space Complexity of Approximating the Frequency Moments

TL;DR: In this paper, the authors considered the space complexity of randomized algorithms that approximate the frequency moments of a sequence, where the elements of the sequence are given one by one and cannot be stored.
Proceedings ArticleDOI

The space complexity of approximating the frequency moments

TL;DR: It turns out that the numbers F0;F1 and F2 can be approximated in logarithmic space, whereas the approximation of Fk for k 6 requires n (1) space.
Proceedings ArticleDOI

SkewTune: mitigating skew in mapreduce applications

TL;DR: The results show that SkewTune can significantly reduce job runtime in the presence of skew and adds little to no overhead in the absence of skew.
References
More filters
Journal ArticleDOI

Universal classes of hash functions

TL;DR: An input independent average linear time algorithm for storage and retrieval on keys that makes a random choice of hash function from a suitable class of hash functions.
Journal ArticleDOI

The Gamma database machine project

TL;DR: Gamma as mentioned in this paper is a relational database machine running on an Intel iPSC/2 hypercube with 32 processors and 32 disk drives, where all relations are horizontally partitioned across multiple disk drives enabling relations to be scanned in parallel.
Related Papers (5)