scispace - formally typeset
Open AccessProceedings Article

Applicability of GPU Computing for Efficient Merge in In-Memory Databases

TLDR
It is found that the maximum potential merge speedup is limited since only two of its four stages are likely to benefit from parallelization, so a parallel dictionary slice merge algorithm as well as an alternative parallel merge algorithm for GPUs that achieves up to 40% more throughput than its CPU implementation are presented.
Abstract
Column oriented in-memory databases typically use dictionary compression to reduce the overall storage space and allow fast lookup and comparison. However, there is a high performance cost for updates since the dictionary, used for compression, has to be recreated each time records are created, updated or deleted. This has to be taken into account for TPC-C like workloads with around 45% of all queries being transactional modifications. A technique called differential updates can be used to allow faster modifications. In addition to the main storage, the database then maintains a delta storage to accommodate modifying queries. During the merge process, the modifications of the delta are merged with the main storage in parallel to the normal operation of the database. Current hardware and software trends suggest that this problem can be tackled by massively parallelizing the merge process. One approach to massive parallelism are GPUs that offer order of magnitudes more cores than modern CPUs. Therefore, we analyze the feasibility of a parallel GPU merge implementation and its potential speedup. We found that the maximum potential merge speedup is limited since only two of its four stages are likely to benefit from parallelization. We present a parallel dictionary slice merge algorithm as well as an alternative parallel merge algorithm for GPUs that achieves up to 40% more throughput than its CPU implementation. In addition, we propose a parallel duplicate removal algorithm that achieves up to 27 times the throughput of the CPU implementation.

read more

Citations
More filters
Proceedings ArticleDOI

A Memory Bandwidth-Efficient Hybrid Radix Sort on GPUs

TL;DR: In this paper, a pipelined heterogeneous sorting algorithm was proposed for sorting on GPUs, achieving a 2.32-fold improvement over the state-of-the-art CPU-based radix sort running 16 threads.
Journal ArticleDOI

Efficient co-processor utilization in database query processing

TL;DR: This paper presents a framework that automatically learns and adapts execution models for arbitrary algorithms on any (co-)processor, and uses the execution models to distribute a workload of database operators on available (co-processing devices).
Proceedings ArticleDOI

A GPU-based index to support interactive spatio-temporal queries over historical data

TL;DR: The results show that the GPU-based index obtains interactive, sub-second response times for queries over large data sets and leads to at least two orders of magnitude speedup over spatial indexes implemented in existing open-source and commercial database systems.
Book ChapterDOI

Automatic selection of processing units for coprocessing in databases

TL;DR: This paper presents a framework that automatically learns and adapts execution models for arbitrary algorithms on any (co)processor to find break-even points and support scheduling decisions and demonstrates its applicability for three common use cases in modern database systems.
Patent

Hash Table and Radix Sort Based Aggregation

TL;DR: Aggregation of an in-memory database includes receiving, by at least one processor having a plurality of threads, input having records stored in random access memory, distributing, by the at most one processor, the input into portions, one of the plurality of thread having an assigned portion, aggregating the records in the assigned portion based on locality of keys in the records and outputting, by a processor, aggregated records into a global hash table.
References
More filters
Book

An introduction to parallel algorithms

TL;DR: This book provides an introduction to the design and analysis of parallel algorithms, with the emphasis on the application of the PRAM model of parallel computation, with all its variants, to algorithm analysis.

GPU Computing

TL;DR: The background, hardware, and programming model for GPU computing is described, the state of the art in tools and techniques are summarized, and four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications are presented.
Journal ArticleDOI

The log-structured merge-tree (LSM-tree)

TL;DR: The log-structured mergetree (LSM-tree) is a disk-based data structure designed to provide low-cost indexing for a file experiencing a high rate of record inserts (and deletes) over an extended period.
Book ChapterDOI

C-store: a column-oriented DBMS

TL;DR: Preliminary performance data on a subset of TPC-H is presented and it is shown that the system the team is building, C-Store, is substantially faster than popular commercial products.