scispace - formally typeset
Proceedings ArticleDOI

Streaming sorting network based BWT acceleration on FPGA for lossless compression

Reads0
Chats0
TLDR
A novel BWT accelerator based on the streaming sorting network that achieves 14.3X speedup compared with the state-of-art work when the data block size is 4KB and a lossless data compression system based on this accelerator.
Abstract
The Burrows-Wheeler Transform (BWT) has received special attention due to its effectiveness in lossless data compression algorithms Because BWT is a time-consuming task, the efficient hardware accelerator that can yield high throughputs is required in real-time applications This paper presents a novel BWT accelerator based on the streaming sorting network The streaming sorting network performs the suffix sorting of large amount of data which is the most difficult task in BWT Our BWT accelerator is implemented on a NetFPGA board Experimental results show that it achieves 143X speedup compared with the state-of-art work when the data block size is 4KB Furthermore, we design and implement a lossless data compression system based on the proposed BWT accelerator The hardware system is composed of Burrows-Wheeler Transform module, the move-to-front encoding module, the run length encoding module, and the canonical Huffman encoding module We evaluate the system performance on a NetFPGA board at the frequency of 155MHz The throughput of the system could reach 179 MB/s on board when we use only one streaming sorting network for a 4KB block The system throughput can be linearly improved up to 537 MB/s in simulation on a Virtex UltraScale xcvu440 chip if we use three streaming sorting networks to compute BWT

read more

Citations
More filters
Proceedings ArticleDOI

An FPGA-Based BWT Accelerator for Bzip2 Data Compression

TL;DR: This paper analyzes the bottleneck of the BWT acceleration and presents a novel design to map the anti-sequential suffix sorting algorithm to FPGAs, which can achieve ~2x speedup compared to the best CPU implementation using standard large Corpus benchmarks.
Proceedings ArticleDOI

Lempel-Ziv-Oberhumer: A Critical Evaluation of Lossless Algorithm and Its Applications

Shiv Preet, +1 more
TL;DR: This paper reviews an algorithm technique which even though adds one extra layer of processing but still able to achieve desired compression for multitude of formats.
Patent

Устройство хранения данных

TL;DR: In this article, the authors proposed a method to improve the performance of the HDD-based HDD model by using the HPD model to identify the most important HDD features.

Hardware Architecture for Inplace Compute of Burrows-Wheeler Transform in a Single Iteration

TL;DR: A hardware architecture that implements an inplace algorithm to compute the Burrows-Wheeler transform (BWT), using a register based character buffer in a scanchain configuration, such that the BWT is computed from right to left, as characters are loaded.

O ð N Þ Memory-Free Hardware Architecture for Burrows-Wheeler Transform

TL;DR: In this article , a hardware architecture for the Burrows-wheeler transform (BWT) scheme is presented, where the core idea is to have a memory-free strategy that does not involve any software overhead during BWToperation.
References
More filters

A Block-sorting Lossless Data Compression Algorithm

TL;DR: A block-sorting, lossless data compression algorithm, and the implementation of that algorithm and the performance of the implementation with widely available data compressors running on the same hardware are compared.
Proceedings ArticleDOI

Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA

TL;DR: This paper proposes a streaming permutation network (SPN) by "folding" the classic Clos network and proves that the SPN is programmable to realize all the interconnection patterns in the bitonic sorting network.

Block Sorting Text Compression -- Final Report

TL;DR: This report investigates the block sorting compression algorithm, in particular trying to understand its operation and limitations, with a compression approaching that of the currently best compressors while being much faster than other compressors of comparable performance.
Proceedings ArticleDOI

An FPGA-based parallel sorting architecture for the Burrows Wheeler transform

TL;DR: In this article, a parallel sorting block is used to implement the BWT transform on a field programmable gate array (FPGA) device providing good performance improvements compared with other reported implementations on FPGAs.
Proceedings ArticleDOI

Prototyping of efficient hardware algorithms for data compression in future communication systems

TL;DR: The feasibility and VLSI implementation of this scalable BWT architecture in simulating and prototyping its systolic, highly utilized hardware structure with Virtex FPGAs is discussed.
Related Papers (5)