Streaming sorting network based BWT acceleration on FPGA for lossless compression

doi:10.1109/FPT.2017.8280152

Proceedings ArticleDOI

Streaming sorting network based BWT acceleration on FPGA for lossless compression

Baofu Zhao, +3 more

- pp 247-250

Chats0

TLDR

A novel BWT accelerator based on the streaming sorting network that achieves 14.3X speedup compared with the state-of-art work when the data block size is 4KB and a lossless data compression system based on this accelerator.

Abstract:

The Burrows-Wheeler Transform (BWT) has received special attention due to its effectiveness in lossless data compression algorithms Because BWT is a time-consuming task, the efficient hardware accelerator that can yield high throughputs is required in real-time applications This paper presents a novel BWT accelerator based on the streaming sorting network The streaming sorting network performs the suffix sorting of large amount of data which is the most difficult task in BWT Our BWT accelerator is implemented on a NetFPGA board Experimental results show that it achieves 143X speedup compared with the state-of-art work when the data block size is 4KB Furthermore, we design and implement a lossless data compression system based on the proposed BWT accelerator The hardware system is composed of Burrows-Wheeler Transform module, the move-to-front encoding module, the run length encoding module, and the canonical Huffman encoding module We evaluate the system performance on a NetFPGA board at the frequency of 155MHz The throughput of the system could reach 179 MB/s on board when we use only one streaming sorting network for a 4KB block The system throughput can be linearly improved up to 537 MB/s in simulation on a Virtex UltraScale xcvu440 chip if we use three streaming sorting networks to compute BWT

Streaming sorting network based BWT acceleration on FPGA for lossless compression

Citations

An FPGA-Based BWT Accelerator for Bzip2 Data Compression

Lempel-Ziv-Oberhumer: A Critical Evaluation of Lossless Algorithm and Its Applications

Устройство хранения данных

Hardware Architecture for Inplace Compute of Burrows-Wheeler Transform in a Single Iteration

O ð N Þ Memory-Free Hardware Architecture for Burrows-Wheeler Transform

References

A Block-sorting Lossless Data Compression Algorithm

Energy and Memory Efficient Mapping of Bitonic Sorting on FPGA

Block Sorting Text Compression -- Final Report

An FPGA-based parallel sorting architecture for the Burrows Wheeler transform

Prototyping of efficient hardware algorithms for data compression in future communication systems

Related Papers (5)

A High-Throughput Hardware Accelerator for Lossless Compression of a DDR4 Command Trace

Segment-Parallel Predictor for FPGA-Based Hardware Compressor and Decompressor of Floating-Point Data Streams to Enhance Memory I/O Bandwidth

Hardware-accelerated Fast Lossless Compression Based on LZ4 Algorithm

Data Compression Device Based on Modified LZ4 Algorithm

A High Throughput No-Stall Golomb-Rice Hardware Decoder