scispace - formally typeset
Open AccessProceedings ArticleDOI

Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization

TLDR
This work design a new error-controlled lossy compression algorithm for large-scale scientific data, significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions, and derives a series of multilayer prediction formulas and their unified formula in the context of data compression.
Abstract
Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions. We derive a series of multilayer prediction formulas and their unified formula in the context of data compression. One serious challenge is that the data prediction has to be performed based on the preceding decompressed values during the compression in order to guarantee the error bounds, which may degrade the prediction accuracy in turn. We explore the best layer for the prediction by considering the impact of compression errors on the prediction accuracy. Moreover, we propose an adaptive error-controlled quantization encoder, which can further improve the prediction hitting rate considerably. The data size can be reduced significantly after performing the variable-length encoding because of the uneven distribution produced by our quantization encoder. We evaluate the new compressor on production scientific data sets and compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP, SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class, especially with regard to compression factors (or bit-rates) and compression errors (including RMSE, NRMSE, and PSNR). Our solution is better than the second-best solution by more than a 2x increase in the compression factor and 3.8x reduction in the normalized root mean squared error on average, with reasonable error bounds and user-desired bit-rates.

read more

Citations
More filters
Proceedings ArticleDOI

Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets

TL;DR: Evaluation results confirm that the new adaptive solution can significantly improve the rate distortion for the lossy compression with fairly high compression ratios.
Proceedings ArticleDOI

Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data

TL;DR: This paper conducts a comprehensive study on state-of-the-art lossy compression, including ZFP, SZ, and ISABELA, using real and representative HPC datasets and proposes a sampling based estimation method that extrapolates the reduction ratio from data samples, to guide domain scientists to make more informed data reduction decisions.
Proceedings ArticleDOI

Full-state quantum circuit simulation by using data compression

TL;DR: This study develops a hybrid solution by combining the lossless compression and the tailored lossy compression method with adaptive error bounds at each timestep of the simulation, which reduces the memory requirement of simulating the 61-qubit Grover's search algorithm and suggests that the techniques can increase the simulation size by 2~16 qubits for general quantum circuits.
Book ChapterDOI

Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales

TL;DR: Methods and tools that various groups are developing are described to enable experimental exploration of algorithmic, software, and system design alternatives that have major implications for the design of various elements of supercomputer systems.
References
More filters
Book

The Mathematical Theory of Finite Element Methods

TL;DR: In this article, the construction of a finite element of space in Sobolev spaces has been studied in the context of operator-interpolation theory in n-dimensional variational problems.
Journal ArticleDOI

A universal algorithm for sequential data compression

TL;DR: The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable- to-block codes designed to match a completely specified source.
Journal ArticleDOI

Fixed-rate compressed floating-point arrays

TL;DR: A fixed-rate, near-lossless compression scheme that maps small blocks of 4d values in d dimensions to a fixed, user-specified number of bits per block, thereby allowing read and write random access to compressed floating-point data at block granularity.
Journal ArticleDOI

Fast and Efficient Compression of Floating-Point Data

TL;DR: This work proposes a simple scheme for lossless, online compression of floating-point data that transparently integrates into the I/O of many applications, and achieves state-of-the-art compression rates and speeds.
Proceedings ArticleDOI

Fast Error-Bounded Lossy HPC Data Compression with SZ

TL;DR: This paper proposes a novel HPC data compression method that works very effectively on compressing large-scale HPCData sets, and evaluates it using 13 real-world HPC applications across different scientific domains, and compared to many other state-of-the-art compression methods.
Related Papers (5)