Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
Dingwen Tao,Sheng Di,Zizhong Chen,Franck Cappello,Franck Cappello +4 more
- pp 1129-1139
TLDR
This work design a new error-controlled lossy compression algorithm for large-scale scientific data, significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions, and derives a series of multilayer prediction formulas and their unified formula in the context of data compression.Abstract:
Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions. We derive a series of multilayer prediction formulas and their unified formula in the context of data compression. One serious challenge is that the data prediction has to be performed based on the preceding decompressed values during the compression in order to guarantee the error bounds, which may degrade the prediction accuracy in turn. We explore the best layer for the prediction by considering the impact of compression errors on the prediction accuracy. Moreover, we propose an adaptive error-controlled quantization encoder, which can further improve the prediction hitting rate considerably. The data size can be reduced significantly after performing the variable-length encoding because of the uneven distribution produced by our quantization encoder. We evaluate the new compressor on production scientific data sets and compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP, SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class, especially with regard to compression factors (or bit-rates) and compression errors (including RMSE, NRMSE, and PSNR). Our solution is better than the second-best solution by more than a 2x increase in the compression factor and 3.8x reduction in the normalized root mean squared error on average, with reasonable error bounds and user-desired bit-rates.read more
Citations
More filters
Proceedings ArticleDOI
Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets
TL;DR: Evaluation results confirm that the new adaptive solution can significantly improve the rate distortion for the lossy compression with fairly high compression ratios.
Proceedings ArticleDOI
Understanding and Modeling Lossy Compression Schemes on HPC Scientific Data
Tao Lu,Qing Liu,Xubin He,Huizhang Luo,E. Suchyta,Jong Choi,Norbert Podhorszki,Scott Klasky,Mathew Wolf,Tong Liu,Zhenbo Qiao +10 more
TL;DR: This paper conducts a comprehensive study on state-of-the-art lossy compression, including ZFP, SZ, and ISABELA, using real and representative HPC datasets and proposes a sampling based estimation method that extrapolates the reduction ratio from data samples, to guide domain scientists to make more informed data reduction decisions.
Proceedings ArticleDOI
Full-state quantum circuit simulation by using data compression
Xin-Chuan Wu,Sheng Di,Emma Maitreyee Dasgupta,Franck Cappello,Hal Finkel,Yuri Alexeev,Frederic T. Chong +6 more
TL;DR: This study develops a hybrid solution by combining the lossless compression and the tailored lossy compression method with adaptive error bounds at each timestep of the simulation, which reduces the memory requirement of simulating the 61-qubit Grover's search algorithm and suggests that the techniques can increase the simulation size by 2~16 qubits for general quantum circuits.
Journal ArticleDOI
Use cases of lossy compression for floating-point data in scientific data sets:
Franck Cappello,Franck Cappello,Sheng Di,Sihuan Li,Xin Liang,Ali Murat Gok,Dingwen Tao,Chun Hong Yoon,Xin-Chuan Wu,Yuri Alexeev,Frederic T. Chong +10 more
TL;DR: The architectural and technological trends of systems used for scientific computing call for a significant reduction of scientific data sets that are composed mainly of floating-point data as mentioned in this paper, and this articl...
Book ChapterDOI
Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales
Ian Foster,Ian Foster +1 more
TL;DR: Methods and tools that various groups are developing are described to enable experimental exploration of algorithmic, software, and system design alternatives that have major implications for the design of various elements of supercomputer systems.
References
More filters
Book
The Mathematical Theory of Finite Element Methods
TL;DR: In this article, the construction of a finite element of space in Sobolev spaces has been studied in the context of operator-interpolation theory in n-dimensional variational problems.
Journal ArticleDOI
A universal algorithm for sequential data compression
Jacob Ziv,A. Lempel +1 more
TL;DR: The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable- to-block codes designed to match a completely specified source.
Journal ArticleDOI
Fixed-rate compressed floating-point arrays
TL;DR: A fixed-rate, near-lossless compression scheme that maps small blocks of 4d values in d dimensions to a fixed, user-specified number of bits per block, thereby allowing read and write random access to compressed floating-point data at block granularity.
Journal ArticleDOI
Fast and Efficient Compression of Floating-Point Data
Peter Lindstrom,Martin Isenburg +1 more
TL;DR: This work proposes a simple scheme for lossless, online compression of floating-point data that transparently integrates into the I/O of many applications, and achieves state-of-the-art compression rates and speeds.
Proceedings ArticleDOI
Fast Error-Bounded Lossy HPC Data Compression with SZ
Sheng Di,Franck Cappello +1 more
TL;DR: This paper proposes a novel HPC data compression method that works very effectively on compressing large-scale HPCData sets, and evaluates it using 13 real-world HPC applications across different scientific domains, and compared to many other state-of-the-art compression methods.