MBZip: Multiblock Data Compression
TLDR
MBZip is a synergistic mechanism that compresses multiple data blocks into one single block (called a zipped block), both at the LLC and DRAM, and improves the system performance by 21.9%, with a maximum of 90.3% on a 4-core system.Abstract:
Compression techniques at the last-level cache and the DRAM play an important role in improving system performance by increasing their effective capacities. A compressed block in DRAM also reduces the transfer time over the memory bus to the caches, reducing the latency of a LLC cache miss. Usually, compression is achieved by exploiting data patterns present within a block. But applications can exhibit data locality that spread across multiple consecutive data blocks. We observe that there is significant opportunity available for compressing multiple consecutive data blocks into one single block, both at the LLC and DRAM. Our studies using 21 SPEC CPU applications show that, at the LLC, around 25% (on average) of the cache blocks can be compressed into one single cache block when grouped together in groups of 2 to 8 blocks. In DRAM, more than 30% of the columns residing in a single DRAM page can be compressed into one DRAM column, when grouped together in groups of 2 to 6. Motivated by these observations, we propose a mechanism, namely, MBZip, that compresses multiple data blocks into one single block (called a zipped block), both at the LLC and DRAM. At the cache, MBZip includes a simple tag structure to index into these zipped cache blocks and the indexing does not incur any redirectional delay. At the DRAM, MBZip does not need any changes to the address computation logic and works seamlessly with the conventional/existing logic. MBZip is a synergistic mechanism that coordinates these zipped blocks at the LLC and DRAM. Further, we also explore silent writes at the DRAM and show that certain writes need not access the memory when blocks are zipped. MBZip improves the system performance by 21.9%, with a maximum of 90.3% on a 4-core system.read more
Citations
More filters
Proceedings ArticleDOI
Safecracker: Leaking Secrets through Compressed Caches
TL;DR: This paper offers the first security analysis of cache compression, one such promising technique that is likely to appear in future processors and finds that cache compression is insecure because the compressibility of a cache line reveals information about its contents.
Journal ArticleDOI
Optimized Lossless Embedded Compression for Mobile Multimedia Applications
TL;DR: The proposed evaluation metrics to assess lossless embedded compression (LEC) algorithms to reflect realistic design considerations for mobile multimedia scenarios are proposed and an optimized LEC implementation is introduced for contemporary multimedia applications in mobile devices based on the proposed metrics.
Proceedings ArticleDOI
CABLE: a CAche-based link encoder for bandwidth-starved manycores
TL;DR: This work presents CABLE, a novel CAche-Based Link Encoder that enables point-to-point link compression between coherent caches, re-purposing the data already stored in the caches as a massive and scalable dictionary for data compression.
Journal ArticleDOI
MemSZ: Squeezing Memory Traffic with Lossy Compression
TL;DR: MemSZ introduces a low latency, parallel design of the Squeeze (SZ) algorithm offering aggressive compression ratios, up to 16:1 in the authors' implementation, and improves the execution time, energy, and memory traffic by up to 15%, 9%, and 64%, respectively.
Journal ArticleDOI
Compacted CPU/GPU Data Compression via Modified Virtual Address Translation
Larry Seiler,Daqi Lin,Cem Yuksel +2 more
TL;DR: A method to reduce the footprint of compressed data by using modified virtual address translation to permit random access to the data and an important property of this method is that compression, decompression, and reallocation are automatically managed by the new hardware without operating system intervention.
References
More filters
Journal ArticleDOI
The gem5 simulator
Nathan Binkert,Bradford M. Beckmann,Gabriel Black,Steven K. Reinhardt,Ali G. Saidi,Arkaprava Basu,Joel Hestness,Derek R. Hower,Tushar Krishna,Somayeh Sardashti,Rathijit Sen,Korey Sewell,Muhammad Shoaib,Nilay Vaish,Mark D. Hill,Darien Wood +15 more
TL;DR: The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.
Journal ArticleDOI
SPEC CPU2006 benchmark descriptions
TL;DR: On August 24, 2006, the Standard Performance Evaluation Corporation (SPEC) announced CPU2006, which replaces CPU2000, and the SPEC CPU benchmarks are widely used in both industry and academia.
Journal ArticleDOI
SPEC CPU2000: measuring CPU performance in the New Millennium
TL;DR: CPU2000 as mentioned in this paper is a new CPU benchmark suite with 19 applications that have never before been in a SPEC CPU suite, including high-performance numeric computing, Web servers, and graphical subsystems.
Journal ArticleDOI
Symbiotic jobscheduling for a simultaneous multithreaded processor
Allan Snavely,Dean M. Tullsen +1 more
TL;DR: It is demonstrated that performance on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler, and that a small sample of the possible schedules is sufficient to identify a good schedule quickly.
Journal ArticleDOI
System-Level Performance Metrics for Multiprogram Workloads
Stijn Eyerman,Lieven Eeckhout +1 more
TL;DR: The authors propose two performance metrics: average normalized turnaround time, a user- oriented metric, and system throughput, a system-oriented metric for developing multiprogram performance metrics in a top-down fashion starting from system-level objectives.