Showing papers on "Cache invalidation published in 2020"

PDF

Open Access

Journal Article•DOI•

Leaper: a learned prefetcher for cache invalidation in LSM-tree based storage engines

[...]

Lei Yang¹, Hong Wu², Tieying Zhang², Xuntao Cheng², Feifei Li², Lei Zou¹, Yujie Wang², Rongyao Chen², Jianying Wang², Gui Huang² - Show less +6 more•Institutions (2)

Peking University¹, Alibaba Group²

01 Jul 2020

TL;DR: Leaper, a machine learning method to predict hot records in an LSM-tree storage engine and prefetch them into the cache without being disturbed by background operations is proposed, and implemented in a state-of-the-art X-Engine as a light-weight plug-in.

...read moreread less

Abstract: Frequency-based cache replacement policies that work well on page-based database storage engines are no longer sufficient for the emerging LSM-tree (Log-Structure Merge-tree) based storage engines. Due to the append-only and copy-on-write techniques applied to accelerate writes, the state-of-the-art LSM-tree adopts mutable record blocks and issues frequent background operations (i.e., compaction, flush) to reorganize records in possibly every block. As a side-effect, such operations invalidate the corresponding entries in the cache for each involved record, causing sudden drops on the cache hit rates and spikes on access latency. Given the observation that existing methods cannot address this cache invalidation problem, we propose Leaper, a machine learning method to predict hot records in an LSM-tree storage engine and prefetch them into the cache without being disturbed by background operations. We implement Leaper in a state-of-the-art LSM-tree storage engine, X-Engine, as a light-weight plug-in. Evaluation results show that Leaper eliminates about 70% cache invalidations and 99% latency spikes with at most 0.95% overheads as measured in real-world workloads.

...read moreread less

37 citations

Journal Article•DOI•

RIVA: Robust Integrity Verification Algorithm for High-Speed File Transfers

[...]

Batyr Charyyev¹, Engin Arslan²•Institutions (2)

Stevens Institute of Technology¹, University of Nevada, Reno²

14 Jan 2020-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Robust Integrity Verification Algorithm (RIVA) is proposed to strengthen the integrity of file transfers by forcing checksum computation tasks to read files directly from disk by invalidating memory mappings of file pages after their transfer.

...read moreread less

Abstract: End-to-end integrity verification is designed to protect file transfers against silent data corruption by comparing checksum of files at source and destination end points using cryptographic hash functions such as MD5 and SHA1. However, existing implementations of end-to-end integrity verification for file transfers fall short to detect undetected disk errors that causes inconsistency between disk and cache memory. In this article, we propose Robust Integrity Verification Algorithm (RIVA) to strengthen the integrity of file transfers by forcing checksum computation tasks to read files directly from disk. RIVA achieves this by invalidating memory mappings of file pages after their transfer such that when the file is read again for checksum calculation, it will be fetched from disk and silent disk errors will be captured. We design and conduct extensive fault resilience experiments to evaluate the robustness of integrity verification algorithms against undetected disk write errors. The results indicate that while the state-of-the-art integrity verification algorithms fail to detect the injected errors for almost all file sizes, RIVA captures all of them with the help of cache invalidation. We further run statistical analysis to assess the probability of missing silent disk errors and find that RIVA reduces the likelihood by 10 to 15 orders of magnitude compared to the existing approaches. Finally, enforcing disk read in integrity verification introduces an inevitable overhead in exchange of increased robustness against silent disk errors, but RIVA keeps its overhead below 15 percent in most cases by running transfer, cache invalidation, and checksum computation processes concurrently for different portions of the same file.

...read moreread less

12 citations

Journal Article•DOI•

The self modifying code (SMC)-aware processor (SAP): a security look on architectural impact and support

[...]

Marcus Botacin¹, Marco Antonio Alves Zanata¹, André Grégio¹•Institutions (1)

Federal University of Paraná¹

01 Sep 2020-Journal of Computer Virology and Hacking Techniques

TL;DR: This work revisits SMC impact on hardware internals and discusses the implementation of an SMC detector at distinct architectural points, and considers three detection approaches: existing hardware counters; block invalidation by the cache coherence protocol; and the use of Memory Management Unit information to control SMC execution.

...read moreread less

Abstract: Self modifying code (SMC) are code snippets that modify themselves at runtime. Malware use SMC to hide payloads and achieve persistence. Software-based SMC detection solutions impose performance penalties for real-time monitoring and do not benefit from runtime architectural information (cache invalidation or pipeline flush, for instance). We revisit SMC impact on hardware internals and discuss the implementation of an SMC detector at distinct architectural points. We consider three detection approaches: (i) existing hardware counters; (ii) block invalidation by the cache coherence protocol; (iii) the use of Memory Management Unit (MMU) information to control SMC execution. We compare the identified instrumentation points to highlight their strong and weak points. We also compare them to previous SMC detectors’ implementations.

...read moreread less

9 citations

Journal Article•DOI•

A Cache Invalidation Strategy Based on Publish/Subscribe for Named Data Networking

[...]

Yuanzhi Kan¹, Quan Zheng¹, Jian Yang¹, Xiaobin Tan¹•Institutions (1)

University of Science and Technology of China¹

01 Jan 2020-IEEE Access

TL;DR: This paper proposes a novel strategy of cache invalidation, called PIOR (Proactive Invalidation with Optional Renewing), to provide strong consistency for NDN, and conducts extensive simulations to evaluate the achievable performance.

...read moreread less

Abstract: Named Data Networking (NDN) aims to improve the efficiency of data delivery for the Internet. One of the typical characteristics of NDN is ubiquitous caching, that is to say, each network participant in NDN is capable of caching contents. This caching feature is beneficial for enhancing the data availability but also raises a problem of cache consistency. In this paper, we propose a novel strategy of cache invalidation, called PIOR (Proactive Invalidation with Optional Renewing), to provide strong consistency for NDN. PIOR is based on a lightweight publish/subscribe model, actively publishing the updated contents to the router nodes to guarantee the copy validity. We also conceive customized publish/subscribe rules to relieve the unbearable burden on the server imposed by the excessive publishing traffic. The advantage of PIOR lies in simple deployment and compatibility, since the invalidation process of PIOR is independent of the inherent process of NDN. We conduct extensive simulations over a real topology to evaluate the achievable performance of PIOR. The simulation results show that PIOR is able to achieve a high hit ratio and low server load at the low cost of network management.

...read moreread less

4 citations

Journal Article•DOI•

Minimizing CPU Utilization Requirements to Monitor an ATLAS Data Transfer System

[...]

G. Leventis¹, Jorn Schumacher¹, M. Dönszelmann²•Institutions (2)

CERN¹, Radboud University Nijmegen²

10 Feb 2020-Journal of Instrumentation

TL;DR: The paper details the challenges of properly using atomics and how they are overcome in the implementation of the FELIX monitoring system, and demonstrates that atomics can be useful for efficient computations in a multi-threaded environment.

...read moreread less

Abstract: The ATLAS experiment at LHC will use a PC-based read-out component called FELIX to connect its front-end electronics to the Data Acquisition System. FELIX translates custom front-end protocols to Ethernet and vice versa. Currently, FELIX makes use of parallel multi-threading to achieve the data rate requirements. In order to establish the FELIX operation conditions, monitoring of its parameters is necessary. This includes, but is not limited to, data counters and rates as well as compute resource utilisation. However, for these statistics to be of practical use, the parallel threads are required to intercommunicate. The FELIX monitoring implementation prior to this research utilized thread-safe queues to which data was pushed from the parallel threads. A central thread would extract and combine the queue contents. Enabling statistics would deteriorate the throughput to less than a fifth of the baseline performance. To minimize this performance hit to the greatest extent, we take advantage of the CPU's microarchitecture features and reduce concurrency. The focus is on hardware-supported atomic operations. When a thread performs an atomic operation, the other threads see it as happening instantaneously. They are used to complement and/or replace parallel computing lock mechanisms. The aforementioned queue system gets replaced with sets of C/C++ atomic variables and corresponding atomic functions, hereinafter referred to as atomics. Three implementations are tested. Implementation I has one set of atomic variables being updated by all the parallel threads. Implementation II has a set of atomic variables for every thread. These sets are periodically accumulated by a central thread. Implementation III is the same as implementation II, but appropriate measures are taken to eliminate any concurrency implications. The compiler used during the measurements is GCC, which supports the hardware (microarchitecture) optimizations for atomics. Implementations I and II resulted in negligible differences compared to the original one. Some benchmarks even show deterioration of the performance. Implementation III (concurrency & cache optimized) yields results with a performance improvement of up to six-fold increase compared to the original implementation. Achieved throughput is significantly closer to what is desirable. Similar structured software applications could benefit from the results of this research, especially Implementation III. The results presented demonstrate that atomics can be useful for efficient computations in a multi-threaded environment. However, from the results, it is clear that concurrency, cache invalidation and proper usage of the system's microarchitecture needs to be taken into account in this programming model. The paper details the challenges of properly using atomics and how they are overcome in the implementation of the FELIX monitoring system.

...read moreread less

2 citations

Patent•

Low latency dirty RAM for cache invalidation speed improvement

[...]

Lai Leon King Nok, Ma Qian, Mirza Jimshed B

21 May 2020

TL;DR: In this article, a technique for improving performance of a cache is presented, which involves maintaining indicators of whether cache entries are dirty in a random access memory (RAM) that has a lower latency to a cache controller than the cache memory that stores the cache entries.

...read moreread less

Abstract: A technique for improving performance of a cache is provided. The technique involves maintaining indicators of whether cache entries are dirty in a random access memory (“RAM”) that has a lower latency to a cache controller than the cache memory that stores the cache entries. When a request to invalidate one or more cache entries is received by the cache controller, the cache controller checks the RAM to determine whether any cache entries are dirty and thus should be written out to a backing store. Using the RAM removes the need to check the actual cache memory for whether cache entries are dirty, which reduces the latency associated with performing such checks and thus with performing cache invalidations.

...read moreread less

Patent•

Address translation cache invalidation in a microprocessor

[...]

Chatterjee Debapriya¹, Cockcroft Bryant¹, Larry Scott Leitner, Schumann John A, Yokum Karen - Show less +1 more•Institutions (1)

IBM¹

26 Nov 2020

TL;DR: In this article, a method and an information handling system having a plurality of processors connected by a cross-processor network is presented, where each of the processors preferably has a filter construct having an outgoing filter list that identifies logical partition identifications (LPIDs) that are exclusively assigned to that processor and/or an incoming filter list, and at least one additional processor in the system.

...read moreread less

Abstract: A method and an information handling system having a plurality of processors connected by a cross-processor network, where each of the plurality of processors preferably has a filter construct having an outgoing filter list that identifies logical partition identifications (LPIDs) that are exclusively assigned to that processor and/or an incoming filter list that identifies LPIDs on that processor and at least one additional processor in the system. In operation, if the LPID of the outgoing translation invalidation instruction is on the outgoing filter list, the address translation invalidation instruction is acknowledged on behalf of the system. If the LPID of the incoming invalidation instruction does not match any LPID on the incoming filter list, then the translation invalidation instruction is acknowledged, and if the LPID of the incoming invalidation instruction matches any LPID on the incoming filter list, then the invalidation instruction is sent into the respective processor.

...read moreread less

Patent•

Fast cache invalidation response using cache class attributes

[...]

Gupta Jay, Padmanabhan Gosagan, Mittal Devesh, Agarwal Kaushal

04 Jun 2020

TL;DR: In this paper, a memory management unit responds to an invalidate by class command by identifying a marker for a class of cache entries that the invalidate is meant to invalidate.

...read moreread less

Abstract: A memory management unit responds to an invalidate by class command by identifying a marker for a class of cache entries that the invalidate by class command is meant to invalidate The memory management unit stores the active marker as a retired marker and then sets the active marker to the next available marker Thereafter, the memory management sends an acknowledgement signal (eg, to the operating system) while invalidating the cache entries having the class and the retired marker in the background By correlating markers with classes of cache entries, the memory management can more quickly respond to class invalidation requests

...read moreread less