scispace - formally typeset
Search or ask a question

Showing papers on "Cache invalidation published in 2022"


Journal ArticleDOI
TL;DR: In this paper , a data placement strategy based on an improved reservoir sampling algorithm is proposed to solve the problem of intermediate data tilt in the shuffle stage of Spark, where the data skew measurement model is used to classify skewed data into skewed data, and non-skewed and coarse-grained, and fine grained placement algorithms are designed.

20 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used conditional probability to characterize the interactive relationship between existence and validity and developed an analytical model that evaluates the performance (hit probability and server load) of four different invalidation schemes with LRU replacement under arbitrary invalidation frequency distribution.
Abstract: Caching contents close to end-users can improve the network performance, while causing the problem of guaranteeing consistency. Specifically, solutions are classified into validation and invalidation, the latter of which can provide strong cache consistency strictly required in some scenarios. To date, little work on the analysis of cache invalidation has been covered. In this work, by using conditional probability to characterize the interactive relationship between existence and validity, we develop an analytical model that evaluates the performance (hit probability and server load) of four different invalidation schemes with LRU replacement under arbitrary invalidation frequency distribution. The model allows us to theoretically identify some key parameters that affect our metrics of interest and gain some common insights on parameter settings to balance the performance of cache invalidation. Compared with other cache invalidation models, our model can achieve higher accuracy in predicting the cache hit probability. We also conduct extensive simulations that demonstrate the achievable performance of our model.

7 citations


Proceedings ArticleDOI
20 Feb 2022
TL;DR: V-Cache as mentioned in this paper is a 3D stacked product that attaches additional cache onto a high-performance processor through hybrid bonding, a technology that offers significant bandwidth and power benefits over state-of-the-art uBump based approaches.
Abstract: AMD's V-Cache is a 3D stacked product that attaches additional cache onto a high-performance processor through hybrid bonding, a technology that offers significant bandwidth and power benefits over state-of-the-art uBump based approaches. V-Cache expands Zen3's on-die L3 Cache from 32MB to 96MB, providing up to 2TB/s of bandwidth and 15% average gaming performance uplift. This paper describes the hybrid bonding technology components, provides insight into the V-Cache's architecture and design, discusses the associated DFT implications, and offers measured performance results.

5 citations


Journal ArticleDOI
TL;DR: In this article , the authors used conditional probability to characterize the interactive relationship between existence and validity and developed an analytical model that evaluates the performance (hit probability and server load) of four different invalidation schemes with LRU replacement under arbitrary invalidation frequency distribution.
Abstract: Caching contents close to end-users can improve the network performance, while causing the problem of guaranteeing consistency. Specifically, solutions are classified into validation and invalidation, the latter of which can provide strong cache consistency strictly required in some scenarios. To date, little work on the analysis of cache invalidation has been covered. In this work, by using conditional probability to characterize the interactive relationship between existence and validity, we develop an analytical model that evaluates the performance (hit probability and server load) of four different invalidation schemes with LRU replacement under arbitrary invalidation frequency distribution. The model allows us to theoretically identify some key parameters that affect our metrics of interest and gain some common insights on parameter settings to balance the performance of cache invalidation. Compared with other cache invalidation models, our model can achieve higher accuracy in predicting the cache hit probability. We also conduct extensive simulations that demonstrate the achievable performance of our model.

4 citations


Journal ArticleDOI
TL;DR: This work presents Contention Analysis in Shared Hierarchies using Thefts, or CASHT, a framework for capturing cache contention information both offline and online and uses thefts to complement more familiar cache statistics to train a learning model based on Gradient-boosting Trees to predict the best ways to partition the last-level cache.
Abstract: Cache management policies should consider workloads’ contention behavior when managing a shared cache. Prior art makes estimates about shared cache behavior by adding extra logic or time to isolate per workload cache statistics. These approaches provide per-workload analysis but do not provide a holistic understanding of the utilization and effectiveness of caches under the ever-growing contention that comes standard with scaling cores. We present Contention Analysis in Shared Hierarchies using Thefts, or CASHT,1 a framework for capturing cache contention information both offline and online. CASHT takes advantage of cache statistics made richer by observing a consequence of cache contention: inter-core evictions, or what we call THEFTS. We use thefts to complement more familiar cache statistics to train a learning model based on Gradient-boosting Trees (GBT) to predict the best ways to partition the last-level cache. GBT achieves 90+% accuracy with trained models as small as 100 B and at least 95% accuracy at 1 kB model size when predicting the best way to partition two workloads. CASHT employs a novel run-time framework for collecting thefts-based metrics despite partition intervention, and enables per-access sampling rather than set sampling that could add overhead but may not capture true workload behavior. Coupling CASHT and GBT for use as a dynamic policy results in a very lightweight and dynamic partitioning scheme that performs within a margin of error of Utility-based Cache Partitioning at a 1/8 the overhead.

4 citations


Proceedings ArticleDOI
14 Apr 2022
TL;DR: In this paper , cache shaping is proposed to preserve user privacy against cache-based website fingerprinting attacks, which produces dummy cache activities by introducing dummy I/O operations and implementing with multiple processes, which hides fingerprints when a user visits websites.
Abstract: Cache-based website fingerprinting attacks can infer which website a user visits by measuring CPU cache activities. Studies have shown that an attacker can achieve high accuracy with a low sampling rate by monitoring cache occupancy of the entire Last Level Cache. Although a defense has been proposed, it was not effective when an attacker adapts and retrains a classifier with defended data. In this paper, we propose a new defense, referred to as cache shaping, to preserve user privacy against cache-based website fingerprinting attacks. Our proposed defense produces dummy cache activities by introducing dummy I/O operations and implementing with multiple processes, which hides fingerprints when a user visits websites. Our experimental results over large-scale datasets collected from multiple web browsers and operating systems show that our defense remains effective even if an attacker retrains a classifier with defended cache traces. We demonstrate the efficacy of our defense in the closed-world setting and the open-world setting by leveraging deep neural networks as classifiers.

4 citations


Journal ArticleDOI
TL;DR: In this paper , a cache implementation scheme in NDN is proposed to solve the problem of insufficient cache space of prgrammable switch and realize the practical application of NDN, which replaces the memory space of programmable switch.
Abstract: This work proposes NFD.P4, a cache implementation scheme in Named Data Networking (NDN), to solve the problem of insufficient cache space of prgrammable switch and realize the practical application of NDN. We transplant the cache function of NDN.P4 to the NDN Forwarding Daemon (NFD) cache server, which replace the memory space of programmable switch.

3 citations


Journal ArticleDOI
TL;DR: In this paper , an aging-based Least Frequently Used (LFU) algorithm is used by considering both the size and frequency of data simultaneously and the priority and expiry age of the data in the cache memory is managed by dealing with both the sizes and frequencies of data.
Abstract: Fast access of data from Data Warehouse (DW) is a need for today’s Business Intelligence (BI). In the era of Big Data, the cache is regarded as one of the most effective techniques to improve the performance of accessing data. DW has been widely used by several organizations to manage data and use it for Decision Support System (DSS). Many methods have been used to optimize the performance of fetching data from DW. Query cache method is one of those methods that play an effective role in optimization. The proposed work is based on a cache-based mechanism that helps DW in two aspects: the first one is to reduce the execution time by directly accessing records from cache memory, and the second is to save cache memory space by eliminating non-frequent data. Our target is to fill the cache memory with the most used data. To achieve this goal aging-based Least Frequently Used (LFU) algorithm is used by considering the size and frequency of data simultaneously. The priority and expiry age of the data in the cache memory is managed by dealing with both the size and frequency of data. LFU sets priorities and counts the age of data placed in cache memory. The entry with the lowest age count and priority is eliminated first from the cache block. Ultimately, the proposed cache mechanism efficiently utilized cache memory and fills a large performance gap between the main DW and the business user query.

3 citations


Proceedings ArticleDOI
11 May 2022
TL;DR: This study studies the access patterns and potential for network traffic reduction by this federated storage cache known as the Southern California Petabyte Scale Cache to explore the predictability of the cache uses and the potential for a more general in-network data caching.
Abstract: Scientific collaborations are increasingly relying on large volumes of data for their work and many of them employ tiered systems to replicate the data to their worldwide user communities. Each user in the community often selects a different subset of data for their analysis tasks; however, members of a research group often are working on related research topics that require similar data objects. Thus, there is a significant amount of data sharing possible. In this work, we study the access traces of a federated storage cache known as the Southern California Petabyte Scale Cache. By studying the access patterns and potential for network traffic reduction by this caching system, we aim to explore the predictability of the cache uses and the potential for a more general in-network data caching. Our study shows that this distributed storage cache is able to reduce the network traffic volume by a factor of 2.35 during a part of the study period. We further show that machine learning models could predict cache utilization with an accuracy of 0.88. This demonstrates that such cache usage is predictable, which could be useful for managing complex networking resources such as in-network caching.

3 citations


Journal ArticleDOI
TL;DR: In this paper , a machine learning method is proposed to predict the blocks that need to be requested in the future to prevent erroneous decisions, which can improve the efficiency of cache management and performance.
Abstract: This study proposes a cache replacement policy technique to increase the cache hit rate. This policy can improve the efficiency of cache management and performance. Heuristic cache replacement policies are mechanisms that are designed empirically in advance to determine what needs to be replaced. This study explains why the heuristic policy does not achieve a high accuracy for certain patterns of data. A machine learning method is proposed to predict the blocks that need to be requested in the future to prevent erroneous decisions. The core operation of the proposed method is that when a cache miss occurs, the machine learning model predicts a future block reference sequence that is based on the block reference sequence of the input sequence. The predicted block is added to the prediction buffer and the predicted block is removed from the non-access buffer if it exists in the non-access buffer. After filling the prediction buffer, the conventional replacement policy can be replaced with a time complexity of O(1) by replacing the block with a non-access buffer. The proposed method improves the least recently used (LRU) algorithm by 77%, the least frequently used (LFU) algorithm by 65%, and the adaptive replacement cache (ARC) by 77% and shows a hit rate similar to that of state-of-the-art research. The proposed method reinforces the existing heuristic policy and enables a consistent performance for LRU- and LFU-friendly workloads.

3 citations


Journal ArticleDOI
TL;DR: In this article , the authors proposed a hybrid Short Long History Table-based Instruction Prefetcher (HCIP) for the L1-I cache, which makes use of a hybrid configuration of the two history-based prefetchers tables that are LST and Short History Table (SHT).
Abstract: In modern applications, instruction cache misses have become a performance constraint, and numerous prefetchers have been developed to conceal memory latency. With today's client and server workloads, large instruction working sets require more. These working sets are typically large enough to fit in the Last Level Cache (LLC). However, the Level 1 Instruction (L1-I) cache has a high miss rate, which typically prevents the processor front-end from receiving instructions. Instruction prefetching is a latency hiding method that allows the LLC to send instructions to the L1-I cache. In order to design a high-performance cache architecture, prefetching instructions in the L1-I cache is a fundamental approach. When developing an efficient and effective prefetcher, accuracy and coverage are the most important parameters to be considered. This paper proposed a novel Hybrid Short Long History Table-based Cache Instruction Prefetcher (HCIP) for the L1-I cache. The HCIP makes use of a hybrid configuration of the two history-based prefetchers tables that are Long History Table (LST) and Short History Table (SHT). The transitive closure of the control flow graph is the PRE+PC table used in HCIP. In contrast to PIPS and NOPREF, HCIP indicates maximum coverage of 67% for the majority of the benchmarks given.

Journal ArticleDOI
TL;DR: In this article , a cache control mechanism is proposed to improve energy efficiency by adjusting a cache hierarchy to each application based on the cache usage behaviors of individual applications, which can achieve significant energy saving at the sacrifice of small performance degradation.
Abstract: As the number of cores on a processor increases, cache hierarchies contain more cache levels and a larger last level cache (LLC). Thus, the power and energy consumption of the cache hierarchy becomes non-negligible. Meanwhile, because the cache usage behaviors of individual applications can be different, it is possible to achieve higher energy efficiency of the computing system by determining the appropriate cache configurations for individual applications. This paper proposes a cache control mechanism to improve energy efficiency by adjusting a cache hierarchy to each application. Our mechanism first bypasses and disables a less-significant cache level, then partially disables the LLC, and finally adjusts the associativity if it suffers from a large number of conflict misses. The mechanism can achieve significant energy saving at the sacrifice of small performance degradation. The evaluation results show that our mechanism improves energy efficiency by 23.9% and 7.0% on average over the baseline and the cache-level bypassing mechanisms, respectively. In addition, even if the LLC resource contention occurs, the proposed mechanism is still effective for improving energy efficiency.

Proceedings ArticleDOI
13 Jun 2022
TL;DR: A pervasive cache replacement framework to automatically learn the relationship between the probability distribution of different replacement policies and workload distribution by using deep reinforcement learning is proposed and outperforms several state-of-the-art approaches.
Abstract: In the past few decades, much research has been conducted on the design of cache replacement policies. Prior work frequently relies on manually-engineered heuristics to capture the most common cache access patterns, or predict the reuse distance and try to identify the blocks that are either cache-friendly or cache-averse. Researchers are now applying recent advances in machine learning to guide cache replacement policy, augmenting or replacing traditional heuristics and data structures. However, most existing approaches depend on the certain environment which restricted their application, e.g, most of the approaches only consider the on-chip cache consisting of program counters (PCs). Moreover, those approaches with attractive hit rates are usually unable to deal with modern irregular workloads, due to the limited feature used. In contrast, we propose a pervasive cache replacement framework to automatically learn the relationship between the probability distribution of different replacement policies and workload distribution by using deep reinforcement learning. We train an end-to-end cache replacement policy only on the past requested address through two simple and stable cache replacement policies. Furthermore, the overall framework can be easily plugged into any scenario that requires cache. Our simulation results on 8 production storage traces run against 3 different cache configurations confirm that the proposed cache replacement policy is effective and outperforms several state-of-the-art approaches.

Proceedings ArticleDOI
01 May 2022
TL;DR: CASY (CPU Cache Allocation SYstem), a system that performs CPU cache allocation for serverless functions using the Intel CAT technology, is proposed and implemented and integrated into the Open Whisk FaaS platform.
Abstract: Function as a Service (FaaS) has become a key service in the cloud. It enables customers to conceive their appli-cation as a collection of minimal serverless functions interacting with each other. FaaS platforms abstract all the management complexity to the client. This emerging paradigm is also attractive because of its billing model. Clients are charged based on the execution time of functions, allowing finer-grained pricing. There-fore, executing functions as fast as possible is very important to lower the cost. Several research studies have investigated runtime optimization in FaaS environments, but none have explored CPU cache allocation. Indeed, CPU cache contention is a well-known issue in software and FaaS is not exempt from this issue. Various hardware improvements have been made to address the CPU cache partitioning problem. Among other things, Intel has implemented a new technology in their new processors that allows cache partitioning: Cache Allocation Technology (CAT). This technology allows allocating cache ways to processes, and the usage of the cache by each process will be limited to the allocated amount. In this paper, we propose CASY (CPU Cache Allocation SYstem), a system that performs CPU cache allocation for serverless functions using the Intel CAT technology. CASY uses machine learning to build a cache usage profile for functions and uses this profile to predict the cache requirements based on the function's input data. Because the CPU's cache size is small, CASY integrates an allocation algorithm which ensures that the cache loads are balanced on all cache ways. We implemented our system and integrated it into the Open Whisk FaaS platform. Our evaluations show a 11 % decrease in execution time for some serverless functions without degrading the performance of other functions.

Posted ContentDOI
TL;DR: In this article, a data structure that facilitates cache blocking is considered, and a range of kernel grouping configurations for an FR based Euler solver are examined, and the most performant configuration leads to a speedup of approximately 2.81x in practice.

Proceedings ArticleDOI
01 Nov 2022
TL;DR: In this paper , the authors considered the coded caching problem with shared caches, where users share the caches, and each user gets access only to one cache, and the number of users connected to each cache is assumed to be known at the server during the placement phase.
Abstract: This work considers the coded caching problem with shared caches, where users share the caches, and each user gets access only to one cache. The number of users connected to each cache is assumed to be known at the server during the placement phase. We focus on the schemes derived using placement delivery arrays (PDAs). The PDAs were originally designed to address the sub-packetization bottleneck of coded caching in a dedicated cache setup. We observe that in the setup of this paper, permuting the columns of the PDA results in schemes with different performances for the same problem, but the sub-packetization level remains the same. This is contrary to what was observed for dedicated cache networks. We propose a procedure to identify the ordering of columns that gives the best performance possible from the PDA employed in the given problem. Further, the performance gain achieved by reordering the columns of the PDA is illustrated using certain classes of PDAs.

Journal ArticleDOI
TL;DR: In this article , a write-optimized edge storage system via concurrent microwrites merging is proposed to solve the problem of frequent competition on cache blocks, massive fragments caused by merging, and cache pollution due to cache updating.

Journal ArticleDOI
TL;DR: A Dynamic Cooperative Cache Management Scheme (DCCMS) based on social and popular data is proposed, which improves the cache efficiency and implements it in a dynamic environment and simulation results show that the proposed DCCMS scheme improved the cache performance than other state-of-the-art approaches.
Abstract: Vehicular Named Data Network (VNDN) is considered a strong paradigm to deploy in vehicular applications. In VNDN, each node has its cache, but due to limited cache, it directly affects the performance in a highly dynamic environment, which requires massive and fast content delivery. To reduce these issues, the cooperative caching plays an efficient role in VNDN. Most studies regarding cooperative caching focus on content replacement and caching algorithms and implement these methods in a static environment rather than a dynamic environment. In addition, few existing approaches addressed the cache diversity and latency in VNDN. This paper proposes a Dynamic Cooperative Cache Management Scheme (DCCMS) based on social and popular data, which improves the cache efficiency and implements it in a dynamic environment. We designed a two-level dynamic caching scheme, in which we choose the right caching node that frequently communicates with other nodes, keep the copy of the most popular content, and distribute it with the requester’s node when needed. The main intention of DCCMS is to improve the cache performance in terms of reducing latency, server load, cache hit ratio, average hop count, cache utilization, and diversity. The simulation results show that our proposed DCCMS scheme improves the cache performance than other state-of-the-art approaches.

Proceedings ArticleDOI
09 Jun 2022
TL;DR: The fundamental idea behind the approach is that real-world instances of the problem have specific structural properties that can be exploited to obtain efficient algorithms with strong approximation guarantees, and it provides fixed-parameter tractable algorithms that provably approximate the optimal number of cache misses within any factor 1 + є.
Abstract: There is a huge and growing gap between the speed of accesses to data stored in main memory vs cache. Thus, cache misses account for a significant portion of runtime overhead in virtually every program and minimizing them has been an active research topic for decades. The primary and most classical formal model for this problem is that of Cache-conscious Data Placement (CDP): given a commutative cache with constant capacity k and a sequence Σ of accesses to data elements, the goal is to map each data element to a cache line such that the total number of cache misses over Σ is minimized. Note that we are considering an offline single-threaded setting in which Σ is known a priori. CDP has been widely studied since the 1990s. In POPL 2002, Petrank and Rawitz proved a notoriously strong hardness result: They showed that for every k ≥ 3, CDP is not only NP-hard but also hard-to-approximate within any non-trivial factor unless P=NP. As such, all subsequent works gave up on theoretical improvements and instead focused on heuristic algorithms with no theoretical guarantees. In this work, we present the first-ever positive theoretical result for CDP. The fundamental idea behind our approach is that real-world instances of the problem have specific structural properties that can be exploited to obtain efficient algorithms with strong approximation guarantees. Specifically, the access graphs corresponding to many real-world access sequences are sparse and tree-like. This was already well-known in the community but has only been used to design heuristics without guarantees. In contrast, we provide fixed-parameter tractable algorithms that provably approximate the optimal number of cache misses within any factor 1 + є, assuming that the access graph of a specific degree dє is sparse, i.e. sparser real-world instances lead to tighter approximations. Our theoretical results are accompanied by an experimental evaluation in which our approach outperforms past heuristics over small caches with a handful of lines. However, the approach cannot currently handle large real-world caches and making it scalable in practice is a direction for future work.

Book ChapterDOI
01 Jan 2022
TL;DR: In this paper , the authors discuss the functioning of the MESI cache coherence protocol for CMP in which each processor has both private and shared caches, and they discuss how coherency and consistency are maintained in the cache.
Abstract: The chip multiprocessor (CMP) uses cache coherence protocol to maintain the coherency between multiple copies of shared data. The cache coherence is the uniformity of shared data among multiple caches. So writes to a particular location in a cache should update other copies of the same data in other caches. However, coherency does not give information about the order of updation, and when updation of all caches shall be visible to serve the other requests. In addition to coherency, consistency is equally important to ensure that writes to different locations will be seen in order. An efficient cache coherence protocol maintains coherency between data and maintains consistency by faster retrieval of shared data. In this paper, we discuss how coherency and consistency are maintained in the MESI cache coherence protocol. MESI is popularly implemented in various commercial products. We discuss the functioning of directory protocol and MESI cache coherence protocol for CMP in which each processor has both private and shared caches.

Proceedings ArticleDOI
26 Jun 2022
TL;DR: In this paper , the privacy of the users' demands was taken into consideration, i.e., each user, while retrieving its own demanded file, cannot obtain any information on the demands of the other users.
Abstract: Hachem et al. formulated a multiaccess coded caching model which consists of a central server connected to K users via an error-free shared link, and K cache-nodes. Each cache-node is equipped with a local cache and each user can access L neighbouring cache-nodes in a cyclic wraparound fashion. In this paper, we take the privacy of the users’ demands into consideration, i.e., each user, while retrieving its own demanded file, cannot obtain any information on the demands of the other users. By storing some private keys at the cache-nodes, we develop a novel transformation approach to turn any non-private coded caching scheme (satisfying some constraints) into a private one.

Proceedings ArticleDOI
10 Sep 2022
TL;DR: Evaluation results confirm the capability of CaType in identifying side channel defects with great precision, efficiency, and scalability.
Abstract: Cache side-channel attacks exhibit severe threats to software security and privacy, especially for cryptosystems. In this paper, we propose CaType, a novel refinement type-based tool for detecting cache side channels in crypto software. Compared to previous works, CaType provides the following advantages: (1) For the first time CaType analyzes cache side channels using refinement type over x86 assembly code. It reveals several significant and effective enhancements with refined types, including bit-level granularity tracking, distinguishing different effects of variables, precise type inferences, and high scalability. (2) CaType is the first static analyzer for crypto libraries in consideration of blinding-based defenses. (3) From the perspective of implementation, CaType uses cache layouts of potential vulnerable control-flow branches rather than cache states to suppress false positives. We evaluate CaType in identifying side channel vulnerabilities in real-world crypto software, including RSA, ElGamal, and (EC)DSA from OpenSSL and Libgcrypt. CaType captures all known defects, detects previously-unknown vulnerabilities, and reveals several false positives of previous tools. In terms of performance, CaType is 16X faster than CacheD and 131X faster than CacheS when analyzing the same libraries. These evaluation results confirm the capability of CaType in identifying side channel defects with great precision, efficiency, and scalability.

Proceedings ArticleDOI
28 Mar 2022
TL;DR: This work introduces a hybrid approach, warping cache simulation, that aims to achieve applicability to real-world cache models and problem-size-independent runtimes and focuses on programs in the polyhedral model, which allows to reason about the sequence of memory accesses analytically.
Abstract: Techniques to evaluate a program's cache performance fall into two camps: 1. Traditional trace-based cache simulators precisely account for sophisticated real-world cache models and support arbitrary workloads, but their runtime is proportional to the number of memory accesses performed by the program under analysis. 2. Relying on implicit workload characterizations such as the polyhedral model, analytical approaches often achieve problem-size-independent runtimes, but so far have been limited to idealized cache models. We introduce a hybrid approach, warping cache simulation, that aims to achieve applicability to real-world cache models and problem-size-independent runtimes. As prior analytical approaches, we focus on programs in the polyhedral model, which allows to reason about the sequence of memory accesses analytically. Combining this analytical reasoning with information about the cache behavior obtained from explicit cache simulation allows us to soundly fast-forward the simulation. By this process of warping, we accelerate the simulation so that its cost is often independent of the number of memory accesses.

Proceedings ArticleDOI
25 Mar 2022
TL;DR: In this article , a cache replacement strategy based on user behaviour analysis for file systems (LFU-UB) is proposed, where a log analysis module is built to clean the user access record information and mine association rules, and then the association parameters are transmitted to the computing model.
Abstract: Common cache elimination strategies are to improve the hit ratio of files in specific scenarios. In real scenarios, different users' behaviours often show great differences, and a general cache replacement strategy cannot comprehensively achieve good performance. Considering these problems, this paper designs a cache replacement strategy based on user behaviour analysis for file systems (LFU-UB). First, a log analysis module is built to clean the user's access record information and mine association rules, and then the association parameters are transmitted to the computing model. Then several small files with the lowest priority are selected through the cache replacement module. Finally, resources with the lowest priority are replaced by new resources. The effectiveness of LFU-UB strategy is proved by comparison experiments in the storage environment of massive small files; It has a higher hit ratio than the general cache strategy and can effectively reduce the cache load.

Journal ArticleDOI
TL;DR: This article proves that CARL is optimal under certain statistical assumptions, and proves miss curve convexity, which is useful for optimizing shared cache, and sub-partitioning monotonicity, which simplifies lease compilation.
Abstract: Data movement is a common performance bottleneck, and its chief remedy is caching. Traditional cache management is transparent to the workload: data that should be kept in cache are determined by the recency information only, while the program information, i.e., future data reuses, is not communicated to the cache. This has changed in a new cache design named Lease Cache. The program control is passed to the lease cache by a compiler technique called Compiler Assigned Reference Lease (CARL). This technique collects the reuse interval distribution for each reference and uses it to compute and assign the lease value to each reference. In this article, we prove that CARL is optimal under certain statistical assumptions. Based on this optimality, we prove miss curve convexity, which is useful for optimizing shared cache, and sub-partitioning monotonicity, which simplifies lease compilation. We evaluate the potential using scientific kernels from PolyBench and show that compiler insertions of up to 34 leases in program code achieve similar or better cache utilization (in variable size cache) than the optimal fixed-size caching policy, which has been unattainable with automatic caching but now within the potential of cache programming for all tested programs and most cache sizes.

Proceedings ArticleDOI
14 Mar 2022
TL;DR: Wang et al. as discussed by the authors propose to unify both factors of temporal and spatial locality of user applications by employing the visibility graph technique, for directing cache management, which can yield improvements on cache hits by more than 2.8%, and the overall I/O latency by 20.2% on average.
Abstract: To ensure better I/O performance of solid-state drivers (SSDs), a dynamic random access memory (DRAM) is commonly equipped as a cache to absorb overwrites or writes, instead of directly flushing them onto underlying SSD cells. This paper focuses on the management of the small amount cache inside SSDs. First, we propose to unify both factors of temporal and spatial locality of user applications by employing the visibility graph technique, for directing cache management. Next, we propose to support batch adjustment of adjacent or nearby (hot) cached data pages by referring to the connection situations in the visibility graph of all cached pages. At last, we propose to evict the buffered data pages in batches, to maximize the internal flushing parallelism of SSD devices, without worsening I/O congestion. The trace-driven simulation experiments show that our proposal can yield improvements on cache hits by more than 2.8%, and the overall I/O latency by 20.2% on average, in contrast to conventional cache schemes inside SSDs.

Journal ArticleDOI
Zhidu Li, Fuxiang Li, Tong Tang, Hong Zhang, Jin Yang 
TL;DR: Considering the difference between global and local video popularities and the time-varying characteristics of video popularity, a two-stage caching scheme is proposed to push popular videos closer to users and minimize the average initial buffer delay as mentioned in this paper .

Proceedings ArticleDOI
01 Oct 2022
TL;DR: In this article , the authors propose a new conflict-based cache covert channel named NTP+NTP, which achieves cache conflicts without cache set priming for the first time.
Abstract: Modern $\times$86 processors feature many prefetch instructions that developers can use to enhance performance. However, with some prefetch instructions, users can more directly manipulate cache states which may result in powerful cache covert channel and side channel attacks. In this work, we reverse-engineer the detailed cache behavior of PREFETCHNTA on various Intel processors. Based on the results, we first propose a new conflict-based cache covert channel named NTP+NTP. Prior conflict-based channels often require priming the cache set in order to cause cache conflicts. In contrast, in NTP+NTP, the data of the sender and receiver can compete for one specific way in the cache set, achieving cache conflicts without cache set priming for the first time. As a result, NTP+NTP has higher bandwidth than prior conflict-based channels such as Prime+Probe. The channel capacity of NTP+NTP is 302 KB/s. Second, we found that PREFETCHNTA can also be used to boost the performance of existing side channel attacks that utilize cache replacement states, making those attacks much more efficient than before.

Journal ArticleDOI
TL;DR: In this article , a new cache management scheme, Weight-aware cache (WaC), which reflects the I/O weights on cache allocation and reclamation, is proposed to achieve application-level proportionality.
Abstract: Virtualization technology has enabled server consolidation where multiple servers are co-located on a single physical machine to improve resource utilization. In such systems, proportional I/O sharing is critical to meet the SLO (Service-Level Objectives) of the applications running in each virtual instance. However, previous studies focus on block-level I/O proportionality without considering the upper-layer I/O caches, which handle I/O requests on behalf of the underlying storage devices, thereby failing to achieve application-level proportional I/O sharing. To overcome this limitation, we propose a new cache management scheme, Weight-aware Cache (WaC), which reflects the I/O weights on cache allocation and reclamation. Specifically, WaC prioritizes higher-weighted applications in the lock acquisition process of cache allocation by re-ordering the lock waiting queue based on I/O weight. Additionally, WaC keeps the number of cache entries of each application proportional to its I/O weight, through weight-aware cache reclamation. To verify the efficacy of our scheme, we implement and evaluate WaC on both the page cache and bcache. The experimental results demonstrate that our scheme improves I/O proportionality with negligible overhead in various cases.

Journal ArticleDOI
01 Oct 2022
TL;DR: In this article , a flexible time-based eviction model is proposed to derive the average system cost function that measures the system's cost due to the service of aging content in addition to the regular cache miss cost.
Abstract: We introduce a framework and provably-efficient schemes for ‘fresh’ caching at the (front-end) local cache of content that is subject to ‘dynamic’ updates at the (back-end) database. We start by formulating the hard-cache-constrained problem for this setting, which quickly becomes intractable due to the limited cache. To bypass this challenge, we first propose a flexible time-based-eviction model to derive the average system cost function that measures the system’s cost due to the service of aging content in addition to the regular cache miss cost. Next, we solve the cache-unconstrained case, which reveals how the refresh dynamics and popularity of content affect optimal caching. Then, we extend our approach to a soft-cache-constrained version, where we can guarantee that the cache use is limited with arbitrarily high probability. The corresponding solution reveals the interesting insight that ‘whether to cache an item or not in the local cache?’ depends primarily on its popularity level and channel reliability, whereas ‘how long the cached item should be held in the cache before eviction?’ depends primarily on its refresh rate. Moreover, we investigate the cost-cache saving trade-offs and prove that substantial cache gains can be obtained while also asymptotically achieving the minimum cost as the database size grows.