Showing papers on "Cache algorithms published in 2017"

PDF

Open Access

Proceedings Article•

[...]

Shaizeen Aga¹, Supreet Jeloka¹, Arun Subramaniyan¹, Satish Narayanasamy¹, David Blaauw¹, Reetuparna Das¹ - Show less +2 more•Institutions (1)

University of Michigan¹

01 Feb 2017

TL;DR: This paper presents the Compute Cache architecture that enables in-place computation in caches, which uses emerging bit-line SRAM circuit technology to repurpose existing cache elements and transforms them into active very large vector computational units.

...read moreread less

Abstract: This paper presents the Compute Cache architecturethat enables in-place computation in caches. ComputeCaches uses emerging bit-line SRAM circuit technology to repurpose existing cache elements and transforms them into active very large vector computational units. Also, it significantlyreduces the overheads in moving data between different levelsin the cache hierarchy. Solutions to satisfy new constraints imposed by ComputeCaches such as operand locality are discussed. Also discussedare simple solutions to problems in integrating them into aconventional cache hierarchy while preserving properties suchas coherence, consistency, and reliability. Compute Caches increase performance by 1.9× and reduceenergy by 2.4× for a suite of data-centric applications, includingtext and database query processing, cryptographic kernels, and in-memory checkpointing. Applications with larger fractionof Compute Cache operations could benefit even more, asour micro-benchmarks indicate (54× throughput, 9× dynamicenergy savings).

...read moreread less

225 citations

Proceedings Article•

Strong and efficient cache side-channel protection using hardware transactional memory

[...]

Daniel Gruss¹, Julian Lettner¹, Felix Schuster¹, Olga Ohrimenko¹, Istvan Haller¹, Manuel Costa¹ - Show less +2 more•Institutions (1)

Microsoft¹

16 Aug 2017

TL;DR: Cloak, a new technique that uses hardware transactional memory to prevent adversarial observation of cache misses on sensitive code and data, provides strong protection against all known cache-based side-channel attacks with low performance overhead.

...read moreread less

Abstract: Cache-based side-channel attacks are a serious problem in multi-tenant environments, for example, modern cloud data centers. We address this problem with Cloak, a new technique that uses hardware transactional memory to prevent adversarial observation of cache misses on sensitive code and data. We show that Cloak provides strong protection against all known cache-based side-channel attacks with low performance overhead. We demonstrate the efficacy of our approach by retrofitting vulnerable code with Cloak and experimentally confirming immunity against state-of-the-art attacks. We also show that by applying Cloak to code running inside Intel SGX enclaves we can effectively block information leakage through cache side channels from enclaves, thus addressing one of the main weaknesses of SGX.

...read moreread less

194 citations

Proceedings Article•DOI•

Age-optimal constrained cache updating

[...]

Roy D. Yates¹, Philippe Ciblat², Aylin Yener³, Micheie Wigger²•Institutions (3)

Rutgers University¹, Université Paris-Saclay², Pennsylvania State University³

25 Jun 2017

TL;DR: This work considers a system where a local cache maintains a collection of N dynamic content items that are randomly requested by local users and shows that an asymptotically optimal policy updates a cached item in proportion to the square root of the item's popularity.

...read moreread less

Abstract: We consider a system where a local cache maintains a collection of N dynamic content items that are randomly requested by local users. A capacity-constrained link to a remote network server limits the ability of the cache to hold the latest version of each item at all times, making it necessary to design an update policy. Using an age of information metric, we show under a relaxed problem formulation that an asymptotically optimal policy updates a cached item in proportion to the square root of the item's popularity. We then show experimentally that a physically realizable policy closely approximates the asymptotic optimal policy.

...read moreread less

169 citations

Journal Article•DOI•

Cache Placement in Fog-RANs: From Centralized to Distributed Algorithms

[...]

Juan Liu¹, Bo Bai², Jun Zhang³, Khaled Ben Letaief³•Institutions (3)

Ningbo University¹, Huawei², Hong Kong University of Science and Technology³

11 Aug 2017-IEEE Transactions on Wireless Communications

TL;DR: In this paper, the authors studied the cache placement problem in fog radio access networks (Fog-RANs), by taking into account flexible physical-layer transmission schemes and diverse content preferences of different users.

...read moreread less

Abstract: To deal with the rapid growth of high-speed and/or ultra-low latency data traffic for massive mobile users, fog radio access networks (Fog-RANs) have emerged as a promising architecture for next-generation wireless networks. In Fog-RANs, the edge nodes and user terminals possess storage, computation and communication functionalities to various degrees, which provide high flexibility for network operation, i.e., from fully centralized to fully distributed operation. In this paper, we study the cache placement problem in Fog-RANs, by taking into account flexible physical-layer transmission schemes and diverse content preferences of different users. We develop both centralized and distributed transmission aware cache placement strategies to minimize users’ average download delay subject to the storage capacity constraints. In the centralized mode, the cache placement problem is transformed into a matroid constrained submodular maximization problem, and an approximation algorithm is proposed to find a solution within a constant factor to the optimum. In the distributed mode, a belief propagation-based distributed algorithm is proposed to provide a suboptimal solution, with iterative updates at each BS based on locally collected information. Simulation results show that by exploiting caching and cooperation gains, the proposed transmission aware caching algorithms can greatly reduce the users’ average download delay.

...read moreread less

126 citations

Proceedings Article•DOI•

Secure Hierarchy-Aware Cache Replacement Policy (SHARP): Defending Against Cache-Based Side Channel Atacks

[...]

Mengjia Yan¹, Bhargava Gopireddy¹, Thomas Shull¹, Josep Torrellas¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

24 Jun 2017

TL;DR: This paper proposes to alter the line replacement algorithm of the shared cache, to prevent a process from creating inclusion victims in the caches of cores running other processes, and calls it SHARP (Secure Hierarchy-Aware cache Replacement Policy).

...read moreread less

Abstract: In cache-based side channel attacks, a spy that shares a cache with a victim probes cache locations to extract information on the victim's access patterns. For example, in evict+reload, the spy repeatedly evicts and then reloads a probe address, checking if the victim has accessed the address in between the two operations. While there are many proposals to combat these cache attacks, they all have limitations: they either hurt performance, require programmer intervention, or can only defend against some types of attacks.This paper makes the following observation for an environment with an inclusive cache hierarchy: when the spy evicts the probe address from the shared cache, the address will also be evicted from the private cache of the victim process, creating an inclusion victim. Consequently, to disable cache attacks, this paper proposes to alter the line replacement algorithm of the shared cache, to prevent a process from creating inclusion victims in the caches of cores running other processes. By enforcing this rule, the spy cannot evict the probe address from the shared cache and, hence, cannot glimpse any information on the victim's access patterns. We call our proposal SHARP (Secure Hierarchy-Aware cache Replacement Policy). SHARP efficiently defends against all existing cross-core shared-cache attacks, needs only minimal hardware modifications, and requires no code modifications. We implement SHARP in a cycle-level full-system simulator. We show that it protects against real-world attacks, and that it introduces negligible average performance degradation.

...read moreread less

109 citations

Journal Article•DOI•

On the Complexity of Optimal Request Routing and Content Caching in Heterogeneous Cache Networks

[...]

Mostafa Dehghan¹, Bo Jiang¹, Anand Seetharam², Ting He³, Theodoros Salonidis⁴, Jim Kurose¹, Don Towsley¹, Ramesh K. Sitaraman¹ - Show less +4 more•Institutions (4)

University of Massachusetts Amherst¹, Binghamton University², Pennsylvania State University³, IBM⁴

01 Jun 2017-IEEE ACM Transactions on Networking

TL;DR: This work investigates the problem of developing optimal joint routing and caching policies in a network supporting in-network caching with the goal of minimizing expected content-access delay and identifies the structural property of the user-cache graph that makes the problem NP-complete.

...read moreread less

Abstract: In-network content caching has been deployed in both the Internet and cellular networks to reduce content-access delay. We investigate the problem of developing optimal joint routing and caching policies in a network supporting in-network caching with the goal of minimizing expected content-access delay. Here, needed content can either be accessed directly from a back-end server (where content resides permanently) or be obtained from one of multiple in-network caches. To access content, users must thus decide whether to route their requests to a cache or to the back-end server. In addition, caches must decide which content to cache. We investigate two variants of the problem, where the paths to the back-end server can be considered as either congestion-sensitive or congestion-insensitive, reflecting whether or not the delay experienced by a request sent to the back-end server depends on the request load, respectively. We show that the problem of optimal joint caching and routing is NP-complete in both cases. We prove that under the congestion-insensitive delay model, the problem can be solved optimally in polynomial time if each piece of content is requested by only one user, or when there are at most two caches in the network. We also identify the structural property of the user-cache graph that makes the problem NP-complete. For the congestion-sensitive delay model, we prove that the problem remains NP-complete even if there is only one cache in the network and each content is requested by only one user. We show that approximate solutions can be found for both cases within a $(1-1/e)$ factor from the optimal, and demonstrate a greedy solution that is numerically shown to be within 1% of optimal for small problem sizes. Through trace-driven simulations, we evaluate the performance of our greedy solutions to joint caching and routing, which show up to 50% reduction in average delay over the solution of optimized routing to least recently used caches.

...read moreread less

107 citations

Posted Content•

Cache Placement in Fog-RANs: From Centralized to Distributed Algorithms

[...]

Juan Liu¹, Bo Bai², Jun Zhang³, Khaled Ben Letaief³•Institutions (3)

Ningbo University¹, Huawei², Hong Kong University of Science and Technology³

10 Aug 2017-arXiv: Signal Processing

TL;DR: In this article, the authors studied the cache placement problem in fog-RANs, by taking into account flexible physical-layer transmission schemes and diverse content preferences of different users, and developed both centralized and distributed transmission aware cache placement strategies to minimize users' average download delay subject to the storage capacity constraints.

...read moreread less

Abstract: To deal with the rapid growth of high-speed and/or ultra-low latency data traffic for massive mobile users, fog radio access networks (Fog-RANs) have emerged as a promising architecture for next-generation wireless networks. In Fog-RANs, the edge nodes and user terminals possess storage, computation and communication functionalities to various degrees, which provides high flexibility for network operation, i.e., from fully centralized to fully distributed operation. In this paper, we study the cache placement problem in Fog-RANs, by taking into account flexible physical-layer transmission schemes and diverse content preferences of different users. We develop both centralized and distributed transmission aware cache placement strategies to minimize users' average download delay subject to the storage capacity constraints. In the centralized mode, the cache placement problem is transformed into a matroid constrained submodular maximization problem, and an approximation algorithm is proposed to find a solution within a constant factor to the optimum. In the distributed mode, a belief propagation based distributed algorithm is proposed to provide a suboptimal solution, with iterative updates at each BS based on locally collected information. Simulation results show that by exploiting caching and cooperation gains, the proposed transmission aware caching algorithms can greatly reduce the users' average download delay.

...read moreread less

94 citations

Proceedings Article•DOI•

Characterizing the rate-memory tradeoff in cache networks within a factor of 2

[...]

Qian Yu¹, Mohammad Ali Maddah-Ali², A. Salman Avestimehr¹•Institutions (2)

University of Southern California¹, Bell Labs²

25 Jun 2017

TL;DR: In this article, the authors considered a basic caching system, where a single server with a database of N files (e.g. movies) is connected to a set of K users through a shared bottleneck link.

...read moreread less

Abstract: We consider a basic caching system, where a single server with a database of N files (e.g. movies) is connected to a set of K users through a shared bottleneck link. Each user has a local cache memory with a size of M files. The system operates in two phases: a placement phase, where each cache memory is populated up to its size from the database, and a following delivery phase, where each user requests a file from the database, and the server is responsible for delivering the requested contents. The objective is to design the two phases to minimize the load (peak or average) of the bottleneck link. We characterize the rate-memory tradeoff of the above caching system within a factor of 2.00884 for both the peak rate and the average rate (under uniform file popularity), where the best proved characterization in the current literature gives a factor of 4 and 4.7 respectively. Moreover, in the practically important case where the number of files (N) is large, we exactly characterize the tradeoff for systems with no more than 5 users, and characterize the tradeoff within a factor of 2 otherwise. We establish these results by developing novel information theoretic outer-bounds for the caching problem, which improves the state of the art and gives tight characterization in various cases.

...read moreread less

81 citations

Journal Article•DOI•

Edge Caching for Layered Video Contents in Mobile Social Networks

[...]

Zhou Su¹, Qichao Xu¹, Fen Hou², Qing Yang³, Qifan Qi¹ - Show less +1 more•Institutions (3)

Shanghai University¹, University of Macau², University of North Texas³

28 Jul 2017-IEEE Transactions on Multimedia

TL;DR: Simulation results show that the proposed method outperforms the exiting counterparts with a higher hit ratio and lower delay of delivering video contents, and leveraging the backward induction method, the optimal strategy of each player in the game model is proposed.

...read moreread less

Abstract: To improve the performance of mobile video delivery, caching layered videos at a site near to mobile end users (e.g., at the edge of mobile service provider's backbone) was advocated because cached videos can be delivered to mobile users with a high quality of experience, e.g., a short latency. How to optimally cache layered videos based on caching price, the available capacity of cache nodes, and the social features of mobile users, however, is still a challenging issue. In this paper, we propose a novel edge caching scheme to cache layered videos. First, a framework to cache layered videos is presented in which a cache node stores layered videos for multiple social groups, formed by mobile users based on their requests. Due to the limited capacity of the cache node, these social groups compete with each other for the number of layers they request to cache, aiming at maximizing their utilities while all mobile users in each group share the cost involved in the cache of video contents. Second, a Stackelberg game model is developed to study the interaction among multiple social groups and the cache node, and a noncooperative game model is introduced to analyze the competition among mobile users in different social groups. Third, leveraging the backward induction method, the optimal strategy of each player in the game model is proposed. Finally, simulation results show that the proposed method outperforms the exiting counterparts with a higher hit ratio and lower delay of delivering video contents.

...read moreread less

74 citations

Proceedings Article•DOI•

How secure is your cache against side-channel attacks?

[...]

Zecheng He¹, Ruby B. Lee¹•Institutions (1)

Princeton University¹

14 Oct 2017

TL;DR: A novel probabilistic information flow graph is proposed to model the interaction between the victim program, the attacker program and the cache architecture, and a new metric, the Probability of Attack Success (PAS), is derived, which gives a quantitative measure for evaluating a cache’s resilience against a given class of cache side-channel attacks.

...read moreread less

Abstract: Security-critical data can leak through very unexpected side channels, making side-channel attacks very dangerous threats to information security. Of these, cache-based side-channel attacks are some of the most problematic. This is because caches are essential for the performance of modern computers, but an intrinsic property of all caches – the different access times for cache hits and misses – is the property exploited to leak information in time-based cache side-channel attacks. Recently, different secure cache architectures have been proposed to defend against these attacks. However, we do not have a reliable method for evaluating a cache’s resilience against different classes of cache side-channel attacks, which is the goal of this paper.We first propose a novel probabilistic information flow graph (PIFG) to model the interaction between the victim program, the attacker program and the cache architecture. From this model, we derive a new metric, the Probability of Attack Success (PAS), which gives a quantitative measure for evaluating a cache’s resilience against a given class of cache side-channel attacks. We show the generality of our model and metric by applying them to evaluate nine different cache architectures against all four classes of cache side-channel attacks. Our new methodology, model and metric can help verify the security provided by different proposed secure cache architectures, and compare them in terms of their resilience to cache side-channel attacks, without the need for simulation or taping out a chip.CCS CONCEPTS• Security and privacy $\rightarrow $ Side-channel analysis and counter-measures; • General and reference $\rightarrow$ Evaluation; • Computer systems organization $\rightarrow $ Processors and memory architectures;

...read moreread less

72 citations

Proceedings Article•DOI•

Banshee: bandwidth-efficient DRAM caching via software/hardware cooperation

[...]

Xiangyao Yu¹, Christopher J. Hughes², Nadathur Satish², Onur Mutlu³, Srinivas Devadas¹ - Show less +1 more•Institutions (3)

Massachusetts Institute of Technology¹, Intel², ETH Zurich³

14 Oct 2017

TL;DR: Banshee is a new DRAM cache design that optimizes for both in-package and off-package DRAM bandwidth efficiency without degrading access latency and reduces unnecessary DRAM caches replacement traffic with a new bandwidth-aware frequency-based replacement policy.

...read moreread less

Abstract: Placing the DRAM in the same package as a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising use of in-package DRAM is as a large cache. Unfortunately, most previous DRAM cache designs optimize mainly for cache hit latency and do not consider bandwidth efficiency as a first-class design constraint. Hence, as we show in this paper, these designs are suboptimal for use with in-package DRAM.We propose a new DRAM cache design, Banshee, that optimizes for both in-package and off-package DRAM bandwidth efficiency without degrading access latency. Banshee is based on two key ideas. First, it eliminates the tag lookup overhead by tracking the contents of the DRAM cache using TLBs and page table entries, which is efficiently enabled by a new lightweight TLB coherence protocol we introduce. Second, it reduces unnecessary DRAM cache replacement traffic with a new bandwidth-aware frequency-based replacement policy. Our evaluations show that Banshee significantly improves performance (15% on average) and reduces DRAM traffic (35.8% on average) over the best-previous latency-optimized DRAM cache design.CCS CONCEPTS•Computersystemsorganization → Multicore architectures; {\it Heterogeneous (hybrid) systems;

...read moreread less

Journal Article•DOI•

An Efficient Cache Strategy in Information Centric Networking Vehicle-to-Vehicle Scenario

[...]

Weicheng Zhao¹, Yajuan Qin¹, Deyun Gao¹, Chuan Heng Foh², Han-Chieh Chao³ - Show less +1 more•Institutions (3)

Beijing Jiaotong University¹, University of Surrey², National Dong Hwa University³

09 Jun 2017-IEEE Access

TL;DR: A popularity prediction-based cooperative cache replacement mechanism, which predicts and ranks popular content during a period of time is put forward, which aims to lower the cache replacement overhead and reduce the cache redundancy.

...read moreread less

Abstract: Information centric networking (ICN) has been recently proposed as a prominent solution for content delivery in vehicular ad hoc networks. By caching the data packets in vehicular unused storage space, vehicles can obtain the replicate of contents from other vehicles instead of original content provider, which reduces the access pressure of content provider and increases the response speed of content request. In this paper, we propose a community similarity and population-based cache policy in an ICN vehicle-to-vehicle scenario. First, a dynamic probability caching scheme is designed by evaluating the community similarity and privacy rating of vehicles. Then, a caching vehicle selection method with hop numbers based on content popularity is proposed to reduce the cache redundancy. Moreover, to lower the cache replacement overhead, we put forward a popularity prediction-based cooperative cache replacement mechanism, which predicts and ranks popular content during a period of time. Simulation results show that the performance of our proposed mechanisms is greatly outstanding in reducing the average time delay and increasing the cache hit ratio and the cache hit distance.

...read moreread less

Proceedings Article•

Hyperbolic caching: flexible caching for web applications

[...]

Aaron Blankstein¹, Sercan Sen², Michael J. Freedman¹•Institutions (2)

Princeton University¹, Microsoft²

12 Jul 2017

TL;DR: This work designs a new caching algorithm for web applications called hyperbolic caching, which decays item priorities at variable rates and continuously reorders many items at once and introduces the notion of a cost class in order to measure the costs and manipulate the priorities of all items belonging to a related group.

...read moreread less

Abstract: Today's web applications rely heavily on caching to reduce latency and backend load, using services like Redis or Memcached that employ inflexible caching algorithms. But the needs of each application vary, and significant performance gains can be achieved with a tailored strategy, e.g., incorporating cost of fetching, expiration time, and so forth. Existing strategies are fundamentally limited, however, because they rely on data structures to maintain a total ordering of the cached items. Inspired by Redis's use of random sampling for eviction (in lieu of a data structure) and recent theoretical justification for this approach, we design a new caching algorithm for web applications called hyperbolic caching. Unlike prior schemes, hyperbolic caching decays item priorities at variable rates and continuously reorders many items at once. By combining random sampling with lazy evaluation of the hyperbolic priority function, we gain complete flexibility in customizing the function. For example, we describe extensions that incorporate item cost, expiration time, and windowing. We also introduce the notion of a cost class in order to measure the costs and manipulate the priorities of all items belonging to a related group. We design a hyperbolic caching variant for several production systems from leading cloud providers. We implement our scheme in Redis and the Django web framework. Using real and simulated traces, we show that hyperbolic caching reduces miss rates by ∼10-20% over competitive baselines tailored to the application, and improves end-to-end throughput by ∼5-10%.

...read moreread less

Proceedings Article•DOI•

Cache automaton

[...]

Arun Subramaniyan¹, Jingcheng Wang¹, Ezhil R. M. Balasubramanian¹, David Blaauw¹, Dennis Sylvester¹, Reetuparna Das¹ - Show less +2 more•Institutions (1)

University of Michigan¹

14 Oct 2017

TL;DR: Cache Automaton as discussed by the authors extends a conventional last-level cache architecture with components to accelerate two phases in NFA processing: state-match and state-transition, which is made efficient using a sense-amplifier cycling technique that exploits spatial locality in symbol matches.

...read moreread less

Abstract: Finite State Automata are widely used to accelerate pattern matching in many emerging application domains like DNA sequencing and XML parsing. Conventional CPUs and compute-centric accelerators are bottlenecked by memory bandwidth and irregular memory access patterns in automata processing. We present Cache Automaton, which repurposes last-level cache for automata processing, and a compiler that automates the process of mapping large real world Non-Deterministic Finite Automata (NFAs) to the proposed architecture. Cache Automaton extends a conventional last-level cache architecture with components to accelerate two phases in NFA processing: state-match and state-transition. State-matching is made efficient using a sense-amplifier cycling technique that exploits spatial locality in symbol matches. State-transition is made efficient using a new compact switch architecture. By overlapping these two phases for adjacent symbols we realize an efficient pipelined design. We evaluate two designs, one optimized for performance and the other optimized for space, across a set of 20 diverse benchmarks. The performance optimized design provides a speedup of 15× over DRAM-based Micron’s Automata Processor and 3840× speedup over processing in a conventional x86 CPU. The proposed design utilizes on an average 1.2 MB of cache space across benchmarks, while consuming 2.3 nJ of energy per input symbol. Our space optimized design can reduce the cache utilization to 0.72 MB, while still providing a speedup of 9× over AP. CCS CONCEPTS • Hardware → Emerging architectures; • Theory of computation → Formal languages and automata theory;

...read moreread less

Journal Article•DOI•

Optimizing Cache Placement for Heterogeneous Small Cell Networks

[...]

Jialing Liao¹, Kai-Kit Wong¹, Muhammad R. A. Khandaker¹, Zhongbin Zheng•Institutions (1)

University College London¹

01 Jan 2017-IEEE Communications Letters

TL;DR: This letter studies the optimization for cache content placement to minimize the backhaul load subject to cache capacity constraints for caching enabled small cell networks with heterogeneous file and cache sizes.

...read moreread less

Abstract: In this letter, we study the optimization for cache content placement to minimize the backhaul load subject to cache capacity constraints for caching enabled small cell networks with heterogeneous file and cache sizes. Multicast content delivery is adopted to reduce the backhaul rate exploiting the independence among maximum distance separable coded packets.

...read moreread less

Proceedings Article•DOI•

vCAT: Dynamic Cache Management Using CAT Virtualization

[...]

Meng Xu¹, Linh Thi, Xuan Phan¹, Hyon-Young Choi¹, Insup Lee¹ - Show less +1 more•Institutions (1)

University of Pennsylvania¹

01 Apr 2017

TL;DR: In this paper, the authors present vCAT, a novel design for dynamic shared cache management on multicore virtualization platforms based on Intel's cache allocation technology (CAT), which achieves strong isolation at both task and VM levels through cache partition virtualization.

...read moreread less

Abstract: This paper presents vCAT, a novel design for dynamic shared cache management on multicore virtualization platforms based on Intel's Cache Allocation Technology (CAT). Our design achieves strong isolation at both task and VM levels through cache partition virtualization, which works in a similar way as memory virtualization, but has challenges that are unique to cache and CAT. To demonstrate the feasibility and benefits of our design, we provide a prototype implementation of vCAT, and we present an extensive set of microbenchmarks and performance evaluation results on the PARSEC benchmarks and synthetic workloads, for both static and dynamic allocations. The evaluation results show that (i) vCAT can be implemented with minimal overhead, (ii) it can be used to mitigate shared cache interference, which could have caused task WCET increased by up to 7.2×, (iii) static management in vCAT can increase system utilization by up to 7× compared to a system without cache management, and (iv) dynamic management substantially outperforms static management in terms of schedulable utilization (increase by up to 3× in our multi-mode example use case).

...read moreread less

Proceedings Article•DOI•

Centralized Coded Caching with Heterogeneous Cache Sizes

[...]

Abdelrahman M. Ibrahim¹, Ahmed A. Zewail¹, Aylin Yener¹•Institutions (1)

Pennsylvania State University¹

19 Mar 2017

TL;DR: This paper proposes an optimization framework for cache placement and delivery schemes which explicitly accounts for the heterogeneity of the cache sizes, and characterize explicitly the optimal caching scheme, for the case where the sum of the users' cache sizes is smaller than or equal to the library size.

...read moreread less

Abstract: Coded caching can improve fundamental limits of communication, utilizing storage memory at individual users. This paper considers a centralized coded caching system, introducing heterogeneous cache sizes at the users, i.e., the users' cache memories are of different size. The goal is to design cache placement and delivery policies that minimize the worst-case delivery load on the server. To that end, the paper proposes an optimization framework for cache placement and delivery schemes which explicitly accounts for the heterogeneity of the cache sizes. We also characterize explicitly the optimal caching scheme, for the case where the sum of the users' cache sizes is smaller than or equal to the library size.

...read moreread less

Proceedings Article•DOI•

Access Pattern-Aware Cache Management for Improving Data Utilization in GPU

[...]

Gunjae Koo¹, Yunho Oh², Won Woo Ro², Murali Annavaram¹•Institutions (2)

University of Southern California¹, Yonsei University²

24 Jun 2017

TL;DR: This paper proposes Access Pattern-aware Cache Management (APCM), which dynamically detects the locality type of each load instruction by monitoring the accesses from one exemplary warp, and uses the detected locality type to selectively apply cache bypassing and cache pinning of data based on load locality characterization.

...read moreread less

Abstract: Long latency of memory operation is a prominent performance bottleneck in graphics processing units (GPUs). The small data cache that must be shared across dozens of warps (a collection of threads) creates significant cache contention and premature data eviction. Prior works have recognized this problem and proposed warp throttling which reduces the number of active warps contending for cache space. In this paper we discover that individual load instructions in a warp exhibit four different types of data locality behavior: (1) data brought by a warp load instruction is used only once, which is classified as streaming data (2) data brought by a warp load is reused multiple times within the same warp, called intra-warp locality (3) data brought by a warp is reused multiple times but across different warps, called inter-warp locality (4) and some data exhibit both a mix of intra- and inter-warp locality. Furthermore, each load instruction exhibits consistently the same locality type across all warps within a GPU kernel. Based on this discovery we argue that cache management must be done using per-load locality type information, rather than applying warp-wide cache management policies. We propose Access Pattern-aware Cache Management (APCM), which dynamically detects the locality type of each load instruction by monitoring the accesses from one exemplary warp. APCM then uses the detected locality type to selectively apply cache bypassing and cache pinning of data based on load locality characterization. Using an extensive set of simulations we show that APCM improves performance of GPUs by 34% for cache sensitive applications while saving 27% of energy consumption over baseline GPU.

...read moreread less

Proceedings Article•

Memshare: a dynamic multi-tenant key-value cache

[...]

Asaf Cidon¹, Daniel Rushton², Stephen M. Rumble³, Ryan Stutsman²•Institutions (3)

Stanford University¹, University of Utah², Google³

12 Jul 2017

TL;DR: Web application performance heavily relies on the hit rate of DRAM key-value caches, and Memshare provides a resource sharing model that guarantees reserved memory to different applications while dynamically pooling and sharing the remaining memory to optimize overall hit rate.

...read moreread less

Abstract: Web application performance heavily relies on the hit rate of DRAM key-value caches. Current DRAM caches statically partition memory across applications that share the cache. This results in under utilization and limits cache hit rates. We present Memshare, a DRAM key-value cache that dynamically manages memory across applications. Memshare provides a resource sharing model that guarantees reserved memory to different applications while dynamically pooling and sharing the remaining memory to optimize overall hit rate. Key-value caches are typically memory capacity bound, which leaves cache server CPU and memory bandwidth idle. Memshare leverages these resources with a log-structured design that allows it to provide better hit rates than conventional caches by dynamically repartitioning memory among applications. We implemented Memshare and ran it on a week-long trace from a commercial memcached provider. Memshare increases the combined hit rate of the applications in the trace from 84.7% to 90.8%, and it reduces the total number of misses by 39.7% without significantly affecting cache throughput or latency. Even for single-tenant applications, Memshare increases the average hit rate of the state-of-the-art key-value cache by an additional 2.7%.

...read moreread less

Journal Article•DOI•

A Cache Management Scheme for Efficient Content Eviction and Replication in Cache Networks

[...]

Muhammad Bilal¹, Shin-Gak Kang¹•Institutions (1)

Electronics and Telecommunications Research Institute¹

16 Feb 2017-IEEE Access

TL;DR: The CLCE replication scheme reduces the redundant caching of contents; hence improves the cache space utilization and LFRU approximates the least frequently used scheme coupled with the least recently used scheme and is practically implementable for rapidly changing cache networks like ICNs.

...read moreread less

Abstract: To cope with the ongoing changing demands of the internet, ‘in-network caching’ has been presented as an application solution for two decades. With the advent of information-centric network (ICN) architecture, ‘in-network caching’ becomes a network level solution. Some unique features of the ICNs, e.g., rapidly changing cache states, higher request arrival rates, smaller cache sizes, and other factors, impose diverse requirements on the content eviction policies. In particular, eviction policies should be fast and lightweight. In this paper, we propose cache replication and eviction schemes, conditional leave cope everywhere (CLCE) and least frequent recently used (LFRU), which are well suited for the ICN type of cache networks (CNs). The CLCE replication scheme reduces the redundant caching of contents; hence improves the cache space utilization. LFRU approximates the least frequently used scheme coupled with the least recently used scheme and is practically implementable for rapidly changing cache networks like ICNs.

...read moreread less

Posted Content•

Edge-Caching Wireless Networks: Performance Analysis and Optimization

[...]

Thang X. Vu¹, Symeon Chatzinotas¹, Bjorn Ottersten¹•Institutions (1)

University of Luxembourg¹

16 May 2017-arXiv: Information Theory

TL;DR: This paper investigates multi-layer caching where both base station and users are capable of storing content data in their local cache and analyzes the performance of edge-caching wireless networks under two notable uncoded and coded caching strategies.

...read moreread less

Abstract: Edge-caching has received much attention as an efficient technique to reduce delivery latency and network congestion during peak-traffic times by bringing data closer to end users. Existing works usually design caching algorithms separately from physical layer design. In this paper, we analyse edge-caching wireless networks by taking into account the caching capability when designing the signal transmission. Particularly, we investigate multi-layer caching where both base station (BS) and users are capable of storing content data in their local cache and analyse the performance of edge-caching wireless networks under two notable uncoded and coded caching strategies. Firstly, we propose a coded caching strategy that is applied to arbitrary values of cache size. The required backhaul and access rates are derived as a function of the BS and user cache size. Secondly, closed-form expressions for the system energy efficiency (EE) corresponding to the two caching methods are derived. Based on the derived formulas, the system EE is maximized via precoding vectors design and optimization while satisfying a predefined user request rate. Thirdly, two optimization problems are proposed to minimize the content delivery time for the two caching strategies. Finally, numerical results are presented to verify the effectiveness of the two caching methods.

...read moreread less

Proceedings Article•DOI•

Maximizing Cache Performance Under Uncertainty

[...]

Nathan Beckmann¹, Daniel Sanchez¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Feb 2017

TL;DR: These results show that formalizing cache replacement yields practical benefits, and propose that practical policies should replace lines based on their economic value added (EVA), the difference of their expected hits from the average.

...read moreread less

Abstract: Much prior work has studied cache replacement, but a large gap remains between theory and practice. The design of many practical policies is guided by the optimal policy, Belady's MIN. However, MIN assumes perfect knowledge of the future that is unavailable in practice, and the obvious generalizationsof MIN are suboptimal with imperfect information. What, then, is the right metric for practical cache replacement?We propose that practical policies should replace lines based on their economic value added (EVA), the difference of their expected hits from the average. Drawing on the theory of Markov decision processes, we discuss why this metric maximizes the cache's hit rate. We present an inexpensive implementation ofEVA and evaluate it exhaustively. EVA outperforms several prior policies and saves area at iso-performance. These results show that formalizing cache replacement yields practical benefits.

...read moreread less

Journal Article•DOI•

Dynamic Adaptive Replacement Policy in Shared Last-Level Cache of DRAM/PCM Hybrid Memory for Big Data Storage

[...]

Gangyong Jia¹, Guangjie Han², Jinfang Jiang², Li Liu²•Institutions (2)

Hangzhou Dianzi University¹, Hohai University²

01 Aug 2017-IEEE Transactions on Industrial Informatics

TL;DR: A dynamic adaptive replacement policy (DARP) in the shared last-level cache for the DRAM/PCM hybrid main memory is proposed and results have shown that the DARP improved the memory access efficiency by 25.4%.

...read moreread less

Abstract: The increasing demand on the main memory capacity is one of the main big data challenges. Dynamic random access memory (DRAM) does not represent the best choice for a main memory, due to high power consumption and low density. However, the nonvolatile memory, such as the phase-change memory (PCM), represents an additional choice because of the low power consumption and high-density characteristic. Nevertheless, the high access latency and limited write endurance have disabled the PCM to replace the DRAM currently. Therefore, a hybrid memory, which combines both the DRAM and the PCM, has become a good alternative to the traditional DRAM memory. Both DRAM and PCM disadvantages are challenges for the hybrid memory. In this paper, a dynamic adaptive replacement policy (DARP) in the shared last-level cache for the DRAM/PCM hybrid main memory is proposed. The DARP distinguishes the cache data into the PCM data and the DRAM data, then, the algorithm adopts different replacement policies for each data type. Specifically, for the PCM data, the least recently used (LRU) replacement policy is adopted, and for the DRAM data, the DARP is employed according to the process behavior. Experimental results have shown that the DARP improved the memory access efficiency by 25.4%.

...read moreread less

Proceedings Article•DOI•

Jenga: Software-Defined Cache Hierarchies

[...]

Po-An Tsai¹, Nathan Beckmann², Daniel Sanchez¹•Institutions (2)

Massachusetts Institute of Technology¹, Carnegie Mellon University²

24 Jun 2017

TL;DR: Jenga is proposed, a reconfigurable cache hierarchy that dynamically and transparently specializes itself to applications, and builds virtual cache hierarchies out of heterogeneous, distributed cache banks using simple hardware mechanisms and an OS runtime.

...read moreread less

Abstract: Caches are traditionally organized as a rigid hierarchy, with multiple levels of progressively larger and slower memories. Hierarchy allows a simple, fixed design to benefit a wide range of applications, since working sets settle at the smallest (i.e., fastest and most energy-efficient) level they fit in. However, rigid hierarchies also add overheads, because each level adds latency and energy even when it does not fit the working set. These overheads are expensive on emerging systems with heterogeneous memories, where the differences in latency and energy across levels are small. Significant gains are possible by specializing the hierarchy to applications.We propose Jenga, a reconfigurable cache hierarchy that dynamically and transparently specializes itself to applications. Jenga builds virtual cache hierarchies out of heterogeneous, distributed cache banks using simple hardware mechanisms and an OS runtime. In contrast to prior techniques that trade energy and bandwidth for performance (e.g., dynamic bypassing or prefetching), Jenga eliminates accesses to unwanted cache levels. Jenga thus improves both performance and energy efficiency. On a 36-core chip with a 1 GB DRAM cache, Jenga improves energy-delay product over a combination of state-of-the-art techniques by 23% on average and by up to 85%.

...read moreread less

Proceedings Article•DOI•

Multiperspective reuse prediction

[...]

Daniel A. Jimenez¹, Elvira Teran²•Institutions (2)

Texas A&M University¹, Intel²

14 Oct 2017

TL;DR: The technique is demonstrated using a placement, promotion, and bypass optimization that outperforms state-of-the-art policies using a low overhead and the accuracy of the multiperspective technique is superior to previous work.

...read moreread less

Abstract: The disparity between last-level cache and memory latencies motivates the search for efficient cache management policies. Recent work in predicting reuse of cache blocks enables optimizations that significantly improve cache performance and e ciency. However, the accuracy of the prediction mechanisms limits the scope of optimization. This paper introduces multiperspective reuse prediction, a technique that predicts the future reuse of cache blocks using several different types of features. The accuracy of the multiperspective technique is superior to previous work. We demonstrate the technique using a placement, promotion, and bypass optimization that outperforms state-of-the-art policies using a low overhead. On a set of single-thread benchmarks, the technique yields a geometric mean 9.0% speedup over LRU, compared with 5.1% for Hawkeye and 6.3% for Perceptron. On multi-programmed workloads, the technique gives a geometric mean weighted speedup of 8.3% over LRU, compared with 5.2% for Hawkeye and 5.8% for Perceptron. CCS CONCEPTS • Computer systems organization $\rightarrow$ Multicore architectures; • Hardware $\rightarrow$ Static memory;

...read moreread less

Proceedings Article•DOI•

Fair Caching Algorithms for Peer Data Sharing in Pervasive Edge Computing Environments

[...]

Yaodong Huang¹, Xintong Song², Fan Ye¹, Yuanyuan Yang¹, Xiaoming Li² - Show less +1 more•Institutions (2)

Stony Brook University¹, Peking University²

05 Jun 2017

TL;DR: Extensive evaluation shows that compared with existing wireless network caching algorithms, the proposed algorithms significantly improve data caching fairness, while keeping the contention induced latency similar to the best existing algorithms.

...read moreread less

Abstract: Edge devices (e.g., smartphones, tablets, connected vehicles, IoT nodes) with sensing, storage and communication resources are increasingly penetrating our environments. Many novel applications can be created when nearby peer edge devices share data. Caching can greatly improve the data availability, retrieval robustness and latency. In this paper, we study the unique issue of caching fairness in edge environment. Due to distinct ownership of peer devices, caching load balance is critical. We consider fairness metrics and formulate an integer linear programming problem, which is shown as summation of multiple Connected Facility Location (ConFL) problems. We propose an approximation algorithm leveraging an existing ConFL approximation algorithm, and prove that it preserves a 6.55 approximation ratio. We further develop a distributed algorithm where devices exchange data reachability and identify popular candidates as caching nodes. Extensive evaluation shows that compared with existing wireless network caching algorithms, our algorithms significantly improve data caching fairness, while keeping the contention induced latency similar to the best existing algorithms.

...read moreread less

Journal Article•DOI•

Ephemeral Content Popularity at the Edge and Implications for On-Demand Caching

[...]

Niklas Carlsson¹, Derek L. Eager²•Institutions (2)

Linköping University¹, University of Saskatchewan²

01 Jun 2017-IEEE Transactions on Parallel and Distributed Systems

TL;DR: It is found that although room for substantial improvement exists when comparing performance to that of a perfect “oracle” policy, such improvements are unlikely to be achievable in practice.

...read moreread less

Abstract: The ephemeral content popularity seen with many content delivery applications can make indiscriminate on-demand caching in edge networks highly inefficient, since many of the content items that are added to the cache will not be requested again from that network. In this paper, we address the problem of designing and evaluating more selective edge-network caching policies. The need for such policies is demonstrated through an analysis of a dataset recording YouTube video requests from users on an edge network over a 20-month period. We then develop a novel workload modelling approach for such applications and apply it to study the performance of alternative edge caching policies, including indiscriminate caching and cache on $k$ th request for different $k$ . The latter policies are found able to greatly reduce the fraction of the requested items that are inserted into the cache, at the cost of only modest increases in cache miss rate. Finally, we quantify and explore the potential room for improvement from use of other possible predictors of further requests. We find that although room for substantial improvement exists when comparing performance to that of a perfect “oracle” policy, such improvements are unlikely to be achievable in practice.

...read moreread less

Journal Article•DOI•

Content Cache Placement for Scalable Video in Heterogeneous Wireless Network

[...]

Cheng Zhan¹, Zhe Wen¹•Institutions (1)

Southwest University¹

26 Sep 2017-IEEE Communications Letters

TL;DR: An integer programming problem is formulated to minimize the average download time under the constraint of cache size at each SBS, and it is shown that finding the optimal caching placement strategy is NP-hard.

...read moreread less

Abstract: To alleviate the pressure brought by the explosion of mobile video traffic on present cellular networks, small cell base stations (SBSs) with caching ability are introduced In this letter, we consider the caching strategy of scalable video coding streaming over heterogeneous wireless network containing SBSs We formulate an integer programming problem to minimize the average download time under the constraint of cache size at each SBS, and show that finding the optimal caching placement strategy is NP-hard Heuristic solution was proposed based on the convex programming relaxation, which reveals the structural properties of cache allocation for each video Simulation results demonstrate that our proposed caching strategies acquire significant performance gain compared with conventional caching policy

...read moreread less

Proceedings Article•DOI•

DICE: Compressing DRAM Caches for Bandwidth and Capacity

[...]

Vinson Young¹, Prashant J. Nair¹, Moinuddin K. Qureshi¹•Institutions (1)

Georgia Institute of Technology¹

24 Jun 2017

TL;DR: DICE is proposed, a dynamic design that can adapt between spatial indexing and TSI, depending on the compressibility of the data, and low-cost Cache Index Predictors (CIP) that can accurately predict the cache indexing scheme on access in order to avoid probing both indices for retrieving a given cache line.

...read moreread less

Abstract: This paper investigates compression for DRAM caches. As the capacity of DRAM cache is typically large, prior techniques on cache compression, which solely focus on improving cache capacity, provide only a marginal benefit. We show that more performance benefit can be obtained if the compression of the DRAM cache is tailored to provide higher bandwidth. If a DRAM cache can provide two compressed lines in a single access, and both lines are useful, the effective bandwidth of the DRAM cache would double. Unfortunately, it is not straight-forward to compress DRAM caches for bandwidth. The typically used Traditional Set Indexing (TSI) maps consecutive lines to consecutive sets, so the multiple compressed lines obtained from the set are from spatially distant locations and unlikely to be used within a short period of each other. We can change the indexing of the cache to place consecutive lines in the same set to improve bandwidth; however, when the data is incompressible, such spatial indexing reduces effective capacity and causes significant slowdown.Ideally, we would like to have spatial indexing when the data is compressible and TSI otherwise. To this end, we propose Dynamic-Indexing Cache comprEssion (DICE), a dynamic design that can adapt between spatial indexing and TSI, depending on the compressibility of the data. We also propose low-cost Cache Index Predictors (CIP) that can accurately predict the cache indexing scheme on access in order to avoid probing both indices for retrieving a given cache line. Our studies with a 1GB DRAM cache, on a wide range of workloads (including SPEC and Graph), show that DICE improves performance by 19.0% and reduces energy-delay-product by 36% on average. DICE is within 3% of a design that has double the capacity and double the bandwidth. DICE incurs a storage overhead of less than 1KB and does not rely on any OS support.

...read moreread less

Proceedings Article•DOI•

Kill the Program Counter: Reconstructing Program Behavior in the Processor Cache Hierarchy

[...]

Jinchun Kim¹, Elvira Teran¹, Paul V. Gratz¹, Daniel A. Jimenez¹, Seth H. Pugsley², Christopher B. Wilkerson² - Show less +2 more•Institutions (2)

Texas A&M University¹, Intel²

04 Apr 2017

TL;DR: This paper proposes a holistic cache management technique called Kill-the-PC (KPC) that overcomes the weaknesses of traditional prefetching and replacement policy algorithms and removes the need to propagate the PC through entire on-chip cache hierarchy while providing a holistic caches management approach with better performance.

...read moreread less

Abstract: Data prefetching and cache replacement algorithms have been intensively studied in the design of high performance microprocessors. Typically, the data prefetcher operates in the private caches and does not interact with the replacement policy in the shared Last-Level Cache (LLC). Similarly, most replacement policies do not consider demand and prefetch requests as different types of requests. In particular, program counter (PC)-based replacement policies cannot learn from prefetch requests since the data prefetcher does not generate a PC value. PC-based policies can also be negatively affected by compiler optimizations. In this paper, we propose a holistic cache management technique called Kill-the-PC (KPC) that overcomes the weaknesses of traditional prefetching and replacement policy algorithms. KPC cache management has three novel contributions. First, a prefetcher which approximates the future use distance of prefetch requests based on its prediction confidence. Second, a simple replacement policy provides similar or better performance than current state-of-the-art PC-based prediction using global hysteresis. Third, KPC integrates prefetching and replacement policy into a whole system which is greater than the sum of its parts. Information from the prefetcher is used to improve the performance of the replacement policy and vice-versa. Finally, KPC removes the need to propagate the PC through entire on-chip cache hierarchy while providing a holistic cache management approach with better performance than state-of-the-art PC-, and non-PC-based schemes. Our evaluation shows that KPC provides 8% better performance than the best combination of existing prefetcher and replacement policy for multi-core workloads.

...read moreread less

Collapse