Showing papers on "Smart Cache published in 2013"

PDF

Open Access

Journal Article•DOI•

Caching in information centric networking: A survey

[...]

Guoqiang Zhang¹, Guoqiang Zhang², Yang Li³, Tao Lin³•Institutions (3)

Nanjing Normal University¹, Soochow University (Suzhou)², Chinese Academy of Sciences³

01 Nov 2013-Computer Networks

TL;DR: This paper presents a comprehensive survey of state-of-art techniques aiming to address caching issues, with particular focus on reducing cache redundancy and improving the availability of cached content.

...read moreread less

343 citations

Journal Article•DOI•

Cache less for more in information-centric networks (extended version)

[...]

Wei Koong Chai¹, Diliang He¹, Ioannis Psaras¹, George Pavlou¹•Institutions (1)

University College London¹

01 Apr 2013-Computer Communications

TL;DR: This work studies the problem of en route caching and investigates if caching in only a subset of nodes along the delivery path can achieve better performance in terms of cache and server hit rates and proposes a centrality-based caching algorithm that can consistently achieve better gain across both synthetic and real network topologies that have different structural properties.

...read moreread less

235 citations

Proceedings Article•DOI•

An analysis of Facebook photo caching

[...]

Qi Huang¹, Kenneth P. Birman¹, Robbert van Renesse¹, Wyatt Lloyd², Sanjeev Kumar³, Harry C. Li³ - Show less +2 more•Institutions (3)

Cornell University¹, Princeton University², Facebook³

03 Nov 2013

TL;DR: This paper instrumented every Facebook-controlled layer of the stack and sampled the resulting event stream to obtain traces covering over 77 million requests for more than 1 million unique photos to study traffic patterns, cache access patterns, geolocation of clients and servers, and to explore correlation between properties of the content and accesses.

...read moreread less

Abstract: This paper examines the workload of Facebook's photo-serving stack and the effectiveness of the many layers of caching it employs Facebook's image-management infrastructure is complex and geographically distributed It includes browser caches on end-user systems, Edge Caches at ~20 PoPs, an Origin Cache, and for some kinds of images, additional caching via Akamai The underlying image storage layer is widely distributed, and includes multiple data centersWe instrumented every Facebook-controlled layer of the stack and sampled the resulting event stream to obtain traces covering over 77 million requests for more than 1 million unique photos This permits us to study traffic patterns, cache access patterns, geolocation of clients and servers, and to explore correlation between properties of the content and accesses Our results (1) quantify the overall traffic percentages served by different layers: 655% browser cache, 200% Edge Cache, 46% Origin Cache, and 99% Backend storage, (2) reveal that a significant portion of photo requests are routed to remote PoPs and data centers as a consequence both of load-balancing and peering policy, (3) demonstrate the potential performance benefits of coordinating Edge Caches and adopting S4LRU eviction algorithms at both Edge and Origin layers, and (4) show that the popularity of photos is highly dependent on content age and conditionally dependent on the social-networking metrics we considered

...read moreread less

225 citations

Proceedings Article•DOI•

Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache

[...]

Djordje Jevdjic¹, Stavros Volos¹, Babak Falsafi¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

23 Jun 2013

TL;DR: This paper introduces Footprint Cache, an efficient die-stacked DRAM cache design for server processors that eliminates the excessive off-chip traffic associated with page-based designs, while preserving their high hit ratio, small tag array overhead, and low lookup latency.

...read moreread less

Abstract: Recent research advocates using large die-stacked DRAM caches to break the memory bandwidth wall. Existing DRAM cache designs fall into one of two categories --- block-based and page-based. The former organize data in conventional blocks (e.g., 64B), ensuring low off-chip bandwidth utilization, but co-locate tags and data in the stacked DRAM, incurring high lookup latency. Furthermore, such designs suffer from low hit ratios due to poor temporal locality. In contrast, page-based caches, which manage data at larger granularity (e.g., 4KB pages), allow for reduced tag array overhead and fast lookup, and leverage high spatial locality at the cost of moving large amounts of data on and off the chip.This paper introduces Footprint Cache, an efficient die-stacked DRAM cache design for server processors. Footprint Cache allocates data at the granularity of pages, but identifies and fetches only those blocks within a page that will be touched during the page's residency in the cache --- i.e., the page's footprint. In doing so, Footprint Cache eliminates the excessive off-chip traffic associated with page-based designs, while preserving their high hit ratio, small tag array overhead, and low lookup latency. Cycle-accurate simulation results of a 16-core server with up to 512MB Footprint Cache indicate a 57% performance improvement over a baseline chip without a die-stacked cache. Compared to a state-of-the-art block-based design, our design improves performance by 13% while reducing dynamic energy of stacked DRAM by 24%.

...read moreread less

207 citations

Proceedings Article•DOI•

[...]

César Bernardini, Thomas Silverston, Olivier Festor¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

09 Jun 2013

TL;DR: By caching only popular content, MPC is able to cache less content while, at the same time, it still achieves a higher Cache Hit and outperforms existing default caching strategy in CCN.

...read moreread less

Abstract: Content Centric Networking (CCN) has recently emerged as a promising architecture to deliver content at large-scale. It is based on named-data where a packet address names content and not its location. Then, the premise is to cache content on the network nodes along the delivery path. An important feature for CCN is therefore to manage the cache of the nodes. In this paper, we present Most Popular Content (MPC), a new caching strategy adapted to CCN networks. By caching only popular content, we show through extensive simulation experiments that MPC is able to cache less content while, at the same time, it still achieves a higher Cache Hit and outperforms existing default caching strategy in CCN.

...read moreread less

197 citations

Proceedings Article•DOI•

Unioning of the buffer cache and journaling layers with non-volatile memory

[...]

Eunji Lee¹, Hyokyung Bahn¹, Sam H. Noh²•Institutions (2)

Ewha Womans University¹, Hongik University²

12 Feb 2013

TL;DR: A novel buffer cache architecture is presented that subsumes the functionality of caching and journaling by making use of non-volatile memory such as PCM or STT-MRAM and shows that this scheme improves I/O performance by 76% on average and up to 240% compared to the existing Linux buffer cache with ext4 without any loss of reliability.

...read moreread less

Abstract: Journaling techniques are widely used in modern file systems as they provide high reliability and fast recovery from system failures. However, it reduces the performance benefit of buffer caching as journaling accounts for a bulk of the storage writes in real system environments. In this paper, we present a novel buffer cache architecture that subsumes the functionality of caching and journaling by making use of non-volatile memory such as PCM or STT-MRAM. Specifically, our buffer cache supports what we call the in-place commit scheme. This scheme avoids logging, but still provides the same journaling effect by simply altering the state of the cached block to frozen. As a frozen block still performs the function of caching, we show that in-place commit does not degrade cache performance. We implement our scheme on Linux 2.6.38 and measure the throughput and execution time of the scheme with various file I/O benchmarks. The results show that our scheme improves I/O performance by 76% on average and up to 240% compared to the existing Linux buffer cache with ext4 without any loss of reliability.

...read moreread less

171 citations

Proceedings Article•DOI•

Hash-routing schemes for information centric networking

[...]

Lorenzo Saino¹, Ioannis Psaras¹, George Pavlou¹•Institutions (1)

University College London¹

12 Aug 2013

TL;DR: This paper designs five different hash-routing schemes which efficiently exploit in-network caches without requiring network routers to maintain per-content state information and shows that such schemes can increase cache hits by up to 31% in comparison to on-path caching, with minimal impact on the traffic dynamics of intra-domain links.

...read moreread less

Abstract: Hash-routing has been proposed in the past as a mapping mechanism between object requests and cache clusters within enterprise networks.In this paper, we revisit hash-routing techniques and apply them to Information-Centric Networking (ICN) environments, where network routers have cache space readily available. In particular, we investigate whether hash-routing is a viable and efficient caching approach when applied outside enterprise networks, but within the boundaries of a domain.We design five different hash-routing schemes which efficiently exploit in-network caches without requiring network routers to maintain per-content state information.We evaluate the proposed hash-routing schemes using extensive simulations over real Internet domain topologies and compare them against various on-path caching mechanisms. We show that such schemes can increase cache hits by up to 31% in comparison to on-path caching, with minimal impact on the traffic dynamics of intra-domain links.

...read moreread less

142 citations

Journal Article•DOI•

A lightweight mechanism for detection of cache pollution attacks in Named Data Networking

[...]

Mauro Conti¹, Paolo Gasti², Marco Teoli¹•Institutions (2)

University of Padua¹, New York Institute of Technology²

01 Nov 2013-Computer Networks

TL;DR: This paper focuses on cache pollution attacks, where the adversary's goal is to disrupt cache locality to increase link utilization and cache misses for honest consumers, and illustrates that existing proactive countermeasures are ineffective against realistic adversaries.

...read moreread less

139 citations

Proceedings Article•DOI•

Optimal cache allocation for Content-Centric Networking

[...]

Yonggong Wang, Zhenyu Li, Gareth Tyson¹, Steve Uhlig¹, Gaogang Xie - Show less +1 more•Institutions (1)

Queen Mary University of London¹

01 May 2013

TL;DR: This work focuses on the cache allocation problem: namely, how to distribute the cache capacity across routers under a constrained total storage budget for the network, and formulate this problem as a content placement problem and obtains the exact optimal solution by a two-step method.

...read moreread less

Abstract: Content-Centric Networking (CCN) is a promising framework for evolving the current network architecture, advocating ubiquitous in-network caching to enhance content delivery. Consequently, in CCN, each router has storage space to cache frequently requested content. In this work, we focus on the cache allocation problem: namely, how to distribute the cache capacity across routers under a constrained total storage budget for the network. We formulate this problem as a content placement problem and obtain the exact optimal solution by a two-step method. Through simulations, we use this algorithm to investigate the factors that affect the optimal cache allocation in CCN, such as the network topology and the popularity of content. We find that a highly heterogeneous topology tends to put most of the capacity over a few central nodes. On the other hand, heterogeneous content popularity has the opposite effect, by spreading capacity across far more nodes. Using our findings, we make observations on how network operators could best deploy CCN caches capacity.

...read moreread less

133 citations

Journal Article•DOI•

Fundamental Limits of Caching with Secure Delivery

[...]

Avik Sengupta¹, Ravi Tandon¹, T. Charles Clancy¹•Institutions (1)

Virginia Tech¹

13 Dec 2013-arXiv: Information Theory

TL;DR: In this article, the authors considered the secure caching problem with the additional goal of minimizing information leakage to an external wiretapper and showed that security can be introduced at a negligible cost, particularly for large number of files and users.

...read moreread less

Abstract: Caching is emerging as a vital tool for alleviating the severe capacity crunch in modern content-centric wireless networks. The main idea behind caching is to store parts of popular content in end-users' memory and leverage the locally stored content to reduce peak data rates. By jointly designing content placement and delivery mechanisms, recent works have shown order-wise reduction in transmission rates in contrast to traditional methods. In this work, we consider the secure caching problem with the additional goal of minimizing information leakage to an external wiretapper. The fundamental cache memory vs. transmission rate trade-off for the secure caching problem is characterized. Rather surprisingly, these results show that security can be introduced at a negligible cost, particularly for large number of files and users. It is also shown that the rate achieved by the proposed caching scheme with secure delivery is within a constant multiplicative factor from the information-theoretic optimal rate for almost all parameter values of practical interest.

...read moreread less

125 citations

Proceedings Article•DOI•

A Coordinated Approach for Practical OS-Level Cache Management in Multi-core Real-Time Systems

[...]

Hyoseung Kim¹, Arvind Kandhalu², Ragunathan Rajkumar¹•Institutions (2)

Carnegie Mellon University¹, Texas Instruments²

09 Jul 2013

TL;DR: A practical OS-level cache management scheme for multi-core real-time systems that provides predictable cache performance, addresses the aforementioned problems of existing software cache partitioning, and efficiently allocates cache partitions to schedule a given task set is proposed.

...read moreread less

Abstract: Many modern multi-core processors sport a large shared cache with the primary goal of enhancing the statistic performance of computing workloads. However, due to resulting cache interference among tasks, the uncontrolled use of such a shared cache can significantly hamper the predictability and analyzability of multi-core real-time systems. Software cache partitioning has been considered as an attractive approach to address this issue because it does not require any hardware support beyond that available on many modern processors. However, the state-of-the-art software cache partitioning techniques face two challenges: (1) the memory co-partitioning problem, which results in page swapping or waste of memory, and (2) the availability of a limited number of cache partitions, which causes degraded performance. These are major impediments to the practical adoption of software cache partitioning. In this paper, we propose a practical OS-level cache management scheme for multi-core real-time systems. Our scheme provides predictable cache performance, addresses the aforementioned problems of existing software cache partitioning, and efficiently allocates cache partitions to schedule a given task set. We have implemented and evaluated our scheme in Linux/RK running on the Intel Core i7 quad-core processor. Experimental results indicate that, compared to the traditional approaches, our scheme is up to 39% more memory space efficient and consumes up to 25% less cache partitions while maintaining cache predictability. Our scheme also yields a significant utilization benefit that increases with the number of tasks.

...read moreread less

Proceedings Article•DOI•

Intra-AS cooperative caching for content-centric networks

[...]

Jason Min Wang¹, Jun Zhang¹, Brahim Bensaou¹•Institutions (1)

Hong Kong University of Science and Technology¹

12 Aug 2013

TL;DR: It is shown via trace-driven simulation, that intra-AS cache cooperation improves the system caching performance and reduces considerably the traffic load on the AS gateway links, which is very appealing from an ISP's perspective.

...read moreread less

Abstract: The default caching scheme in CCN results in a high redundancy along the symmetric request-response path, and makes the caching system inefficient. Since it was first proposed, much work has been done to improve the general caching performance of CCN. Most new caching schemes attempt to reduce the on-path redundancy by passing information on content redundancy and popularity between nodes. In this paper, we tackle the problem from a different perspective. Instead of curbing the redundancy through special caching decisions in the beginning, we take an orthogonal approach by pro-actively eliminating redundancy via an independent intra-AS procedure. We propose an \textit{intra-AS cache cooperation} scheme, to effectively control the redundancy level within the AS and allow neighbour nodes in an AS to collaborate in serving each other's requests. We show via trace-driven simulation, that intra-AS cache cooperation improves the system caching performance and reduces considerably the traffic load on the AS gateway links, which is very appealing from an ISP's perspective.

...read moreread less

Proceedings Article•DOI•

Improving flash-based disk cache with Lazy Adaptive Replacement

[...]

Sai Huang¹, Qingsong Wei², Jianxi Chen¹, Cheng Chen², Dan Feng¹ - Show less +1 more•Institutions (2)

Huazhong University of Science and Technology¹, Data Storage Institute²

06 May 2013

TL;DR: A novel cache management algorithm for flash-based disk cache, named Lazy Adaptive Replacement Cache (LARC), which can filter out seldom accessed blocks and prevent them from entering cache and improves performance and extends SSD lifetime at the same time.

...read moreread less

Abstract: The increasing popularity of flash memory has changed storage systems. Flash-based solid state drive(SSD) is now widely deployed as cache for magnetic hard disk drives(HDD) to speed up data intensive applications. However, existing cache algorithms focus exclusively on performance improvements and ignore the write endurance of SSD. In this paper, we proposed a novel cache management algorithm for flash-based disk cache, named Lazy Adaptive Replacement Cache(LARC). LARC can filter out seldom accessed blocks and prevent them from entering cache. This avoids cache pollution and keeps popular blocks in cache for a longer period of time, leading to higher hit rate. Meanwhile, LARC reduces the amount of cache replacements thus incurs less write traffics to SSD, especially for read dominant workloads. In this way, LARC improves performance and extends SSD lifetime at the same time. LARC is self-tuning and low overhead. It has been extensively evaluated by both trace-driven simulations and a prototype implementation in flashcache. Our experiments show that LARC outperforms state-of-art algorithms and reduces write traffics to SSD by up to 94.5% for read dominant workloads, 11.2-40.8% for write dominant workloads.

...read moreread less

Patent•

Systems and methods for cache endurance

[...]

Nisha Talagala, Ned Plasson, Jingpai Yang, Robert Wood¹, Swaminathan Sundararaman, Gregory N. Gillis - Show less +2 more•Institutions (1)

SanDisk¹

05 Dec 2013

TL;DR: In this article, a cache and/or storage module may be configured to reduce write amplification in a cache storage, which may occur due to an over-permissive admission policy, or it may arise due to the write-once properties of the storage medium.

...read moreread less

Abstract: A cache and/or storage module may be configured to reduce write amplification in a cache storage. Cache layer write amplification (CLWA) may occur due to an over-permissive admission policy. The cache module may be configured to reduce CLWA by configuring admission policies to avoid unnecessary writes. Admission policies may be predicated on access and/or sequentiality metrics. Flash layer write amplification (FLWA) may arise due to the write-once properties of the storage medium. FLWA may be reduced by delegating cache eviction functionality to the underlying storage layer. The cache and storage layers may be configured to communicate coordination information, which may be leveraged to improve the performance of cache and/or storage operations.

...read moreread less

Proceedings Article•DOI•

Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi

[...]

Sabela Ramos¹, Torsten Hoefler²•Institutions (2)

University of A Coruña¹, ETH Zurich²

17 Jun 2013

TL;DR: An intuitive performance model for cache-coherent architectures is developed and used to develop several optimal and optimized algorithms for complex parallel data exchanges that beat the performance of the highly-tuned vendor-specific Intel OpenMP and MPI libraries.

...read moreread less

Abstract: Most multi-core and some many-core processors implement cache coherency protocols that heavily complicate the design of optimal parallel algorithms. Communication is performed implicitly by cache line transfers between cores, complicating the understanding of performance properties. We developed an intuitive performance model for cache-coherent architectures and demonstrate its use with the currently most scalable cache-coherent many-core architecture, Intel Xeon Phi. Using our model, we develop several optimal and optimized algorithms for complex parallel data exchanges. All algorithms that were developed with the model beat the performance of the highly-tuned vendor-specific Intel OpenMP and MPI libraries by up to a factor of 4.3. The model can be simplified to satisfy the tradeoff between complexity of algorithm design and accuracy. We expect that our model can serve as a vehicle for advanced algorithm design.

...read moreread less

Proceedings Article•DOI•

An efficient compiler framework for cache bypassing on GPUs

[...]

Xiaolong Xie¹, Yun Liang¹, Guangyu Sun¹, Deming Chen²•Institutions (2)

Peking University¹, University of Illinois at Urbana–Champaign²

18 Nov 2013

TL;DR: An efficient compiler framework for cache bypassing on GPUs is proposed and efficient algorithms that judiciously select global load instructions for cache access or bypass are presented.

...read moreread less

Abstract: Graphics Processing Units (GPUs) have become ubiquitous for general purpose applications due to their tremendous computing power. Initially, GPUs only employ scratchpad memory as on-chip memory. Though scratchpad memory benefits many applications, it is not ideal for those general purpose applications with irregular memory accesses. Hence, GPU vendors have introduced caches in conjunction with scratchpad memory in the recent generations of GPUs. The caches on GPUs are highly-configurable. The programmer or the compiler can explicitly control cache access or bypass for global load instructions. This highly-configurable feature of GPU caches opens up the opportunities for optimizing the cache performance. In this paper, we propose an efficient compiler framework for cache bypassing on GPUs. Our objective is to efficiently utilize the configurable cache and improve the overall performance for general purpose GPU applications. In order to achieve this goal, we first characterize GPU cache utilization and develop performance metrics to estimate the cache reuses and memory traffic. Next, we present efficient algorithms that judiciously select global load instructions for cache access or bypass. Finally, we integrate our techniques into an automatic compiler framework that leverages PTX instruction set architecture. Experiments evaluation demonstrates that compared to cache-all and bypass-all solutions, our techniques can achieve considerable performance improvement.

...read moreread less

Proceedings Article•DOI•

S-CAVE: effective SSD caching to improve virtual machine storage performance

[...]

Tian Luo¹, Siyuan Ma¹, Rubao Lee¹, Xiaodong Zhang¹, Deng Liu², Li Zhou³ - Show less +2 more•Institutions (3)

Ohio State University¹, VMware², Facebook³

07 Oct 2013

TL;DR: The design and implementation of S-CAVE, a hypervisor-based SSD caching facility, which effectively manages a storage cache in a Multi-VM environment by collecting and exploiting runtime information from both VMs and storage devices is presented.

...read moreread less

Abstract: A unique challenge for SSD storage caching management in a virtual machine (VM) environment is to accomplish the dual objectives: maximizing utilization of shared SSD cache devices and ensuring performance isolation among VMs. In this paper, we present our design and implementation of S-CAVE, a hypervisor-based SSD caching facility, which effectively manages a storage cache in a Multi-VM environment by collecting and exploiting runtime information from both VMs and storage devices. Due to a hypervisor's unique position between VMs and hardware resources, S-CAVE does not require any modification to guest OSes, user applications, or the underlying storage system. A critical issue to address in S-CAVE is how to allocate limited and shared SSD cache space among multiple VMs to achieve the dual goals. This is accomplished in two steps. First, we propose an effective metric to determine the demand for SSD cache space of each VM. Next, by incorporating this cache demand information into a dynamic control mechanism, S-CAVE is able to efficiently provide a fair share of cache space to each VM while achieving the goal of best utilizing the shared SSD cache device. In accordance with the constraints of all the functionalities of a hypervisor, S-CAVE incurs minimum overhead in both memory space and computing time. We have implemented S-CAVE in vSphere ESX, a widely used commercial hypervisor from VMWare. Our extensive experiments have shown its strong effectiveness for various data-intensive applications.

...read moreread less

Proceedings Article•DOI•

Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching

[...]

Somayeh Sardashti¹, Darien Wood¹•Institutions (1)

University of Wisconsin-Madison¹

07 Dec 2013

TL;DR: The Decoupled Compressed Cache (DCC) is proposed, which exploits spatial locality to improve both the performance and energy-efficiency of cache compression and nearly doubles the benefits of previous compressed caches with similar area overhead.

...read moreread less

Abstract: In multicore processor systems, last-level caches (LLCs) play a crucial role in reducing system energy by i) filtering out expensive accesses to main memory and ii) reducing the time spent executing in high-power states. Cache compression can increase effective cache capacity and reduce misses, improve performance, and potentially reduce system energy. However, previous compressed cache designs have demonstrated only limited benefits due to internal fragmentation and limited tags. In this paper, we propose the Decoupled Compressed Cache (DCC), which exploits spatial locality to improve both the performance and energy-efficiency of cache compression. DCC uses decoupled super-blocks and non-contiguous sub-block allocation to decrease tag overhead without increasing internal fragmentation. Non-contiguous sub-blocks also eliminate the need for energy-expensive re-compaction when a block's size changes. Compared to earlier compressed caches, DCC increases normalized effective capacity to a maximum of 4 and an average of 2.2 for a wide range of workloads. A further optimized Co-DCC (Co-Compacted DCC) design improves the average normalized effective capacity to 2.6 by co-compacting the compressed blocks in a super-block. Our simulations show that DCC nearly doubles the benefits of previous compressed caches with similar area overhead. We also demonstrate a practical DCC design based on a recent commercial LLC design.

...read moreread less

Proceedings Article•DOI•

PopCache: Cache more or less based on content popularity for information-centric networking

[...]

Kalika Suksomboon, Saran Tarnoi, Yusheng Ji, Michihiro Koibuchi, Kensuke Fukuda, Shunji Abe, Nakamura Motonori, Michihiro Aoki, Shigeo Urushidani, Shigeki Yamada - Show less +6 more

01 Oct 2013

TL;DR: An analytical model is proposed to evaluate the performance of different caching decision policies in terms of the server-hit rate and expected round-trip time and it is shown that PopCache yields the lowest expectedRound- Trip time compared with three benchmark caching decision Policies.

...read moreread less

Abstract: Due to a mismatch between downloading and caching content, the network may not gain significant benefit from the sophisticated in-network caching of information-centric networking (ICN) architectures by using a basic caching mechanism. This paper aims to seek an effective caching decision policy to improve the content dissemination in ICN. We propose PopCache-a caching decision policy with respect to the content popularity-that allows an individual ICN router to cache content more or less in accordance with the popularity characteristic of the content. We propose an analytical model to evaluate the performance of different caching decision policies in terms of the server-hit rate and expected round-trip time. The analysis confirmed by simulation results shows that PopCache yields the lowest expected round-trip time compared with three benchmark caching decision policies, i.e., the always, fixed probability and path-capacity-based probability, and PopCache provides the server-hit rate comparable to the lowest ones.

...read moreread less

Proceedings Article•DOI•

Cooperative caching through routing control in information-centric networks

[...]

Sumanta Saha¹, Andrey Lukyanenko¹, Antti Ylä-Jääski¹•Institutions (1)

Aalto University¹

14 Apr 2013

TL;DR: In this paper, the authors proposed a distributed and uncoordinated off-path caching architecture to overcome the problem of uncooperative caches in information-centric networks (ICN).

...read moreread less

Abstract: Information-centric network (ICN), which is one of the prominent Internet re-design architectures, relies on in-network caching for its fundamental operation. However, previous works argue that the performance of in-network caching is highly degraded with the current cache-along-default-path design, which makes popular objects to be cached redundantly in many places. Thus, it would be beneficial to have a distributed and uncoordinated design. Although cooperative caches could be an answer to this, previous research showed that they are generally unfeasible due to excessive signaling burden, protocol complexity, and a need for fault tolerance. In this work we illustrate the ICN caching problem, and propose a novel architecture to overcome the problem of uncooperative caches. Our design possesses the cooperation property intrinsically. We utilize controlled off-path caching to achieve almost 9-fold increase in cache efficiency, and around 20% increase in server load reduction when compared to the classic on-path caching used in ICN proposals.

...read moreread less

Proceedings Article•DOI•

Coordinated Bank and Cache Coloring for Temporal Protection of Memory Accesses

[...]

Noriaki Suzuki¹, Hyoseung Kim², Dionisio de Niz², Björn Andersson², Lutz Wrage², Mark Klein², Ragunathan Rajkumar² - Show less +3 more•Institutions (2)

NEC¹, Carnegie Mellon University²

03 Dec 2013

TL;DR: A coordinated cache and bank coloring scheme that is designed to prevent Cache and bank interference simultaneously is presented and implemented in the Linux kernel.

...read moreread less

Abstract: In commercial-off-the-shelf (COTS) multi-core systems, the execution times of tasks become hard to predict because of contention on shared resources in the memory hierarchy. In particular, a task running in one processor core can delay the execution of another task running in another processor core. This is due to the fact that tasks can access data in the same cache set shared among processor cores or in the same memory bank in the DRAM memory (or both). Such cache and bank interference effects have motivated the need to create isolation mechanisms for resources accessed by more than one task. One popular isolation mechanism is cache coloring that divides the cache into multiple partitions. With cache coloring, each task can be assigned exclusive cache partitions, thereby preventing cache interference from other tasks. Similarly, bank coloring allows assigning exclusive bank partitions to tasks. While cache coloring and some bank coloring mechanisms have been studied separately, interactions between the two schemes have not been studied. Specifically, while memory accesses to two different bank colors do not interfere with each other at the bank level, they may interact at the cache level. Similarly, two different cache colors avoid cache interference but may not prevent bank interference. Therefore it is necessary to coordinate cache and bank coloring approaches. In this paper, we present a coordinated cache and bank coloring scheme that is designed to prevent cache and bank interference simultaneously. We also developed color allocation algorithms for configuring a virtual memory system to support our scheme which has been implemented in the Linux kernel. In our experiments, we observed that the execution time can increase by 60% due to inter-task interference when we use only cache coloring. Our coordinated approach can reduce this figure down to 12% (an 80% reduction).

...read moreread less

Patent•

Flash cache partitioning

[...]

Assaf Natanzon¹, David Erel¹•Institutions (1)

EMC Corporation¹

26 Jun 2013

TL;DR: In this article, the authors present a computer implemented method, system, and computer program product for cache management comprising recording metadata of IO sent from the server to a storage array, calculating a distribution of a server cache based on metadata, receiving an IO directed to the storage array and revising an allocation of the server cache to the plurality of storage mediums based on the calculated distribution and the IO.

...read moreread less

Abstract: A computer implemented method, system, and computer program product for cache management comprising recording metadata of IO sent from the server to a storage array, calculating a distribution of a server cache based on the metadata, receiving an IO directed to the storage array, and revising an allocation of the server cache to the plurality of storage mediums based on the calculated distribution and the IO.

...read moreread less

Proceedings Article•DOI•

Fundamental limits of caching

[...]

Mohammad Ali Maddah-Ali¹, Urs Niesen¹•Institutions (1)

Bell Labs¹

07 Jul 2013

TL;DR: This paper proposes a novel caching approach that can achieve a significantly larger reduction in peak rate compared to previously known caching schemes, and argues that the performance of the proposed scheme is within a constant factor from the information-theoretic optimum for all values of the problem parameters.

...read moreread less

Abstract: Caching is a technique to reduce peak traffic rates by prefetching popular content in memories at the end users. This paper proposes a novel caching approach that can achieve a significantly larger reduction in peak rate compared to previously known caching schemes. In particular, the improvement can be on the order of the number of end users in the network. Conventionally, cache memories are exploited by delivering requested contents in part locally rather than through the network. The gain offered by this approach, which we term local caching gain, depends on the local cache size (i.e., the cache available at each individual user). In this paper, we introduce and exploit a second, global, caching gain, which is not utilized by conventional caching schemes. This gain depends on the aggregate global cache size (i.e., the cumulative cache available at all users), even though there is no cooperation among the caches. To evaluate and isolate these two gains, we introduce a new, information-theoretic formulation of the caching problem focusing on its basic structure. For this setting, the proposed scheme exploits both local and global caching gains, leading to a multiplicative improvement in the peak rate compared to previously known schemes. Moreover, we argue that the performance of the proposed scheme is within a constant factor from the information-theoretic optimum for all values of the problem parameters.

...read moreread less

Proceedings Article•DOI•

Performance, energy characterizations and architectural implications of an emerging mobile platform benchmark suite - MobileBench

[...]

Dhinakaran Pandiyan¹, Shin-Ying Lee¹, Carole-Jean Wu¹•Institutions (1)

Arizona State University¹

01 Sep 2013

TL;DR: Key microarchitectural features of mobile computing platforms that are crucial to the performance of smart phone applications are explored to guide the design of future smart phone platforms for lower power consumptions through simpler architecture while achieving high performance.

...read moreread less

Abstract: In this paper, we explore key microarchitectural features of mobile computing platforms that are crucial to the performance of smart phone applications. We create and use a selection of representative smart phone applications, which we call MobileBench that aid in this analysis. We also evaluate the effectiveness of current memory subsystem on the mobile platforms. Furthermore, by instrumenting the Android framework, we perform energy characterization for MobileBench on an existing Samsung Galaxy S III smart phone. Based on our energy analysis, we find that application cores on modern smart phones consume significant amount of energy. This motivates our detailed performance analysis centered at the application cores. Based on our detailed performance studies, we reach several key findings. (i) Using a more sophisticated tournament branch predictor can improve the branch prediction accuracy but this does not translate to observable performance gain. (ii) Smart phone applications show distinct TLB capacity needs. Larger TLBs can improve performance by an avg. of 14%. (iii) The current L2 cache on most smart phone platform experiences poor utilization because of the fast-changing memory requirements of smart phone applications. Using a more effective cache management scheme improves the L2 cache utilization by as much as 29.3% and by an avg. of 12%. (iv) Smart phone applications are prefetching-friendly. Using a simple stride prefetcher can improve performance across MobileBench applications by an avg. of 14%. (v) Lastly, the memory bandwidth requirements of MobileBench applications are moderate and well under current smart phone memory bandwidth capacity of 8.3 GB/s. With these insights into the smart phone application characteristics, we hope to guide the design of future smart phone platforms for lower power consumptions through simpler architecture while achieving high performance.

...read moreread less

Proceedings Article•DOI•

MobiCache: When k-anonymity meets cache

[...]

Xiaoyan Zhu¹, Haotian Chi¹, Ben Niu¹, Weidong Zhang¹, Zan Li¹, Hui Li¹ - Show less +2 more•Institutions (1)

Xidian University¹

01 Dec 2013

TL;DR: A novel collaborative system, MobiCache, which combines k-anonymity with caching together to protect user's location privacy while improving the cache hit ratio and an enhanced-DSA to further improve the user's privacy as well as the cacheHit ratio.

...read moreread less

Abstract: Location-Based Services (LBSs) are becoming increasingly popular in our daily life. In some scenarios, multiple users may seek data of same interest from a LBS server simultaneously or one by one, and they may need to provide their exact locations to the un-trusted LBS server in order to enjoy such a location-based service. Unfortunately, this will breach users' location privacy and security. To address this problem, we propose a novel collaborative system, MobiCache, which combines k-anonymity with caching together to protect user's location privacy while improving the cache hit ratio. Different from the traditional k-anonymity, our Dummy Selection Algorithm (DSA) chooses dummy locations which have not been queried before to increase the cache hit ratio. We also propose an enhanced-DSA to further improve the user's privacy as well as the cache hit ratio by assigning dummy locations which can make more contributions to cache hit ratio. Evaluation results show that the proposed DSA can increase the cache hit ratio and the enhanced-DSA can further improve the cache hit ratio as well as the user's privacy.

...read moreread less

Proceedings Article•

Flash caching on the storage client

[...]

David A. Holland¹, Elaine Angelino¹, Gideon Wald¹, Margo Seltzer¹•Institutions (1)

Harvard University¹

26 Jun 2013

TL;DR: It is found that the chief benefit of the flash cache is its size, not its persistence, and for some workloads a large flash cache allows using miniscule amounts of RAM for file caching leaving more memory available for application use.

...read moreread less

Abstract: Flash memory has recently become popular as a caching medium. Most uses to date are on the storage server side. We investigate a different structure: flash as a cache on the client side of a networked storage environment. We use trace-driven simulation to explore the design space. We consider a wide range of configurations and policies to determine the potential client-side caches might offer and how best to arrange them. Our results show that the flash cache writeback policy does not significantly affect performance. Write-through is sufficient; this greatly simplifies cache consistency handling. We also find that the chief benefit of the flash cache is its size, not its persistence. Cache persistence offers additional performance benefits at system restart at essentially no runtime cost. Finally, for some workloads a large flash cache allows using miniscule amounts of RAM for file caching (e.g., 256 KB) leaving more memory available for application use.

...read moreread less

Proceedings Article•DOI•

Dynamic performance profiling of cloud caches

[...]

Hjörtur Björnsson¹, Gregory Chockler², Trausti Saemundsson³, Ymir Vigfusson³•Institutions (3)

University of Iceland¹, University of London², Reykjavík University³

01 Oct 2013

TL;DR: In-memory object caches, such as memcached, are critical to the success of popular web sites, by reducing database load and improving scalability, but unfortunately cache configuration is poorly understood.

...read moreread less

Abstract: Large-scale in-memory object caches such as memcached are widely used to accelerate popular web sites and to reduce burden on backend databases. Yet current cache systems give cache operators limited information on what resources are required to optimally accommodate the present workload. This paper focuses on a key question for cache operators: how much total memory should be allocated to the in-memory cache tier to achieve desired performance?We present our Mimir system: a lightweight online profiler that hooks into the replacement policy of each cache server and produces graphs of the overall cache hit rate as a function of memory size. The profiler enables cache operators to dynamically project the cost and performance impact from adding or removing memory resources within a distributed in-memory cache, allowing "what-if" questions about cache performance to be answered without laborious offline tuning. Internally, Mimir uses a novel lock-free algorithm and lookup filters for quickly and dynamically estimating hit rate of LRU caches.Running Mimir as a profiler requires minimal changes to the cache server, thanks to a lean API. Our experiments show that Mimir produces dynamic hit rate curves with over 98% accuracy and 2--5% overhead on request latency and throughput when Mimir is run in tandem with memcached, suggesting online cache profiling can be a practical tool for improving provisioning of large caches.

...read moreread less

Proceedings Article•DOI•

HEC: improving endurance of high performance flash-based cache devices

[...]

Jingpei Yang, Ned Plasson, Greg Gillis, Nisha Talagala, Swaminathan Sundararaman, Robert Wood - Show less +2 more

30 Jun 2013

TL;DR: This paper analyzes the added write pressures that cache workloads place on flash devices and proposes optimizations at both the cache and flash management layers to improve endurance while maintaining or increasing cache hit rate.

...read moreread less

Abstract: Flash memory is widely used for its fast random I/O access performance in a gamut of enterprise storage applications. However, due to the limited endurance and asymmetric write performance of flash memory, minimizing writes to a flash device is critical for both performance and endurance. Previous studies have focused on flash memory as a candidate for primary storage devices; little is known about its behavior as a Solid State Cache (SSC) device. In this paper, we propose HEC, a High Endurance Cache that aims to improve overall device endurance via reduced media writes and erases while maximizing cache hit rate performance. We analyze the added write pressures that cache workloads place on flash devices and propose optimizations at both the cache and flash management layers to improve endurance while maintaining or increasing cache hit rate. We demonstrate the individual and cumulative contributions of cache admission policy, cache eviction policy, flash garbage collection policy, and flash device configuration on a) hit rate, b) overall writes, and c) erases as seen by the SSC device. Through our improved cache and flash optimizations, 83% of the analyzed workload ensembles achieved increased or maintained hit rate with write reductions up to 20x, and erase count reductions up to 6x.

...read moreread less

Proceedings Article•DOI•

A Server-Side Solution to Cache-Based Side-Channel Attacks in the Cloud

[...]

Michael Godfrey¹, Mohammad Zulkernine¹•Institutions (1)

Queen's University¹

28 Jun 2013

TL;DR: This paper investigates the current state of side-channel vulnerabilities involving the CPU cache, and identifies the shortcomings of traditional defenses in a Cloud environment, and develops a mitigation technique applicable for Cloud security.

...read moreread less

Abstract: As Cloud services become more common place, recent work have uncovered vulnerabilities unique to Cloud systems. Specifically, the paradigm promotes a risk of information leakage across virtual machine isolation via side-channels. In this paper, we investigate the current state of side-channel vulnerabilities involving the CPU cache, and identify the shortcomings of traditional defenses in a Cloud environment. We explore why solutions to non-Cloud cache-based side-channels cease to work in Cloud environments, and develop a mitigation technique applicable for Cloud security. Applying this solution to a canonical Cloud environment, we demonstrate the validity of this Cloud-specific, cache-based side-channel mitigation technique. Furthermore, we show that it can be implemented as a server-side approach to improve security without inconveniencing the client. Finally, we conduct a comparison of our solution to the current state-of-the-art.

...read moreread less

Proceedings Article•DOI•

Reuse-based online models for caches

[...]

Rathijit Sen¹, Darien Wood¹•Institutions (1)

University of Wisconsin-Madison¹

17 Jun 2013

TL;DR: This framework unifies existing cache miss rate prediction techniques such as Smith's associativity model, Poisson variants, and hardware way-counter based schemes and shows how to adapt LRU way-counters to work when the number of sets in the cache changes.

...read moreread less

Abstract: We develop a reuse distance/stack distance based analytical modeling framework for efficient, online prediction of cache performance for a range of cache configurations and replacement policies LRU, PLRU, RANDOM, NMRU. Our framework unifies existing cache miss rate prediction techniques such as Smith's associativity model, Poisson variants, and hardware way-counter based schemes. We also show how to adapt LRU way-counters to work when the number of sets in the cache changes. As an example application, we demonstrate how results from our models can be used to select, based on workload access characteristics, last-level cache configurations that aim to minimize energy-delay product.

...read moreread less

Collapse