Showing papers on "Cache published in 2016"

PDF

Open Access

Proceedings Article•

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

[...]

Song Han¹, Huizi Mao², William J. Dally¹, William J. Dally³•Institutions (3)

Stanford University¹, Tsinghua University², Nvidia³

15 Feb 2016

TL;DR: Deep Compression as mentioned in this paper proposes a three-stage pipeline: pruning, quantization, and Huffman coding to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

...read moreread less

Abstract: Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Our method first prunes the network by learning only the important connections. Next, we quantize the weights to enforce weight sharing, finally, we apply Huffman coding. After the first two steps we retrain the network to fine tune the remaining connections and the quantized centroids. Pruning, reduces the number of connections by 9x to 13x; Quantization then reduces the number of bits that represent each connection from 32 to 5. On the ImageNet dataset, our method reduced the storage required by AlexNet by 35x, from 240MB to 6.9MB, without loss of accuracy. Our method reduced the size of VGG-16 by 49x from 552MB to 11.3MB, again with no loss of accuracy. This allows fitting the model into on-chip SRAM cache rather than off-chip DRAM memory. Our compression method also facilitates the use of complex neural networks in mobile applications where application size and download bandwidth are constrained. Benchmarked on CPU, GPU and mobile GPU, compressed network has 3x to 4x layerwise speedup and 3x to 7x better energy efficiency.

...read moreread less

7,256 citations

Journal Article•DOI•

Content-Centric Sparse Multicast Beamforming for Cache-Enabled Cloud RAN

[...]

Meixia Tao¹, Erkai Chen¹, Hao Zhou¹, Wei Yu²•Institutions (2)

Shanghai Jiao Tong University¹, University of Toronto²

01 Sep 2016-IEEE Transactions on Wireless Communications

TL;DR: This paper presents a content-centric transmission design in a cloud radio access network by incorporating multicasting and caching, and reformulates an equivalent sparse multicast beamforming (SBF) problem, transformed into the difference of convex programs and effectively solved using the convex-concave procedure algorithms.

...read moreread less

Abstract: This paper presents a content-centric transmission design in a cloud radio access network by incorporating multicasting and caching. Users requesting the same content form a multicast group and are served by a same cluster of base stations (BSs) cooperatively. Each BS has a local cache, and it acquires the requested contents either from its local cache or from the central processor via backhaul links. We investigate the dynamic content-centric BS clustering and multicast beamforming with respect to both channel condition and caching status. We first formulate a mixed-integer nonlinear programming problem of minimizing the weighted sum of backhaul cost and transmit power under the quality-of-service constraint for each multicast group. Theoretical analysis reveals that all the BSs caching a requested content can be included in the BS cluster of this content, regardless of the channel conditions. Then, we reformulate an equivalent sparse multicast beamforming (SBF) problem. By adopting smoothed $\ell _{0}$ -norm approximation and other techniques, the SBF problem is transformed into the difference of convex programs and effectively solved using the convex-concave procedure algorithms. Simulation results demonstrate significant advantage of the proposed content-centric transmission. The effects of heuristic caching strategies are also evaluated.

...read moreread less

468 citations

Book Chapter•DOI•

Flush+Flush: A Fast and Stealthy Cache Attack

[...]

Daniel Gruss¹, Clémentine Maurice¹, Klaus Wagner¹, Stefan Mangard¹•Institutions (1)

Graz University of Technology¹

07 Jul 2016

TL;DR: The Flush+Flush attack as mentioned in this paper uses the execution time of the flush instruction, which depends on whether data is cached or not, to reduce the number of cache misses.

...read moreread less

Abstract: Research on cache attacks has shown that CPU caches leak significant information. Proposed detection mechanisms assume that all cache attacks cause more cache hits and cache misses than benign applications and use hardware performance counters for detection. In this article, we show that this assumption does not hold by developing a novel attack technique: the Flush+Flush attack. The Flush+Flush attack only relies on the execution time of the flush instruction, which depends on whether data is cached or not. Flush+Flush does not make any memory accesses, contrary to any other cache attack. Thus, it causes no cache misses at all and the number of cache hits is reduced to a minimum due to the constant cache flushes. Therefore, Flush+Flush attacks are stealthy, i.e., the spy process cannot be detected based on cache hits and misses, or state-of-the-art detection mechanisms. The Flush+Flush attack runs in a higher frequency and thus is faster than any existing cache attack. With 496i?źKB/s in a cross-core covert channel it is 6.7 times faster than any previously published cache covert channel.

...read moreread less

416 citations

Journal Article•DOI•

The Sunway TaihuLight supercomputer: system and applications

[...]

Haohuan Fu¹, Junfeng Liao¹, Jinzhe Yang¹, Lanning Wang², Zhenya Song³, Xiaomeng Huang¹, Chao Yang⁴, Wei Xue¹, Fangfang Liu⁴, Fangli Qiao³, Wei Zhao³, Xunqiang Yin³, Chaofeng Hou⁴, Chenglong Zhang⁴, Wei Ge⁴, Jian Zhang⁴, Yangang Wang⁴, Chunbo Zhou⁴, Guangwen Yang¹ - Show less +15 more•Institutions (4)

Tsinghua University¹, Beijing Normal University², State Oceanic Administration³, Chinese Academy of Sciences⁴

21 Jun 2016-Science in China Series F: Information Sciences

TL;DR: Preliminary efforts on developing and optimizing applications on the TaihuLight system are reported, focusing on key application domains, such as earth system modeling, ocean surface wave modeling, atomistic simulation, and phase-field simulation.

...read moreread less

Abstract: The Sunway TaihuLight supercomputer is the worlds first system with a peak performance greater than 100 PFlops. In this paper, we provide a detailed introduction to the TaihuLight system. In contrast with other existing heterogeneous supercomputers, which include both CPU processors and PCIe-connected many-core accelerators (NVIDIA GPU or Intel Xeon Phi), the computing power of TaihuLight is provided by a homegrown many-core SW26010 CPU that includes both the management processing elements (MPEs) and computing processing elements (CPEs) in one chip. With 260 processing elements in one CPU, a single SW26010 provides a peak performance of over three TFlops. To alleviate the memory bandwidth bottleneck in most applications, each CPE comes with a scratch pad memory, which serves as a user-controlled cache. To support the parallelization of programs on the new many-core architecture, in addition to the basic C/C++ and Fortran compilers, the system provides a customized Sunway OpenACC tool that supports the OpenACC 2.0 syntax. This paper also reports our preliminary efforts on developing and optimizing applications on the TaihuLight system, focusing on key application domains, such as earth system modeling, ocean surface wave modeling, atomistic simulation, and phase-field simulation.

...read moreread less

394 citations

Proceedings Article•DOI•

CATalyst: Defeating last-level cache side channel attacks in cloud computing

[...]

Fangfei Liu¹, Qian Ge², Yuval Yarom², Frank Mckeen³, Carlos V. Rozas³, Gernot Heiser², Ruby B. Lee¹ - Show less +3 more•Institutions (3)

Princeton University¹, NICTA², Intel³

12 Mar 2016

TL;DR: CATalyst, a pseudo-locking mechanism which uses CAT to partition the LLC into a hybrid hardware-software managed cache, is presented, and it is shown that LLC side channel attacks can be defeated.

...read moreread less

Abstract: Cache side channel attacks are serious threats to multi-tenant public cloud platforms. Past work showed how secret information in one virtual machine (VM) can be extracted by another co-resident VM using such attacks. Recent research demonstrated the feasibility of high-bandwidth, low-noise side channel attacks on the last-level cache (LLC), which is shared by all the cores in the processor package, enabling attacks even when VMs are scheduled on different cores. This paper shows how such LLC side channel attacks can be defeated using a performance optimization feature recently introduced in commodity processors. Since most cloud servers use Intel processors, we show how the Intel Cache Allocation Technology (CAT) can be used to provide a system-level protection mechanism to defend from side channel attacks on the shared LLC. CAT is a way-based hardware cache-partitioning mechanism for enforcing quality-of-service with respect to LLC occupancy. However, it cannot be directly used to defeat cache side channel attacks due to the very limited number of partitions it provides. We present CATalyst, a pseudo-locking mechanism which uses CAT to partition the LLC into a hybrid hardware-software managed cache. We implement a proof-of-concept system using Xen and Linux running on a server with Intel processors, and show that LLC side channel attacks can be defeated. Furthermore, CATalyst only causes very small performance overhead when used for security, and has negligible impact on legacy applications.

...read moreread less

360 citations

Book Chapter•DOI•

Rowhammer.js: A Remote Software-Induced Fault Attack in JavaScript

[...]

Daniel Gruss¹, Clémentine Maurice¹, Stefan Mangard¹•Institutions (1)

Graz University of Technology¹

07 Jul 2016

TL;DR: This work shows that caches can be forced into fast cache eviction to trigger the Rowhammer bug with only regular memory accesses, and demonstrates a fully automated attack that requires nothing but a website with JavaScript to trigger faults on remote hardware.

...read moreread less

Abstract: A fundamental assumption in software security is that a memory location can only be modified by processes that may write to this memory location. However, a recent study has shown that parasitic effects in DRAM can change the content of a memory cell without accessing it, but by accessing other memory locations in a high frequency. This so-called Rowhammer bug occurs in most of today's memory modules and has fatal consequences for the security of all affected systems, e.g., privilege escalation attacks. All studies and attacks related to Rowhammer so far rely on the availability of a cache flush instruction in order to cause accesses to DRAM modules at a sufficiently high frequency. We overcome this limitation by defeating complex cache replacement policies. We show that caches can be forced into fast cache eviction to trigger the Rowhammer bug with only regular memory accesses. This allows to trigger the Rowhammer bug in highly restricted and even scripting environments. We demonstrate a fully automated attack that requires nothing but a website with JavaScript to trigger faults on remote hardware. Thereby we can gain unrestricted access to systems of website visitors. We show that the attack works on off-the-shelf systems. Existing countermeasures fail to protect against this new Rowhammer attack.

...read moreread less

296 citations

Journal Article•DOI•

Analysis on Cache-Enabled Wireless Heterogeneous Networks

[...]

Chenchen Yang¹, Yao Yao², Zhiyong Chen¹, Bin Xia¹•Institutions (2)

Shanghai Jiao Tong University¹, Huawei²

01 Jan 2016-IEEE Transactions on Wireless Communications

TL;DR: In this paper, the authors proposed and analyzed cache-based content delivery in a three-tier heterogeneous network (HetNet), where base stations (BSs), relays, and device-to-device (D2D) pairs are included.

...read moreread less

Abstract: Caching popular multimedia content is a promising way to unleash the ultimate potential of wireless networks. In this paper, we propose and analyze cache-based content delivery in a three-tier heterogeneous network (HetNet), where base stations (BSs), relays, and device-to-device (D2D) pairs are included. We advocate proactively caching popular content in the relays and parts of the users with caching ability when the network is off-peak. The cached content can be reused for frequent access to offload the cellular network traffic. The node locations are first modeled as mutually independent Poisson point processes (PPPs) and the corresponding content access protocol is developed. The average ergodic rate and outage probability in the downlink are then analyzed theoretically. We further derive the throughput and the delay based on the multiclass processor-sharing queue model and the continuous-time Markov process. According to the critical condition of the steady state in the HetNet, the maximum traffic load and the global throughput gain are investigated. Moreover, impacts of some key network characteristics, e.g., the heterogeneity of multimedia contents, node densities, and the limited caching capacities, on the system performance are elaborated on to provide valuable insight.

...read moreread less

293 citations

Posted Content•

Improving Neural Language Models with a Continuous Cache

[...]

Edouard Grave¹, Armand Joulin², Nicolas Usunier²•Institutions (2)

Columbia University¹, Facebook²

13 Dec 2016-arXiv: Computation and Language

TL;DR: This article propose an extension to neural network language models to adapt their prediction to the recent history, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation.

...read moreread less

Abstract: We propose an extension to neural network language models to adapt their prediction to the recent history. Our model is a simplified version of memory augmented networks, which stores past hidden activations as memory and accesses them through a dot product with the current hidden activation. This mechanism is very efficient and scales to very large memory sizes. We also draw a link between the use of external memory in neural network and cache models used with count based language models. We demonstrate on several language model datasets that our approach performs significantly better than recent memory augmented networks.

...read moreread less

264 citations

Journal Article•DOI•

Online coded caching

[...]

Ramtin Pedarsani¹, Mohammad Ali Maddah-Ali², Urs Niesen²•Institutions (2)

University of California¹, Bell Labs²

01 Apr 2016-IEEE ACM Transactions on Networking

TL;DR: This work proposes an online coded caching scheme termed coded least-recently sent (LRS) and simulates it for a demand time series derived from the dataset made available by Netflix for the Netflix Prize, showing that the proposed coded LRS algorithm significantly outperforms the popular least- recently used caching algorithm.

...read moreread less

Abstract: We consider a basic content distribution scenario consisting of a single origin server connected through a shared bottleneck link to a number of users each equipped with a cache of finite memory. The users issue a sequence of content requests from a set of popular files, and the goal is to operate the caches as well as the server such that these requests are satisfied with the minimum number of bits sent over the shared link. Assuming a basic Markov model for renewing the set of popular files, we characterize approximately the optimal long-term average rate of the shared link. We further prove that the optimal online scheme has approximately the same performance as the optimal offline scheme, in which the cache contents can be updated based on the entire set of popular files before each new request. To support these theoretical results, we propose an online coded caching scheme termed coded least-recently sent (LRS) and simulate it for a demand time series derived from the dataset made available by Netflix for the Netflix Prize. For this time series, we show that the proposed coded LRS algorithm significantly outperforms the popular least-recently used caching algorithm.

...read moreread less

249 citations

Journal Article•DOI•

Exploiting Caching and Multicast for 5G Wireless Networks

[...]

Konstantinos Poularakis¹, George Iosifidis², Vasilis Sourlas³, Leandros Tassiulas²•Institutions (3)

University of Thessaly¹, Yale University², University College London³

04 Jan 2016-IEEE Transactions on Wireless Communications

TL;DR: It is shown that the multicast-aware caching problem is NP-hard and solutions with performance guarantees using randomized-rounding techniques are developed, showing that in the presence of massive demand for delay tolerant content, combining caching and multicast can indeed reduce energy costs.

...read moreread less

Abstract: The landscape toward 5G wireless communication is currently unclear, and, despite the efforts of academia and industry in evolving traditional cellular networks, the enabling technology for 5G is still obscure. This paper puts forward a network paradigm toward next-generation cellular networks, targeting to satisfy the explosive demand for mobile data while minimizing energy expenditures. The paradigm builds on two principles; namely caching and multicast . On one hand, caching policies disperse popular content files at the wireless edge, e.g., pico-cells and femto-cells, hence shortening the distance between content and requester. On other hand, due to the broadcast nature of wireless medium, requests for identical files occurring at nearby times are aggregated and served through a common multicast stream. To better exploit the available cache space, caching policies are optimized based on multicast transmissions. We show that the multicast-aware caching problem is NP-hard and develop solutions with performance guarantees using randomized-rounding techniques. Trace-driven numerical results show that in the presence of massive demand for delay tolerant content, combining caching and multicast can indeed reduce energy costs. The gains over existing caching schemes are 19% when users tolerate delay of three minutes, increasing further with the steepness of content access pattern.

...read moreread less

241 citations

Proceedings Article•DOI•

ARMageddon: cache attacks on mobile devices

[...]

Moritz Lipp¹, Daniel Gruss¹, Raphael Spreitzer¹, Clémentine Maurice¹, Stefan Mangard¹ - Show less +1 more•Institutions (1)

Graz University of Technology¹

10 Aug 2016

TL;DR: This work demonstrates how to solve key challenges to perform the most powerful cross-core cache attacks Prime+Probe, Flush+ Reload, Evict+Reload, and Flush-Flush on non-rooted ARM-based devices without any privileges.

...read moreread less

Abstract: In the last 10 years, cache attacks on Intel x86 CPUs have gained increasing attention among the scientific community and powerful techniques to exploit cache side channels have been developed. However, modern smartphones use one or more multi-core ARM CPUs that have a different cache organization and instruction set than Intel x86 CPUs. So far, no cross-core cache attacks have been demonstrated on non-rooted Android smartphones. In this work, we demonstrate how to solve key challenges to perform the most powerful cross-core cache attacks Prime+Probe, Flush+Reload, Evict+Reload, and Flush+Flush on non-rooted ARM-based devices without any privileges. Based on our techniques, we demonstrate covert channels that outperform state-of-the-art covert channels on Android by several orders of magnitude. Moreover, we present attacks to monitor tap and swipe events as well as keystrokes, and even derive the lengths of words entered on the touchscreen. Eventually, we are the first to attack cryptographic primitives implemented in Java. Our attacks work across CPUs and can even monitor cache activity in the ARM TrustZone from the normal world. The techniques we present can be used to attack hundreds of millions of Android devices.

...read moreread less

Proceedings Article•DOI•

Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation

[...]

Kevin Hsieh¹, Samira Khan², Nandita Vijaykumar¹, Kevin K. Chang¹, Amirali Boroumand¹, Saugata Ghose¹, Onur Mutlu³ - Show less +3 more•Institutions (3)

Carnegie Mellon University¹, University of Virginia², ETH Zurich³

01 Oct 2016

TL;DR: The In-Memory PoInter Chasing Accelerator (IMPICA), which leverages the logic layer within 3D-stacked memory for linked data structure traversal and addresses the key challenges of how to achieve high parallelism in the presence of serial accesses in pointer chasing, and how to effectively perform virtual-to-physical address translation on the memory side without requiring expensive accesses to the CPU's memory management unit.

...read moreread less

Abstract: Pointer chasing is a fundamental operation, used by many important data-intensive applications (e.g., databases, key-value stores, graph processing workloads) to traverse linked data structures. This operation is both memory bound and latency sensitive, as it (1) exhibits irregular access patterns that cause frequent cache and TLB misses, and (2) requires the data from every memory access to be sent back to the CPU to determine the next pointer to access. Our goal is to accelerate pointer chasing by performing it inside main memory, thereby avoiding inefficient and high-latency data transfers between main memory and the CPU. To this end, we propose the In-Memory PoInter Chasing Accelerator (IMPICA), which leverages the logic layer within 3D-stacked memory for linked data structure traversal.

...read moreread less

Book Chapter•DOI•

CloudRadar: A Real-Time Side-Channel Attack Detection System in Clouds

[...]

Tianwei Zhang¹, Yinqian Zhang², Ruby B. Lee¹•Institutions (2)

Princeton University¹, Ohio State University²

19 Sep 2016

TL;DR: This work presents CloudRadar, a system to detect, and hence mitigate, cache-based side-channel attacks in multi-tenant cloud systems, designed as a lightweight patch to existing cloud systems which does not require new hardware support, or any hypervisor, operating system, application modifications.

...read moreread less

Abstract: We present CloudRadar, a system to detect, and hence mitigate, cache-based side-channel attacks in multi-tenant cloud systems. CloudRadar operates by correlating two events: first, it exploits signature-based detection to identify when the protected virtual machine (VM) executes a cryptographic application; at the same time, it uses anomaly-based detection techniques to monitor the co-located VMs to identify abnormal cache behaviors that are typical during cache-based side-channel attacks. We show that correlation in the occurrence of these two events offer strong evidence of side-channel attacks. Compared to other work on side-channel defenses, CloudRadar has the following advantages: first, CloudRadar focuses on the root causes of cache-based side-channel attacks and hence is hard to evade using metamorphic attack code, while maintaining a low false positive rate. Second, CloudRadar is designed as a lightweight patch to existing cloud systems, which does not require new hardware support, or any hypervisor, operating system, application modifications. Third, CloudRadar provides real-time protection and can detect side-channel attacks within the order of milliseconds. We demonstrate a prototype implementation of CloudRadar in the OpenStack cloud framework. Our evaluation suggests CloudRadar achieves negligible performance overhead with high detection accuracy.

...read moreread less

Journal Article•DOI•

Cluster Content Caching: An Energy-Efficient Approach to Improve Quality of Service in Cloud Radio Access Networks

[...]

Zhongyuan Zhao¹, Mugen Peng¹, Zhiguo Ding², Wenbo Wang¹, H. Vincent Poor³ - Show less +1 more•Institutions (3)

Beijing University of Posts and Telecommunications¹, Lancaster University², Princeton University³

22 Mar 2016-IEEE Journal on Selected Areas in Communications

TL;DR: Tractable expressions for both effective capacity and energy efficiency performance are derived and show that the proposed cluster content caching structure can improve QoS guarantees with a lower cost of local storage.

...read moreread less

Abstract: In cloud radio access networks (C-RANs), a substantial amount of data must be exchanged in both backhaul and fronthaul links, which causes high power consumption and poor quality of service (QoS) experience for real-time services. To solve this problem, a cluster content caching structure is proposed in this paper, which takes full advantages of distributed caching and centralized signal processing. In particular, redundant traffic on the backhaul can be reduced because the cluster content cache provides a part of required content objects for remote radio heads (RRHs) connected to a common edge cloud. Tractable expressions for both effective capacity and energy efficiency performance are derived, which show that the proposed structure can improve QoS guarantees with a lower cost of local storage. Furthermore, to fully explore the potential of the proposed cluster content caching structure, the joint design of resource allocation and RRH association is optimized, and two distributed algorithms are accordingly proposed. Simulation results verify the accuracy of the analytical results and show the performance gains achieved by cluster content caching in C-RANs.

...read moreread less

Proceedings Article•

DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks

[...]

Peter Pessl¹, Daniel Gruss¹, Clémentine Maurice¹, Michael Schwarz¹, Stefan Mangard¹ - Show less +1 more•Institutions (1)

Graz University of Technology¹

01 Jan 2016

TL;DR: In this article, the DRAM address mappings are used to reverse engineer the mapping of memory addresses to DRAM channels, ranks, and banks, and a new class of attacks, DRAMA attacks, are presented.

...read moreread less

Abstract: In cloud computing environments, multiple tenants are often co-located on the same multi-processor system. Thus, preventing information leakage between tenants is crucial. While the hypervisor enforces software isolation, shared hardware, such as the CPU cache or memory bus, can leak sensitive information. For security reasons, shared memory between tenants is typically disabled. Furthermore, tenants often do not share a physical CPU. In this setting, cache attacks do not work and only a slow cross-CPU covert channel over the memory bus is known. In contrast, we demonstrate a high-speed covert channel as well as the first side-channel attack working across processors and without any shared memory. To build these attacks, we use the undocumented DRAM address mappings. We present two methods to reverse engineer the mapping of memory addresses to DRAM channels, ranks, and banks. One uses physical probing of the memory bus, the other runs entirely in software and is fully automated. Using this mapping, we introduce DRAMA attacks, a novel class of attacks that exploit the DRAM row buffer that is shared, even in multi-processor systems. Thus, our attacks work in the most restrictive environments. First, we build a covert channel with a capacity of up to 2Mbps, which is three to four orders of magnitude faster than memory-bus-based channels. Second, we build a side-channel template attack that can automatically locate and monitor memory accesses. Third, we show how using the DRAM mappings improves existing attacks and in particular enables practical Rowhammer attacks on DDR4.

...read moreread less

Journal Article•DOI•

Real time detection of cache-based side-channel attacks using hardware performance counters

[...]

Marco Chiappetta¹, Erkay Savas¹, Cemal Yilmaz¹•Institutions (1)

Sabancı University¹

01 Dec 2016

TL;DR: This paper analyzes three methods to detect cache-based side-channel attacks in real time, preventing or limiting the amount of leaked information, and how the detection systems behave with a modified version of one of the spy processes.

...read moreread less

Abstract: Graphical abstractDisplay Omitted HighlightsThree methods for detecting a class of cache-based side-channel attacks are proposed.A new tool (quickhpc) for probing hardware performance counters at a higher temporal resolution than the existing tools is presented.The first method is based on correlation, the other two use machine learning techniques and reach a minimum F-score of 0.93.A smarter attack is devised that is capable of circumventing the first method. In this paper we analyze three methods to detect cache-based side-channel attacks in real time, preventing or limiting the amount of leaked information. Two of the three methods are based on machine learning techniques and all the three of them can successfully detect an attack in about one fifth of the time required to complete it. We could not experience the presence of false positives in our test environment and the overhead caused by the detection systems is negligible. We also analyze how the detection systems behave with a modified version of one of the spy processes. With some optimization we are confident these systems can be used in real world scenarios.

...read moreread less

Proceedings Article•DOI•

Fundamental Limits of Cache-Aided Interference Management

[...]

Navid Naderializadeh¹, Mohammad Ali Maddah-Ali², A. Salman Avestimehr¹•Institutions (2)

University of Southern California¹, Bell Labs²

12 Feb 2016-arXiv: Information Theory

TL;DR: In this article, the authors considered a cache-aided wireless network with a library of files and showed that the sum degrees-of-freedom (sum-DoF) of the network is within a factor of 2 of the optimum under one-shot linear schemes.

...read moreread less

Abstract: We consider a system comprising a library of $N$ files (e.g., movies) and a wireless network with $K_T$ transmitters, each equipped with a local cache of size of $M_T$ files, and $K_R$ receivers, each equipped with a local cache of size of $M_R$ files. Each receiver will ask for one of the $N$ files in the library, which needs to be delivered. The objective is to design the cache placement (without prior knowledge of receivers' future requests) and the communication scheme to maximize the throughput of the delivery. In this setting, we show that the sum degrees-of-freedom (sum-DoF) of $\min\left\{\frac{K_T M_T+K_R M_R}{N},K_R\right\}$ is achievable, and this is within a factor of 2 of the optimum, under one-shot linear schemes. This result shows that (i) the one-shot sum-DoF scales linearly with the aggregate cache size in the network (i.e., the cumulative memory available at all nodes), (ii) the transmitters' and receivers' caches contribute equally in the one-shot sum-DoF, and (iii) caching can offer a throughput gain that scales linearly with the size of the network. To prove the result, we propose an achievable scheme that exploits the redundancy of the content at transmitters' caches to cooperatively zero-force some outgoing interference and availability of the unintended content at receivers' caches to cancel (subtract) some of the incoming interference. We develop a particular pattern for cache placement that maximizes the overall gains of cache-aided transmit and receive interference cancellations. For the converse, we present an integer optimization problem which minimizes the number of communication blocks needed to deliver any set of requested files to the receivers. We then provide a lower bound on the value of this optimization problem, hence leading to an upper bound on the linear one-shot sum-DoF of the network, which is within a factor of 2 of the achievable sum-DoF.

...read moreread less

Proceedings Article•DOI•

On the optimality of uncoded cache placement

[...]

Kai Wan¹, Daniela Tuninetti², Pablo Piantanida³•Institutions (3)

University of Paris-Sud¹, University of Illinois at Chicago², Centre national de la recherche scientifique³

11 Sep 2016

TL;DR: In this article, when the cache contents and the user demands are fixed, the authors connect the caching problem to an index coding problem and show the optimality of the MAN scheme under the conditions that the cache placement phase is restricted to be uncoded (i.e., pieces of the files can only copied into the user's cache).

...read moreread less

Abstract: Caching is an effective way to reduce peak-hour network traffic congestion by storing some contents at user's local cache. Maddah-Ali and Niesen (MAN) initiated a fundamental study of caching systems by proposing a scheme (with uncoded cache placement and linear network coding delivery) that is provably optimal to within a factor 4.7. In this paper, when the cache contents and the user demands are fixed, we connect the caching problem to an index coding problem and show the optimality of the MAN scheme under the conditions that (i) the cache placement phase is restricted to be uncoded (i.e, pieces of the files can only copied into the user's cache), and (ii) the number of users is no more than the number of files. As a consequence, further improvements to the MAN scheme are only possible through the use of coded cache placement.

...read moreread less

Journal Article•DOI•

A Survey of Caching Policies and Forwarding Mechanisms in Information-Centric Networking

[...]

Andriana Ioannou¹, Stefan Weber¹•Institutions (1)

Trinity College, Dublin¹

01 Oct 2016-IEEE Communications Surveys and Tutorials

TL;DR: A review of the caching problem in ICN, with a focus on on-path caching, is provided and a detailed analysis of the existing caching policies and forwarding mechanisms that complement these policies is given.

...read moreread less

Abstract: Information-centric networking (ICN), an alternative to the host-centric model of the current Internet infrastructure, focuses on the distribution and retrieval of content instead of the transfer of information between specific endpoints In order to achieve this, ICN is based on the paradigm of publish-subscribe and the concepts of naming and in-network caching Current approaches to ICN employ caches within networks to minimize the latency of information retrieval Content may be distributed either in caches along the delivery path(s), on-path caching or in any cache within a network, off-path caching While approaches to off-path caching are comparable to traditional approaches for content replication and Web caching, approaches to on-path caching are specific to the ICN area The purpose of this paper is to provide a review of the caching problem in ICN, with a focus on on-path caching To this end, a detailed analysis of the existing caching policies and forwarding mechanisms that complement these policies is given A number of criteria such as the caching model and level of operation and the evaluation parameters used in the evaluation of the existing caching policies are being employed to derive a taxonomy for on-path caching and highlight the trends and evaluation issues in this area A discussion driven by the advantages and disadvantages of the existing caching policies and the challenges and open questions in on-path caching is finally being held

...read moreread less

Proceedings Article•DOI•

CacheFlow: Dependency-Aware Rule-Caching for Software-Defined Networks

[...]

Naga Praveen Kumar Katta¹, Omid Alipourfard², Jennifer Rexford¹, David Walker¹•Institutions (2)

Princeton University¹, University of Southern California²

14 Mar 2016

TL;DR: This paper shows how to give applications the illusion of high-speed forwarding, large rule tables, and fast updates by combining the best of hardware and software processing.

...read moreread less

Abstract: Software-Defined Networking (SDN) allows control applications to install fine-grained forwarding policies in the underlying switches. While Ternary Content Addressable Memory (TCAM) enables fast lookups in hardware switches with flexible wildcard rule patterns, the cost and power requirements limit the number of rules the switches can support. To make matters worse, these hardware switches cannot sustain a high rate of updates to the rule table. In this paper, we show how to give applications the illusion of high-speed forwarding, large rule tables, and fast updates by combining the best of hardware and software processing. Our CacheFlow system "caches" the most popular rules in the small TCAM, while relying on software to handle the small amount of "cache miss" traffic. However, we cannot blindly apply existing cache-replacement algorithms, because of dependencies between rules with overlapping patterns. Rather than cache large chains of dependent rules, we "splice" long dependency chains to cache smaller groups of rules while preserving the semantics of the policy. Experiments with our CacheFlow prototype---on both real and synthetic workloads and policies---demonstrate that rule splicing makes effective use of limited TCAM space, while adapting quickly to changes in the policy and the traffic demands.

...read moreread less

Proceedings Article•DOI•

ChargeCache: Reducing DRAM latency by exploiting row access locality

[...]

Hasan Hassan¹, Gennady Pekhimenko¹, Nandita Vijaykumar¹, Vivek Seshadri¹, Donghyuk Lee¹, Oguz Ergin², Onur Mutlu¹ - Show less +3 more•Institutions (2)

Carnegie Mellon University¹, TOBB University of Economics and Technology²

12 Mar 2016

TL;DR: This work develops a low-cost mechanism, called ChargeCache, that enables faster access to recently- accessed rows in DRAM, with no modifications to DRAM chips, based on the key observation that a recently-accessed row has more charge and thus the following access to the same row can be performed faster.

...read moreread less

Abstract: DRAM latency continues to be a critical bottleneck for system performance. In this work, we develop a low-cost mechanism, called Charge Cache, that enables faster access to recently-accessed rows in DRAM, with no modifications to DRAM chips. Our mechanism is based on the key observation that a recently-accessed row has more charge and thus the following access to the same row can be performed faster. To exploit this observation, we propose to track the addresses of recently-accessed rows in a table in the memory controller. If a later DRAM request hits in that table, the memory controller uses lower timing parameters, leading to reduced DRAM latency. Row addresses are removed from the table after a specified duration to ensure rows that have leaked too much charge are not accessed with lower latency. We evaluate ChargeCache on a wide variety of workloads and show that it provides significant performance and energy benefits for both single-core and multi-core systems.

...read moreread less

Proceedings Article•DOI•

ANVIL: Software-Based Protection Against Next-Generation Rowhammer Attacks

[...]

Zelalem Birhanu Aweke¹, Salessawi Ferede Yitbarek¹, Rui Qiao², Reetuparna Das¹, Matthew Hicks¹, Yossi Oren³, Todd Austin¹ - Show less +3 more•Institutions (3)

University of Michigan¹, Stony Brook University², Ben-Gurion University of the Negev³

25 Mar 2016

TL;DR: A software-based defense, ANVIL, is developed, which thwarts all known rowhammer attacks on existing systems and is shown to be low-cost and robust, and experiments indicate that it is an effective approach for protecting existing and future systems from even advanced rowhAMmer attacks.

...read moreread less

Abstract: Ensuring the integrity and security of the memory system is critical. Recent studies have shown serious security concerns due to "rowhammer" attacks, where repeated accesses to a row of memory cause bit flips in adjacent rows. Recent work by Google's Project Zero has shown how to leverage rowhammer-induced bit-flips as the basis for security exploits that include malicious code injection and memory privilege escalation. Being an important security concern, industry has attempted to defend against rowhammer attacks. Deployed defenses employ two strategies: (1) doubling the system DRAM refresh rate and (2) restricting access to the CLFLUSH instruction that attackers use to bypass the cache to increase memory access frequency (i.e., the rate of rowhammering). We demonstrate that such defenses are inadequte: we implement rowhammer attacks that both avoid using the CLFLUSH instruction and cause bit flips with a doubled refresh rate. Our next-generation CLFLUSH-free rowhammer attack bypasses the cache by manipulating cache replacement state to allow frequent misses out of the last-level cache to DRAM rows of our choosing. To protect existing systems from more advanced rowhammer attacks, we develop a software-based defense, ANVIL, which thwarts all known rowhammer attacks on existing systems. ANVIL detects rowhammer attacks by tracking the locality of DRAM accesses using existing hardware performance counters. Our detector identifies the rows being frequently accessed (i.e., the aggressors), then selectively refreshes the nearby victim rows to prevent hammering. Experiments running on real hardware with the SPEC2006 benchmarks show that ANVIL has less than a 1% false positive rate and an average slowdown of 1%. ANVIL is low-cost and robust, and our experiments indicate that it is an effective approach for protecting existing and future systems from even advanced rowhammer attacks.

...read moreread less

Journal Article•DOI•

Energy Efficiency of Downlink Networks With Caching at Base Stations

[...]

Dong Liu¹, Chenyang Yang¹•Institutions (1)

Beihang University¹

31 Mar 2016-IEEE Journal on Selected Areas in Communications

TL;DR: In this article, the authors explore the potential of EE of cache-enabled wireless access networks and identify the key factors that contribute more to the EE gain from caching, and derive the closed-form expression of the approximated EE.

...read moreread less

Abstract: Caching popular contents at base stations (BSs) can reduce the backhaul cost and improve the network throughput. Yet whether locally caching at the BSs can improve the energy efficiency (EE), a major goal for fifth generation cellular networks, remains unclear. Due to the entangled impact of various factors on EE such as interference level, backhaul capacity, BS density, power consumption parameters, BS sleeping, content popularity, and cache capacity, another important question is what are the key factors that contribute more to the EE gain from caching. In this paper, we attempt to explore the potential of EE of the cache-enabled wireless access networks and identify the key factors. By deriving closed-form expression of the approximated EE, we provide the condition when the EE can benefit from caching, find the optimal cache capacity that maximizes the network EE, and analyze the maximal EE gain brought by caching. We show that caching at the BSs can improve the network EE when power efficient cache hardware is used. When local caching has EE gain over not caching, caching more contents at the BSs may not provide higher EE. Numerical and simulation results show that the caching EE gain is large when the backhaul capacity is stringent, interference level is low, content popularity is skewed, and when caching at pico BSs instead of macro BSs.

...read moreread less

Book Chapter•DOI•

Cache Attacks Enable Bulk Key Recovery on the Cloud

[...]

Mehmet Sinan Inci¹, Berk Gulmezoglu¹, Gorka Irazoqui¹, Thomas Eisenbarth¹, Berk Sunar¹ - Show less +1 more•Institutions (1)

Worcester Polytechnic Institute¹

17 Aug 2016

TL;DR: This paper argues that shared resources like the CPU, memory and even the network adapter that provide subtle side-channels to malicious parties indeed leak fine grained, sensitive information and enable key recovery attacks on the cloud.

...read moreread less

Abstract: Cloud services keep gaining popularity despite the security concerns. While non-sensitive data is easily trusted to cloud, security critical data and applications are not. The main concern with the cloud is the shared resources like the CPU, memory and even the network adapter that provide subtle side-channels to malicious parties. We argue that these side-channels indeed leak fine grained, sensitive information and enable key recovery attacks on the cloud. Even further, as a quick scan in one of the Amazon EC2 regions shows, high percentage – 55 % – of users run outdated, leakage prone libraries leaving them vulnerable to mass surveillance.

...read moreread less

Journal Article•DOI•

GLIMPSE: Continuous, Real-Time Object Recognition on Mobile Devices

[...]

Tiffany Yu-Han Chen¹, Hari Balakrishnan¹, Lenin Ravindranath², Paramvir Bahl²•Institutions (2)

Massachusetts Institute of Technology¹, Microsoft²

14 Jul 2016

TL;DR: Glimpse is a continuous, real-time object recognition system for camera-equipped mobile devices that captures full-motion video, locates objects of interest, recognizes and labels them, and tracks them from frame to frame for the user.

...read moreread less

Abstract: Excerpted from "Glimpse: Continuous, Real-Time Object Recognition on Mobile Devices" from Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems with permission. http://dx.doi.org/10.1145/2809695.2809711 © ACM 2015.Glimpse is a continuous, real-time object recognition system for camera-equipped mobile devices. Glimpse captures full-motion video, locates objects of interest, recognizes and labels them, and tracks them from frame to frame for the user. Because the algorithms for object recognition entail significant computation, Glimpse runs them on server machines. When the latency between the server and mobile device is higher than a frame-time, this approach lowers object recognition accuracy. To regain accuracy, Glimpse uses an active cache of video frames on the mobile device. A subset of the frames in the active cache are used to track objects on the mobile, using (stale) hints about objects that arrive from the server from time to time. To reduce network bandwidth usage, Glimpse computes trigger frames to send to the server for recognizing.

...read moreread less

Proceedings Article•DOI•

Popularity-driven content caching

[...]

S. Y. Li¹, Jie Xu², Mihaela van der Schaar³, Weiping Li¹•Institutions (3)

University of Science and Technology of China¹, University of Miami², University of California, Los Angeles³

10 Apr 2016

TL;DR: It is proved that the learning regret of PopCaching is sublinear in the number of content requests, and it is converges fast and asymptotically achieves the optimal cache hit rate.

...read moreread less

Abstract: This paper presents a novel cache replacement method — Popularity-Driven Content Caching (PopCaching). PopCaching learns the popularity of content and uses it to determine which content it should store and which it should evict from the cache. Popularity is learned in an online fashion, requires no training phase and hence, it is more responsive to continuously changing trends of content popularity. We prove that the learning regret of PopCaching (i.e., the gap between the hit rate achieved by PopCaching and that by the optimal caching policy with hindsight) is sublinear in the number of content requests. Therefore, PopCaching converges fast and asymptotically achieves the optimal cache hit rate. We further demonstrate the effectiveness of PopCaching by applying it to a movie.douban.com dataset that contains over 38 million requests. Our results show significant cache hit rate lift compared to existing algorithms, and the improvements can exceed 40% when the cache capacity is limited. In addition, PopCaching has low complexity.

...read moreread less

Proceedings Article•DOI•

Placing dynamic content in caches with small population

[...]

Mathieu Leconte¹, Georgios S. Paschos¹, Lazaros Gkatzikis¹, Moez Draief¹, Spyridon Vassilaras¹, Symeon Chouvardas¹ - Show less +2 more•Institutions (1)

Huawei¹

10 Apr 2016

TL;DR: This paper proposes an Age-Based Threshold (ABT) policy which caches all contents requested more times than a threshold N (τ), and shows that ABT is asymptotically hit rate optimal in the many contents regime, which allows the first characterization of the optimal performance of a caching system in a dynamic context.

...read moreread less

Abstract: This paper addresses a fundamental limitation for the adoption of caching for wireless access networks due to small population sizes. This shortcoming is due to two main challenges: making timely estimates of varying content popularity and inferring popular content from small samples. We propose a framework which alleviates such limitations. To timely estimate varying popularity in a context of a single cache we propose an Age-Based Threshold (ABT) policy which caches all contents requested more times than a threshold N (τ), where τ is the content age. We show that ABT is asymptotically hit rate optimal in the many contents regime, which allows us to obtain the first characterization of the optimal performance of a caching system in a dynamic context. We then address small sample sizes focusing on L local caches and one global cache. On the one hand we show that the global cache learns L times faster by aggregating all requests from local caches, which improves hit rates. On the other hand, aggregation washes out local characteristics of correlated traffic which penalizes hit rate. This motivates coordination mechanisms which combine global learning of popularity scores in clusters and Least-Recently-Used (LRU) policy with prefetching.

...read moreread less

Proceedings Article•DOI•

A utility optimization approach to network cache design

[...]

Mostafa Dehghan¹, Laurent Massoulié², Don Towsley¹, Daniel Sadoc Menasche³, Y. C. Tay⁴ - Show less +1 more•Institutions (4)

University of Massachusetts Amherst¹, Microsoft², Federal University of Rio de Janeiro³, National University of Singapore⁴

10 Apr 2016

TL;DR: This paper proposes utility-driven caching, where each content is associate with each content a utility, which is a function of the corresponding content hit probability, and develops online algorithms that can be used by service providers to implement various caching policies based on arbitrary utility functions.

...read moreread less

Abstract: In any caching system, the admission and eviction policies determine which contents are added and removed from a cache when a miss occurs. Usually, these policies are devised so as to mitigate staleness and increase the hit probability. Nonetheless, the utility of having a high hit probability can vary across contents. This occurs, for instance, when service level agreements must be met, or if certain contents are more difficult to obtain than others. In this paper, we propose utility-driven caching, where we associate with each content a utility, which is a function of the corresponding content hit probability. We formulate optimization problems where the objectives are to maximize the sum of utilities over all contents. These problems differ according to the stringency of the cache capacity constraint. Our framework enables us to reverse engineer classical replacement policies such as LRU and FIFO, by computing the utility functions that they maximize. We also develop online algorithms that can be used by service providers to implement various caching policies based on arbitrary utility functions.

...read moreread less

Journal Article•DOI•

Back to the future: leveraging Belady's algorithm for improved cache replacement

[...]

Akanksha Jain¹, Calvin Lin¹•Institutions (1)

University of Texas at Austin¹

18 Jun 2016

TL;DR: This paper explains how a cache replacement algorithm can nonetheless learn from Belady's algorithm by applying it to past cache accesses to inform future cache replacement decisions, and shows that the implementation is surprisingly efficient.

...read moreread less

Abstract: Belady's algorithm is optimal but infeasible because it requires knowledge of the future. This paper explains how a cache replacement algorithm can nonetheless learn from Belady's algorithm by applying it to past cache accesses to inform future cache replacement decisions. We show that the implementation is surprisingly efficient, as we introduce a new method of efficiently simulating Belady's behavior, and we use known sampling techniques to compactly represent the long history information that is needed for high accuracy. For a 2MB LLC, our solution uses a 16KB hardware budget (excluding replacement state in the tag array). When applied to a memory-intensive subset of the SPEC 2006 CPU benchmarks, our solution improves performance over LRU by 8.4%, as opposed to 6.2% for the previous state-of-the-art. For a 4-core system with a shared 8MB LLC, our solution improves performance by 15.0%, compared to 12.0% for the previous state-of-the-art.

...read moreread less

Book Chapter•DOI•

Flush+Flush: A stealthier last-level cache attack

[...]

Daniel Gruss¹, Clémentine Maurice, Klaus Wagner¹•Institutions (1)

Graz University of Technology¹

01 Jan 2016-arXiv: Cryptography and Security

TL;DR: The Flush+Flush attack has a performance close to state-of-the-art side channels in existing cache attack scenarios, while reducing cache misses significantly below the border of detectability, in the first work discussing the stealthiness of cache attacks both from the attacker and the defender perspective.

...read moreread less

Abstract: Research on cache attacks has shown that CPU caches leak significant information. Recent attacks either use the Flush+Reload technique on read-only shared memory or the Prime+Probe technique without shared memory, to derive encryption keys or eavesdrop on user input. Efficient countermeasures against these powerful attacks that do not cause a loss of performance are a challenge. In this paper, we use hardware performance counters as a means to detect access-based cache attacks. Indeed, existing attacks cause numerous cache references and cache misses and can subsequently be detected. We propose a new criteria that uses these events for ad-hoc detection. These findings motivate the development of a novel attack technique: the Flush+Flush attack. The Flush+Flush attack only relies on the execution time of the flush instruction, that depends on whether the data is cached or not. Like Flush+Reload, it monitors when a process loads read-only shared memory into the CPU cache. However, Flush+Flush does not have a reload step, thus causing no cache misses compared to typical Flush+Reload and Prime+Probe attacks. We show that the significantly lower impact on the hardware performance counters therefore evades detection mechanisms. The Flush+Flush attack has a performance close to state-of-the-art side channels in existing cache attack scenarios, while reducing cache misses significantly below the border of detectability. Our Flush+Flush covert channel achieves a transmission rate of 496KB/s which is 6.7 times faster than any previously published cache covert channel. To the best of our knowledge, this is the first work discussing the stealthiness of cache attacks both from the attacker and the defender perspective.

...read moreread less

Collapse