Showing papers on "Cache published in 2018"

PDF

Open Access

Posted Content•

Spectre Attacks: Exploiting Speculative Execution

[...]

Paul C. Kocher, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, Yuval Yarom - Show less +6 more

03 Jan 2018-arXiv: Cryptography and Security

TL;DR: This paper describes practical attacks that combine methodology from side channel attacks, fault attacks, and return-oriented programming that can read arbitrary memory from the victim's process that violate the security assumptions underpinning numerous software security mechanisms.

...read moreread less

Abstract: Modern processors use branch prediction and speculative execution to maximize performance. For example, if the destination of a branch depends on a memory value that is in the process of being read, CPUs will try guess the destination and attempt to execute ahead. When the memory value finally arrives, the CPU either discards or commits the speculative computation. Speculative logic is unfaithful in how it executes, can access to the victim's memory and registers, and can perform operations with measurable side effects. Spectre attacks involve inducing a victim to speculatively perform operations that would not occur during correct program execution and which leak the victim's confidential information via a side channel to the adversary. This paper describes practical attacks that combine methodology from side channel attacks, fault attacks, and return-oriented programming that can read arbitrary memory from the victim's process. More broadly, the paper shows that speculative execution implementations violate the security assumptions underpinning numerous software security mechanisms, including operating system process separation, static analysis, containerization, just-in-time (JIT) compilation, and countermeasures to cache timing/side-channel attacks. These attacks represent a serious threat to actual systems, since vulnerable speculative execution capabilities are found in microprocessors from Intel, AMD, and ARM that are used in billions of devices. While makeshift processor-specific countermeasures are possible in some cases, sound solutions will require fixes to processor designs as well as updates to instruction set architectures (ISAs) to give hardware architects and software developers a common understanding as to what computation state CPU implementations are (and are not) permitted to leak.

...read moreread less

576 citations

Journal Article•DOI•

The Exact Rate-Memory Tradeoff for Caching With Uncoded Prefetching

[...]

Qian Yu¹, Mohammad Ali Maddah-Ali², A. Salman Avestimehr¹•Institutions (2)

University of Southern California¹, Sharif University of Technology²

01 Feb 2018-IEEE Transactions on Information Theory

TL;DR: A novel caching scheme is proposed, which strictly improves the state of the art by exploiting commonality among user demands and fully characterize the rate-memory tradeoff for a decentralized setting, in which users fill out their cache content without any coordination.

...read moreread less

Abstract: We consider a basic cache network, in which a single server is connected to multiple users via a shared bottleneck link. The server has a database of files (content). Each user has an isolated memory that can be used to cache content in a prefetching phase. In a following delivery phase, each user requests a file from the database, and the server needs to deliver users’ demands as efficiently as possible by taking into account their cache contents. We focus on an important and commonly used class of prefetching schemes, where the caches are filled with uncoded data. We provide the exact characterization of the rate-memory tradeoff for this problem, by deriving both the minimum average rate (for a uniform file popularity) and the minimum peak rate required on the bottleneck link for a given cache size available at each user. In particular, we propose a novel caching scheme, which strictly improves the state of the art by exploiting commonality among user demands. We then demonstrate the exact optimality of our proposed scheme through a matching converse, by dividing the set of all demands into types, and showing that the placement phase in the proposed caching scheme is universally optimal for all types. Using these techniques, we also fully characterize the rate-memory tradeoff for a decentralized setting, in which users fill out their cache content without any coordination.

...read moreread less

378 citations

Proceedings Article•DOI•

Joint Service Caching and Task Offloading for Mobile Edge Computing in Dense Networks

[...]

Jie Xu¹, Lixing Chen¹, Pan Zhou²•Institutions (2)

University of Miami¹, Huazhong University of Science and Technology²

16 Apr 2018

TL;DR: In this paper, the authors investigated the problem of dynamic service caching in MEC-enabled dense cellular networks and proposed an efficient online algorithm, called OREO, which jointly optimizes service caching and task offloading to address service heterogeneity, unknown system dynamics, spatial demand coupling and decentralized coordination.

...read moreread less

Abstract: Mobile Edge Computing (MEC) pushes computing functionalities away from the centralized cloud to the network edge, thereby meeting the latency requirements of many emerging mobile applications and saving backhaul network bandwidth. Although many existing works have studied computation of-floading policies, service caching is an equally, if not more important, design topic of MEC, yet receives much less attention. Service caching refers to caching application services and their related databases/libraries in the edge server (e.g. MEC-enabled BS), thereby enabling corresponding computation tasks to be executed. Because only a small number of application services can be cached in resource-limited edge server at the same time, which services to cache has to be judiciously decided to maximize the edge computing performance. In this paper, we investigate the extremely compelling but much less studied problem of dynamic service caching in MEC-enabled dense cellular networks. We propose an efficient online algorithm, called OREO, which jointly optimizes dynamic service caching and task offloading to address a number of key challenges in MEC systems, including service heterogeneity, unknown system dynamics, spatial demand coupling and decentralized coordination. Our algorithm is developed based on Lyapunov optimization and Gibbs sampling, works online without requiring future information, and achieves provable close-to-optimal performance. Simulation results show that our algorithm can effectively reduce computation latency for end users while keeping energy consumption low.

...read moreread less

326 citations

Posted Content•

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

[...]

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses, Sven Verdoolaege, Andrew Adams, Albert Cohen - Show less +5 more

13 Feb 2018-arXiv: Programming Languages

TL;DR: A language close to the mathematics of deep learning called Tensor Comprehensions offering both imperative and declarative styles, a polyhedral Just-In-Time compiler to convert a mathematical description of a deep learning DAG into a CUDA kernel with delegated memory management and synchronization, and a compilation cache populated by an autotuner are contributed.

...read moreread less

Abstract: Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding, ranking user preferences, ad placement, etc Competing frameworks for building these networks such as TensorFlow, Chainer, CNTK, Torch/PyTorch, Caffe1/2, MXNet and Theano, explore different tradeoffs between usability and expressiveness, research or production orientation and supported hardware They operate on a DAG of computational operators, wrapping high-performance libraries such as CUDNN (for NVIDIA GPUs) or NNPACK (for various CPUs), and automate memory allocation, synchronization, distribution Custom operators are needed where the computation does not fit existing high-performance library calls, usually at a high engineering cost This is frequently required when new operators are invented by researchers: such operators suffer a severe performance penalty, which limits the pace of innovation Furthermore, even if there is an existing runtime call these frameworks can use, it often doesn't offer optimal performance for a user's particular network architecture and dataset, missing optimizations between operators as well as optimizations that can be done knowing the size and shape of data Our contributions include (1) a language close to the mathematics of deep learning called Tensor Comprehensions, (2) a polyhedral Just-In-Time compiler to convert a mathematical description of a deep learning DAG into a CUDA kernel with delegated memory management and synchronization, also providing optimizations such as operator fusion and specialization for specific sizes, (3) a compilation cache populated by an autotuner [Abstract cutoff]

...read moreread less

318 citations

Posted Content•

Joint Service Caching and Task Offloading for Mobile Edge Computing in Dense Networks

[...]

Jie Xu¹, Lixing Chen¹, Pan Zhou²•Institutions (2)

University of Miami¹, Huazhong University of Science and Technology²

17 Jan 2018-arXiv: Distributed, Parallel, and Cluster Computing

...read moreread less

Abstract: Mobile Edge Computing (MEC) pushes computing functionalities away from the centralized cloud to the network edge, thereby meeting the latency requirements of many emerging mobile applications and saving backhaul network bandwidth. Although many existing works have studied computation offloading policies, service caching is an equally, if not more important, design topic of MEC, yet receives much less attention. Service caching refers to caching application services and their related databases/libraries in the edge server (e.g. MEC-enabled BS), thereby enabling corresponding computation tasks to be executed. Because only a small number of application services can be cached in resource-limited edge server at the same time, which services to cache has to be judiciously decided to maximize the edge computing performance. In this paper, we investigate the extremely compelling but much less studied problem of dynamic service caching in MEC-enabled dense cellular networks. We propose an efficient online algorithm, called OREO, which jointly optimizes dynamic service caching and task offloading to address a number of key challenges in MEC systems, including service heterogeneity, unknown system dynamics, spatial demand coupling and decentralized coordination. Our algorithm is developed based on Lyapunov optimization and Gibbs sampling, works online without requiring future information, and achieves provable close-to-optimal performance. Simulation results show that our algorithm can effectively reduce computation latency for end users while keeping energy consumption low.

...read moreread less

249 citations

Proceedings Article•DOI•

DNNBuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAs

[...]

Xiaofan Zhang¹, Junsong Wang², Chao Zhu², Yonghua Lin², Jinjun Xiong², Wen-mei W. Hwu¹, Deming Chen¹ - Show less +3 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, IBM²

05 Nov 2018

TL;DR: DNNBuilder, an automatic design space exploration tool to generate optimized parallelism guidelines by considering external memory access bandwidth, data reuse behaviors, FPGA resource availability, and DNN complexity, is designed and demonstrated.

...read moreread less

Abstract: Building a high-performance EPGA accelerator for Deep Neural Networks (DNNs) often requires RTL programming, hardware verification, and precise resource allocation, all of which can be time-consuming and challenging to perform even for seasoned FPGA developers. To bridge the gap between fast DNN construction in software (e.g., Caffe, TensorFlow) and slow hardware implementation, we propose DNNBuilder for building high-performance DNN hardware accelerators on FPGAs automatically. Novel techniques are developed to meet the throughput and latency requirements for both cloud- and edge-devices. A number of novel techniques including high-quality RTL neural network components, a fine-grained layer-based pipeline architecture, and a column-based cache scheme are developed to boost throughput, reduce latency, and save FPGA on-chip memory. To address the limited resource challenge, we design an automatic design space exploration tool to generate optimized parallelism guidelines by considering external memory access bandwidth, data reuse behaviors, FPGA resource availability, and DNN complexity. DNNBuilder is demonstrated on four DNNs (Alexnet, ZF, VGG16, and YOLO) on two FPGAs (XC7Z045 and KU115) corresponding to the edge- and cloud-computing, respectively. The fine-grained layer-based pipeline architecture and the column-based cache scheme contribute to 7.7x and 43x reduction of the latency and BRAM utilization compared to conventional designs. We achieve the best performance (up to 5.15x faster) and efficiency (up to 5.88x more efficient) compared to published FPGA-based classification-oriented DNN accelerators for both edge and cloud computing cases. We reach 4218 GOPS for running object detection DNN which is the highest throughput reported to the best of our knowledge. DNNBuilder can provide millisecond-scale real-time performance for processing HD video input and deliver higher efficiency (up to 4.35x) than the GPU-based solutions.

...read moreread less

244 citations

Journal Article•DOI•

Optimal and Scalable Caching for 5G Using Reinforcement Learning of Space-Time Popularities

[...]

Alireza Sadeghi¹, Fatemeh Sheikholeslami¹, Georgios B. Giannakis¹•Institutions (1)

University of Minnesota¹

01 Feb 2018-IEEE Journal of Selected Topics in Signal Processing

TL;DR: In this paper, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown, providing a simple, yet practical asynchronous caching approach.

...read moreread less

Abstract: Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this paper, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus, enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.

...read moreread less

241 citations

Journal Article•DOI•

Efficient and Secure Service-Oriented Authentication Supporting Network Slicing for 5G-Enabled IoT

[...]

Jianbing Ni¹, Xiaodong Lin², Xuemin Sherman Shen¹•Institutions (2)

University of Waterloo¹, Wilfrid Laurier University²

12 Mar 2018-IEEE Journal on Selected Areas in Communications

TL;DR: An efficient and secure service-oriented authentication framework supporting network slicing and fog computing for 5G-enabled IoT services is proposed and session keys are negotiated among users, local fogs and IoT servers to guarantee secure access of service data in fog cache and remote servers with low latency.

...read moreread less

Abstract: 5G network is considered as a key enabler in meeting continuously increasing demands for the future Internet of Things (IoT) services, including high data rate, numerous devices connection, and low service latency. To satisfy these demands, network slicing and fog computing have been envisioned as the promising solutions in service-oriented 5G architecture. However, security paradigms enabling authentication and confidentiality of 5G communications for IoT services remain elusive, but indispensable. In this paper, we propose an efficient and secure service-oriented authentication framework supporting network slicing and fog computing for 5G-enabled IoT services. Specifically, users can efficiently establish connections with 5G core network and anonymously access IoT services under their delegation through proper network slices of 5G infrastructure selected by fog nodes based on the slice/service types of accessing services. The privacy-preserving slice selection mechanism is introduced to preserve both configured slice types and accessing service types of users. In addition, session keys are negotiated among users, local fogs and IoT servers to guarantee secure access of service data in fog cache and remote servers with low latency. We evaluate the performance of the proposed framework through simulations to demonstrate its efficiency and feasibility under 5G infrastructure.

...read moreread less

228 citations

Proceedings Article•DOI•

Neural cache: bit-serial in-cache acceleration of deep neural networks

[...]

Charles Eckert¹, Xiaowei Wang¹, Jingcheng Wang¹, Arun Subramaniyan¹, Ravi Iyer², Dennis Sylvester¹, David Blaaauw¹, Reetuparna Das¹ - Show less +4 more•Institutions (2)

University of Michigan¹, Intel²

02 Jun 2018

TL;DR: The Neural Cache architecture as mentioned in this paper re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for deep neural networks, which is capable of fully executing convolutional, fully connected, and pooling layers in-cache.

...read moreread less

Abstract: This paper presents the Neural Cache architecture, which re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks. Techniques to do in-situ arithmetic in SRAM arrays, create efficient data mapping and reducing data movement are proposed. The Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in-cache. The proposed architecture also supports quantization in-cache. Our experimental results show that the proposed architecture can improve inference latency by 18.3X over state-of-art multi-core CPU (Xeon E5), 7.7X over server class GPU (Titan Xp), for Inception v3 model. Neural Cache improves inference throughput by 12.4X over CPU (2.2X over GPU), while reducing power consumption by 50% over CPU (53% over GPU).

...read moreread less

215 citations

Proceedings Article•

Translation leak-aside buffer : Defeating cache side-channel protections with TLB attacks

[...]

Ben Gras¹, Kaveh Razavi¹, Herbert Bos¹, Cristiano Giuffrida¹•Institutions (1)

VU University Amsterdam¹

01 Aug 2018

TL;DR: It is shown for the first time that hardware translation lookaside buffers (TLBs) can be abused to leak fine-grained information about a victim's activity even when CPU cache activity is guarded by state-of-the-art cache side-channel protections, such as CAT and TSX.

...read moreread less

Abstract: To stop side channel attacks on CPU caches that have allowed attackers to leak secret information and break basic security mechanisms, the security community has developed a variety of powerful defenses that effectively isolate the security domains Of course, other shared hardware resources exist, but the assumption is that unlike cache side channels, any channel offered by these resources is insufficiently reliable and too coarse-grained to leak general-purpose information This is no longer true In this paper, we revisit this assumption and show for the first time that hardware translation lookaside buffers (TLBs) can be abused to leak fine-grained information about a victim's activity even when CPU cache activity is guarded by state-of-the-art cache side-channel protections, such as CAT and TSX However, exploiting the TLB channel is challenging, due to unknown addressing functions inside the TLB and the attacker's limited monitoring capabilities which, at best, cover only the victim's coarse-grained data accesses To address the former, we reverse engineer the previously unknown addressing function in recent Intel processors To address the latter, we devise a machine learning strategy that exploits high-resolution temporal features about a victim's memory activity Our prototype implementation, TLBleed, can leak a 256-bit EdDSA secret key from a single capture after 17 seconds of computation time with a 98% success rate, even in presence of state-of-the-art cache isolation Similarly, using a single capture, TLBleed reconstructs 92% of RSA keys from an implementation that is hardened against FLUSH+RELOAD attacks

...read moreread less

208 citations

Journal Article•DOI•

Cooperative Content Caching in 5G Networks with Mobile Edge Computing

[...]

Ke Zhang, Supeng Leng¹, Yejun He², Sabita Maharjan³, Yan Zhang³ - Show less +1 more•Institutions (3)

University of Electronic Science and Technology of China¹, Huazhong University of Science and Technology², University of Oslo³

04 Jul 2018-IEEE Wireless Communications

TL;DR: This work proposes a new cooperative edge caching architecture for 5G networks, where mobile edge computing resources are utilized for enhancing edge caching capability and introduces a new vehicular caching cloud concept, and proposes a vehicle-aided edge caching scheme.

...read moreread less

Abstract: Along with modern wireless networks being content-centric, the demand for rich multimedia services has been growing at a tremendous pace, which brings significant challenges to mobile networks in terms of the need for massive content delivery. Edge caching has emerged as a promising approach to alleviate the heavy burden on data transmission through caching and forwarding contents at the edge of networks. However, existing studies always treat storage and computing resources separately, and neglect the mobility characteristic of both the content caching nodes and end users. Driven by these issues, in this work, we propose a new cooperative edge caching architecture for 5G networks, where mobile edge computing resources are utilized for enhancing edge caching capability. In the architecture, we focus on mobility-aware hierarchical caching, where smart vehicles are taken as collaborative caching agents for sharing content cache tasks with base stations. To further utilize the caching resource of smart vehicles, we introduce a new vehicular caching cloud concept, and propose a vehicle-aided edge caching scheme, where the caching and computing resources at the wireless network edge are jointly scheduled. Numerical results indicate that the proposed scheme minimizes content access latency and improves caching resource utilization.

...read moreread less

Proceedings Article•DOI•

CEASER: mitigating conflict-based cache attacks via encrypted-address and remapping

[...]

Moinuddin K. Qureshi¹•Institutions (1)

Georgia Institute of Technology¹

20 Oct 2018

TL;DR: This paper provides the key insight that randomized mapping can be accomplished efficiently by accessing the cache with an encrypted address, as encryption would cause the lines that map to the same set of a conventional cache to get scattered to different sets.

...read moreread less

Abstract: Modern processors share the last-level cache between all the cores to efficiently utilize the cache space. Unfortunately, such sharing makes the cache vulnerable to attacks whereby an adversary can infer the access pattern of a co-running application by carefully orchestrating evictions using cache conflicts. Conflict-based attacks can be mitigated by randomizing the location of the lines in the cache. Unfortunately, prior proposals for randomized mapping require storage-intensive tables and are effective only if the OS can classify the applications into protected and unprotected groups. The goal of this paper is to mitigate conflict-based attacks while incurring negligible storage and performance overheads, and without relying on OS support. This paper provides the key insight that randomized mapping can be accomplished efficiently by accessing the cache with an encrypted address, as encryption would cause the lines that map to the same set of a conventional cache to get scattered to different sets. This paper proposes CEASE, a design that uses Low-Latency Block-Cipher (LLBC) to translate the physical line-address into an encrypted line-address, and accesses the cache with this encrypted line-address. We analyze efficient designs for LLBC that can perform encryption and decryption within two cycles. We also propose CEASER, a design that periodically changes the encryption key and performs dynamic-remapping to improve robustness. CEASER provides strong security (tolerates 100+ years of attack), has low performance overhead (1% slowdown), requires a storage overhead of less than 24 bytes for the newly added structures, and does not need any OS support.

...read moreread less

Journal Article•DOI•

Learn to Cache: Machine Learning for Network Edge Caching in the Big Data Era

[...]

Zheng Chang¹, Lei Lei², Zhenyu Zhou³, Shiwen Mao⁴, Tapani Ristaniemi¹ - Show less +1 more•Institutions (4)

University of Jyväskylä¹, University of Luxembourg², North China Electric Power University³, Auburn University⁴

04 Jul 2018-IEEE Wireless Communications

TL;DR: Big data analytics to advance edge caching capability is proposed, which is considered as a promising approach to improve network efficiency and alleviate the high demand for the radio resource in future networks.

...read moreread less

Abstract: The unprecedented growth of wireless data traffic not only challenges the design and evolution of the wireless network architecture, but also brings about profound opportunities to drive and improve future networks. Meanwhile, the evolution of communications and computing technologies can make the network edge, such as BSs or UEs, become intelligent and rich in terms of computing and communications capabilities, which intuitively enables big data analytics at the network edge. In this article, we propose to explore big data analytics to advance edge caching capability, which is considered as a promising approach to improve network efficiency and alleviate the high demand for the radio resource in future networks. The learning-based approaches for network edge caching are discussed, where a vast amount of data can be harnessed for content popularity estimation and proactive caching strategy design. An outlook of research directions, challenges, and opportunities is provided and discussed in depth. To validate the proposed solution, a case study and a performance evaluation are presented. Numerical studies show that several gains are achieved by employing learning- based schemes for edge caching.

...read moreread less

Proceedings Article•DOI•

DAWG: a defense against cache timing attacks in speculative execution processors

[...]

Vladimir Kiriansky¹, Ilia Lebedev¹, Saman Amarasinghe¹, Srinivas Devadas¹, Joel Emer² - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Nvidia²

20 Oct 2018

TL;DR: DAWG as mentioned in this paper is a generic mechanism for secure way partitioning of set associative structures including memory caches, which can be implemented on a processor with minimal modifications to modern operating systems.

...read moreread less

Abstract: Software side channel attacks have become a serious concern with the recent rash of attacks on speculative processor architectures. Most attacks that have been demonstrated exploit the cache tag state as their exfiltration channel. While many existing defense mechanisms that can be implemented solely in software have been proposed, these mechanisms appear to patch specific attacks, and can be circumvented. In this paper, we propose minimal modifications to hardware to defend against a broad class of attacks, including those based on speculation, with the goal of eliminating the entire attack surface associated with the cache state covert channel. We propose DAWG, Dynamically Allocated Way Guard, a generic mechanism for secure way partitioning of set associative structures including memory caches. DAWG endows a set associative structure with a notion of protection domains to provide strong isolation. When applied to a cache, unlike existing quality of service mechanisms such as Intel's Cache Allocation Technology (CAT), DAWG fully isolates hits, misses, and metadata updates across protection domains. We describe how DAWG can be implemented on a processor with minimal modifications to modern operating systems. We describe a non-interference property that is orthogonal to speculative execution and therefore argue that existing attacks such as Spectre Variant 1 and 2 will not work on a system equipped with DAWG. Finally, we evaluate the performance impact of DAWG on the cache subsystem.

...read moreread less

Proceedings Article•DOI•

DeepCache: Principled Cache for Mobile Deep Vision

[...]

Mengwei Xu¹, Mengze Zhu¹, Yunxin Liu², Felix Xiaozhu Lin³, Xuanzhe Liu¹ - Show less +1 more•Institutions (3)

Peking University¹, Microsoft², Purdue University³

15 Oct 2018

TL;DR: DeepCache as mentioned in this paper proposes a principled cache design for deep learning inference in continuous mobile vision, which benefits model execution efficiency by exploiting temporal locality in input video streams and propagates regions of reusable results by exploiting the model's internal structure.

...read moreread less

Abstract: We present DeepCache, a principled cache design for deep learning inference in continuous mobile vision. DeepCache benefits model execution efficiency by exploiting temporal locality in input video streams. It addresses a key challenge raised by mobile vision: the cache must operate under video scene variation, while trading off among cacheability, overhead, and loss in model accuracy. At the input of a model, DeepCache discovers video temporal locality by exploiting the video's internal structure, for which it borrows proven heuristics from video compression; into the model, DeepCache propagates regions of reusable results by exploiting the model's internal structure. Notably, DeepCache eschews applying video heuristics to model internals which are not pixels but high-dimensional, difficult-to-interpret data. Our implementation of DeepCache works with unmodified deep learning models, requires zero developer's manual effort, and is therefore immediately deployable on off-the-shelf mobile devices. Our experiments show that DeepCache saves inference execution time by 18% on average and up to 47%. DeepCache reduces system energy consumption by 20% on average.

...read moreread less

Posted Content•

DAWG: A Defense Against Cache Timing Attacks in Speculative Execution Processors.

[...]

Vladimir Kiriansky¹, Ilia Lebedev¹, Saman Amarasinghe¹, Srinivas Devadas¹, Joel Emer² - Show less +1 more•Institutions (2)

Massachusetts Institute of Technology¹, Nvidia²

01 Jan 2018-IACR Cryptology ePrint Archive

TL;DR: DAWG, Dynamically Allocated Way Guard, a generic mechanism for secure way partitioning of set associative structures including memory caches, is proposed, which describes a non-interference property that is orthogonal to speculative execution and therefore argues that existing attacks such as Spectre Variant 1 and 2 will not work on a system equipped with DAWG.

...read moreread less

Proceedings Article•

Varys: protecting SGX enclaves from practical side-channel attacks

[...]

Oleksii Oleksenko¹, Bohdan Trach¹, Robert Krahn¹, Andre Martin¹, Christof Fetzer¹, Mark Silberstein² - Show less +2 more•Institutions (2)

Dresden University of Technology¹, Technion – Israel Institute of Technology²

11 Jul 2018

TL;DR: Varys fully protects against all L1/L2 cache timing attacks and significantly raises the bar for page table side-channel attacks and proposes a set of minor hardware extensions that hold the potential to extend Varies' security guarantees to L3 cache and further improve its performance.

...read moreread less

Abstract: Numerous recent works have experimentally shown that Intel Software Guard Extensions (SGX) are vulnerable to cache timing and page table side-channel attacks which could be used to circumvent the data confidentiality guarantees provided by SGX. Existing mechanisms that protect against these attacks either incur high execution costs, are ineffective against certain attack variants, or require significant code modifications. We present Varys, a system that protects unmodified programs running in SGX enclaves from cache timing and page table side-channel attacks. Varys takes a pragmatic approach of strict reservation of physical cores to security-sensitive threads, thereby preventing the attacker from accessing shared CPU resources during enclave execution. The key challenge that we are addressing is that of maintaining the core reservation in the presence of an untrusted OS. Varys fully protects against all L1/L2 cache timing attacks and significantly raises the bar for page table side-channel attacks--all with only 15% overhead on average for Phoenix and PARSEC benchmarks. Additionally, we propose a set of minor hardware extensions that hold the potential to extend Varys' security guarantees to L3 cache and further improve its performance.

...read moreread less

Journal Article•DOI•

Learning to Remember Translation History with a Continuous Cache

[...]

Zhaopeng Tu¹, Yang Liu², Shuming Shi¹, Tong Zhang¹•Institutions (2)

Tencent¹, Tsinghua University²

02 Jul 2018-Transactions of the Association for Computational Linguistics

TL;DR: The authors propose to augment NMT models with a very light-weight cache-like memory network, which stores recent hidden representations as translation history, and the probability distribution over generated words is updated online depending on the translation history retrieved from the memory.

...read moreread less

Abstract: Existing neural machine translation (NMT) models generally translate sentences in isolation, missing the opportunity to take advantage of document-level information. In this work, we propose to augment NMT models with a very light-weight cache-like memory network, which stores recent hidden representations as translation history. The probability distribution over generated words is updated online depending on the translation history retrieved from the memory, endowing NMT models with the capability to dynamically adapt over time. Experiments on multiple domains with different topics and styles show the effectiveness of the proposed approach with negligible impact on the computational cost.

...read moreread less

Journal Article•DOI•

Overcoming Endurance Issue: UAV-Enabled Communications With Proactive Caching

[...]

Xiaoli Xu¹, Yong Zeng², Yong Liang Guan¹, Rui Zhang²•Institutions (2)

Nanyang Technological University¹, National University of Singapore²

07 Jun 2018-IEEE Journal on Selected Areas in Communications

TL;DR: In this article, the authors proposed a proactive caching scheme for UAV-enabled content-centric communication systems, where a UAV is dispatched to serve a group of ground nodes (GNs) with random and asynchronous requests for files drawn from a given set.

...read moreread less

Abstract: Wireless communication enabled by unmanned aerial vehicles (UAVs) has emerged as an appealing technology for many application scenarios in future wireless systems. However, the limited endurance of UAVs greatly hinders the practical implementation of UAV-enabled communications. To overcome this issue, this paper proposes a novel scheme for UAV-enabled communications by utilizing the promising technique of proactive caching at the users. Specifically, we focus on content-centric communication systems, where a UAV is dispatched to serve a group of ground nodes (GNs) with random and asynchronous requests for files drawn from a given set. With the proposed scheme, at the beginning of each operation period, the UAV pro-actively transmits the files to a subset of selected GNs that cooperatively cache all the files. As a result, when requested, a file can be retrieved by each GN either directly from its local cache or from its nearest neighbor that has cached the file via device-to-device communications. It is revealed that there exists a fundamental trade-off between the file caching cost , which is the total time required for the UAV to transmit the files to their designated caching GNs, and the file retrieval cost , which is the average time required for serving one file request. To characterize this trade-off, we formulate an optimization problem to minimize the weighted sum of the two costs, via jointly designing the file caching policy, the UAV trajectory, and communication scheduling. As the formulated problem is NP-hard in general, we propose efficient algorithms to find high-quality approximate solutions for it. Numerical results are provided to corroborate our study and show the great potential of proactive caching for overcoming the endurance issue in UAV-enabled communications.

...read moreread less

Journal Article•DOI•

Caching in Information-Centric Networking: Strategies, Challenges, and Future Research Directions

[...]

Ikram Ud Din¹, Suhaidi Hassan¹, Muhammad Khurram Khan², Mohsen Guizani³, Osman Ghazali¹, Adib Habbal¹ - Show less +2 more•Institutions (3)

Universiti Utara Malaysia¹, King Saud University², University of Idaho³

01 Jan 2018-IEEE Communications Surveys and Tutorials

TL;DR: A survey of cache management strategies in ICN is presented along with their contributions and limitations, and their performance is evaluated in a simulation network environment with respect to cache hit, stretch ratio, and eviction operations.

...read moreread less

Abstract: Information-Centric Networking (ICN) is an appealing architecture that has received a remarkable interest from the research community thanks to its friendly structure. Several projects have proposed innovative ICN models to cope with the Internet practice, which moves from host-centrism to receiver-driven communication. A worth mentioning component of these novel models is in-network caching, which provides flexibility and pervasiveness for the upturn of swiftness in data distribution. Because of the rapid Internet traffic growth, cache deployment and content caching have been unanimously accepted as conspicuous ICN issues to be resolved. In this article, a survey of cache management strategies in ICN is presented along with their contributions and limitations, and their performance is evaluated in a simulation network environment with respect to cache hit, stretch ratio, and eviction operations. Some unresolved ICN caching challenges and directions for future research in this networking area are also discussed.

...read moreread less

Journal Article•DOI•

QoE-Driven Mobile Edge Caching Placement for Adaptive Video Streaming

[...]

Chenglin Li¹, Laura Toni¹, Junni Zou², Hongkai Xiong², Pascal Frossard¹ - Show less +1 more•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Shanghai Jiao Tong University²

01 Apr 2018-IEEE Transactions on Multimedia

TL;DR: The proposed optimization framework reveals the caching performance upper bound for general adaptive video streaming systems, while the proposed algorithm provides some design guidelines for the edge servers to select the cached representations in practice based on both the video popularity and content information.

...read moreread less

Abstract: Caching at mobile edge servers can smooth temporal traffic variability and reduce the service load of base stations in mobile video delivery. However, the assignment of multiple video representations to distributed servers is still a challenging question in the context of adaptive streaming, since any two representations from different videos or even from the same video will compete for the limited caching storage. Therefore, it is important, yet challenging, to optimally select the cached representations for each edge server in order to effectively reduce the service load of base station while maintaining a high quality of experience (QoE) for users. To address this, we study a QoE-driven mobile edge caching placement optimization problem for dynamic adaptive video streaming that properly takes into account the different rate-distortion (R–D) characteristics of videos and the coordination among distributed edge servers. Then, by the optimal caching placement of representations for multiple videos, we maximize the aggregate average video distortion reduction of all users while minimizing the additional cost of representation downloading from the base station, subject not only to the storage capacity constraints of the edge servers, but also to the transmission and initial startup delay constraints of the users. We formulate the proposed optimization problem as an integer linear program to provide the performance upper bound, and as a submodular maximization problem with a set of knapsack constraints to develop a practically feasible cost benefit greedy algorithm. The proposed algorithm has polynomial computational complexity and a theoretical lower bound on its performance. Simulation results further show that the proposed algorithm is able to achieve a near-optimal performance with very low time complexity. Therefore, the proposed optimization framework reveals the caching performance upper bound for general adaptive video streaming systems, while the proposed algorithm provides some design guidelines for the edge servers to select the cached representations in practice based on both the video popularity and content information.

...read moreread less

Journal Article•DOI•

Computation Offloading With Data Caching Enhancement for Mobile Edge Computing

[...]

Shuai Yu¹, Rami Langar², Xiaoming Fu³, Li Wang⁴, Zhu Han⁵ - Show less +1 more•Institutions (5)

University of Paris¹, University of Marne-la-Vallée², University of Göttingen³, Beijing University of Posts and Telecommunications⁴, University of Houston⁵

10 Sep 2018-IEEE Transactions on Vehicular Technology

TL;DR: This paper proposes an optimal offloading with caching-enhancement scheme (OOCS) for femto-cloud scenario and mobile edge computing scenario, respectively, and considers the scenario where multiple mobile users offload duplicated computation tasks to the network edge, and share the computation results among them.

...read moreread less

Abstract: Computation offloading is a proven successful paradigm for enabling resource-intensive applications on mobile devices. Moreover, in view of emerging mobile collaborative application, the offloaded tasks can be duplicated when multiple users are in the same proximity. This motivates us to design a collaborative offloading scheme and cache the popular computation results that are likely to be reused by other mobile users. In this paper, we consider the scenario where multiple mobile users offload duplicated computation tasks to the network edge, and share the computation results among them. Our goal is to develop the optimal fine-grained collaborative offloading strategies with caching enhancements to minimize the overall execution delay at the mobile terminal side. To this end, we propose an optimal offloading with caching-enhancement scheme (OOCS) for femto-cloud scenario and mobile edge computing scenario, respectively. Simulation results show that compared to six alternative solutions in literature, our single-user OOCS can reduce execution delay up to $42.83$ % and $33.28$ % for single-user femto-cloud and single-user mobile edge computing, respectively. Our multi-user OOCS can further reduce $11.71$ % delay compared to single-user OOCS through users’ cooperation.

...read moreread less

Journal Article•DOI•

An Edge Caching Scheme to Distribute Content in Vehicular Networks

[...]

Zhou Su¹, Yilong Hui¹, Qichao Xu¹, Tingting Yang², Jianyi Liu³, Yunjian Jia⁴ - Show less +2 more•Institutions (4)

Shanghai University¹, Dalian Maritime University², Xi'an Jiaotong University³, Chongqing University⁴

09 Apr 2018-IEEE Transactions on Vehicular Technology

TL;DR: A cross-entropy-based dynamic content caching scheme is proposed accordingly to cache the contents at the edge of VCNs based on the requests of vehicles and the cooperation among RSUs and the performance of the proposed scheme is evaluated by extensive simulation experiments.

...read moreread less

Abstract: Vehicular content networks (VCNs), which distribute medium-volume contents to vehicles in a fully distributed manner, represent the key enabling technology of vehicular infotainment applications. In VCNs, the road-side units (RSUs) cache replicas of contents on the edge of networks to facilitate the timely content delivery to driving-through vehicles when requested. However, due to the limited storage at RSUs and soaring content size for distribution, RSUs can only selectively cache content replicas. The edge caching scheme in RSUs, therefore, becomes a fundamental issue in VCNs. This paper addresses the issue by developing an edge caching scheme in RSUs. Specifically, we first analyze the features of vehicular content requests based on the content access pattern, vehicle's velocity, and road traffic density. A model is then proposed to determine whether and where to obtain the replica of content when the moving vehicle requests it. After this, a cross-entropy-based dynamic content caching scheme is proposed accordingly to cache the contents at the edge of VCNs based on the requests of vehicles and the cooperation among RSUs. Finally, the performance of the proposed scheme is evaluated by extensive simulation experiments.

...read moreread less

Proceedings Article•DOI•

Two Freshness Metrics for Local Cache Refresh

[...]

Jing Zhong¹, Roy D. Yates¹, Emina Soljanin¹•Institutions (1)

Rutgers University¹

17 Jun 2018

TL;DR: The AoI optimal policy is derived, which depends only on the square root of the source popularity, and an AoS near-optimal rate allocation policy is proposed that is proportional to the cube root of both the source update rate and the sources popularity.

...read moreread less

Abstract: We consider a cache refresh system where a local server is connected to multiple remote sources and maintains local copies of the data items at the sources. The data at each source is updated randomly and independently without notifying the local server, while the local server refreshes the corresponding cached data periodically. The freshness of the local cache is measured by two different freshness metrics, age of synchronization (AoS) and age of information (AoI). We address the following problem: given a constrained total refresh rate, how does the local server allocate the refresh rate for each source to maintain overall data freshness? We derive the AoI optimal policy which depends only on the square root of the source popularity. For a large refresh rate, we propose an AoS near-optimal rate allocation policy that is proportional to the cube root of both the source update rate and the source popularity. For small refresh rates, we also prove that the square root law with respect to the popularity minimizes both AoS and AoI.

...read moreread less

Journal Article•DOI•

[...]

Wai-Xi Liu¹, Jie Zhang¹, Zhongwei Liang¹, Ling-Xi Peng¹, Cai Jun - Show less +1 more•Institutions (1)

Guangzhou University¹

01 Jan 2018-IEEE Access

TL;DR: A lightweight caching scheme that integrates cache placement and cache replacement—caching based on popularity prediction and cache capacity (CPC).

...read moreread less

Abstract: In information-centric networking, accurately predicting content popularity can improve the performance of caching. Therefore, based on software defined network (SDN), this paper proposes Deep-Learning-based Content Popularity Prediction (DLCPP) to achieve the popularity prediction. DLCPP adopts the switch’s computing resources and links in the SDN to build a distributed and reconfigurable deep learning network. For DLCPP, we initially determine the metrics that can reflect changes in content popularity. Second, each network node collects the spatial-temporal joint distribution data of these metrics. Then, the data are used as input to stacked auto-encoders (SAE) in DLCPP to extract the spatiotemporal features of popularity. Finally, we transform the popularity prediction into a multi-classification problem through discretizing the content popularity into multiple classifications. The Softmax classifier is used to achieve the content popularity prediction. Some challenges for DLCPP are also addressed, such as determining the structure of SAE, realizing the neuron function on an SDN switch, and deploying DLCPP on an OpenFlow-based SDN. At the same time, we propose a lightweight caching scheme that integrates cache placement and cache replacement—caching based on popularity prediction and cache capacity (CPC). Abundant experiments demonstrate good performance of DLCPP, and it achieves close to 2.1%~15% and 5.2%~40% accuracy improvements over neural networks and auto regressive, respectively. Benefitting from DLCPP’s better prediction accuracy, CPC can yield a steady improvement of caching performance over other dominant cache management frameworks.

...read moreread less

Proceedings Article•DOI•

Federated Learning Based Proactive Content Caching in Edge Computing

[...]

Zhengxin Yu¹, Jia Hu¹, Geyong Min¹, Haochuan Lu, Zhiwei Zhao², Haozhe Wang¹, Nektarios Georgalas³ - Show less +3 more•Institutions (3)

University of Exeter¹, Zhejiang University², BT Group³

01 Dec 2018

TL;DR: A Federated learning based Proactive Content Caching (FPCC) scheme, which does not require to gather users' data centrally for training, and which outperforms other learning-based caching algorithms such as m-epsilon-greedy and Thompson sampling in terms of cache efficiency.

...read moreread less

Abstract: Content caching is a promising approach in edge computing to cope with the explosive growth of mobile data on 5G networks, where contents are typically placed on local caches for fast and repetitive data access. Due to the capacity limit of caches, it is essential to predict the popularity of files and cache those popular ones. However, the fluctuated popularity of files makes the prediction a highly challenging task. To tackle this challenge, many recent works propose learning based approaches which gather the users' data centrally for training, but they bring a significant issue: users may not trust the central server and thus hesitate to upload their private data. In order to address this issue, we propose a Federated learning based Proactive Content Caching (FPCC) scheme, which does not require to gather users' data centrally for training. The FPCC is based on a hierarchical architecture in which the server aggregates the users' updates using federated averaging, and each user performs training on its local data using hybrid filtering on stacked autoencoders. The experimental results demonstrate that, without gathering user's private data, our scheme still outperforms other learning-based caching algorithms such as m-epsilon-greedy and Thompson sampling in terms of cache efficiency.

...read moreread less

Journal Article•DOI•

A Cooperative Caching Scheme Based on Mobility Prediction in Vehicular Content Centric Networks

[...]

Lin Yao¹, Ailun Chen¹, Jing Deng², Jianbang Wang³, Guowei Wu¹ - Show less +1 more•Institutions (3)

Dalian University of Technology¹, University of North Carolina at Greensboro², National University of Singapore³

01 Jun 2018-IEEE Transactions on Vehicular Technology

TL;DR: This paper proposes a scheme called cooperative caching based on mobility prediction (CCMP) for VCCNs, and designs a cache replacement based on content popularity to guarantee that only popular contents are cached at a set of mobile nodes that may visit the same hot spot areas repeatedly.

...read moreread less

Abstract: Vehicular content centric networks (VCCNs) emerge as a strong candidate to be deployed in information-rich applications of vehicular communications. Due to vehicles’ mobility, it becomes rather inefficient to establish end-to-end connections in VCCNs. Consequently, content packets are usually sent back to the requesting node via different paths in VCCNs. To improve network performance of VCCNs, node mobility should be exploited for vehicles to serve as relays and to carry data for delivery. In this paper, we propose a scheme called cooperative caching based on mobility prediction (CCMP) for VCCNs. The main idea of CCMP is to cache popular contents at a set of mobile nodes that may visit the same hot spot areas repeatedly. In our CCMP scheme, we use prediction based on partial matching to predict mobile nodes’ probability of reaching different hot spot regions based on their past trajectories. Vehicles with longer sojourn time in a hot region can provide more services and should be preferred as caching nodes. To solve the problem of limited buffer at each node, we design a cache replacement based on content popularity to guarantee that only popular contents are cached. We evaluate CCMP through the opportunistic network environment simulator for its salient features in success ratio and content access delay compared to other state-of-the-art schemes.

...read moreread less

Journal Article•DOI•

Hierarchical Edge Caching in Device-to-Device Aided Mobile Networks: Modeling, Optimization, and Design

[...]

Xiuhua Li¹, Xiaofei Wang², Peng-Jun Wan³, Zhu Han⁴, Victor C. M. Leung¹ - Show less +1 more•Institutions (4)

University of British Columbia¹, Tianjin University², Illinois Institute of Technology³, University of Houston⁴

06 Jun 2018-IEEE Journal on Selected Areas in Communications

TL;DR: This paper addresses the system modeling, large-scale optimization, and framework design of hierarchical edge caching in device-to-device aided mobile networks, and investigates the maximum capacity of the network infrastructure in terms of offloading network traffic, reducing system costs, and supporting content requests from mobile users locally.

...read moreread less

Abstract: The explosive growth of content requests from mobile users is stretching the capability of current mobile networking technologies to satisfy users’ demands with acceptable quality of service. An effective approach to address this challenge, which has not yet been thoroughly studied, is to offload network traffic by caching popular content at the edges (e.g., mobile devices and base stations) of mobile networks, thus reducing the massive duplication of content downloads. In this paper, we address the system modeling, large-scale optimization, and framework design of hierarchical edge caching in device-to-device aided mobile networks. In particular, taking into account the analysis of social behavior and preference of mobile users, heterogeneous cache sizes, and the derived system topology, we investigate the maximum capacity of the network infrastructure in terms of offloading network traffic, reducing system costs, and supporting content requests from mobile users locally. Our proposed framework has a low complexity and can be applied in practical engineering implementation. Trace-based simulation results demonstrate the effectiveness of the proposed framework.

...read moreread less

Proceedings Article•DOI•

DeepCache: A Deep Learning Based Framework For Content Caching

[...]

Arvind Narayanan¹, Saurabh Verma¹, Eman Ramadan¹, Pariya Babaie¹, Zhi-Li Zhang¹ - Show less +1 more•Institutions (1)

University of Minnesota¹

07 Aug 2018

TL;DR: DDEEPCACHE Framework is a novel Framework for content caching, which can significantly boost cache performance and apply to existing cache policies, such as LRU and k-LRU, significantly boosts the number of cache hits.

...read moreread less

Abstract: In this paper, we present DEEPCACHE a novel Framework for content caching, which can significantly boost cache performance. Our Framework is based on powerful deep recurrent neural network models. It comprises of two main components: i) Object Characteristics Predictor, which builds upon deep LSTM Encoder-Decoder model to predict the future characteristics of an object (such as object popularity) -- to the best of our knowledge, we are the first to propose LSTM Encoder-Decoder model for content caching; ii) a caching policy component, which accounts for predicted information of objects to make smart caching decisions. In our thorough experiments, we show that applying DEEPCACHE Framework to existing cache policies, such as LRU and k-LRU, significantly boosts the number of cache hits.

...read moreread less

Proceedings Article•DOI•

Reducing DRAM footprint with NVM in Facebook

[...]

Assaf Eisenman¹, Darryl Gardner², Islam Farid Hamed AbdelRahman², Jens Axboe², Siying Dong², Kim Hazelwood², Chris Petersen², Asaf Cidon¹, Sachin Katti¹ - Show less +5 more•Institutions (2)

Stanford University¹, Facebook²

23 Apr 2018

TL;DR: This work designs a key-value store, MyNVM, which leverages an NVM block device to reduce DRAM usage, and to reduce the total cost of ownership, while providing comparable latency and queries-per-second as MyRocks on a server with a much larger amount of DRAM.

...read moreread less

Abstract: Popular SSD-based key-value stores consume a large amount of DRAM in order to provide high-performance database operations. However, DRAM can be expensive for data center providers, especially given recent global supply shortages that have resulted in increasing DRAM costs. In this work, we design a key-value store, MyNVM, which leverages an NVM block device to reduce DRAM usage, and to reduce the total cost of ownership, while providing comparable latency and queries-per-second (QPS) as MyRocks on a server with a much larger amount of DRAM. Replacing DRAM with NVM introduces several challenges. In particular, NVM has limited read bandwidth, and it wears out quickly under a high write bandwidth. We design novel solutions to these challenges, including using small block sizes with a partitioned index, aligning blocks post-compression to reduce read bandwidth, utilizing dictionary compression, implementing an admission control policy for which objects get cached in NVM to control its durability, as well as replacing interrupts with a hybrid polling mechanism. We implemented MyNVM and measured its performance in Facebook's production environment. Our implementation reduces the size of the DRAM cache from 96 GB to 16 GB, and incurs a negligible impact on latency and queries-per-second compared to MyRocks. Finally, to the best of our knowledge, this is the first study on the usage of NVM devices in a commercial data center environment.

...read moreread less

Collapse