scispace - formally typeset
Search or ask a question

Showing papers on "Cache published in 2018"


Posted Content
TL;DR: This paper describes practical attacks that combine methodology from side channel attacks, fault attacks, and return-oriented programming that can read arbitrary memory from the victim's process that violate the security assumptions underpinning numerous software security mechanisms.
Abstract: Modern processors use branch prediction and speculative execution to maximize performance. For example, if the destination of a branch depends on a memory value that is in the process of being read, CPUs will try guess the destination and attempt to execute ahead. When the memory value finally arrives, the CPU either discards or commits the speculative computation. Speculative logic is unfaithful in how it executes, can access to the victim's memory and registers, and can perform operations with measurable side effects. Spectre attacks involve inducing a victim to speculatively perform operations that would not occur during correct program execution and which leak the victim's confidential information via a side channel to the adversary. This paper describes practical attacks that combine methodology from side channel attacks, fault attacks, and return-oriented programming that can read arbitrary memory from the victim's process. More broadly, the paper shows that speculative execution implementations violate the security assumptions underpinning numerous software security mechanisms, including operating system process separation, static analysis, containerization, just-in-time (JIT) compilation, and countermeasures to cache timing/side-channel attacks. These attacks represent a serious threat to actual systems, since vulnerable speculative execution capabilities are found in microprocessors from Intel, AMD, and ARM that are used in billions of devices. While makeshift processor-specific countermeasures are possible in some cases, sound solutions will require fixes to processor designs as well as updates to instruction set architectures (ISAs) to give hardware architects and software developers a common understanding as to what computation state CPU implementations are (and are not) permitted to leak.

576 citations


Journal ArticleDOI
TL;DR: A novel caching scheme is proposed, which strictly improves the state of the art by exploiting commonality among user demands and fully characterize the rate-memory tradeoff for a decentralized setting, in which users fill out their cache content without any coordination.
Abstract: We consider a basic cache network, in which a single server is connected to multiple users via a shared bottleneck link. The server has a database of files (content). Each user has an isolated memory that can be used to cache content in a prefetching phase. In a following delivery phase, each user requests a file from the database, and the server needs to deliver users’ demands as efficiently as possible by taking into account their cache contents. We focus on an important and commonly used class of prefetching schemes, where the caches are filled with uncoded data. We provide the exact characterization of the rate-memory tradeoff for this problem, by deriving both the minimum average rate (for a uniform file popularity) and the minimum peak rate required on the bottleneck link for a given cache size available at each user. In particular, we propose a novel caching scheme, which strictly improves the state of the art by exploiting commonality among user demands. We then demonstrate the exact optimality of our proposed scheme through a matching converse, by dividing the set of all demands into types, and showing that the placement phase in the proposed caching scheme is universally optimal for all types. Using these techniques, we also fully characterize the rate-memory tradeoff for a decentralized setting, in which users fill out their cache content without any coordination.

378 citations


Proceedings ArticleDOI
16 Apr 2018
TL;DR: In this paper, the authors investigated the problem of dynamic service caching in MEC-enabled dense cellular networks and proposed an efficient online algorithm, called OREO, which jointly optimizes service caching and task offloading to address service heterogeneity, unknown system dynamics, spatial demand coupling and decentralized coordination.
Abstract: Mobile Edge Computing (MEC) pushes computing functionalities away from the centralized cloud to the network edge, thereby meeting the latency requirements of many emerging mobile applications and saving backhaul network bandwidth. Although many existing works have studied computation of-floading policies, service caching is an equally, if not more important, design topic of MEC, yet receives much less attention. Service caching refers to caching application services and their related databases/libraries in the edge server (e.g. MEC-enabled BS), thereby enabling corresponding computation tasks to be executed. Because only a small number of application services can be cached in resource-limited edge server at the same time, which services to cache has to be judiciously decided to maximize the edge computing performance. In this paper, we investigate the extremely compelling but much less studied problem of dynamic service caching in MEC-enabled dense cellular networks. We propose an efficient online algorithm, called OREO, which jointly optimizes dynamic service caching and task offloading to address a number of key challenges in MEC systems, including service heterogeneity, unknown system dynamics, spatial demand coupling and decentralized coordination. Our algorithm is developed based on Lyapunov optimization and Gibbs sampling, works online without requiring future information, and achieves provable close-to-optimal performance. Simulation results show that our algorithm can effectively reduce computation latency for end users while keeping energy consumption low.

326 citations


Posted Content
TL;DR: A language close to the mathematics of deep learning called Tensor Comprehensions offering both imperative and declarative styles, a polyhedral Just-In-Time compiler to convert a mathematical description of a deep learning DAG into a CUDA kernel with delegated memory management and synchronization, and a compilation cache populated by an autotuner are contributed.
Abstract: Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding, ranking user preferences, ad placement, etc Competing frameworks for building these networks such as TensorFlow, Chainer, CNTK, Torch/PyTorch, Caffe1/2, MXNet and Theano, explore different tradeoffs between usability and expressiveness, research or production orientation and supported hardware They operate on a DAG of computational operators, wrapping high-performance libraries such as CUDNN (for NVIDIA GPUs) or NNPACK (for various CPUs), and automate memory allocation, synchronization, distribution Custom operators are needed where the computation does not fit existing high-performance library calls, usually at a high engineering cost This is frequently required when new operators are invented by researchers: such operators suffer a severe performance penalty, which limits the pace of innovation Furthermore, even if there is an existing runtime call these frameworks can use, it often doesn't offer optimal performance for a user's particular network architecture and dataset, missing optimizations between operators as well as optimizations that can be done knowing the size and shape of data Our contributions include (1) a language close to the mathematics of deep learning called Tensor Comprehensions, (2) a polyhedral Just-In-Time compiler to convert a mathematical description of a deep learning DAG into a CUDA kernel with delegated memory management and synchronization, also providing optimizations such as operator fusion and specialization for specific sizes, (3) a compilation cache populated by an autotuner [Abstract cutoff]

318 citations


Posted Content
TL;DR: In this paper, the authors investigated the problem of dynamic service caching in MEC-enabled dense cellular networks and proposed an efficient online algorithm, called OREO, which jointly optimizes service caching and task offloading to address service heterogeneity, unknown system dynamics, spatial demand coupling and decentralized coordination.
Abstract: Mobile Edge Computing (MEC) pushes computing functionalities away from the centralized cloud to the network edge, thereby meeting the latency requirements of many emerging mobile applications and saving backhaul network bandwidth. Although many existing works have studied computation offloading policies, service caching is an equally, if not more important, design topic of MEC, yet receives much less attention. Service caching refers to caching application services and their related databases/libraries in the edge server (e.g. MEC-enabled BS), thereby enabling corresponding computation tasks to be executed. Because only a small number of application services can be cached in resource-limited edge server at the same time, which services to cache has to be judiciously decided to maximize the edge computing performance. In this paper, we investigate the extremely compelling but much less studied problem of dynamic service caching in MEC-enabled dense cellular networks. We propose an efficient online algorithm, called OREO, which jointly optimizes dynamic service caching and task offloading to address a number of key challenges in MEC systems, including service heterogeneity, unknown system dynamics, spatial demand coupling and decentralized coordination. Our algorithm is developed based on Lyapunov optimization and Gibbs sampling, works online without requiring future information, and achieves provable close-to-optimal performance. Simulation results show that our algorithm can effectively reduce computation latency for end users while keeping energy consumption low.

249 citations


Proceedings ArticleDOI
05 Nov 2018
TL;DR: DNNBuilder, an automatic design space exploration tool to generate optimized parallelism guidelines by considering external memory access bandwidth, data reuse behaviors, FPGA resource availability, and DNN complexity, is designed and demonstrated.
Abstract: Building a high-performance EPGA accelerator for Deep Neural Networks (DNNs) often requires RTL programming, hardware verification, and precise resource allocation, all of which can be time-consuming and challenging to perform even for seasoned FPGA developers. To bridge the gap between fast DNN construction in software (e.g., Caffe, TensorFlow) and slow hardware implementation, we propose DNNBuilder for building high-performance DNN hardware accelerators on FPGAs automatically. Novel techniques are developed to meet the throughput and latency requirements for both cloud- and edge-devices. A number of novel techniques including high-quality RTL neural network components, a fine-grained layer-based pipeline architecture, and a column-based cache scheme are developed to boost throughput, reduce latency, and save FPGA on-chip memory. To address the limited resource challenge, we design an automatic design space exploration tool to generate optimized parallelism guidelines by considering external memory access bandwidth, data reuse behaviors, FPGA resource availability, and DNN complexity. DNNBuilder is demonstrated on four DNNs (Alexnet, ZF, VGG16, and YOLO) on two FPGAs (XC7Z045 and KU115) corresponding to the edge- and cloud-computing, respectively. The fine-grained layer-based pipeline architecture and the column-based cache scheme contribute to 7.7x and 43x reduction of the latency and BRAM utilization compared to conventional designs. We achieve the best performance (up to 5.15x faster) and efficiency (up to 5.88x more efficient) compared to published FPGA-based classification-oriented DNN accelerators for both edge and cloud computing cases. We reach 4218 GOPS for running object detection DNN which is the highest throughput reported to the best of our knowledge. DNNBuilder can provide millisecond-scale real-time performance for processing HD video input and deliver higher efficiency (up to 4.35x) than the GPU-based solutions.

244 citations


Journal ArticleDOI
TL;DR: In this paper, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown, providing a simple, yet practical asynchronous caching approach.
Abstract: Small basestations (SBs) equipped with caching units have potential to handle the unprecedented demand growth in heterogeneous networks. Through low-rate, backhaul connections with the backbone, SBs can prefetch popular files during off-peak traffic hours, and service them to the edge at peak periods. To intelligently prefetch, each SB must learn what and when to cache, while taking into account SB memory limitations, the massive number of available contents, the unknown popularity profiles, as well as the space-time popularity dynamics of user file requests. In this paper, local and global Markov processes model user requests, and a reinforcement learning (RL) framework is put forth for finding the optimal caching policy when the transition probabilities involved are unknown. Joint consideration of global and local popularity demands along with cache-refreshing costs allow for a simple, yet practical asynchronous caching approach. The novel RL-based caching relies on a Q-learning algorithm to implement the optimal policy in an online fashion, thus, enabling the cache control unit at the SB to learn, track, and possibly adapt to the underlying dynamics. To endow the algorithm with scalability, a linear function approximation of the proposed Q-learning scheme is introduced, offering faster convergence as well as reduced complexity and memory requirements. Numerical tests corroborate the merits of the proposed approach in various realistic settings.

241 citations


Journal ArticleDOI
TL;DR: An efficient and secure service-oriented authentication framework supporting network slicing and fog computing for 5G-enabled IoT services is proposed and session keys are negotiated among users, local fogs and IoT servers to guarantee secure access of service data in fog cache and remote servers with low latency.
Abstract: 5G network is considered as a key enabler in meeting continuously increasing demands for the future Internet of Things (IoT) services, including high data rate, numerous devices connection, and low service latency. To satisfy these demands, network slicing and fog computing have been envisioned as the promising solutions in service-oriented 5G architecture. However, security paradigms enabling authentication and confidentiality of 5G communications for IoT services remain elusive, but indispensable. In this paper, we propose an efficient and secure service-oriented authentication framework supporting network slicing and fog computing for 5G-enabled IoT services. Specifically, users can efficiently establish connections with 5G core network and anonymously access IoT services under their delegation through proper network slices of 5G infrastructure selected by fog nodes based on the slice/service types of accessing services. The privacy-preserving slice selection mechanism is introduced to preserve both configured slice types and accessing service types of users. In addition, session keys are negotiated among users, local fogs and IoT servers to guarantee secure access of service data in fog cache and remote servers with low latency. We evaluate the performance of the proposed framework through simulations to demonstrate its efficiency and feasibility under 5G infrastructure.

228 citations


Proceedings ArticleDOI
02 Jun 2018
TL;DR: The Neural Cache architecture as mentioned in this paper re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for deep neural networks, which is capable of fully executing convolutional, fully connected, and pooling layers in-cache.
Abstract: This paper presents the Neural Cache architecture, which re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks. Techniques to do in-situ arithmetic in SRAM arrays, create efficient data mapping and reducing data movement are proposed. The Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in-cache. The proposed architecture also supports quantization in-cache. Our experimental results show that the proposed architecture can improve inference latency by 18.3X over state-of-art multi-core CPU (Xeon E5), 7.7X over server class GPU (Titan Xp), for Inception v3 model. Neural Cache improves inference throughput by 12.4X over CPU (2.2X over GPU), while reducing power consumption by 50% over CPU (53% over GPU).

215 citations


Proceedings Article
01 Aug 2018
TL;DR: It is shown for the first time that hardware translation lookaside buffers (TLBs) can be abused to leak fine-grained information about a victim's activity even when CPU cache activity is guarded by state-of-the-art cache side-channel protections, such as CAT and TSX.
Abstract: To stop side channel attacks on CPU caches that have allowed attackers to leak secret information and break basic security mechanisms, the security community has developed a variety of powerful defenses that effectively isolate the security domains Of course, other shared hardware resources exist, but the assumption is that unlike cache side channels, any channel offered by these resources is insufficiently reliable and too coarse-grained to leak general-purpose information This is no longer true In this paper, we revisit this assumption and show for the first time that hardware translation lookaside buffers (TLBs) can be abused to leak fine-grained information about a victim's activity even when CPU cache activity is guarded by state-of-the-art cache side-channel protections, such as CAT and TSX However, exploiting the TLB channel is challenging, due to unknown addressing functions inside the TLB and the attacker's limited monitoring capabilities which, at best, cover only the victim's coarse-grained data accesses To address the former, we reverse engineer the previously unknown addressing function in recent Intel processors To address the latter, we devise a machine learning strategy that exploits high-resolution temporal features about a victim's memory activity Our prototype implementation, TLBleed, can leak a 256-bit EdDSA secret key from a single capture after 17 seconds of computation time with a 98% success rate, even in presence of state-of-the-art cache isolation Similarly, using a single capture, TLBleed reconstructs 92% of RSA keys from an implementation that is hardened against FLUSH+RELOAD attacks

208 citations


Journal ArticleDOI
TL;DR: This work proposes a new cooperative edge caching architecture for 5G networks, where mobile edge computing resources are utilized for enhancing edge caching capability and introduces a new vehicular caching cloud concept, and proposes a vehicle-aided edge caching scheme.
Abstract: Along with modern wireless networks being content-centric, the demand for rich multimedia services has been growing at a tremendous pace, which brings significant challenges to mobile networks in terms of the need for massive content delivery. Edge caching has emerged as a promising approach to alleviate the heavy burden on data transmission through caching and forwarding contents at the edge of networks. However, existing studies always treat storage and computing resources separately, and neglect the mobility characteristic of both the content caching nodes and end users. Driven by these issues, in this work, we propose a new cooperative edge caching architecture for 5G networks, where mobile edge computing resources are utilized for enhancing edge caching capability. In the architecture, we focus on mobility-aware hierarchical caching, where smart vehicles are taken as collaborative caching agents for sharing content cache tasks with base stations. To further utilize the caching resource of smart vehicles, we introduce a new vehicular caching cloud concept, and propose a vehicle-aided edge caching scheme, where the caching and computing resources at the wireless network edge are jointly scheduled. Numerical results indicate that the proposed scheme minimizes content access latency and improves caching resource utilization.

Proceedings ArticleDOI
20 Oct 2018
TL;DR: This paper provides the key insight that randomized mapping can be accomplished efficiently by accessing the cache with an encrypted address, as encryption would cause the lines that map to the same set of a conventional cache to get scattered to different sets.
Abstract: Modern processors share the last-level cache between all the cores to efficiently utilize the cache space. Unfortunately, such sharing makes the cache vulnerable to attacks whereby an adversary can infer the access pattern of a co-running application by carefully orchestrating evictions using cache conflicts. Conflict-based attacks can be mitigated by randomizing the location of the lines in the cache. Unfortunately, prior proposals for randomized mapping require storage-intensive tables and are effective only if the OS can classify the applications into protected and unprotected groups. The goal of this paper is to mitigate conflict-based attacks while incurring negligible storage and performance overheads, and without relying on OS support. This paper provides the key insight that randomized mapping can be accomplished efficiently by accessing the cache with an encrypted address, as encryption would cause the lines that map to the same set of a conventional cache to get scattered to different sets. This paper proposes CEASE, a design that uses Low-Latency Block-Cipher (LLBC) to translate the physical line-address into an encrypted line-address, and accesses the cache with this encrypted line-address. We analyze efficient designs for LLBC that can perform encryption and decryption within two cycles. We also propose CEASER, a design that periodically changes the encryption key and performs dynamic-remapping to improve robustness. CEASER provides strong security (tolerates 100+ years of attack), has low performance overhead (1% slowdown), requires a storage overhead of less than 24 bytes for the newly added structures, and does not need any OS support.

Journal ArticleDOI
TL;DR: Big data analytics to advance edge caching capability is proposed, which is considered as a promising approach to improve network efficiency and alleviate the high demand for the radio resource in future networks.
Abstract: The unprecedented growth of wireless data traffic not only challenges the design and evolution of the wireless network architecture, but also brings about profound opportunities to drive and improve future networks. Meanwhile, the evolution of communications and computing technologies can make the network edge, such as BSs or UEs, become intelligent and rich in terms of computing and communications capabilities, which intuitively enables big data analytics at the network edge. In this article, we propose to explore big data analytics to advance edge caching capability, which is considered as a promising approach to improve network efficiency and alleviate the high demand for the radio resource in future networks. The learning-based approaches for network edge caching are discussed, where a vast amount of data can be harnessed for content popularity estimation and proactive caching strategy design. An outlook of research directions, challenges, and opportunities is provided and discussed in depth. To validate the proposed solution, a case study and a performance evaluation are presented. Numerical studies show that several gains are achieved by employing learning- based schemes for edge caching.

Proceedings ArticleDOI
20 Oct 2018
TL;DR: DAWG as mentioned in this paper is a generic mechanism for secure way partitioning of set associative structures including memory caches, which can be implemented on a processor with minimal modifications to modern operating systems.
Abstract: Software side channel attacks have become a serious concern with the recent rash of attacks on speculative processor architectures. Most attacks that have been demonstrated exploit the cache tag state as their exfiltration channel. While many existing defense mechanisms that can be implemented solely in software have been proposed, these mechanisms appear to patch specific attacks, and can be circumvented. In this paper, we propose minimal modifications to hardware to defend against a broad class of attacks, including those based on speculation, with the goal of eliminating the entire attack surface associated with the cache state covert channel. We propose DAWG, Dynamically Allocated Way Guard, a generic mechanism for secure way partitioning of set associative structures including memory caches. DAWG endows a set associative structure with a notion of protection domains to provide strong isolation. When applied to a cache, unlike existing quality of service mechanisms such as Intel's Cache Allocation Technology (CAT), DAWG fully isolates hits, misses, and metadata updates across protection domains. We describe how DAWG can be implemented on a processor with minimal modifications to modern operating systems. We describe a non-interference property that is orthogonal to speculative execution and therefore argue that existing attacks such as Spectre Variant 1 and 2 will not work on a system equipped with DAWG. Finally, we evaluate the performance impact of DAWG on the cache subsystem.

Proceedings ArticleDOI
15 Oct 2018
TL;DR: DeepCache as mentioned in this paper proposes a principled cache design for deep learning inference in continuous mobile vision, which benefits model execution efficiency by exploiting temporal locality in input video streams and propagates regions of reusable results by exploiting the model's internal structure.
Abstract: We present DeepCache, a principled cache design for deep learning inference in continuous mobile vision. DeepCache benefits model execution efficiency by exploiting temporal locality in input video streams. It addresses a key challenge raised by mobile vision: the cache must operate under video scene variation, while trading off among cacheability, overhead, and loss in model accuracy. At the input of a model, DeepCache discovers video temporal locality by exploiting the video's internal structure, for which it borrows proven heuristics from video compression; into the model, DeepCache propagates regions of reusable results by exploiting the model's internal structure. Notably, DeepCache eschews applying video heuristics to model internals which are not pixels but high-dimensional, difficult-to-interpret data. Our implementation of DeepCache works with unmodified deep learning models, requires zero developer's manual effort, and is therefore immediately deployable on off-the-shelf mobile devices. Our experiments show that DeepCache saves inference execution time by 18% on average and up to 47%. DeepCache reduces system energy consumption by 20% on average.

Posted Content
TL;DR: DAWG, Dynamically Allocated Way Guard, a generic mechanism for secure way partitioning of set associative structures including memory caches, is proposed, which describes a non-interference property that is orthogonal to speculative execution and therefore argues that existing attacks such as Spectre Variant 1 and 2 will not work on a system equipped with DAWG.
Abstract: Software side channel attacks have become a serious concern with the recent rash of attacks on speculative processor architectures. Most attacks that have been demonstrated exploit the cache tag state as their exfiltration channel. While many existing defense mechanisms that can be implemented solely in software have been proposed, these mechanisms appear to patch specific attacks, and can be circumvented. In this paper, we propose minimal modifications to hardware to defend against a broad class of attacks, including those based on speculation, with the goal of eliminating the entire attack surface associated with the cache state covert channel. We propose DAWG, Dynamically Allocated Way Guard, a generic mechanism for secure way partitioning of set associative structures including memory caches. DAWG endows a set associative structure with a notion of protection domains to provide strong isolation. When applied to a cache, unlike existing quality of service mechanisms such as Intel's Cache Allocation Technology (CAT), DAWG fully isolates hits, misses, and metadata updates across protection domains. We describe how DAWG can be implemented on a processor with minimal modifications to modern operating systems. We describe a non-interference property that is orthogonal to speculative execution and therefore argue that existing attacks such as Spectre Variant 1 and 2 will not work on a system equipped with DAWG. Finally, we evaluate the performance impact of DAWG on the cache subsystem.

Proceedings Article
11 Jul 2018
TL;DR: Varys fully protects against all L1/L2 cache timing attacks and significantly raises the bar for page table side-channel attacks and proposes a set of minor hardware extensions that hold the potential to extend Varies' security guarantees to L3 cache and further improve its performance.
Abstract: Numerous recent works have experimentally shown that Intel Software Guard Extensions (SGX) are vulnerable to cache timing and page table side-channel attacks which could be used to circumvent the data confidentiality guarantees provided by SGX. Existing mechanisms that protect against these attacks either incur high execution costs, are ineffective against certain attack variants, or require significant code modifications. We present Varys, a system that protects unmodified programs running in SGX enclaves from cache timing and page table side-channel attacks. Varys takes a pragmatic approach of strict reservation of physical cores to security-sensitive threads, thereby preventing the attacker from accessing shared CPU resources during enclave execution. The key challenge that we are addressing is that of maintaining the core reservation in the presence of an untrusted OS. Varys fully protects against all L1/L2 cache timing attacks and significantly raises the bar for page table side-channel attacks--all with only 15% overhead on average for Phoenix and PARSEC benchmarks. Additionally, we propose a set of minor hardware extensions that hold the potential to extend Varys' security guarantees to L3 cache and further improve its performance.

Journal ArticleDOI
TL;DR: The authors propose to augment NMT models with a very light-weight cache-like memory network, which stores recent hidden representations as translation history, and the probability distribution over generated words is updated online depending on the translation history retrieved from the memory.
Abstract: Existing neural machine translation (NMT) models generally translate sentences in isolation, missing the opportunity to take advantage of document-level information. In this work, we propose to augment NMT models with a very light-weight cache-like memory network, which stores recent hidden representations as translation history. The probability distribution over generated words is updated online depending on the translation history retrieved from the memory, endowing NMT models with the capability to dynamically adapt over time. Experiments on multiple domains with different topics and styles show the effectiveness of the proposed approach with negligible impact on the computational cost.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a proactive caching scheme for UAV-enabled content-centric communication systems, where a UAV is dispatched to serve a group of ground nodes (GNs) with random and asynchronous requests for files drawn from a given set.
Abstract: Wireless communication enabled by unmanned aerial vehicles (UAVs) has emerged as an appealing technology for many application scenarios in future wireless systems. However, the limited endurance of UAVs greatly hinders the practical implementation of UAV-enabled communications. To overcome this issue, this paper proposes a novel scheme for UAV-enabled communications by utilizing the promising technique of proactive caching at the users. Specifically, we focus on content-centric communication systems, where a UAV is dispatched to serve a group of ground nodes (GNs) with random and asynchronous requests for files drawn from a given set. With the proposed scheme, at the beginning of each operation period, the UAV pro-actively transmits the files to a subset of selected GNs that cooperatively cache all the files. As a result, when requested, a file can be retrieved by each GN either directly from its local cache or from its nearest neighbor that has cached the file via device-to-device communications. It is revealed that there exists a fundamental trade-off between the file caching cost , which is the total time required for the UAV to transmit the files to their designated caching GNs, and the file retrieval cost , which is the average time required for serving one file request. To characterize this trade-off, we formulate an optimization problem to minimize the weighted sum of the two costs, via jointly designing the file caching policy, the UAV trajectory, and communication scheduling. As the formulated problem is NP-hard in general, we propose efficient algorithms to find high-quality approximate solutions for it. Numerical results are provided to corroborate our study and show the great potential of proactive caching for overcoming the endurance issue in UAV-enabled communications.

Journal ArticleDOI
TL;DR: A survey of cache management strategies in ICN is presented along with their contributions and limitations, and their performance is evaluated in a simulation network environment with respect to cache hit, stretch ratio, and eviction operations.
Abstract: Information-Centric Networking (ICN) is an appealing architecture that has received a remarkable interest from the research community thanks to its friendly structure. Several projects have proposed innovative ICN models to cope with the Internet practice, which moves from host-centrism to receiver-driven communication. A worth mentioning component of these novel models is in-network caching, which provides flexibility and pervasiveness for the upturn of swiftness in data distribution. Because of the rapid Internet traffic growth, cache deployment and content caching have been unanimously accepted as conspicuous ICN issues to be resolved. In this article, a survey of cache management strategies in ICN is presented along with their contributions and limitations, and their performance is evaluated in a simulation network environment with respect to cache hit, stretch ratio, and eviction operations. Some unresolved ICN caching challenges and directions for future research in this networking area are also discussed.

Journal ArticleDOI
TL;DR: The proposed optimization framework reveals the caching performance upper bound for general adaptive video streaming systems, while the proposed algorithm provides some design guidelines for the edge servers to select the cached representations in practice based on both the video popularity and content information.
Abstract: Caching at mobile edge servers can smooth temporal traffic variability and reduce the service load of base stations in mobile video delivery. However, the assignment of multiple video representations to distributed servers is still a challenging question in the context of adaptive streaming, since any two representations from different videos or even from the same video will compete for the limited caching storage. Therefore, it is important, yet challenging, to optimally select the cached representations for each edge server in order to effectively reduce the service load of base station while maintaining a high quality of experience (QoE) for users. To address this, we study a QoE-driven mobile edge caching placement optimization problem for dynamic adaptive video streaming that properly takes into account the different rate-distortion (R–D) characteristics of videos and the coordination among distributed edge servers. Then, by the optimal caching placement of representations for multiple videos, we maximize the aggregate average video distortion reduction of all users while minimizing the additional cost of representation downloading from the base station, subject not only to the storage capacity constraints of the edge servers, but also to the transmission and initial startup delay constraints of the users. We formulate the proposed optimization problem as an integer linear program to provide the performance upper bound, and as a submodular maximization problem with a set of knapsack constraints to develop a practically feasible cost benefit greedy algorithm. The proposed algorithm has polynomial computational complexity and a theoretical lower bound on its performance. Simulation results further show that the proposed algorithm is able to achieve a near-optimal performance with very low time complexity. Therefore, the proposed optimization framework reveals the caching performance upper bound for general adaptive video streaming systems, while the proposed algorithm provides some design guidelines for the edge servers to select the cached representations in practice based on both the video popularity and content information.

Journal ArticleDOI
TL;DR: This paper proposes an optimal offloading with caching-enhancement scheme (OOCS) for femto-cloud scenario and mobile edge computing scenario, respectively, and considers the scenario where multiple mobile users offload duplicated computation tasks to the network edge, and share the computation results among them.
Abstract: Computation offloading is a proven successful paradigm for enabling resource-intensive applications on mobile devices. Moreover, in view of emerging mobile collaborative application, the offloaded tasks can be duplicated when multiple users are in the same proximity. This motivates us to design a collaborative offloading scheme and cache the popular computation results that are likely to be reused by other mobile users. In this paper, we consider the scenario where multiple mobile users offload duplicated computation tasks to the network edge, and share the computation results among them. Our goal is to develop the optimal fine-grained collaborative offloading strategies with caching enhancements to minimize the overall execution delay at the mobile terminal side. To this end, we propose an optimal offloading with caching-enhancement scheme (OOCS) for femto-cloud scenario and mobile edge computing scenario, respectively. Simulation results show that compared to six alternative solutions in literature, our single-user OOCS can reduce execution delay up to $42.83$ % and $33.28$ % for single-user femto-cloud and single-user mobile edge computing, respectively. Our multi-user OOCS can further reduce $11.71$ % delay compared to single-user OOCS through users’ cooperation.

Journal ArticleDOI
TL;DR: A cross-entropy-based dynamic content caching scheme is proposed accordingly to cache the contents at the edge of VCNs based on the requests of vehicles and the cooperation among RSUs and the performance of the proposed scheme is evaluated by extensive simulation experiments.
Abstract: Vehicular content networks (VCNs), which distribute medium-volume contents to vehicles in a fully distributed manner, represent the key enabling technology of vehicular infotainment applications. In VCNs, the road-side units (RSUs) cache replicas of contents on the edge of networks to facilitate the timely content delivery to driving-through vehicles when requested. However, due to the limited storage at RSUs and soaring content size for distribution, RSUs can only selectively cache content replicas. The edge caching scheme in RSUs, therefore, becomes a fundamental issue in VCNs. This paper addresses the issue by developing an edge caching scheme in RSUs. Specifically, we first analyze the features of vehicular content requests based on the content access pattern, vehicle's velocity, and road traffic density. A model is then proposed to determine whether and where to obtain the replica of content when the moving vehicle requests it. After this, a cross-entropy-based dynamic content caching scheme is proposed accordingly to cache the contents at the edge of VCNs based on the requests of vehicles and the cooperation among RSUs. Finally, the performance of the proposed scheme is evaluated by extensive simulation experiments.

Proceedings ArticleDOI
17 Jun 2018
TL;DR: The AoI optimal policy is derived, which depends only on the square root of the source popularity, and an AoS near-optimal rate allocation policy is proposed that is proportional to the cube root of both the source update rate and the sources popularity.
Abstract: We consider a cache refresh system where a local server is connected to multiple remote sources and maintains local copies of the data items at the sources. The data at each source is updated randomly and independently without notifying the local server, while the local server refreshes the corresponding cached data periodically. The freshness of the local cache is measured by two different freshness metrics, age of synchronization (AoS) and age of information (AoI). We address the following problem: given a constrained total refresh rate, how does the local server allocate the refresh rate for each source to maintain overall data freshness? We derive the AoI optimal policy which depends only on the square root of the source popularity. For a large refresh rate, we propose an AoS near-optimal rate allocation policy that is proportional to the cube root of both the source update rate and the source popularity. For small refresh rates, we also prove that the square root law with respect to the popularity minimizes both AoS and AoI.

Journal ArticleDOI
Wai-Xi Liu1, Jie Zhang1, Zhongwei Liang1, Ling-Xi Peng1, Cai Jun 
TL;DR: A lightweight caching scheme that integrates cache placement and cache replacement—caching based on popularity prediction and cache capacity (CPC).
Abstract: In information-centric networking, accurately predicting content popularity can improve the performance of caching. Therefore, based on software defined network (SDN), this paper proposes Deep-Learning-based Content Popularity Prediction (DLCPP) to achieve the popularity prediction. DLCPP adopts the switch’s computing resources and links in the SDN to build a distributed and reconfigurable deep learning network. For DLCPP, we initially determine the metrics that can reflect changes in content popularity. Second, each network node collects the spatial-temporal joint distribution data of these metrics. Then, the data are used as input to stacked auto-encoders (SAE) in DLCPP to extract the spatiotemporal features of popularity. Finally, we transform the popularity prediction into a multi-classification problem through discretizing the content popularity into multiple classifications. The Softmax classifier is used to achieve the content popularity prediction. Some challenges for DLCPP are also addressed, such as determining the structure of SAE, realizing the neuron function on an SDN switch, and deploying DLCPP on an OpenFlow-based SDN. At the same time, we propose a lightweight caching scheme that integrates cache placement and cache replacement—caching based on popularity prediction and cache capacity (CPC). Abundant experiments demonstrate good performance of DLCPP, and it achieves close to 2.1%~15% and 5.2%~40% accuracy improvements over neural networks and auto regressive, respectively. Benefitting from DLCPP’s better prediction accuracy, CPC can yield a steady improvement of caching performance over other dominant cache management frameworks.

Proceedings ArticleDOI
01 Dec 2018
TL;DR: A Federated learning based Proactive Content Caching (FPCC) scheme, which does not require to gather users' data centrally for training, and which outperforms other learning-based caching algorithms such as m-epsilon-greedy and Thompson sampling in terms of cache efficiency.
Abstract: Content caching is a promising approach in edge computing to cope with the explosive growth of mobile data on 5G networks, where contents are typically placed on local caches for fast and repetitive data access. Due to the capacity limit of caches, it is essential to predict the popularity of files and cache those popular ones. However, the fluctuated popularity of files makes the prediction a highly challenging task. To tackle this challenge, many recent works propose learning based approaches which gather the users' data centrally for training, but they bring a significant issue: users may not trust the central server and thus hesitate to upload their private data. In order to address this issue, we propose a Federated learning based Proactive Content Caching (FPCC) scheme, which does not require to gather users' data centrally for training. The FPCC is based on a hierarchical architecture in which the server aggregates the users' updates using federated averaging, and each user performs training on its local data using hybrid filtering on stacked autoencoders. The experimental results demonstrate that, without gathering user's private data, our scheme still outperforms other learning-based caching algorithms such as m-epsilon-greedy and Thompson sampling in terms of cache efficiency.

Journal ArticleDOI
TL;DR: This paper proposes a scheme called cooperative caching based on mobility prediction (CCMP) for VCCNs, and designs a cache replacement based on content popularity to guarantee that only popular contents are cached at a set of mobile nodes that may visit the same hot spot areas repeatedly.
Abstract: Vehicular content centric networks (VCCNs) emerge as a strong candidate to be deployed in information-rich applications of vehicular communications. Due to vehicles’ mobility, it becomes rather inefficient to establish end-to-end connections in VCCNs. Consequently, content packets are usually sent back to the requesting node via different paths in VCCNs. To improve network performance of VCCNs, node mobility should be exploited for vehicles to serve as relays and to carry data for delivery. In this paper, we propose a scheme called cooperative caching based on mobility prediction (CCMP) for VCCNs. The main idea of CCMP is to cache popular contents at a set of mobile nodes that may visit the same hot spot areas repeatedly. In our CCMP scheme, we use prediction based on partial matching to predict mobile nodes’ probability of reaching different hot spot regions based on their past trajectories. Vehicles with longer sojourn time in a hot region can provide more services and should be preferred as caching nodes. To solve the problem of limited buffer at each node, we design a cache replacement based on content popularity to guarantee that only popular contents are cached. We evaluate CCMP through the opportunistic network environment simulator for its salient features in success ratio and content access delay compared to other state-of-the-art schemes.

Journal ArticleDOI
TL;DR: This paper addresses the system modeling, large-scale optimization, and framework design of hierarchical edge caching in device-to-device aided mobile networks, and investigates the maximum capacity of the network infrastructure in terms of offloading network traffic, reducing system costs, and supporting content requests from mobile users locally.
Abstract: The explosive growth of content requests from mobile users is stretching the capability of current mobile networking technologies to satisfy users’ demands with acceptable quality of service. An effective approach to address this challenge, which has not yet been thoroughly studied, is to offload network traffic by caching popular content at the edges (e.g., mobile devices and base stations) of mobile networks, thus reducing the massive duplication of content downloads. In this paper, we address the system modeling, large-scale optimization, and framework design of hierarchical edge caching in device-to-device aided mobile networks. In particular, taking into account the analysis of social behavior and preference of mobile users, heterogeneous cache sizes, and the derived system topology, we investigate the maximum capacity of the network infrastructure in terms of offloading network traffic, reducing system costs, and supporting content requests from mobile users locally. Our proposed framework has a low complexity and can be applied in practical engineering implementation. Trace-based simulation results demonstrate the effectiveness of the proposed framework.

Proceedings ArticleDOI
07 Aug 2018
TL;DR: DDEEPCACHE Framework is a novel Framework for content caching, which can significantly boost cache performance and apply to existing cache policies, such as LRU and k-LRU, significantly boosts the number of cache hits.
Abstract: In this paper, we present DEEPCACHE a novel Framework for content caching, which can significantly boost cache performance. Our Framework is based on powerful deep recurrent neural network models. It comprises of two main components: i) Object Characteristics Predictor, which builds upon deep LSTM Encoder-Decoder model to predict the future characteristics of an object (such as object popularity) -- to the best of our knowledge, we are the first to propose LSTM Encoder-Decoder model for content caching; ii) a caching policy component, which accounts for predicted information of objects to make smart caching decisions. In our thorough experiments, we show that applying DEEPCACHE Framework to existing cache policies, such as LRU and k-LRU, significantly boosts the number of cache hits.

Proceedings ArticleDOI
23 Apr 2018
TL;DR: This work designs a key-value store, MyNVM, which leverages an NVM block device to reduce DRAM usage, and to reduce the total cost of ownership, while providing comparable latency and queries-per-second as MyRocks on a server with a much larger amount of DRAM.
Abstract: Popular SSD-based key-value stores consume a large amount of DRAM in order to provide high-performance database operations. However, DRAM can be expensive for data center providers, especially given recent global supply shortages that have resulted in increasing DRAM costs. In this work, we design a key-value store, MyNVM, which leverages an NVM block device to reduce DRAM usage, and to reduce the total cost of ownership, while providing comparable latency and queries-per-second (QPS) as MyRocks on a server with a much larger amount of DRAM. Replacing DRAM with NVM introduces several challenges. In particular, NVM has limited read bandwidth, and it wears out quickly under a high write bandwidth. We design novel solutions to these challenges, including using small block sizes with a partitioned index, aligning blocks post-compression to reduce read bandwidth, utilizing dictionary compression, implementing an admission control policy for which objects get cached in NVM to control its durability, as well as replacing interrupts with a hybrid polling mechanism. We implemented MyNVM and measured its performance in Facebook's production environment. Our implementation reduces the size of the DRAM cache from 96 GB to 16 GB, and incurs a negligible impact on latency and queries-per-second compared to MyRocks. Finally, to the best of our knowledge, this is the first study on the usage of NVM devices in a commercial data center environment.