scispace - formally typeset
Search or ask a question

Showing papers on "Cache published in 2020"


Journal ArticleDOI
TL;DR: A single edge server that assists a mobile user in executing a sequence of computation tasks is considered, and a mixed integer non-linear programming (MINLP) is formulated that jointly optimizes the service caching placement, computation offloading decisions, and system resource allocation.
Abstract: In mobile edge computing (MEC) systems, edge service caching refers to pre-storing the necessary programs for executing computation tasks at MEC servers. Service caching effectively reduces the real-time delay/bandwidth cost on acquiring and initializing service applications when computation tasks are offloaded to the MEC servers. The limited caching space at resource-constrained edge servers calls for careful design of caching placement to determine which programs to cache over time. This is in general a complicated problem that highly correlates to the computation offloading decisions of computation tasks, i.e., whether or not to offload a task for edge execution. In this paper, we consider a single edge server that assists a mobile user (MU) in executing a sequence of computation tasks. In particular, the MU can upload and run its customized programs at the edge server, while the server can selectively cache the previously generated programs for future reuse. To minimize the computation delay and energy consumption of the MU, we formulate a mixed integer non-linear programming (MINLP) that jointly optimizes the service caching placement, computation offloading decisions, and system resource allocation (e.g., CPU processing frequency and transmit power of MU). To tackle the problem, we first derive the closed-form expressions of the optimal resource allocation solutions, and subsequently transform the MINLP into an equivalent pure 0-1 integer linear programming (ILP) that is much simpler to solve. To further reduce the complexity in solving the ILP, we exploit the underlying structures of caching causality and task dependency models, and accordingly devise a reduced-complexity alternating minimization technique to update the caching placement and offloading decision alternately. Extensive simulations show that the proposed joint optimization techniques achieve substantial resource savings of the MU compared to other representative benchmark methods considered.

182 citations


Proceedings Article
01 Jan 2020
TL;DR: Cache Telepathy as discussed by the authors exploits the cache side channel to steal the architecture of deep neural networks (DNNs) through tiled generalized matrix multiply (GEMM) calls and the dimensions of the matrices used in the GEMM functions.
Abstract: Deep Neural Networks (DNNs) are fast becoming ubiquitous for their ability to attain good accuracy in various machine learning tasks. A DNN's architecture (i.e., its hyper-parameters) broadly determines the DNN's accuracy and performance, and is often confidential. Attacking a DNN in the cloud to obtain its architecture can potentially provide major commercial value. Further, attaining a DNN's architecture facilitates other, existing DNN attacks. This paper presents Cache Telepathy: a fast and accurate mechanism to steal a DNN's architecture using the cache side channel. Our attack is based on the insight that DNN inference relies heavily on tiled GEMM (Generalized Matrix Multiply), and that DNN architecture parameters determine the number of GEMM calls and the dimensions of the matrices used in the GEMM functions. Such information can be leaked through the cache side channel. This paper uses Prime+Probe and Flush+Reload to attack VGG and ResNet DNNs running OpenBLAS and Intel MKL libraries. Our attack is effective in helping obtain the architectures by very substantially reducing the search space of target DNN architectures. For example, for VGG using OpenBLAS, it reduces the search space from more than $10^{35}$ architectures to just 16.

138 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a blockchain empowered distributed content caching framework where vehicles perform content caching and base stations maintain the permissioned blockchain, and exploited the advanced DRL approach to design an optimal content caching scheme with taking mobility into account.
Abstract: Vehicular Edge Computing (VEC) is a promising paradigm to enable huge amount of data and multimedia content to be cached in proximity to vehicles. However, high mobility of vehicles and dynamic wireless channel condition make it challenge to design an optimal content caching policy. Further, with much sensitive personal information, vehicles may be not willing to caching their contents to an untrusted caching provider. Deep Reinforcement Learning (DRL) is an emerging technique to solve the problem with high-dimensional and time-varying features. Permission blockchain is able to establish a secure and decentralized peer-to-peer transaction environment. In this paper, we integrate DRL and permissioned blockchain into vehicular networks for intelligent and secure content caching. We first propose a blockchain empowered distributed content caching framework where vehicles perform content caching and base stations maintain the permissioned blockchain. Then, we exploit the advanced DRL approach to design an optimal content caching scheme with taking mobility into account. Finally, we propose a new block verifier selection method, Proof-of-Utility (PoU), to accelerate block verification process. Security analysis shows that our proposed blockchain empowered content caching can achieve security and privacy protection. Numerical results based on a real dataset from Uber indicate that the DRL-inspired content caching scheme significantly outperforms two benchmark policies.

136 citations


Proceedings ArticleDOI
06 Jul 2020
TL;DR: In this article, the authors consider cooperation among edge nodes and investigate cooperative service caching and workload scheduling in mobile edge computing and develop an iterative algorithm named ICE to solve this problem.
Abstract: Mobile edge computing is beneficial for reducing service response time and core network traffic by pushing cloud functionalities to network edge. Equipped with storage and computation capacities, edge nodes can cache services of resource-intensive and delay-sensitive mobile applications and process the corresponding computation tasks without outsourcing to central clouds. However, the heterogeneity of edge resource capacities and mismatch of edge storage and computation capacities make it difficult to fully utilize both the storage and computation capacities in the absence of edge cooperation. To address this issue, we consider cooperation among edge nodes and investigate cooperative service caching and workload scheduling in mobile edge computing. This problem can be formulated as a mixed integer nonlinear programming problem, which has non-polynomial computation complexity. Addressing this problem faces challenges of sub-problem coupling, computation-communication tradeoff, and edge node heterogeneity. We develop an iterative algorithm named ICE to solve this problem. It is designed based on Gibbs sampling, which has provably near-optimal performance, and the idea of water filling, which has polynomial computation complexity. Simulation results demonstrate that our algorithm can jointly reduce the service response time and the outsourcing traffic, compared with the benchmark algorithms.

128 citations


Journal ArticleDOI
TL;DR: Simulation results demonstrate that the proposed JCC-UA algorithm can effectively reduce the latency of user content downloading and improve the hit rates of contents cached at the BSs as compared to several baseline schemes.
Abstract: Deploying small cell base stations (SBS) under the coverage area of a macro base station (MBS), and caching popular contents at the SBSs in advance, are effective means to provide high-speed and low-latency services in next generation mobile communication networks. In this paper, we investigate the problem of content caching (CC) and user association (UA) for edge computing. A joint CC and UA optimization problem is formulated to minimize the content download latency. We prove that the joint CC and UA optimization problem is NP-hard. Then, we propose a CC and UA algorithm (JCC-UA) to reduce the content download latency. JCC-UA includes a smart content caching policy (SCCP) and dynamic user association (DUA). SCCP utilizes the exponential smoothing method to predict content popularity and cache contents according to prediction results. DUA includes a rapid association (RA) method and a delayed association (DA) method. Simulation results demonstrate that the proposed JCC-UA algorithm can effectively reduce the latency of user content downloading and improve the hit rates of contents cached at the BSs as compared to several baseline schemes.

120 citations


Journal ArticleDOI
05 Sep 2020
TL;DR: A dynamic scheduling policy based on Deep Reinforcement Learning (DRL) with the Deep Deterministic Policy Gradient (DDPG) method is proposed to solve the problem of dynamic caching, computation offloading, and resource allocation in cache-assisted multi-user MEC systems with stochastic task arrivals.
Abstract: Mobile Edge Computing (MEC) is one of the most promising techniques for next-generation wireless communication systems. In this paper, we study the problem of dynamic caching, computation offloading, and resource allocation in cache-assisted multi-user MEC systems with stochastic task arrivals. There are multiple computationally intensive tasks in the system, and each Mobile User (MU) needs to execute a task either locally or remotely in one or more MEC servers by offloading the task data. Popular tasks can be cached in MEC servers to avoid duplicates in offloading. The cached contents can be either obtained through user offloading, fetched from a remote cloud, or fetched from another MEC server. The objective is to minimize the long-term average of a cost function, which is defined as a weighted sum of energy consumption, delay, and cache contents' fetching costs. The weighting coefficients associated with the different metrics in the objective function can be adjusted to balance the tradeoff among them. The optimum design is performed with respect to four decision parameters: whether to cache a given task, whether to offload a given uncached task, how much transmission power should be used during offloading, and how much MEC resources to be allocated for executing a task. We propose to solve the problems by developing a dynamic scheduling policy based on Deep Reinforcement Learning (DRL) with the Deep Deterministic Policy Gradient (DDPG) method. A new decentralized DDPG algorithm is developed to obtain the optimum designs for multi-cell MEC systems by leveraging on the cooperations among neighboring MEC servers. Simulation results demonstrate that the proposed algorithm outperforms other existing strategies, such as Deep Q-Network (DQN).

117 citations


Journal ArticleDOI
TL;DR: A cloud-mobile edge computing (MEC) collaborative task offloading scheme with service orchestration (CTOSO) with orchestrating data as services (ODaS) mechanism based on the SDN technology is proposed, which greatly reduces the network load caused by uploading resources to the cloud.
Abstract: Billions of devices are connected to the Internet of Things (IoT). These devices generate a large volume of data, which poses an enormous burden on conventional networking infrastructures. As an effective computing model, edge computing is collaborative with cloud computing by moving part intensive computation and storage resources to edge devices, thus optimizing the network latency and energy consumption. Meanwhile, the software-defined networks (SDNs) technology is promising in improving the quality of service (QoS) for complex IoT-driven applications. However, building SDN-based computing platform faces great challenges, making it difficult for the current computing models to meet the low-latency, high-complexity, and high-reliability requirements of emerging applications. Therefore, a cloud-mobile edge computing (MEC) collaborative task offloading scheme with service orchestration (CTOSO) is proposed in this article. First, the CTOSO scheme models the computational consumption, communication consumption, and latency of task offloading and implements differentiated offloading decisions for tasks with different resource demand and delay sensitivity. What is more, the CTOSO scheme introduces orchestrating data as services (ODaS) mechanism based on the SDN technology. The collected metadata are orchestrated as high-quality services by MEC servers, which greatly reduces the network load caused by uploading resources to the cloud on the one hand, and on the other hand, the data processing is completed at the edge layer as much as possible, which achieves the load balancing and also reduces the risk of data leakage. The experimental results demonstrate that compared to the random decision-based task offloading scheme and the maximum cache-based task offloading scheme, the CTOSO scheme reduces delay by approximately 73.82%–74.34% and energy consumption by 10.71%–13.73%.

101 citations


Journal ArticleDOI
TL;DR: The Reinforcement Learning techniques Multi Objective Ant Colony Optimization (MOACO) algorithms has been applied to deal with the accurate resource allocation between the end users in the way of creating the cost mapping tables creations and optimal allocation in MEC.

98 citations


Journal ArticleDOI
TL;DR: The work derives the exact optimal worst-case delay and DoF, for a broad range of user-to-cache association profiles where each such profile describes how many users are helped by each cache.
Abstract: The work explores the fundamental limits of coded caching in the setting where a transmitter with potentially multiple ( $N_{0}$ ) antennas serves different users that are assisted by a smaller number of caches. Under the assumption of uncoded cache placement, the work derives the exact optimal worst-case delay and DoF, for a broad range of user-to-cache association profiles where each such profile describes how many users are helped by each cache. This is achieved by presenting an information-theoretic converse based on index coding that succinctly captures the impact of the user-to-cache association, as well as by presenting a coded caching scheme that optimally adapts to the association profile by exploiting the benefits of encoding across users that share the same cache. The work reveals a powerful interplay between shared caches and multiple senders/antennas, where we can now draw the striking conclusion that, as long as each cache serves at least $N_{0}$ users, adding a single degree of cache-redundancy can yield a DoF increase equal to $N_{0}$ , while at the same time — irrespective of the profile — going from 1 to $N_{0}$ antennas reduces the delivery time by a factor of $N_{0}$ . Finally some conclusions are also drawn for the related problem of coded caching with multiple file requests.

89 citations


Proceedings ArticleDOI
James Cadden1, Thomas Unger1, Yara Awad1, Han Dong1, Orran Krieger1, Jonathan Appavoo1 
15 Apr 2020
TL;DR: This paper presents a system-level method for achieving the rapid deployment and high-density caching of serverless functions in a FaaS environment, and is able to cache over 50,000 function instances in memory as opposed to 3,000 using standard OS techniques.
Abstract: This paper presents a system-level method for achieving the rapid deployment and high-density caching of serverless functions in a FaaS environment. For reduced start times, functions are deployed from unikernel snapshots, bypassing expensive initialization steps. To reduce the memory footprint of snapshots we apply page-level sharing across the entire software stack that is required to run a function. We demonstrate the effects of our techniques by replacing Linux on the compute node of a FaaS platform architecture. With our prototype OS, the deployment time of a function drops from 100s of milliseconds to under 10 ms. Platform throughput improves by 51x on workload composed entirely of new functions. We are able to cache over 50,000 function instances in memory as opposed to 3,000 using standard OS techniques. In combination, these improvements give the FaaS platform a new ability to handle large-scale bursts of requests.

87 citations


Posted Content
TL;DR: Security analysis shows that the proposed blockchain empowered content caching can achieve security and privacy protection andumerical results based on a real dataset from Uber indicate that the DRL-inspired content caching scheme significantly outperforms two benchmark policies.
Abstract: Vehicular Edge Computing (VEC) is a promising paradigm to enable huge amount of data and multimedia content to be cached in proximity to vehicles. However, high mobility of vehicles and dynamic wireless channel condition make it challenge to design an optimal content caching policy. Further, with much sensitive personal information, vehicles may be not willing to caching their contents to an untrusted caching provider. Deep Reinforcement Learning (DRL) is an emerging technique to solve the problem with high-dimensional and time-varying features. Permission blockchain is able to establish a secure and decentralized peer-to-peer transaction environment. In this paper, we integrate DRL and permissioned blockchain into vehicular networks for intelligent and secure content caching. We first propose a blockchain empowered distributed content caching framework where vehicles perform content caching and base stations maintain the permissioned blockchain. Then, we exploit the advanced DRL approach to design an optimal content caching scheme with taking mobility into account. Finally, we propose a new block verifier selection method, Proof-of-Utility (PoU), to accelerate block verification process. Security analysis shows that our proposed blockchain empowered content caching can achieve security and privacy protection. Numerical results based on a real dataset from Uber indicate that the DRL-inspired content caching scheme significantly outperforms two benchmark policies.

Proceedings ArticleDOI
12 Oct 2020
TL;DR: PaGraph is proposed, a system that supports general and efficient sampling-based GNN training on single-server with multi-GPU with good cache efficiency and develops a fast GNN-computation-aware partition algorithm to avoid cross-partition access during data parallel training and achieves better cache efficiency.
Abstract: Emerging graph neural networks (GNNs) have extended the successes of deep learning techniques against datasets like images and texts to more complex graph-structured data. By leveraging GPU accelerators, existing frameworks combine both mini-batch and sampling for effective and efficient model training on large graphs. However, this setup faces a scalability issue since loading rich vertices features from CPU to GPU through a limited bandwidth link usually dominates the training cycle. In this paper, we propose PaGraph, a system that supports general and efficient sampling-based GNN training on single-server with multi-GPU. PaGraph significantly reduces the data loading time by exploiting available GPU resources to keep frequently accessed graph data with a cache. It also embodies a lightweight yet effective caching policy that takes into account graph structural information and data access patterns of sampling-based GNN training simultaneously. Furthermore, to scale out on multiple GPUs, PaGraph develops a fast GNN-computation-aware partition algorithm to avoid cross-partition access during data parallel training and achieves better cache efficiency. Evaluations on two representative GNN models, GCN and GraphSAGE, show that PaGraph achieves up to 96.8% data loading time reductions and up to 4.8X performance speedup over the state-of-the-art baselines. Together with preprocessing optimization, PaGraph further delivers up to 16.0X end-to-end speedup.

Journal ArticleDOI
03 Apr 2020
TL;DR: A new efficient algorithm, DL8.5, is introduced for finding optimal decision trees, based on the use of itemset mining techniques, that outperforms earlier approaches with several orders of magnitude, for both numerical and discrete data, and is generic as well.
Abstract: Several recent publications have studied the use of Mixed Integer Programming (MIP) for finding an optimal decision tree, that is, the best decision tree under formal requirements on accuracy, fairness or interpretability of the predictive model. These publications used MIP to deal with the hard computational challenge of finding such trees. In this paper, we introduce a new efficient algorithm, DL8.5, for finding optimal decision trees, based on the use of itemset mining techniques. We show that this new approach outperforms earlier approaches with several orders of magnitude, for both numerical and discrete data, and is generic as well. The key idea underlying this new approach is the use of a cache of itemsets in combination with branch-and-bound search; this new type of cache also stores results for parts of the search space that have been traversed partially.

Journal ArticleDOI
TL;DR: The novel schemes based on deep reinforcement learning are proposed to implement the dynamic decision making and optimization of the content delivery problems, aiming at improving the quality of experience of overall caching system.
Abstract: Internet of Things (IoT) technology suffers from the challenge that rare wireless network resources are difficult to meet the influx of a huge number of terminal devices. Cache-enabled device-to-device (D2D) communication technology is expected to relieve network pressure with the fact that the requesting contents can be easily obtained from nearby users. However, how to design an effective caching policy becomes very challenging due to the limited content storage capacity and the uncertainty of user mobility pattern. In this article, we study the jointly cache content placement and delivery policy for the cache-enabled D2D networks. Specifically, two potential recurrent neural network approaches [the echo state network (ESN) and the long short-term memory (LSTM) network] are employed to predict users’ mobility and content popularity, so as to determine which content to cache and where to cache. When the local cache of the user cannot satisfy its own request, the user may consider establishing a D2D link with the neighboring user to implement the content delivery. In order to decide which user will be selected to establish the D2D link, we propose the novel schemes based on deep reinforcement learning to implement the dynamic decision making and optimization of the content delivery problems, aiming at improving the quality of experience of overall caching system. The simulation results suggest that the cache hit ratio of the system can be well improved by the proposed content placement strategy, and the proposed content delivery approaches can effectively reduce the request content delivery delay and energy consumption.

Journal ArticleDOI
TL;DR: This work thoroughly studied previous works on Vehicular Named Data Networking (VNDN), and demonstrated the feasibility and necessity for employing NDN in vehicular environments with the help of content caching, and strengthened the importance of cache selection and replacement strategies in VNDN framework.
Abstract: The TCP/IP stack plays an important role in terms of data transmission, traffic control and address assignment in the Internet of Vehicles (IoV), which has seen phenomenal growth in recent years. However, with the increasing technical requirements and daily demand in IoV, the drawbacks of traditional TCP/IP protocols, e.g., weak scalability in large networks, low efficiency in the dense environment and unreliable addressing in high mobility circumstance, become non-trivial especially in vehicular environments. Fortunately, the emerging Named Data Networking (NDN) technology provides a good choice to address the above issues in vehicular environments. Specifically, the introduced content store module in NDN which caches the sent/received contents, can greatly improve the networking performance by suppressing redundancy as well as enriching diversity. With aforementioned motivations, we thoroughly studied previous works on Vehicular Named Data Networking (VNDN) with emphasis on content caching, and then demonstrate the feasibility and necessity for employing NDN in vehicular environments with the help of content caching. Subsequently, we further strengthened the importance of cache selection and replacement strategies in VNDN framework, which is positioned to meet the challenges of data transmission efficiency and resource consumption by leveraging in-network vehicular caching. After that, we engaged in an in-depth survey on the existing cache selection and replacement schemes in VNDN, with their applicability compared. Next, further challenges during caching design were elaborately analyzed considering the specific characteristics of VNDN. Finally, we highlighted the potential research directions which may shine lights on the promising efforts to improve the performance of VNDN content caching.

Posted Content
TL;DR: This work presents CacheOut, a new microarchitectural attack that is capable of bypassing Intel's buffer overwrite countermeasures and demonstrates that CacheOut can leak information across multiple security boundaries, including those between processes, virtual machines, user and kernel space, and from SGX enclaves.
Abstract: Recent transient-execution attacks, such as RIDL, Fallout, and ZombieLoad, demonstrated that attackers can leak information while it transits through microarchitectural buffers Named Microarchitectural Data Sampling (MDS) by Intel, these attacks are likened to "drinking from the firehose", as the attacker has little control over what data is observed and from what origin Unable to prevent the buffers from leaking, Intel issued countermeasures via microcode updates that overwrite the buffers when the CPU changes security domains In this work we present CacheOut, a new microarchitectural attack that is capable of bypassing Intel's buffer overwrite countermeasures We observe that as data is being evicted from the CPU's L1 cache, it is often transferred back to the leaky CPU buffers where it can be recovered by the attacker CacheOut improves over previous MDS attacks by allowing the attacker to choose which data to leak from the CPU's L1 cache, as well as which part of a cache line to leak We demonstrate that CacheOut can leak information across multiple security boundaries, including those between processes, virtual machines, user and kernel space, and from SGX enclaves

Journal ArticleDOI
TL;DR: Numerical results show that great throughput enhancement is achieved by applying the proposed joint design in comparison with other benchmarks without trajectory design and power control, and the computational complexity of this algorithm is analyzed.
Abstract: It is well known that unmanned aerial vehicles (UAVs) can help terrestrial base stations (BSs) offload data traffic from crowded areas to improve coverage and boost throughput. However, the limited backhaul capacity cannot cope with the ever-increasing data demands, for which caching is introduced to relieve the backhaul bottleneck. In this paper, we focus on a multi-UAV assisted wireless network, and target to fully utilize the benefits of wireless caching and UAV mobility for multiuser content delivery. By taking into account the limited storage, our goal is to maximize the minimum throughput among UAV-served users by jointly optimizing cache placement, UAV trajectory, and transmission power in a finite period. The resultant problem is a mixed-integer non-convex optimization problem. To facilitate solving this problem, an alternating iterative algorithm is proposed by adopting the block alternating descent and successive convex approximation methods. Specifically, this problem is split into three subproblems, namely cache placement optimization, trajectory optimization, and power allocation optimization. Then these subproblems are solved alternately in an iterative manner. We show that the proposed algorithm can converge to the set of stationary solutions of this problem. Besides, we further analyze the computational complexity of this algorithm. Numerical results show that great throughput enhancement is achieved by applying our proposed joint design in comparison with other benchmarks without trajectory design and power control.

Journal ArticleDOI
TL;DR: In this article, the authors derived fundamental performance limits for the caching problem by using tools for the index coding problem that were either known or are newly developed in this work, and proposed a new index coding achievable scheme based on distributed source coding.
Abstract: Caching is an efficient way to reduce network traffic congestion during peak hours, by storing some content at the user’s local cache memory, even without knowledge of user’s later demands. Maddah-Ali and Niesen proposed a two-phase (placement phase and delivery phase) coded caching strategy for broadcast channels with cache-aided users. This paper investigates the same model under the constraint that content is placed uncoded within the caches, that is, when bits of the files are simply copied within the caches. When the cache contents are uncoded and the users’ demands are revealed, the caching problem can be connected to an index coding problem. This paper focuses on deriving fundamental performance limits for the caching problem by using tools for the index coding problem that were either known or are newly developed in this work. First, a converse bound for the caching problem under the constraint of uncoded cache placement is proposed based on the “acyclic index coding converse bound.” This converse bound is proved to be achievable by the Maddah-Ali and Niesen’s scheme when the number of files is not less than the number of users, and by a newly derived index coding achievable scheme otherwise. The proposed index coding achievable scheme is based on distributed source coding and strictly improves on the widely used “composite (index) coding” achievable bound and its improvements, and is of independent interest. An important consequence of the findings of this paper is that advancements on the coded caching problem posed by Maddah-Ali and Niesen are thus only possible by considering strategies with coded placement phase. A recent work by Yu et al has however shown that coded cache placement can at most half the network load compared to the results presented in this paper.

Journal ArticleDOI
TL;DR: The “Zen 2” processor is designed to meet the needs of diverse markets spanning server, desktop, mobile, and workstation by providing flexibility and scalability up to 64 cores per socket with a total of 256 MB of L3 cache.
Abstract: The “Zen 2” processor is designed to meet the needs of diverse markets spanning server, desktop, mobile, and workstation. The core delivers significant performance and energy-efficiency improvements over “Zen” by microarchitectural changes including a new TAGE branch predictor, a double-size op cache, and a double-width floating-point unit. Building upon the core design, a modular chiplet approach provides flexibility and scalability up to 64 cores per socket with a total of 256 MB of L3 cache.

Journal ArticleDOI
TL;DR: In this paper, an optimization framework for coded caching that accounts for various heterogeneous aspects of practical systems is provided, and the optimization framework is used to develop a coded caching scheme capable of handling simultaneous non-uniform file length, file popularity, and user cache size.
Abstract: This paper aims to provide an optimization framework for coded caching that accounts for various heterogeneous aspects of practical systems. An optimization theoretic perspective on the seminal work on the fundamental limits of caching by Maddah-Ali and Niesen is first developed, whereas it is proved that the coded caching scheme presented in that work is the optimal scheme among a large, non-trivial family of possible caching schemes. The optimization framework is then used to develop a coded caching scheme capable of handling simultaneous non-uniform file length, non-uniform file popularity, and non-uniform user cache size. Although the resulting full optimization problem scales exponentially with the problem size, this paper shows that tractable simplifications of the problem that scale as a polynomial function of the problem size can still perform well compared to the original problem. By considering these heterogeneities both individually and in conjunction with one another, evidence of the effect of their interactions and influence on optimal cache content is obtained.

Journal ArticleDOI
TL;DR: It is shown that the caching-NOMA combination provides a new opportunity of cache hit which enhances the cache utility as well as the effectiveness of NOMA, and two novel methods to address the power allocation problem are proposed.
Abstract: This work exploits the advantages of two prominent techniques in future communication networks, namely caching and non-orthogonal multiple access (NOMA). Particularly, a system with Rayleigh fading channels and cache-enabled users is analyzed. It is shown that the caching-NOMA combination provides a new opportunity of cache hit which enhances the cache utility as well as the effectiveness of NOMA. Importantly, this comes without requiring users’ collaboration, and thus, avoids many complicated issues such as users’ privacy and security, selfishness, etc. In order to optimize users’ quality of service and, concurrently, ensure the fairness among users, the probability that all users can decode the desired signals is maximized. In NOMA, a combination of multiple messages are sent to users, and the defined objective is approached by finding an appropriate power allocation for message signals. To address the power allocation problem, two novel methods are proposed. The first one is a divide-and-conquer-based method for which closed-form expressions for the optimal resource allocation policy are derived making this method simple and flexible to the system context. The second one is based on deep reinforcement learning method that allows all users to share the full bandwidth. Finally, simulation results are provided to demonstrate the effectiveness of the proposed methods and to compare their performance.

Journal ArticleDOI
TL;DR: A Blockchain-based Cache and Delivery Market (CDM) is proposed as an incentive mechanism for the distributed caching system and the model of cache sharing and transaction execution consensus is proposed, and caching placement and SCENE selection as Markov Decision Process problems.
Abstract: Device-to-Device (D2D) caching assists Mobile Edge Computing (MEC) based caching in offloading inter-domain traffic by sharing cached items with nearby users, while its performance relies heavily on caching nodes’ sharing willingness In this paper, a Blockchain-based Cache and Delivery Market (CDM) is proposed as an incentive mechanism for the distributed caching system Under given incentive mechanisms, both D2D and MEC caching nodes’ willingness is guaranteed by satisfying their expected reward for cache sharing Besides, for the distributed CDM, content delivery related transactions are executed by smart contracts To achieve consensus on transactions and prevent frauds, a consensus protocol among the smart contract execution nodes (SCENE) is necessary To minimize the latency of reaching consensus while guaranteeing its confidence level, we propose partial Practical Byzantine Fault Tolerance (pPBFT) protocol Further, the model of cache sharing and transaction execution consensus is proposed, and we further formulate caching placement and SCENE selection as Markov Decision Process problems Due to the complexity and dynamics of the problems, a deep reinforcement learning approach is adopted to solve the problem The simulation results show that the proposed schemes outperform conventional solutions in terms of traffic offloading, content retrieval latency, and consensus latency

Posted Content
TL;DR: CURE is proposed, the first security architecture, which tackles design challenges by providing different types of enclaves, and enables the exclusive assignment of system resources, e.g., peripherals, CPU cores, or cache resources to single enclaves.
Abstract: Security architectures providing Trusted Execution Environments (TEEs) have been an appealing research subject for a wide range of computer systems, from low-end embedded devices to powerful cloud servers. The goal of these architectures is to protect sensitive services in isolated execution contexts, called enclaves. Unfortunately, existing TEE solutions suffer from significant design shortcomings. First, they follow a one-size-fits-all approach offering only a single enclave type, however, different services need flexible enclaves that can adjust to their demands. Second, they cannot efficiently support emerging applications (e.g., Machine Learning as a Service), which require secure channels to peripherals (e.g., accelerators), or the computational power of multiple cores. Third, their protection against cache side-channel attacks is either an afterthought or impractical, i.e., no fine-grained mapping between cache resources and individual enclaves is provided. In this work, we propose CURE, the first security architecture, which tackles these design challenges by providing different types of enclaves: (i) sub-space enclaves provide vertical isolation at all execution privilege levels, (ii) user-space enclaves provide isolated execution to unprivileged applications, and (iii) self-contained enclaves allow isolated execution environments that span multiple privilege levels. Moreover, CURE enables the exclusive assignment of system resources, e.g., peripherals, CPU cores, or cache resources to single enclaves. CURE requires minimal hardware changes while significantly improving the state of the art of hardware-assisted security architectures. We implemented CURE on a RISC-V-based SoC and thoroughly evaluated our prototype in terms of hardware and performance overhead. CURE imposes a geometric mean performance overhead of 15.33% on standard benchmarks.

Proceedings ArticleDOI
06 Jul 2020
TL;DR: This paper formally defines the problem of offloading dependent tasks with service caching (ODT-SC), and proves that there exists no algorithm with constant approximation for this hard problem, and designs an efficient convex programming based algorithm (CP) to solve this problem.
Abstract: In Mobile Edge Computing (MEC), many tasks require specific service support for execution and in addition, have a dependent order of execution among the tasks. However, previous works often ignore the impact of having limited services cached at the edge nodes on (dependent) task offloading, thus may lead to an infeasible offloading decision or a longer completion time. To bridge the gap, this paper studies how to efficiently offload dependent tasks to edge nodes with limited (and predetermined) service caching. We formally define the problem of offloading dependent tasks with service caching (ODT-SC), and prove that there exists no algorithm with constant approximation for this hard problem. Then, we design an efficient convex programming based algorithm (CP) to solve this problem. Moreover, we study a special case with a homogeneous MEC and propose a favorite successor based algorithm (FS) to solve this special case with a competitive ratio of O(1). Extensive simulation results using Google data traces show that our proposed algorithms can significantly reduce applications’ completion time by about 27-51% compared with other alternatives.

Journal ArticleDOI
TL;DR: An ant colony optimization-based algorithm is devised to solve the problem and achieve a near-optimal solution to the cooperative edge caching scheme, which allows vehicles to fetch one content from multiple caching servers cooperatively.
Abstract: In this article, we propose a cooperative edge caching scheme, which allows vehicles to fetch one content from multiple caching servers cooperatively In specific, we consider two types of vehicular content requests, ie, location-based and popular contents, with different delay requirements Both types of contents are encoded according to fountain code and cooperatively cached at multiple servers The proposed scheme can be optimized by finding an optimal cooperative content placement that determines the placing locations and proportions for all contents To this end, we first analyze the upper bound proportion of content caching at a single server, which is determined by both the downloading rate and the association duration when the vehicle drives through the server's coverage For both types of contents, the respective theoretical analysis of transmission delay and service cost (including content caching and transmission cost) are provided We then formulate an optimization problem of cooperative content placement to minimize the overall transmission delay and service cost As the problem is a multi-objective multi-dimensional multi-choice knapsack problem, which is proved to be NP-hard, we devise an ant colony optimization-based algorithm to solve the problem and achieve a near-optimal solution Simulation results are provided to validate the performance of the proposed algorithm, including its convergence and optimality of caching, while guaranteeing low transmission delay and service cost

Journal ArticleDOI
TL;DR: An online UAV-assisted wireless caching design via jointly optimizing UAV trajectory, transmission power and caching content scheduling as an infinite-horizon ergodic Markov Decision Process (MDP) problem to obtain a QoE-optimal solution based on the concept of request queues in wireless caching networks is proposed.
Abstract: Recently, unmanned aerial vehicle (UAV)-assisted wireless communication technology has been proposed to exploit the favorable propagation property and flexibility of air-to-ground channels to support content-centric caching and enhance wireless network capacity. In this article, we propose an online UAV-assisted wireless caching design via jointly optimizing UAV trajectory, transmission power and caching content scheduling. Specifically, we formulate the joint optimization of online UAV trajectory and caching content delivery as an infinite-horizon ergodic Markov Decision Process (MDP) problem to obtain a QoE-optimal solution based on the concept of request queues in wireless caching networks. By exploiting the fluid approximation approach, we first derive an optimal control policy from an approximated Bellman equation. Based on this, an actor-critic based online reinforcement learning algorithm is proposed to solve the problem. Finally, simulation results are provided to show that the proposed solution can achieve significant gain over the existing baselines.

Proceedings ArticleDOI
30 May 2020
TL;DR: Xuantie-910 is an industry leading 64-bit high performance embedded RISC-V processor from Alibaba T-Head division that features custom extensions to arithmetic operation, bit manipulation, load and store, TLB and cache operations, and implements the 0.7.1 stable release of RISCV vector extension specification for high efficiency vector processing.
Abstract: The open source RISC-V ISA has been quickly gaining momentum. This paper presents Xuantie-910, an industry leading 64-bit high performance embedded RISC-V processor from Alibaba T-Head division. It is fully based on the RV64GCV instruction set and it features custom extensions to arithmetic operation, bit manipulation, load and store, TLB and cache operations. It also implements the 0.7.1 stable release of RISC-V vector extension specification for high efficiency vector processing. Xuantie-910 supports multi-core multi-cluster SMP with cache coherence. Each cluster contains 1 to 4 core(s) capable of booting the Linux operating system. Each single core utilizes the state-of-the-art 12-stage deep pipeline, out-of-order, multi-issue superscalar architecture, achieving a maximum clock frequency of 2.5 GHz in the typical process, voltage and temperature condition in a TSMC 12nm FinFET process technology. Each single core with the vector execution unit costs an area of 0.8 mm2 (excluding the L2 cache). The toolchain is enhanced significantly to support the vector extension and custom extensions. Through hardware and toolchain co-optimization, to date Xuantie-910 delivers the highest performance (in terms of IPC, speed, and power efficiency) for a number of industrial control flow and data computing benchmarks, when compared with its predecessors in the RISC-V family. Xuantie-910 FPGA implementation has been deployed in the data centers of Alibaba Cloud, for application-specific acceleration (e.g., blockchain transaction). The ASIC deployment at low-cost SoC applications, such as IoT endpoints and edge computing, is planned to facilitate Alibaba's end-to-end and cloud-to-edge computing infrastructure.

Journal ArticleDOI
TL;DR: To reduce the traffic load of backhaul and transmission latency from the remote cloud, this study uses Q-learning to design the cache mechanism and propose an action selection strategy for the cache problem through reinforcement learning to find the appropriate cache state.

Journal ArticleDOI
TL;DR: A popularity-based caching mechanism in content delivery fog networks is proposed and a load-balancing algorithm is proposed to increase the overall system efficiency in the cached fog network.

Journal ArticleDOI
TL;DR: The proposed cooperative caching mechanism synthetically considers social relations among users, content popularity, and user's mobility so that it provides a better performance for content sharing in MSNs.