scispace - formally typeset
Search or ask a question

Showing papers on "Cache published in 2017"


Journal ArticleDOI
TL;DR: In this article, the problem of proactive deployment of cache-enabled unmanned aerial vehicles (UAVs) for optimizing the quality of experience (QoE) of wireless devices in a cloud radio access network is studied.
Abstract: In this paper, the problem of proactive deployment of cache-enabled unmanned aerial vehicles (UAVs) for optimizing the quality-of-experience (QoE) of wireless devices in a cloud radio access network is studied. In the considered model, the network can leverage human-centric information, such as users’ visited locations, requested contents, gender, job, and device type to predict the content request distribution, and mobility pattern of each user. Then, given these behavior predictions, the proposed approach seeks to find the user-UAV associations, the optimal UAVs’ locations, and the contents to cache at UAVs. This problem is formulated as an optimization problem whose goal is to maximize the users’ QoE while minimizing the transmit power used by the UAVs. To solve this problem, a novel algorithm based on the machine learning framework of conceptor-based echo state networks (ESNs) is proposed. Using ESNs, the network can effectively predict each user’s content request distribution and its mobility pattern when limited information on the states of users and the network is available. Based on the predictions of the users’ content request distribution and their mobility patterns, we derive the optimal locations of UAVs as well as the content to cache at UAVs. Simulation results using real pedestrian mobility patterns from BUPT and actual content transmission data from Youku show that the proposed algorithm can yield 33.3% and 59.6% gains, respectively, in terms of the average transmit power and the percentage of the users with satisfied QoE compared with a benchmark algorithm without caching and a benchmark solution without UAVs.

732 citations


Proceedings ArticleDOI
14 Oct 2017
TL;DR: This work presents NetCache, a new key-value store architecture that leverages the power and flexibility of new-generation programmable switches to handle queries on hot items and balance the load across storage nodes, and shows that it improves the throughput by 3-10x and reduces the latency of up to 40% of queries by 50%, for high-performance, in-memory key- value stores.
Abstract: We present NetCache, a new key-value store architecture that leverages the power and flexibility of new-generation programmable switches to handle queries on hot items and balance the load across storage nodes. NetCache provides high aggregate throughput and low latency even under highly-skewed and rapidly-changing workloads. The core of NetCache is a packet-processing pipeline that exploits the capabilities of modern programmable switch ASICs to efficiently detect, index, cache and serve hot key-value items in the switch data plane. Additionally, our solution guarantees cache coherence with minimal overhead. We implement a NetCache prototype on Barefoot Tofino switches and commodity servers and demonstrate that a single switch can process 2+ billion queries per second for 64K items with 16-byte keys and 128-byte values, while only consuming a small portion of its hardware resources. To the best of our knowledge, this is the first time that a sophisticated application-level functionality, such as in-network caching, has been shown to run at line rate on programmable switches. Furthermore, we show that NetCache improves the throughput by 3-10x and reduces the latency of up to 40% of queries by 50%, for high-performance, in-memory key-value stores.

437 citations


14 Aug 2017
TL;DR: In this article, the authors demonstrate the effectiveness of cache timing attacks against RSA and other cryptographic operations, such as genomic processing, and analyze countermeasures and show that none of the known defenses eliminates the attack.
Abstract: Intel SGX isolates the memory of security-critical applications from the untrusted OS. However, it has been speculated that SGX may be vulnerable to side-channel attacks through shared caches. We developed new cache attack techniques customized for SGX. Our attack differs from other SGX cache attacks in that it is easy to deploy and avoids known detection approaches. We demonstrate the effectiveness of our attack on two case studies: RSA decryption and genomic processing. While cache timing attacks against RSA and other cryptographic operations can be prevented by using appropriately hardened crypto libraries, the same cannot be easily done for other computations, such as genomic processing. Our second case study therefore shows that attacks on noncryptographic but privacy sensitive operations are a serious threat. We analyze countermeasures and show that none of the known defenses eliminates the attack.

343 citations


Journal ArticleDOI
TL;DR: In this paper, the authors considered a cluster-centric small cell network with combined design of cooperative caching and transmission policy, where small base stations (SBSs) are grouped into disjoint clusters, in which in-cluster cache space is utilized as an entity.
Abstract: Wireless content caching in small cell networks (SCNs) has recently been considered as an efficient way to reduce the data traffic and the energy consumption of the backhaul in emerging heterogeneous cellular networks. In this paper, we consider a cluster-centric SCN with combined design of cooperative caching and transmission policy. Small base stations (SBSs) are grouped into disjoint clusters, in which in-cluster cache space is utilized as an entity. We propose a combined caching scheme, where part of the cache space in each cluster is reserved for caching the most popular content in every SBS, while the remaining is used for cooperatively caching different partitions of the less popular content in different SBSs, as a means to increase local content diversity. Depending on the availability and placement of the requested content, coordinated multi-point technique with either joint transmission or parallel transmission is used to deliver content to the served user. Using Poisson point process for the SBS location distribution and a hexagonal grid model for the clusters, we provide analytical results on the successful content delivery probability of both transmission schemes for a user located at the cluster center. Our analysis shows an inherent tradeoff between transmission diversity and content diversity in our cooperation design. We also study the optimal cache space assignment for two objective functions: maximization of the cache service performance and the energy efficiency. Simulation results show that the proposed scheme achieves performance gain by leveraging cache-level and signal-level cooperation and adapting to the network environment and user quality-of-service requirements.

333 citations


Book ChapterDOI
06 Jul 2017
TL;DR: Intel SGX provides a mechanism that addresses this scenario and aims at protecting user-level software from attacks from other processes, the operating system, and even physical attackers.
Abstract: In modern computer systems, user processes are isolated from each other by the operating system and the hardware. Additionally, in a cloud scenario it is crucial that the hypervisor isolates tenants from other tenants that are co-located on the same physical machine. However, the hypervisor does not protect tenants against the cloud provider and thus the supplied operating system and hardware. Intel SGX provides a mechanism that addresses this scenario. It aims at protecting user-level software from attacks from other processes, the operating system, and even physical attackers.

327 citations


Journal ArticleDOI
TL;DR: In this paper, a novel abstract interpretation is proposed to determine whether a particular instruction may cause a hit and a miss on different paths, and an exact analysis is used to remove all remaining uncertainty, based on model checking.
Abstract: Static cache analysis characterizes a program's cache behavior by determining in a sound but approximate manner which memory accesses result in cache hits and which result in cache misses. Such information is valuable in optimizing compilers, worst-case execution time analysis, and side-channel attack quantification and mitigation.Cache analysis is usually performed as a combination of `must' and `may' abstract interpretations, classifying instructions as either `always hit', `always miss', or `unknown'. Instructions classified as `unknown' might result in a hit or a miss depending on program inputs or the initial cache state. It is equally possible that they do in fact always hit or always miss, but the cache analysis is too coarse to see it.Our approach to eliminate this uncertainty consists in (i) a novel abstract interpretation able to ascertain that a particular instruction may definitely cause a hit and a miss on different paths, and (ii) an exact analysis, removing all remaining uncertainty, based on model checking, using abstract-interpretation results to prune down the model for scalability.We evaluated our approach on a variety of examples; it notably improves precision upon classical abstract interpretation at reasonable cost.

325 citations


Book ChapterDOI
25 Sep 2017
TL;DR: CacheZoom as discussed by the authors is able to track all memory accesses of SGX enclaves with high spatial and temporal precision, and it can recover AES keys from T-table based implementations with as few as ten measurements.
Abstract: In modern computing environments, hardware resources are commonly shared, and parallel computation is widely used. Parallel tasks can cause privacy and security problems if proper isolation is not enforced. Intel proposed SGX to create a trusted execution environment within the processor. SGX relies on the hardware, and claims runtime protection even if the OS and other software components are malicious. However, SGX disregards side-channel attacks. We introduce a powerful cache side-channel attack that provides system adversaries a high resolution channel. Our attack tool named CacheZoom is able to virtually track all memory accesses of SGX enclaves with high spatial and temporal precision. As proof of concept, we demonstrate AES key recovery attacks on commonly used implementations including those that were believed to be resistant in previous scenarios. Our results show that SGX cannot protect critical data sensitive computations, and efficient AES key recovery is possible in a practical environment. In contrast to previous works which require hundreds of measurements, this is the first cache side-channel attack on a real system that can recover AES keys with a minimal number of measurements. We can successfully recover AES keys from T-Table based implementations with as few as ten measurements.

241 citations


Posted Content
TL;DR: A new cache attack techniques customized for SGX, which demonstrates the effectiveness of the attack on two case studies: RSA decryption and genomic processing and shows that attacks on noncryptographic but privacy sensitive operations are a serious threat.
Abstract: Side-channel information leakage is a known limitation of SGX. Researchers have demonstrated that secret-dependent information can be extracted from enclave execution through page-fault access patterns. Consequently, various recent research efforts are actively seeking countermeasures to SGX side-channel attacks. It is widely assumed that SGX may be vulnerable to other side channels, such as cache access pattern monitoring, as well. However, prior to our work, the practicality and the extent of such information leakage was not studied. In this paper we demonstrate that cache-based attacks are indeed a serious threat to the confidentiality of SGX-protected programs. Our goal was to design an attack that is hard to mitigate using known defenses, and therefore we mount our attack without interrupting enclave execution. This approach has major technical challenges, since the existing cache monitoring techniques experience significant noise if the victim process is not interrupted. We designed and implemented novel attack techniques to reduce this noise by leveraging the capabilities of the privileged adversary. Our attacks are able to recover confidential information from SGX enclaves, which we illustrate in two example cases: extraction of an entire RSA-2048 key during RSA decryption, and detection of specific human genome sequences during genomic indexing. We show that our attacks are more effective than previous cache attacks and harder to mitigate than previous SGX side-channel attacks.

233 citations


Proceedings Article
01 Feb 2017
TL;DR: This paper presents the Compute Cache architecture that enables in-place computation in caches, which uses emerging bit-line SRAM circuit technology to repurpose existing cache elements and transforms them into active very large vector computational units.
Abstract: This paper presents the Compute Cache architecturethat enables in-place computation in caches. ComputeCaches uses emerging bit-line SRAM circuit technology to repurpose existing cache elements and transforms them into active very large vector computational units. Also, it significantlyreduces the overheads in moving data between different levelsin the cache hierarchy. Solutions to satisfy new constraints imposed by ComputeCaches such as operand locality are discussed. Also discussedare simple solutions to problems in integrating them into aconventional cache hierarchy while preserving properties suchas coherence, consistency, and reliability. Compute Caches increase performance by 1.9× and reduceenergy by 2.4× for a suite of data-centric applications, includingtext and database query processing, cryptographic kernels, and in-memory checkpointing. Applications with larger fractionof Compute Cache operations could benefit even more, asour micro-benchmarks indicate (54× throughput, 9× dynamicenergy savings).

225 citations


Journal ArticleDOI
TL;DR: A fundamentally different approach is needed, in which the cache contents are used as side information for coded communication over the shared link, and it is proposed and proved that it is close to optimal.
Abstract: We consider a network consisting of a file server connected through a shared link to a number of users, each equipped with a cache. Knowing the popularity distribution of the files, the goal is to optimally populate the caches, such as to minimize the expected load of the shared link. For a single cache, it is well known that storing the most popular files is optimal in this setting. However, we show here that this is no longer the case for multiple caches. Indeed, caching only the most popular files can be highly suboptimal. Instead, a fundamentally different approach is needed, in which the cache contents are used as side information for coded communication over the shared link. We propose such a coded caching scheme and prove that it is close to optimal.

224 citations


Journal ArticleDOI
TL;DR: A particular pattern for cache placement that maximizes the overall gains of cache-aided transmit and receive interference cancellations is developed and presented, leading to an upper bound on the linear one-shot sum- doF of the network, which is within a factor of 2 of the achievable sum-DoF.
Abstract: We consider a system, comprising a library of $N$ files (e.g., movies) and a wireless network with a $K_{T}$ transmitters, each equipped with a local cache of size of $M_{T}$ files and a $K_{R}$ receivers, each equipped with a local cache of size of $M_{R}$ files. Each receiver will ask for one of the $N$ files in the library, which needs to be delivered. The objective is to design the cache placement (without prior knowledge of receivers’ future requests) and the communication scheme to maximize the throughput of the delivery. In this setting, we show that the sum degrees-of-freedom (sum-DoF) of $\min \left \{{\frac {K_{T} M_{T}+K_{R} M_{R}}{N},K_{R}}\right \}$ is achievable, and this is within a factor of 2 of the optimum, under uncoded prefetching and one-shot linear delivery schemes. This result shows that (i) the one-shot sum-DoF scales linearly with the aggregate cache size in the network (i.e., the cumulative memory available at all nodes ), (ii) the transmitters’ caches and receivers’ caches contribute equally in the one-shot sum-DoF, and (iii) caching can offer a throughput gain that scales linearly with the size of the network. To prove the result, we propose an achievable scheme that exploits the redundancy of the content at transmitter’s caches to cooperatively zero-force some outgoing interference, and availability of the unintended content at the receiver’s caches to cancel (subtract) some of the incoming interference. We develop a particular pattern for cache placement that maximizes the overall gains of cache-aided transmit and receive interference cancellations. For the converse, we present an integer optimization problem which minimizes the number of communication blocks needed to deliver any set of requested files to the receivers. We then provide a lower bound on the value of this optimization problem, hence leading to an upper bound on the linear one-shot sum-DoF of the network, which is within a factor of 2 of the achievable sum-DoF.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: It is shown that ASLR is fundamentally flawed in sandboxed environments such as JavaScript and future defenses should not rely on randomized virtual addresses as a building block, making ASLR and caching conflicting requirements (ASLR⊕Cache).
Abstract: Address space layout randomization (ASLR) is an important first line of defense against memory corruption attacks and a building block for many modern countermeasures. Existing attacks against ASLR rely on software vulnerabilities and/or on repeated (and detectable) memory probing. In this paper, we show that neither is a hard requirement and that ASLR is fundamentally insecure on modern cachebased architectures, making ASLR and caching conflicting requirements (ASLR⊕Cache, or simply AnC). To support this claim, we describe a new EVICT+TIME cache attack on the virtual address translation performed by the memory management unit (MMU) of modern processors. Our AnC attack relies on the property that the MMU’s page-table walks result in caching page-table pages in the shared last-level cache (LLC). As a result, an attacker can derandomize virtual addresses of a victim’s code and data by locating the cache lines that store the page-table entries used for address translation. Relying only on basic memory accesses allows AnC to be implemented in JavaScript without any specific instructions or software features. We show our JavaScript implementation can break code and heap ASLR in two major browsers running on the latest Linux operating system with 28 bits of entropy in 150 seconds. We further verify that the AnC attack is applicable to every modern architecture that we tried, including Intel, ARM and AMD. Mitigating this attack without naively disabling caches is hard, since it targets the low-level operations of the MMU. We conclude that ASLR is fundamentally flawed in sandboxed environments such as JavaScript and future defenses should not rely on randomized virtual addresses as a building block.

Journal ArticleDOI
TL;DR: In this article, the authors considered the canonical shared link caching network and provided a comprehensive characterization of the order-optimal rate for all regimes of the system parameters, as well as an explicit placement and delivery scheme achieving orderoptimal rates.
Abstract: We consider the canonical shared link caching network formed by a source node, hosting a library of $m$ information messages (files), connected via a noiseless multicast link to $n$ user nodes, each equipped with a cache of size $M$ files. Users request files independently at random according to an a-priori known demand distribution q. A coding scheme for this network consists of two phases: cache placement and delivery. The cache placement is a mapping of the library files onto the user caches that can be optimized as a function of the demand statistics, but is agnostic of the actual demand realization. After the user demands are revealed, during the delivery phase the source sends a codeword (function of the library files, cache placement, and demands) to the users, such that each user retrieves its requested file with arbitrarily high probability. The goal is to minimize the average transmission length of the delivery phase, referred to as rate (expressed in channel symbols per file). In the case of deterministic demands, the optimal min-max rate has been characterized within a constant multiplicative factor, independent of the network parameters. The case of random demands was previously addressed by applying the order-optimal min-max scheme separately within groups of files requested with similar probability. However, no complete characterization of order-optimality was previously provided for random demands under the average rate performance criterion. In this paper, we consider the random demand setting and, for the special yet relevant case of a Zipf demand distribution, we provide a comprehensive characterization of the order-optimal rate for all regimes of the system parameters, as well as an explicit placement and delivery scheme achieving order-optimal rates. We present also numerical results that confirm the superiority of our scheme with respect to previously proposed schemes for the same setting.

Journal ArticleDOI
TL;DR: In this article, a probabilistic caching placement in stochastic wireless D2D caching networks is proposed to improve the cache hit probability in cache-enabled wireless networks, and the optimal caching probabilities obtained by cache-aided throughput optimization are derived for dense user environments.
Abstract: Departing from the conventional cache hit optimization in cache-enabled wireless networks, we consider an alternative optimization approach for the probabilistic caching placement in stochastic wireless D2D caching networks taking into account the reliability of D2D transmissions. Using tools from stochastic geometry, we provide a closed-form approximation of cache-aided throughput, which measures the density of successfully served requests by local device caches, and we obtain the optimal caching probabilities via numerical optimization. Compared with the cache-hit-optimal case, the optimal caching probabilities obtained by cache-aided throughput optimization show notable gain in terms of the density of successfully served user requests, particularly in dense user environments.

Proceedings Article
16 Aug 2017
TL;DR: Cloak, a new technique that uses hardware transactional memory to prevent adversarial observation of cache misses on sensitive code and data, provides strong protection against all known cache-based side-channel attacks with low performance overhead.
Abstract: Cache-based side-channel attacks are a serious problem in multi-tenant environments, for example, modern cloud data centers. We address this problem with Cloak, a new technique that uses hardware transactional memory to prevent adversarial observation of cache misses on sensitive code and data. We show that Cloak provides strong protection against all known cache-based side-channel attacks with low performance overhead. We demonstrate the efficacy of our approach by retrofitting vulnerable code with Cloak and experimentally confirming immunity against state-of-the-art attacks. We also show that by applying Cloak to code running inside Intel SGX enclaves we can effectively block information leakage through cache side channels from enclaves, thus addressing one of the main weaknesses of SGX.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a context-aware proactive caching algorithm, which learns context-specific content popularity online by regularly observing context information of connected users, updating the cache content and observing cache hits subsequently.
Abstract: Content caching in small base stations or wireless infostations is considered to be a suitable approach to improve the efficiency in wireless content delivery. Placing the optimal content into local caches is crucial due to storage limitations, but it requires knowledge about the content popularity distribution, which is often not available in advance. Moreover, local content popularity is subject to fluctuations, since mobile users with different interests connect to the caching entity over time. Which content a user prefers may depend on the user’s context. In this paper, we propose a novel algorithm for context-aware proactive caching. The algorithm learns context-specific content popularity online by regularly observing context information of connected users, updating the cache content and observing cache hits subsequently. We derive a sublinear regret bound, which characterizes the learning speed and proves that our algorithm converges to the optimal cache content placement strategy in terms of maximizing the number of cache hits. Furthermore, our algorithm supports service differentiation by allowing operators of caching entities to prioritize customer groups. Our numerical results confirm that our algorithm outperforms state-of-the-art algorithms in a real world data set, with an increase in the number of cache hits of at least 14%.

Journal ArticleDOI
TL;DR: This is the first study to reveal the cache properties of Kepler and Maxwell GPUs, and the superiority of Maxwell in shared memory performance under bank conflict.
Abstract: Memory access efficiency is a key factor in fully utilizing the computational power of graphics processing units (GPUs). However, many details of the GPU memory hierarchy are not released by GPU vendors. In this paper, we propose a novel fine-grained microbenchmarking approach and apply it to three generations of NVIDIA GPUs, namely Fermi, Kepler, and Maxwell, to expose the previously unknown characteristics of their memory hierarchies. Specifically, we investigate the structures of different GPU cache systems, such as the data cache, the texture cache and the translation look-aside buffer (TLB). We also investigate the throughput and access latency of GPU global memory and shared memory. Our microbenchmark results offer a better understanding of the mysterious GPU memory hierarchy, which will facilitate the software optimization and modelling of GPU architectures. To the best of our knowledge, this is the first study to reveal the cache properties of Kepler and Maxwell GPUs, and the superiority of Maxwell in shared memory performance under bank conflict.

Book ChapterDOI
24 Jul 2017
TL;DR: This paper presents static cache analysis, which characterizes a program’s cache behavior by determining in a sound but approximate manner which memory accesses result in cache hits and which results in cache misses.
Abstract: Static cache analysis characterizes a program’s cache behavior by determining in a sound but approximate manner which memory accesses result in cache hits and which result in cache misses. Such information is valuable in optimizing compilers, worst-case execution time analysis, and side-channel attack quantification and mitigation.

Proceedings ArticleDOI
21 Jun 2017
TL;DR: In this article, the capacity of cache-enabled private information retrieval (PIR) was characterized as a function of the storage parameter S and the information-theoretically optimal download cost was shown to be (1 − S/K) where S ∊ (0, K).
Abstract: The problem of cache enabled private information retrieval (PIR) is considered in which a user wishes to privately retrieve one out of K messages, each of size L bits from N distributed databases. The user has a local cache of storage SL bits which can be used to store any function of the K messages. The main contribution of this work is the exact characterization of the capacity of cache enabled PIR as a function of the storage parameter S. In particular, for a given cache storage parameter S, the information-theoretically optimal download cost D∗(S)/L (or the inverse of capacity) is shown to be equal to (1 − S/K)(1 + 1/N + … + 1/NK−1). Special cases of this result correspond to the settings when S = 0, for which the optimal download cost was shown by Sun and Jafar to be (1 + 1/N + … + 1/NK−1), and the case when S = K, i.e., cache size is large enough to store all messages locally, for which the optimal download cost is 0. The intermediate points S ∊ (0, K) can be readily achieved through a simple memory-sharing based PIR scheme. The key technical contribution of this work is the converse, i.e., a lower bound on the download cost as a function of storage S which shows that memory sharing is information-theoretically optimal.

Journal ArticleDOI
TL;DR: In this article, the authors investigated the fundamental limits of a high signal-to-noise-ratio metric, termed normalized delivery time (NDT), which captures the worst case coding latency for delivering any requested content to the users.
Abstract: A fog-aided wireless network architecture is studied in which edge nodes (ENs), such as base stations, are connected to a cloud processor via dedicated fronthaul links while also being endowed with caches. Cloud processing enables the centralized implementation of cooperative transmission strategies at the ENs, albeit at the cost of an increased latency due to fronthaul transfer. In contrast, the proactive caching of popular content at the ENs allows for the low-latency delivery of the cached files, but with generally limited opportunities for cooperative transmission among the ENs. The interplay between cloud processing and edge caching is addressed from an information-theoretic viewpoint by investigating the fundamental limits of a high signal-to-noise-ratio metric, termed normalized delivery time (NDT), which captures the worst case coding latency for delivering any requested content to the users. The NDT is defined under the assumptions of either serial or pipelined fronthaul-edge transmission, and is studied as a function of fronthaul and cache capacity constraints. Placement and delivery strategies across both fronthaul and wireless, or edge, segments are proposed with the aim of minimizing the NDT. Information-theoretic lower bounds on the NDT are also derived. Achievability arguments and lower bounds are leveraged to characterize the minimal NDT in a number of important special cases, including systems with no caching capabilities, as well as to prove that the proposed schemes achieve optimality within a constant multiplicative factor of 2 for all values of the problem parameters.

Proceedings ArticleDOI
25 Jun 2017
TL;DR: This work considers a system where a local cache maintains a collection of N dynamic content items that are randomly requested by local users and shows that an asymptotically optimal policy updates a cached item in proportion to the square root of the item's popularity.
Abstract: We consider a system where a local cache maintains a collection of N dynamic content items that are randomly requested by local users. A capacity-constrained link to a remote network server limits the ability of the cache to hold the latest version of each item at all times, making it necessary to design an update policy. Using an age of information metric, we show under a relaxed problem formulation that an asymptotically optimal policy updates a cached item in proportion to the square root of the item's popularity. We then show experimentally that a physically realizable policy closely approximates the asymptotic optimal policy.

Journal ArticleDOI
TL;DR: Simulation results prove that the caching placement on SBS and on mobile devices leveraging user mobility is more efficient than other existing caching strategies in terms of both cache hit ratio and energy efficiency.
Abstract: With the drastic increase of mobile devices, there are more and more mobile traffic and repeated requests for content. In 5G networks, small cell base stations (SBSs) caching and caching in wireless device-to-device network can effectively decrease the mobile traffic during peak hours. Currently, most of the related work is focused on how to cache content on SBSs and on mobile devices, and it is assumed that the user can download the entire requested content through the connected SBSs and mobile devices. However, few works have taken user mobility and the randomness of contact duration into consideration. How to improve the caching strategy by exploiting user mobility is still a challenging problem. Thus, in this paper, we first investigate the problem of how to conduct caching placement on SBS and on mobile devices leveraging user mobility, aiming to maximize the cache hit ratio. Specifically, the caching placement on SBSs and on mobile devices is formulated as an integer programming problem, and submodular optimization is adopted to solve the formulated problem. Then, we give the optimal transmission power of SBSs and mobile devices to deliver the caching content in order to reduce the energy cost. Simulation results prove that our caching strategy is more efficient than other existing caching strategies in terms of both cache hit ratio and energy efficiency.

Proceedings ArticleDOI
24 Jun 2017
TL;DR: The DMGC model is introduced, the first conceptualization of the parameter space that exists when implementing low-precision SGD, and it is shown that it provides a way to both classify these algorithms and model their performance.
Abstract: Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in machine learning and other domains. Since this is likely to continue for the foreseeable future, it is important to study techniques that can make it run fast on parallel hardware. In this paper, we provide the first analysis of a technique called Buck-wild! that uses both asynchronous execution and low-precision computation. We introduce the DMGC model, the first conceptualization of the parameter space that exists when implementing low-precision SGD, and show that it provides a way to both classify these algorithms and model their performance. We leverage this insight to propose and analyze techniques to improve the speed of low-precision SGD. First, we propose software optimizations that can increase throughput on existing CPUs by up to 11X. Second, we propose architectural changes, including a new cache technique we call an obstinate cache, that increase throughput beyond the limits of current-generation hardware. We also implement and analyze low-precision SGD on the FPGA, which is a promising alternative to the CPU for future SGD systems.

Journal ArticleDOI
TL;DR: It is revealed that with caching at both transmitter and receiver sides, the network can benefit simultaneously from traffic load reduction and transmission rate enhancement, thereby effectively reducing the content delivery latency.
Abstract: This paper studies the fundamental tradeoff between storage and latency in a general wireless interference network with caches equipped at all transmitters and receivers. The tradeoff is characterized by an information-theoretic metric, normalized delivery time (NDT), which is the worst case delivery time of the actual traffic load at a transmission rate specified by degrees of freedom of a given channel. We obtain both an achievable upper bound and a theoretical lower bound of the minimum NDT for any number of transmitters, any number of receivers, and any feasible cache size tuple. We show that the achievable NDT is exactly optimal in certain cache size regions, and is within a bounded multiplicative gap to the theoretical lower bound in other regions. In the achievability analysis, we first propose a novel cooperative transmitter/receiver coded caching strategy. It offers the freedom to adjust file splitting ratios for NDT minimization. We then propose a delivery strategy that transforms the considered interference network into a new class of cooperative X-multicast channels. It leverages local caching gain, coded multicasting gain, and transmitter cooperation gain (via interference alignment and interference neutralization) opportunistically. Finally, the achievable NDT is obtained by solving a linear programming problem. This paper reveals that with caching at both transmitter and receiver sides, the network can benefit simultaneously from traffic load reduction and transmission rate enhancement, thereby effectively reducing the content delivery latency.

Journal ArticleDOI
TL;DR: In this paper, an algorithm that combines the machine learning framework of echo state networks (ESNs) with sublinear algorithms is proposed to predict each user's content request distribution and mobility pattern while having only limited information on the network's and user's state.
Abstract: In this paper, the problem of proactive caching is studied for cloud radio access networks (CRANs). In the studied model, the baseband units (BBUs) can predict the content request distribution and mobility pattern of each user and determine which content to cache at remote radio heads and the BBUs. This problem is formulated as an optimization problem, which jointly incorporates backhaul and fronthaul loads and content caching. To solve this problem, an algorithm that combines the machine learning framework of echo state networks (ESNs) with sublinear algorithms is proposed. Using ESNs, the BBUs can predict each user’s content request distribution and mobility pattern while having only limited information on the network’s and user’s state. In order to predict each user’s periodic mobility pattern with minimal complexity, the memory capacity of the corresponding ESN is derived for a periodic input. This memory capacity is shown to capture the maximum amount of user information needed for the proposed ESN model. Then, a sublinear algorithm is proposed to determine which content to cache while using limited content request distribution samples. Simulation results using real data from Youku and the Beijing University of Posts and Telecommunications show that the proposed approach yields significant gains, in terms of sum effective capacity, that reach up to 27.8% and 30.7%, respectively, compared with two baseline algorithms: random caching with clustering and random caching without clustering.

Journal ArticleDOI
TL;DR: This paper proposes a hybrid caching design consisting of identical caches in the macro-tier and random caching in the pico-tier, and a corresponding multicasting design, and achieves better performance in the general region than any asymptotically optimal solution, under a mild condition.
Abstract: Heterogeneous wireless networks (HetNets) provide a powerful approach to meeting the dramatic mobile traffic growth, but also impose a significant challenge on backhaul. Caching and multicasting at macro and pico base stations (BSs) are two promising methods to support massive content delivery and reduce backhaul load in HetNets. In this paper, we jointly consider caching and multicasting in a large-scale cache-enabled HetNet with backhaul constraints. We propose a hybrid caching design consisting of identical caching in the macro-tier and random caching in the pico-tier, and a corresponding multicasting design. By carefully handling different types of interferers and adopting appropriate approximations, we derive tractable expressions for the successful transmission probability in the general signal-to-noise ratio (SNR) and user density region as well as the high SNR and user density region, utilizing tools from stochastic geometry. Then, we consider the successful transmission probability maximization by optimizing design parameters, which is a very challenging mixed discrete-continuous optimization problem. By exploring structural properties, we obtain a near optimal solution with superior performance and manageable complexity. This solution achieves better performance in the general region than any asymptotically optimal solution, under a mild condition. The analysis and optimization results provide valuable design insights for practical cache-enabled HetNets.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: This paper provides the first comprehensive characterization of noise on cache covert channels due to cache activity and interrupts, and builds the first robust and errorfree covert channel based on established techniques from wireless transmission protocols, adapted for use in microarchitectural attacks.
Abstract: Covert channels evade isolation mechanisms between multiple parties in the cloud. Especially cache covert channels allow the transmission of several hundred kilobits per second between unprivileged user programs in separate virtual machines. However, caches are small and shared and thus cache-based communication is susceptible to noise from any system activity and interrupts. The feasibility of a reliable cache covert channel under a severe noise scenario has not been demonstrated yet. Instead, previous work relies on either of the two contradicting assumptions: the assumption of direct applicability of error-correcting codes, or the assumption that noise effectively prevents covert channels. In this paper, we show that both assumptions are wrong. First, error-correcting codes cannot be applied directly, due to the noise characteristics. Second, even with extraordinarily high system activity, we demonstrate an error-free and highthroughput covert channel. We provide the first comprehensive characterization of noise on cache covert channels due to cache activity and interrupts. We build the first robust covert channel based on established techniques from wireless transmission protocols, adapted for our use in microarchitectural attacks. Our errorcorrecting and error-handling high-throughput covert channel can sustain transmission rates of more than 45 KBps on Amazon EC2, which is 3 orders of magnitude higher than previous covert channels demonstrated on Amazon EC2. Our robust and errorfree channel even allows us to build an SSH connection between two virtual machines, where all existing covert channels fail.

Journal ArticleDOI
TL;DR: In this article, a mobility-aware caching placement strategy is proposed to maximize the data offloading ratio, which is defined as the percentage of the requested data that can be delivered via D2D links rather than through base stations.
Abstract: Caching at mobile devices can facilitate device-to-device (D2D) communications, which may significantly improve spectrum efficiency and alleviate the heavy burden on backhaul links. However, most previous works ignored user mobility, thus having limited practical applications. In this paper, we take advantage of the user mobility pattern by the inter-contact times between different users, and propose a mobility-aware caching placement strategy to maximize the data offloading ratio , which is defined as the percentage of the requested data that can be delivered via D2D links rather than through base stations. Given the NP-hard caching placement problem, we first propose an optimal dynamic programming algorithm to obtain a performance benchmark with much lower complexity than exhaustive search. We then prove that the problem falls in the category of monotone submodular maximization over a matroid constraint, and propose a time-efficient greedy algorithm, which achieves an approximation ratio as $\frac {1}{2}$ . Simulation results with real-life data sets will validate the effectiveness of our proposed mobility-aware caching placement strategy. We observe that users moving at either a very low or very high speed should cache the most popular files, while users moving at a medium speed should cache less popular files to avoid duplication.

Journal ArticleDOI
TL;DR: This paper identifies the optimal cache-aided degrees-of-freedom (DoF) within a factor of 4, by identifying near-optimal schemes that exploit a new synergy between coded caching and delayed CSIT, as well as by exploiting the unexplored interplay between caching and feedback-quality.
Abstract: Building on the recent coded-caching breakthrough by Maddah-Ali and Niesen, the work here considers the $K$ -user cache-aided wireless multi-antenna symmetric broadcast channel with random fading and imperfect feedback, and analyzes the throughput performance as a function of feedback statistics and cache size. In this setting, this paper identifies the optimal cache-aided degrees-of-freedom (DoF) within a factor of 4, by identifying near-optimal schemes that exploit a new synergy between coded caching and delayed CSIT, as well as by exploiting the unexplored interplay between caching and feedback-quality. The DoF expressions reveal an initial gain due to current CSIT, and an additional gain due to coded caching, which is exponential in the sense that any linear decrease in the required DoF performance, allows for an exponential reduction in the required cache size. In the end, this paper reveals three new aspects of caching: a synergy between memory and delayed feedback, a tradeoff between memory and current CSIT, and a powerful ability to provide cache-aided feedback savings.

Journal ArticleDOI
TL;DR: A cache timing attack against the scatter-gather implementation used in the modular exponentiation routine in OpenSSL version 1.0.2f, which can fully recover the private key after observing 16,000 decryptions.
Abstract: The scatter-gather technique is a commonly implemented approach to prevent cache-based timing attacks. In this paper we show that scatter-gather is not constant time. We implement a cache timing attack against the scatter-gather implementation used in the modular exponentiation routine in OpenSSL version 1.0.2f. Our attack exploits cache-bank conflicts on the Sandy Bridge microarchitecture. We have tested the attack on an Intel Xeon E5-2430 processor. For 4096-bit RSA our attack can fully recover the private key after observing 16,000 decryptions.