scispace - formally typeset
Search or ask a question

Showing papers on "Cache published in 2019"


Proceedings ArticleDOI
19 May 2019
TL;DR: Spectre as mentioned in this paper is a side channel attack that can leak the victim's confidential information via side channel to the adversary. And it can read arbitrary memory from a victim's process.
Abstract: Modern processors use branch prediction and speculative execution to maximize performance. For example, if the destination of a branch depends on a memory value that is in the process of being read, CPUs will try to guess the destination and attempt to execute ahead. When the memory value finally arrives, the CPU either discards or commits the speculative computation. Speculative logic is unfaithful in how it executes, can access the victim's memory and registers, and can perform operations with measurable side effects. Spectre attacks involve inducing a victim to speculatively perform operations that would not occur during correct program execution and which leak the victim's confidential information via a side channel to the adversary. This paper describes practical attacks that combine methodology from side channel attacks, fault attacks, and return-oriented programming that can read arbitrary memory from the victim's process. More broadly, the paper shows that speculative execution implementations violate the security assumptions underpinning numerous software security mechanisms, including operating system process separation, containerization, just-in-time (JIT) compilation, and countermeasures to cache timing and side-channel attacks. These attacks represent a serious threat to actual systems since vulnerable speculative execution capabilities are found in microprocessors from Intel, AMD, and ARM that are used in billions of devices. While makeshift processor-specific countermeasures are possible in some cases, sound solutions will require fixes to processor designs as well as updates to instruction set architectures (ISAs) to give hardware architects and software developers a common understanding as to what computation state CPU implementations are (and are not) permitted to leak.

1,317 citations


Posted Content
TL;DR: This work comprises the first in-depth, scholarly, performance review of Intel's Optane DC PMM, exploring its capabilities as a main memory device, and as persistent, byte-addressable memory exposed to user-space applications.
Abstract: Scalable nonvolatile memory DIMMs will finally be commercially available with the release of the Intel Optane DC Persistent Memory Module (or just "Optane DC PMM"). This new nonvolatile DIMM supports byte-granularity accesses with access times on the order of DRAM, while also providing data storage that survives power outages. This work comprises the first in-depth, scholarly, performance review of Intel's Optane DC PMM, exploring its capabilities as a main memory device, and as persistent, byte-addressable memory exposed to user-space applications. This report details the technologies performance under a number of modes and scenarios, and across a wide variety of macro-scale benchmarks. Optane DC PMMs can be used as large memory devices with a DRAM cache to hide their lower bandwidth and higher latency. When used in this Memory (or cached) mode, Optane DC memory has little impact on applications with small memory footprints. Applications with larger memory footprints may experience some slow-down relative to DRAM, but are now able to keep much more data in memory. When used under a file system, Optane DC PMMs can result in significant performance gains, especially when the file system is optimized to use the load/store interface of the Optane DC PMM and the application uses many small, persistent writes. For instance, using the NOVA-relaxed NVMM file system, we can improve the performance of Kyoto Cabinet by almost 2x. Optane DC PMMs can also enable user-space persistence where the application explicitly controls its writes into persistent Optane DC media. In our experiments, modified applications that used user-space Optane DC persistence generally outperformed their file system counterparts. For instance, the persistent version of RocksDB performed almost 2x faster than the equivalent program utilizing an NVMM-aware file system.

346 citations


Journal ArticleDOI
TL;DR: A deep reinforcement learning (DRL)-based joint mode selection and resource management approach is proposed, aiming at minimizing long-term system power consumption under the dynamics of edge cache states and transfer learning is integrated with DRL to accelerate learning process.
Abstract: Fog radio access networks (F-RANs) are seen as potential architectures to support services of Internet of Things by leveraging edge caching and edge computing. However, current works studying resource management in F-RANs mainly consider a static system with only one communication mode. Given network dynamics, resource diversity, and the coupling of resource management with mode selection, resource management in F-RANs becomes very challenging. Motivated by the recent development of artificial intelligence, a deep reinforcement learning (DRL)-based joint mode selection and resource management approach is proposed. Each user equipment (UE) can operate either in cloud RAN (C-RAN) mode or in device-to-device mode, and the resource managed includes both radio resource and computing resource. The core idea is that the network controller makes intelligent decisions on UE communication modes and processors’ on–off states with precoding for UEs in C-RAN mode optimized subsequently, aiming at minimizing long-term system power consumption under the dynamics of edge cache states. By simulations, the impacts of several parameters, such as learning rate and edge caching service capability, on system performance are demonstrated, and meanwhile the proposal is compared with other different schemes to show its effectiveness. Moreover, transfer learning is integrated with DRL to accelerate learning process.

194 citations


Journal ArticleDOI
TL;DR: This paper presents a detailed and in-depth discussion on the caching process, which can be delineated into four phases including content request, exploration, delivery, and update and identifies different issues and review related works in addressing these issues.
Abstract: With the widespread adoption of various mobile applications, the amount of traffic in wireless networks is growing at an exponential rate, which exerts a great burden on mobile core networks and backhaul links. Mobile edge caching, which enables mobile edges with cache storages, is a promising solution to alleviate this problem. In this paper, we aim to review the state-of-the-art of mobile edge caching. We first present an overview of mobile edge caching and its advantages. We then discuss the locations where mobile edge caching can be realized in the network. We also analyze different caching criteria and their respective effects on the caching performances. Moreover, we compare several caching schemes and discuss their pros and cons. We further present a detailed and in-depth discussion on the caching process, which can be delineated into four phases including content request, exploration, delivery, and update. For each phase, we identify different issues and review related works in addressing these issues. Finally, we present a number of challenges faced by current mobile edge caching architectures and techniques for further studies.

158 citations


Journal ArticleDOI
TL;DR: A novel MEC-based mobile VR delivery framework that is able to cache parts of the field of views (FOVs) in advance and compute certain post-processing procedures on demand at the mobile VR device is presented.
Abstract: Virtual reality (VR) over wireless is emerging as an important use case of 5G networks. Fully-immersive VR experience requires the wireless delivery of huge data at ultra-low latency, thus leading to ultra-high transmission rate requirement for wireless communications. This challenge can be largely addressed by the recent network architecture known as mobile edge computing (MEC) network, which enables caching and computing capabilities at the edge of wireless networks. This paper presents a novel MEC-based mobile VR delivery framework that is able to cache parts of the field of views (FOVs) in advance and compute certain post-processing procedures on demand at the mobile VR device. To minimize the average required transmission rate, we formulate the joint caching and computing optimization problem to determine which FOVs to cache, whether to cache them in 2D or 3D as well as which FOVs to compute at the mobile device under cache size, average power consumption as well as latency constraints. When FOVs are homogeneous, we obtain a closed-form expression for the optimal joint policy which reveals interesting communications-caching-computing tradeoffs. When FOVs are heterogeneous, we obtain a local optima of the problem by transforming it into a linearly constrained indefinite quadratic problem and then applying concave convex procedure. Numerical results demonstrate the proposed mobile VR delivery framework can significantly reduce communication bandwidth while meeting low latency requirement.

153 citations


Proceedings ArticleDOI
06 Nov 2019
TL;DR: SmoTherSpectre is introduced, a speculative code-reuse attack that leverages port-contention in simultaneously multi-threaded processors (SMoTher) as a side channel to leak information from a victim process.
Abstract: Spectre, Meltdown, and related attacks have demonstrated that kernels, hypervisors, trusted execution environments, and browsers are prone to information disclosure through micro-architectural weaknesses. However, it remains unclear as to what extent other applications, in particular those that do not load attacker-provided code, may be impacted. It also remains unclear as to what extent these attacks are reliant on cache-based side channels. We introduce SMoTherSpectre, a speculative code-reuse attack that leverages port-contention in simultaneously multi-threaded processors (SMoTher) as a side channel to leak information from a victim process. SMoTher is a fine-grained side channel that detects contention based on a single victim instruction. To discover real-world gadgets, we describe a methodology and build a tool that locates SMoTher-gadgets in popular libraries. In an evaluation on glibc, we found hundreds of gadgets that can be used to leak information. Finally, we demonstrate proof-of-concept attacks against the OpenSSH server, creating oracles for determining four host key bits, and against an application performing encryption using the OpenSSL library, creating an oracle which can differentiate a bit of the plaintext through gadgets in libcrypto and glibc.

153 citations


Journal ArticleDOI
TL;DR: An enhanced user privacy scheme through caching and spatial K -anonymity (CSKA) in continuous LBSs; it adopts multi-level caching to reduce the risk of exposure of users’ information to untrusted LSPs and can minimize the overhead of the LBS server.

146 citations


Journal ArticleDOI
TL;DR: This article proposes a joint collaborative caching and processing framework that supports Adaptive Bitrate (ABR)-video streaming in MEC networks and proposes practically efficient solutions, including a novel heuristic ABR-aware proactive cache placement algorithm when video popularity is available.
Abstract: Mobile-Edge Computing (MEC) is a promising paradigm that provides storage and computation resources at the network edge in order to support low-latency and computation-intensive mobile applications. In this article, we propose a joint collaborative caching and processing framework that supports Adaptive Bitrate (ABR)-video streaming in MEC networks. We formulate an Integer Linear Program (ILP) that determines the placement of video variants in the caches and the scheduling of video requests to the cache servers so as to minimize the expected delay cost of video retrieval. The considered problem is challenging due to its NP-completeness and to the lack of a-priori knowledge about video request arrivals. Our approach decomposes the original problem into a cache placement problem and a video request scheduling problem while preserving the interplay between the two. We then propose practically efficient solutions, including: (i) a novel heuristic ABR-aware proactive cache placement algorithm when video popularity is available, and (ii) an online low-complexity video request scheduling algorithm that performs very closely to the optimal solution. Simulation results show that our proposed solutions achieve significant increase in terms of cache hit ratio and decrease in backhaul traffic and content access delay compared to the traditional approaches.

144 citations


Proceedings ArticleDOI
01 Apr 2019
TL;DR: This paper reverse engineer the structure of the directory in a sliced, non-inclusive cache hierarchy, and proves that the directory can be used to bootstrap conflict-based cache attacks on the last-level cache.
Abstract: Although clouds have strong virtual memory isolation guarantees, cache attacks stemming from shared caches have proved to be a large security problem. However, despite the past effectiveness of cache attacks, their viability has recently been called into question on modern systems, due to trends in cache hierarchy design moving away from inclusive cache hierarchies. In this paper, we reverse engineer the structure of the directory in a sliced, non-inclusive cache hierarchy, and prove that the directory can be used to bootstrap conflict-based cache attacks on the last-level cache. We design the first cross-core Prime+Probe attack on non-inclusive caches. This attack works with minimal assumptions: the adversary does not need to share any virtual memory with the victim, nor run on the same processor core. We also show the first high-bandwidth Evict+Reload attack on the same hardware. We demonstrate both attacks by extracting key bits during RSA operations in GnuPG on a state-of-the-art non-inclusive Intel Skylake-X server.

143 citations


Journal ArticleDOI
TL;DR: This paper devises location-customized caching schemes to maximize the total content hit rate, and demonstrates that those algorithms can be applied to scenarios with different noise features, and are able to make adaptive caching decisions, achieving a content hit rates comparable to that via the hindsight optimal strategy.
Abstract: Mobile edge caching aims to enable content delivery within the radio access network, which effectively alleviates the backhaul burden and reduces response time. To fully exploit edge storage resources, the most popular contents should be identified and cached. Observing that user demands on certain contents vary greatly at different locations, this paper devises location-customized caching schemes to maximize the total content hit rate. Specifically, a linear model is used to estimate the future content hit rate. For the case with zero-mean noise, a ridge regression-based online algorithm with positive perturbation is proposed. Regret analysis indicates that the hit rate achieved by the proposed algorithm asymptotically approaches that of the optimal caching strategy in the long run. When the noise structure is unknown, an $H_{\infty }$ filter-based online algorithm is devised by taking a prescribed threshold as input, which guarantees prediction accuracy even under the worst-case noise process. Both online algorithms require no training phases and, hence, are robust to the time-varying user demands. The estimation errors of both algorithms are numerically analyzed. Moreover, extensive experiments using real-world datasets are conducted to validate the applicability of the proposed algorithms. It is demonstrated that those algorithms can be applied to scenarios with different noise features, and are able to make adaptive caching decisions, achieving a content hit rate that is comparable to that via the hindsight optimal strategy.

143 citations


Journal ArticleDOI
TL;DR: In a practically important case where the number of files () is large, the rate-memory tradeoff of the above caching system is exactly characterized for systems with no more than five users and the tradeoff within a factor of 2 otherwise.
Abstract: We consider a basic caching system, where a single server with a database of $N$ files (eg, movies) is connected to a set of $K$ users through a shared bottleneck link Each user has a local cache memory with a size of $M$ files The system operates in two phases: a placement phase, where each cache memory is populated up to its size from the database, and a following delivery phase, where each user requests a file from the database, and the server is responsible for delivering the requested contents The objective is to design the two phases to minimize the load (peak or average) of the bottleneck link We characterize the rate-memory tradeoff of the above caching system within a factor of 200884 for both the peak rate and the average rate (under uniform file popularity), improving the state of the arts that are within a factor of 4 and 47, respectively Moreover, in a practically important case where the number of files ( $N$ ) is large, we exactly characterize the tradeoff for systems with no more than five users and characterize the tradeoff within a factor of 2 otherwise To establish these results, we develop two new converse bounds that improve over the state of the art

Journal ArticleDOI
TL;DR: A distributed algorithm based on the machine learning framework of liquid state machine (LSM) is proposed that enables the UAVs to autonomously choose the optimal resource allocation strategies that maximize the number of users with stable queues depending on the network states.
Abstract: In this paper, the problem of joint caching and resource allocation is investigated for a network of cache-enabled unmanned aerial vehicles (UAVs) that service wireless ground users over the LTE licensed and unlicensed bands. The considered model focuses on users that can access both licensed and unlicensed bands while receiving contents from either the cache units at the UAVs directly or via content server-UAV-user links. This problem is formulated as an optimization problem, which jointly incorporates user association, spectrum allocation, and content caching. To solve this problem, a distributed algorithm based on the machine learning framework of liquid state machine (LSM) is proposed. Using the proposed LSM algorithm, the cloud can predict the users’ content request distribution while having only limited information on the network’s and users’ states. The proposed algorithm also enables the UAVs to autonomously choose the optimal resource allocation strategies that maximize the number of users with stable queues depending on the network states. Based on the users’ association and content request distributions, the optimal contents that need to be cached at UAVs and the optimal resource allocation are derived. Simulation results using real datasets show that the proposed approach yields up to 17.8% and 57.1% gains, respectively, in terms of the number of users that have stable queues compared with two baseline algorithms: Q-learning with cache and Q-learning without cache. The results also show that the LSM significantly improves the convergence time of up to 20% compared with conventional learning algorithms such as Q-learning.

Journal ArticleDOI
TL;DR: In this article, the authors considered the problem of private information retrieval from non-colluding and replicated databases, where the user is equipped with a cache that holds an uncoded fraction from each of the stored messages in the databases.
Abstract: We consider the problem of private information retrieval (PIR) from $N$ non-colluding and replicated databases when the user is equipped with a cache that holds an uncoded fraction $r$ from each of the $K$ stored messages in the databases. We assume that the databases are unaware of the cache content. We investigate $D^{*}(r)$ the optimal download cost normalized with the message size as a function of $K$ , $N$ , and $r$ . For a fixed $K$ and $N$ , we develop an inner bound (converse bound) for the $D^{*}(r)$ curve. The inner bound is a piece-wise linear function in $r$ that consists of $K$ line segments. For the achievability, we develop explicit schemes that exploit the cached bits as side information to achieve $K-1$ non-degenerate corner points. These corner points differ in the number of cached bits that are used to generate the one-side information equation. We obtain an outer bound (achievability) for any caching ratio by memory sharing between these corner points. Thus, the outer bound is also a piece-wise linear function in $r$ that consists of $K$ line segments. The inner and the outer bounds match in general for the cases of very low-caching ratio and very high-caching ratio. As a corollary, we fully characterize the optimal download cost caching ratio tradeoff for $K=3$ . For general $K$ , $N$ , and $r$ , we show that the largest gap between the achievability and the converse bounds is 1/6. Our results show that the download cost can be reduced beyond memory sharing if the databases are unaware of the cached content.

Proceedings Article
14 Aug 2019
TL;DR: SCATTERCACHE eliminates fixed cache-set congruences and, thus, makes eviction-based cache attacks unpractical, and the evaluations show that the runtime performance of software is not curtailed and the design even outperforms state-of-the-art caches for certain realistic workloads.
Abstract: Cache side-channel attacks can be leveraged as a building block in attacks leaking secrets even in the absence of software bugs. Currently, there are no practical and generic mitigations with an acceptable performance overhead and strong security guarantees. The underlying problem is that caches are shared in a predictable way across security domains. In this paper, we eliminate this problem. We present SCATTERCACHE, a novel cache design to prevent cache attacks. SCATTERCACHE eliminates fixed cache-set congruences and, thus, makes eviction-based cache attacks unpractical. For this purpose, SCATTERCACHE retrofits skewed associative caches with a keyed mapping function, yielding a security-domaindependent cache mapping. Hence, it becomes virtually impossible to find fully overlapping cache sets, rendering current eviction-based attacks infeasible. Even theoretical statistical attacks become unrealistic, as the attacker cannot confine contention to chosen cache sets. Consequently, the attacker has to resort to eviction of the entire cache, making deductions over cache sets or lines impossible and fully preventing highfrequency attacks. Our security analysis reveals that even in the strongest possible attacker model (noise-free), the construction of a reliable eviction set for PRIME+PROBE in an 8way SCATTERCACHE with 16384 lines requires observation of at least 33.5 million victim memory accesses as compared to fewer than 103 on commodity caches. SCATTERCACHE requires hardware and software changes, yet is minimally invasive on the software level and is fully backward compatible with legacy software while still improving the security level over state-of-the-art caches. Finally, our evaluations show that the runtime performance of software is not curtailed and our design even outperforms state-of-the-art caches for certain realistic workloads.

Journal ArticleDOI
TL;DR: Experimental results have indicated that the cache-enabled UAV scheme can obtain a better throughput, which can bring new approach for multimedia data throughput maximization in IoT system.
Abstract: With the development of the Internet-of-Things (IoT) industry, more and more fields are involved such as multimedia data. Currently, users rely on videos and images with high data volume, so it has brought more challenges for wireless communication and transmission. For multimedia data, it is obviously different from traditional communication data. So new method is required to solve the problem of high data volume in communication. The proactive content caching and the unmanned aerial vehicle (UAV) relaying techniques are deployed over IoT network, enabling the maximum throughput for the served IoT devices. Even though these two existing technologies are important to solve the problem of throughput, there are still other challenges for efficiently improving the system throughput. We mainly study the cache-enabled UAV to maximize throughput among IoT devices in the IoT with the placement of content caching and UAV location. Especially, we divide the joint optimization problem into two parts. First, the UAV deployment problem is decomposed into vertical and horizontal dimensions to ensure the optimal deployment height and 2-D position. The enumeration search method is employed to obtain the 2-D position. Then, we also formulate a concave problem for probabilistic caching placement. Experimental results have indicated that the cache-enabled UAV scheme can obtain a better throughput, which can bring new approach for multimedia data throughput maximization in IoT system.

Journal ArticleDOI
TL;DR: This paper designs D2D caching strategies using multi-agent reinforcement learning and uses Q-learning to learn how to coordinate the caching decisions, and proposes a modified combinatorial upper confidence bound algorithm to reduce the action space for both IL and JAL.
Abstract: To address the increase of multimedia traffic dominated by streaming videos, user equipment (UE) can collaboratively cache and share contents to alleviate the burden of base stations. Prior work on device-to-device (D2D) caching policies assumes perfect knowledge of the content popularity distribution. Since the content popularity distribution is usually unavailable in advance, a machine learning-based caching strategy that exploits the knowledge of content demand history would be highly promising. Thus, we design D2D caching strategies using multi-agent reinforcement learning in this paper. Specifically, we model the D2D caching problem as a multi-agent multi-armed bandit problem and use Q-learning to learn how to coordinate the caching decisions. The UEs can be independent learners (ILs) if they learn the Q-values of their own actions, and joint action learners (JALs) if they learn the Q-values of their own actions in conjunction with those of the other UEs. As the action space is very vast leading to high computational complexity, a modified combinatorial upper confidence bound algorithm is proposed to reduce the action space for both IL and JAL. The simulation results show that the proposed JAL-based caching scheme outperforms the IL-based caching scheme and other popular caching schemes in terms of average downloading latency and cache hit rate.

Journal ArticleDOI
TL;DR: This work designs an asymmetric search tree and improves the branch and bound method to obtain a set of accurate decisions and resource allocation strategies and introduces the auxiliary variables to reformulate the proposed model and applies the modified generalized benders decomposition method to solve the MINLP problem in polynomial computation complexity time.
Abstract: Mobile edge computing (MEC) has risen as a promising paradigm to provide high quality of experience via relocating the cloud server in close proximity to smart mobile devices (SMDs). In MEC networks, the MEC server with computation capability and storage resource can jointly execute the latency-sensitive offloading tasks and cache the contents requested by SMDs. In order to minimize the total latency consumption of the computation tasks, we jointly consider computation offloading, content caching, and resource allocation as an integrated model, which is formulated as a mixed integer nonlinear programming (MINLP) problem. We design an asymmetric search tree and improve the branch and bound method to obtain a set of accurate decisions and resource allocation strategies. Furthermore, we introduce the auxiliary variables to reformulate the proposed model and apply the modified generalized benders decomposition method to solve the MINLP problem in polynomial computation complexity time. Simulation results demonstrate the superiority of the proposed schemes.

Journal ArticleDOI
TL;DR: The proposed mobile VR delivery framework is promising in improving spectral efficiency by maximizing average tolerant delay while meeting high transmission rate requirements and the communications-caching-computing tradeoff at both mobile VR devices and F-APs is revealed.
Abstract: The emerging virtual reality (VR) experience demands ultra-high-transmission-rate and ultra-low-latency deliveries, which is challenging for the current cellular networks. Since fog radio access networks (F-RANs) take full advantages of both edge fog computing and caching technologies and benefit different quality-of-service requirements, it is anticipated that high-quality VR experience could be well addressed in F-RANs. This paper presents an F-RAN-based mobile VR delivery framework, in which the core idea is to cache parts of the VR videos in advance and run a certain processing procedure at the edge of F-RANs. To optimize resource allocation at both mobile VR devices and fog access points (F-APs), a joint radio communication, caching and computing decision problem is formulated to maximize the average tolerant delay with meeting a given transmission rate constraint. This problem is formulated as a multiple choice multiple dimensional knapsack problem and solved with the Lagrangian dual decomposition approach. Furthermore, the optimal joint caching and computing decision is analyzed in a specific case with a closed-form expression of the average tolerant delay. The communications-caching-computing tradeoff at both mobile VR devices and F-APs is revealed, and the numerical results demonstrate that local caching and computing capabilities have significant impacts on the average tolerant delay. The proposed mobile VR delivery framework is promising in improving spectral efficiency by maximizing average tolerant delay while meeting high transmission rate requirements.

Proceedings ArticleDOI
19 May 2019
TL;DR: This work targets ports to stacks of execution units to create a high-resolution timing side-channel due to port contention, inherently stealthy since it does not depend on the memory subsystem like other cache or TLB based attacks.
Abstract: Simultaneous Multithreading (SMT) architectures are attractive targets for side-channel enabled attackers, with their inherently broader attack surface that exposes more per physical core microarchitecture components than cross-core attacks. In this work, we explore SMT execution engine sharing as a side-channel leakage source. We target ports to stacks of execution units to create a high-resolution timing side-channel due to port contention, inherently stealthy since it does not depend on the memory subsystem like other cache or TLB based attacks. Implementing our channel on Intel Skylake and Kaby Lake architectures featuring Hyper-Threading, we mount an end-to-end attack that recovers a P-384 private key from an OpenSSL-powered TLS server using a small number of repeated TLS handshake attempts. Furthermore, we show that traces targeting shared libraries, static builds, and SGX enclaves are essentially identical, hence our channel has wide target application.

Proceedings ArticleDOI
22 Jun 2019
TL;DR: Skewed-CEASER-S is proposed, which divides the cache ways into multiple partitions and maps the cache line to be resident in a different set in each partition, which significantly improves the robustness of CEASER, as the attacker must form an eviction set that can dislodge the line from multiple possible locations.
Abstract: Conflict-based cache attacks can allow an adversary to infer the access pattern of a co-running application by orchestrating evictions via cache conflicts. Such attacks can be mitigated by randomizing the location of the lines in the cache. Our recent proposal, CEASER, makes cache randomization practical by accessing the cache using an encrypted address and periodically changing the encryption key. CEASER was analyzed with the state-of-the-art algorithm on forming eviction sets, and the analysis showed that CEASER with a Remap-Rate of 1% is sufficient to tolerate years of attack. In this paper, we present two new attacks that significantly push the state-of-the-art in forming eviction sets. Our first attack reduces the time required to form the eviction set from O (L2) to O(L), where $L$ is the number of lines in the attack. This attack is 35x faster than the best-known attack and requires that the Remap-Rate of CEASER be increased to 35%. Our second attack exploits the replacement policy (we analyze LRU, RRIP, and Random) to form eviction set quickly and requires that the Remap-Rate of CEASER be increased to more than 100%, incurring impractical overheads. To improve the robustness of CEASER against these attacks in a practical manner, we propose Skewed-CEASER (CEASER-S), which divides the cache ways into multiple partitions and maps the cache line to be resident in a different set in each partition. This design significantly improves the robustness of CEASER, as the attacker must form an eviction set that can dislodge the line from multiple possible locations. We show that CEASER-S can tolerate years of attacks while retaining a Remap-Rate of 1%. CEASER-S incurs negligible slowdown (within 1%) and a storage overhead of less than 100 bytes for the newly added structures.

Proceedings ArticleDOI
12 Oct 2019
TL;DR: CleanupSpec is a hardware-based solution that mitigates speculation-based attacks by undoing the changes to the cache sub-system caused by speculative instructions, in the event they are squashed on a mis-speculation.
Abstract: Speculation-based attacks affect hundreds of millions of computers. These attacks typically exploit caches to leak information, using speculative instructions to cause changes to the cache state. Hardware-based solutions that protect against such forms of attacks try to prevent any speculative changes to the cache sub-system by delaying them. For example, InvisiSpec, a recent work, splits the load into two operations: the first operation is speculative and obtains the value and the second operation is non-speculative and changes the state of the cache. Unfortunately, such a "Redo" based approach typically incurs slowdown due to the requirement of extra operations for correctly speculated loads, that form the large majority of loads. In this work, we propose CleanupSpec, an "Undo"-based approach to safe speculation. CleanupSpec is a hardware-based solution that mitigates these attacks by undoing the changes to the cache sub-system caused by speculative instructions, in the event they are squashed on a mis-speculation. As a result, CleanupSpec prevents information leakage on the correct path of execution due to any mis-speculated load and is secure against speculation-based attacks exploiting caches (we demonstrate a proof-of-concept defense on Spectre Variant-1 PoC). Unlike a Redo-based approach which incurs overheads for correct-path loads, CleanupSpec incurs overheads only for the wrong-path loads that are less frequent. As a result, CleanupSpec only incurs an average slowdown of 5.1% compared to a non-secure baseline. Moreover, CleanupSpec incurs a modest storage overhead of less than 1 kilobyte per core, for tracking and undoing the speculative changes to the caches.

Journal ArticleDOI
TL;DR: It is demonstrated how optically-enabled eight-socket boards can be combined via a 256 × 256 Hipoλaos Optical Packet Switch into a powerful 256-node disaggregated system with less than 335 ns latency, forming a highly promising solution for the latency-critical rack-scale memory disaggregation era.
Abstract: Following a decade of radical advances in the areas of integrated photonics and computing architectures, we discuss the use of optics in the current computing landscape attempting to redefine and refine their role based on the progress in both research fields. We present the current set of critical challenges faced by the computing industry and provide a thorough review of photonic Network-on-Chip (pNoC) architectures and experimental demonstrations, concluding to the main obstacles that still impede the materialization of these concepts. We propose the employment of optics in chip-to-chip (C2C) computing architectures rather than on-chip layouts toward reaping their benefits while avoiding technology limitations on the way to manycore set-ups. We identify multisocket boards as the most prominent application area and present recent advances in optically enabled multisocket boards, revealing successful 40 Gb/s transceiver and routing capabilities via integrated photonics. These results indicate the potential to bring energy consumption down by more than 60% compared to current QuickPath Interconnect (QPI) protocol, while turning multisocket architectures into a single-hop low-latency setup for even more than four interconnected sockets, which form currently the electronic baseline. We go one step further and demonstrate how optically-enabled eight-socket boards can be combined via a 256 × 256 Hipoλaos Optical Packet Switch into a powerful 256-node disaggregated system with less than 335 ns latency, forming a highly promising solution for the latency-critical rack-scale memory disaggregation era. Finally, we discuss the perspective for disintegrated computing via optical technologies as a mean to increase the number of synergized high-performance cores overcoming die area constraints, introducing also the concept of cache disintegration via the use of future off-die ultrafast optical cache memory chiplets.

Proceedings ArticleDOI
04 Apr 2019
TL;DR: Context-sensitive fencing as discussed by the authors leverages the ability to dynamically alter the decoding of the instruction stream, to seamlessly inject new micro-ops, including fences, only when dynamic conditions indicate they are needed.
Abstract: This paper describes context-sensitive fencing (CSF), a microcode-level defense against multiple variants of Spectre. CSF leverages the ability to dynamically alter the decoding of the instruction stream, to seamlessly inject new micro-ops, including fences, only when dynamic conditions indicate they are needed. This enables the processor to protect against the attack, but with minimal impact on the efficacy of key performance features such as speculative execution. This research also examines several alternative fence implementations, and introduces three new types of fences which allow most dynamic reorderings of loads and stores, but in a way that prevents speculative accesses from changing visible cache state. These optimizations reduce the performance overhead of the defense mechanism, compared to state-of-the-art software-based fencing mechanisms by a factor of six.

Journal ArticleDOI
TL;DR: In this paper, a distributed deep learning algorithm that brings together new neural network ideas from liquid state machine (LSM) and echo state networks (ESNs) is proposed to address the problem of content caching and transmission for a wireless virtual reality (VR) network in which cellular-connected UAVs capture videos on live games or sceneries and transmit them to small base stations (SBSs) that service the VR users.
Abstract: In this paper, the problem of content caching and transmission is studied for a wireless virtual reality (VR) network in which cellular-connected unmanned aerial vehicles (UAVs) capture videos on live games or sceneries and transmit them to small base stations (SBSs) that service the VR users. To meet the VR delay requirements, the UAVs can extract specific visible content (e.g., user field of view) from the original 360° VR data and send this visible content to the users so as to reduce the traffic load over backhaul and radio access links. The extracted visible content consists of 120° horizontal and 120° vertical images. To further alleviate the UAV-SBS backhaul traffic, the SBSs can also cache the popular contents that users request. This joint content caching and transmission problem are formulated as an optimization problem whose goal is to maximize the users’ reliability defined as the probability that the content transmission delay of each user satisfies the instantaneous VR delay target. To address this problem, a distributed deep learning algorithm that brings together new neural network ideas from liquid state machine (LSM), and echo state networks (ESNs) is proposed. The proposed algorithm enables each SBS to predict the users’ reliability so as to find the optimal contents to cache and content transmission format for each cellular-connected UAV. Analytical results are derived to expose the various network factors that impact content caching and content transmission format selection. Simulation results show that the proposed algorithm yields 25.4% and 14.7% gains, in terms of reliability compared to Q-learning and a random caching algorithm, respectively.

Proceedings ArticleDOI
22 Jun 2019
TL;DR: A comprehensive characterization of the top seven microservices that run on the compute-optimized data center fleet at Facebook is undertaken and a tool, μSKU, is developed that automates search over a soft-SKU design space using A/B testing in production and can obtain statistically significant gains with no additional hardware requirements.
Abstract: The variety and complexity of microservices in warehouse- scale data centers has grown precipitously over the last few years to support a growing user base and an evolving product portfolio. Despite accelerating microservice diversity, there is a strong requirement to limit diversity in underlying server hardware to maintain hardware resource fungibility, preserve procurement economies of scale, and curb qualification/test overheads. As such, there is an urgent need for strategies that enable limited server CPU architectures (a.k.a “SKUs”) to provide performance and energy efficiency over diverse microservices. To this end, we first undertake a comprehensive characterization of the top seven microservices that run on the compute-optimized data center fleet at Facebook. Our characterization reveals profound diversity in OS and I/O interaction, cache misses, memory bandwidth utilization, instruction mix, and CPU stall behavior. Whereas customizing a CPU SKU for each microservice might be beneficial, it is prohibitive. Instead, we argue for “soft SKUs”, wherein we exploit coarse-grain (e.g., boot time) configuration knobs to tune the platform for a particular microservice. We develop a tool, μSKU, that automates search over a soft-SKU design space using A/B testing in production and demonstrate how it can obtain statistically significant gains (up to 7.2% and 4.5% performance improvement over stock and production servers, respectively) with no additional hardware requirements.

Proceedings ArticleDOI
12 Oct 2019
TL;DR: This paper shows that for cache replacement, a powerful LSTM learning model can in an offline setting provide better accuracy than current hardware predictors, and designs a simple online model that matches the offline model's accuracy with orders of magnitude lower cost.
Abstract: Despite its success in many areas, deep learning is a poor fit for use in hardware predictors because these models are impractically large and slow, but this paper shows how we can use deep learning to help design a new cache replacement policy. We first show that for cache replacement, a powerful LSTM learning model can in an offline setting provide better accuracy than current hardware predictors. We then perform analysis to interpret this LSTM model, deriving a key insight that allows us to design a simple online model that matches the offline model's accuracy with orders of magnitude lower cost. The result is the Glider cache replacement policy, which we evaluate on a set of 33 memory-intensive programs from the SPEC 2006, SPEC 2017, and GAP (graph-processing) benchmark suites. In a single-core setting, Glider outperforms top finishers from the 2nd Cache Replacement Championship, reducing the miss rate over LRU by 8.9%, compared to reductions of 7.1% for Hawkeye, 6.5% for MPPPB, and 7.5% for SHiP++. On a four-core system, Glider improves IPC over LRU by 14.7%, compared with improvements of 13.6% (Hawkeye), 13.2% (MPPPB), and 11.4% (SHiP++).

Journal ArticleDOI
TL;DR: A user-centric video transmission mechanism based on device-to-device communications that allows mobile users to cache and share videos between each other, in a cooperative manner, to achieve a QoE-guaranteed video streaming service in a cellular network.
Abstract: The ever-increasing demand for videos on mobile devices poses a significant challenge to existing cellular network infrastructures. To cope with the challenge, we propose a user-centric video transmission mechanism based on device-to-device communications that allows mobile users to cache and share videos between each other, in a cooperative manner. The proposed solution jointly considers users’ similarity in accessing videos, users’ sharing willingness, users’ location distribution, and users’ quality of experience (QoE) requirements, in order to achieve a QoE-guaranteed video streaming service in a cellular network. Specifically, a service set consisting of several service providers and mobile users, is dynamically configured to provide timely service according to the probability of successful service. Numerical results show that when the number of providers and demanded videos is 40 and 2, respectively, the improved users experience rate in the proposed solution is approximately 85%, and the data offload rate on base station(s) is about 78%.

Journal ArticleDOI
TL;DR: A node recognition method for assessment probability is established to satisfy the priority adjustment for the high probability nodes of cache, and then cache space should be reconstructed to improve the transmission environment.
Abstract: In social networks, nodes should analyze communication area during data transmission and find suitable neighbors to perform effective data classification transmission. This is similar to finding certain transmission destinations during data transmission with mobile devices. However, cache space with node in social opportunistic networks is limited, and waiting for destination node could also cause end-to-end delay. To improve the transmission environment, this study established a node recognition method for assessment probability, to satisfy the priority adjustment for the high probability nodes of cache, and then cache space should be reconstructed. To avoid accidentally deleting cached data, the cache task of the node is shared through the neighbor node cooperation, and the effective data transmission is performed. Through experiments and the comparison of social network algorithms, the proposed scheme improves delivery ratio by 82% and reduces delay by 74% with the traditional algorithms on average.

Journal ArticleDOI
TL;DR: This letter incorporates the wireless content caching into HSTRN, where two representative cache placement schemes are considered, namely, the most popular content- based scheme, and the uniform content-based caching scheme.
Abstract: Hybrid satellite-terrestrial relay network (HSTRN) has been viewed as a flexible solution to be developed with the heterogeneous devices, integrated systems, and infrastructures for future wireless communications. To alleviate the spectrum shortage and meet the requirements of improved spectral efficiency, this letter incorporates the wireless content caching into HSTRN, where two representative cache placement schemes are considered, namely, the most popular content-based scheme, and the uniform content-based caching scheme. Specifically, the analytical expressions for the outage probability of considered HSTRN with different cache placement schemes are derived. Simulation results are provided to validate the theoretical analysis and confirm the substantial performance improvement of proposed schemes over the traditional approach without caching capabilities.

Journal ArticleDOI
TL;DR: The proposed SacLe strategy is shown to be able to achieve the optimal performance obtained by the brute force (BF) algorithm, and the caching strategy has a significant impact on the network secrecy performance through affecting the caching diversity gain and signal cooperation gain at the relays.
Abstract: In this paper, we investigate the security of a cache-aided multi-relay communication network in the presence of multiple eavesdroppers, where each relay can pre-store a part of the requested files in order to assist secure data transmission from source to destination. If the relays have cached the requested file, then they can directly send it to the destination; otherwise, traditional dual-hop data transmission is used. For both cases, relay selection is performed to assist the secure data transmission. We analyze the network secrecy performance in both scenarios of non-colluding and colluding eavesdroppers, and obtain a closed-form expression for the average secrecy outage probability (SOP), as well as an asymptotic expression for the high main-to-eavesdropper ratio (MER). Through minimizing the network SOP, we further optimize the cache placement by proposing a stochastic sampling based cache learning (SacLe) strategy, which can be implemented in parallel and thus reduces the implementation latency substantially. Numerical and simulation results are finally presented to verify the proposed analysis, and show that the caching strategy has a significant impact on the network secrecy performance through affecting the caching diversity gain and signal cooperation gain at the relays. The proposed SacLe strategy is shown to be able to achieve the optimal performance obtained by the brute force (BF) algorithm.