scispace - formally typeset
Search or ask a question

Showing papers on "Cache algorithms published in 2018"


Proceedings ArticleDOI
21 Mar 2018
TL;DR: In this paper, a DRL-based framework with Wolpertinger architecture for content caching at the base station is proposed to maximize the long-term cache hit rate, which requires no knowledge of the content popularity distribution.
Abstract: Content caching at the edge nodes is a promising technique to reduce the data traffic in next-generation wireless networks. Inspired by the success of Deep Reinforcement Learning (DRL) in solving complicated control problems, this work presents a DRL-based framework with Wolpertinger architecture for content caching at the base station. The proposed framework is aimed at maximizing the long-term cache hit rate, and it requires no knowledge of the content popularity distribution. To evaluate the proposed framework, we compare the performance with other caching algorithms, including Least Recently Used (LRU), Least Frequently Used (LFU), and First-In First-Out (FIFO) caching strategies. Meanwhile, since the Wolpertinger architecture can effectively limit the action space size, we also compare the performance with Deep Q-Network to identify the impact of dropping a portion of the actions. Our results show that the proposed framework can achieve improved short-term cache hit rate and improved and stable long-term cache hit rate in comparison with LRU, LFU, and FIFO schemes. Additionally, the performance is shown to be competitive in comparison to Deep Q-learning, while the proposed framework can provide significant savings in runtime.

185 citations


Journal ArticleDOI
TL;DR: This paper proposes a hybrid content caching design that does not require the knowledge of content popularity and proposes practical and heuristic CU/BS caching algorithms to address a general caching scenario by inheriting the design rationale of the aforementioned performance-guaranteed algorithms.
Abstract: Most existing content caching designs require accurate estimation of content popularity, which can be challenging in the dynamic mobile network environment. Moreover, emerging hierarchical network architecture enables us to enhance the content caching performance by opportunistically exploiting both cloud-centric and edge-centric caching. In this paper, we propose a hybrid content caching design that does not require the knowledge of content popularity. Specifically, our design optimizes the content caching locations, which can be original content servers, central cloud units (CUs) and base stations (BSs) where the design objective is to support as high average requested content data rates as possible subject to the finite service latency. We fulfill this design by employing the Lyapunov optimization approach to tackle an NP-hard caching control problem with the tight coupling between CU caching and BS caching control decisions. Toward this end, we propose algorithms in three specific caching scenarios by exploiting the submodularity property of the sum-weight objective function and the hierarchical caching structure. Moreover, we prove the proposed algorithms can achieve finite content service delay for all arrival rates within the constant fraction of capacity region using Lyapunov optimization technique. Furthermore, we propose practical and heuristic CU/BS caching algorithms to address a general caching scenario by inheriting the design rationale of the aforementioned performance-guaranteed algorithms. Trace-driven simulation demonstrates that our proposed hybrid CU/BS caching algorithms outperform the general popularity based caching algorithm and the independent caching algorithm in terms of average end-to-end service latency and backhaul/fronthaul load reduction ratios.

127 citations


Proceedings ArticleDOI
01 Dec 2018
TL;DR: A Federated learning based Proactive Content Caching (FPCC) scheme, which does not require to gather users' data centrally for training, and which outperforms other learning-based caching algorithms such as m-epsilon-greedy and Thompson sampling in terms of cache efficiency.
Abstract: Content caching is a promising approach in edge computing to cope with the explosive growth of mobile data on 5G networks, where contents are typically placed on local caches for fast and repetitive data access. Due to the capacity limit of caches, it is essential to predict the popularity of files and cache those popular ones. However, the fluctuated popularity of files makes the prediction a highly challenging task. To tackle this challenge, many recent works propose learning based approaches which gather the users' data centrally for training, but they bring a significant issue: users may not trust the central server and thus hesitate to upload their private data. In order to address this issue, we propose a Federated learning based Proactive Content Caching (FPCC) scheme, which does not require to gather users' data centrally for training. The FPCC is based on a hierarchical architecture in which the server aggregates the users' updates using federated averaging, and each user performs training on its local data using hybrid filtering on stacked autoencoders. The experimental results demonstrate that, without gathering user's private data, our scheme still outperforms other learning-based caching algorithms such as m-epsilon-greedy and Thompson sampling in terms of cache efficiency.

116 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigate multi-layer caching where both base station (BS) and users are capable of storing content data in their local cache and analyze the performance of edge-caching wireless networks under two notable uncoded and coded caching strategies.
Abstract: Edge-caching has received much attention as an efficient technique to reduce delivery latency and network congestion during peak-traffic times by bringing data closer to end users. Existing works usually design caching algorithms separately from physical layer design. In this paper, we analyze edge-caching wireless networks by taking into account the caching capability when designing the signal transmission. Particularly, we investigate multi-layer caching where both base station (BS) and users are capable of storing content data in their local cache and analyze the performance of edge-caching wireless networks under two notable uncoded and coded caching strategies. First, we calculate backhaul and access throughputs of the two caching strategies for arbitrary values of cache size. The required backhaul and access throughputs are derived as a function of the BS and user cache sizes. Second, closed-form expressions for the system energy efficiency (EE) corresponding to the two caching methods are derived. Based on the derived formulas, the system EE is maximized via precoding vectors design and optimization while satisfying a predefined user request rate. Third, two optimization problems are proposed to minimize the content delivery time for the two caching strategies. Finally, numerical results are presented to verify the effectiveness of the two caching methods.

72 citations


Journal ArticleDOI
TL;DR: To tackle large scenarios with low complexity, it is proved that the optimal caching placement of one user, giving other users’ caching placements, can be derived in polynomial time, and a mobility aware multi-user algorithm is developed.
Abstract: Caching popular files at the user equipments (UEs) provides an effective way to alleviate the burden of the backhaul networks. Generally, popularity-based caching is not a system-wide optimal strategy, especially for user mobility scenarios. Motivated by this observation, we consider optimal caching with the presence of mobility. A cost-optimal caching problem (COCP) for device-to-device (D2D) networks is modeled, in which the impact of user mobility, cache size, and total number of encoded segments are all taken into account. The hardness of the problem is proved via a reduction from the satisfiability problem. Next, a lower-bounding function of the objective function is derived. By the function, an approximation of COCP (ACOCP) achieving linearization is obtained, which features two advantages. First, the ACOCP approach can use an off-the-shelf integer linear programming algorithm to obtain the global optimal solution, and it can effectively deliver solutions for small-scale and medium-scale system scenarios. Second, and more importantly, based on the ACOCP approach, one can derive a lower bound of global optimum of COCP, thus enabling performance benchmarking of any sub-optimal algorithm. To tackle large scenarios with low complexity, we first prove that the optimal caching placement of one user, giving other users’ caching placements, can be derived in polynomial time. Then, based on this proof, a mobility aware multi-user algorithm is developed. Simulation results verify the effectivenesses of the two approaches by comparing them to the lower bound of global optimum and conventional caching algorithms.

53 citations


Journal ArticleDOI
TL;DR: A novel hybrid algorithm is proposed, adaptive-least recently used, that learns both faster and better the changes in the popularity and outperforms all other candidate algorithms when confronted with either a dynamically changing synthetic request process or using real world traces.
Abstract: Typical analysis of content caching algorithms using the metric of steady state hit probability under a stationary request process does not account for performance loss under a variable request arrival process. In this paper, we instead conceptualize caching algorithms as complexity-limited online distribution learning algorithms and use this vantage point to study their adaptability from two perspectives: 1) the accuracy of learning a fixed popularity distribution and 2) the speed of learning items’ popularity. In order to attain this goal, we compute the distance between the stationary distributions of several popular algorithms with that of a genie-aided algorithm that has the knowledge of the true popularity ranking, which we use as a measure of learning accuracy. We then characterize the mixing time of each algorithm, i.e., the time needed to attain the stationary distribution, which we use as a measure of learning efficiency. We merge both the abovementioned measures to obtain the “learning error” representing both how quickly and how accurately an algorithm learns the optimal caching distribution and use this to determine the trade-off between these two objectives of many popular caching algorithms. Informed by the results of our analysis, we propose a novel hybrid algorithm, adaptive-least recently used, that learns both faster and better the changes in the popularity. We show numerically that it also outperforms all other candidate algorithms when confronted with either a dynamically changing synthetic request process or using real world traces.

50 citations


Journal ArticleDOI
01 Feb 2018
TL;DR: It is shown that minimizing the retrieval cost corresponds to solving an online knapsack problem, and new dynamic policies inspired by simulated annealing are proposed, including DynqLRU, a variant of qLRU that significantly outperforms state-of-the-art policies.
Abstract: Cache policies to minimize the content retrieval cost have been studied through competitive analysis when the miss costs are additive and the sequence of content requests is arbitrary. More recently, a cache utility maximization problem has been introduced, where contents have stationary popularities and utilities are strictly concave in the hit rates. This paper bridges the two formulations, considering linear costs and content popularities. We show that minimizing the retrieval cost corresponds to solving an online knapsack problem, and we propose new dynamic policies inspired by simulated annealing, including DynqLRU, a variant of qLRU. We prove that DynqLRU asymptotically asymptotic converges to the optimum under the characteristic time approximation. In a real scenario, popularities vary over time and their estimation is very difficult. DynqLRU does not require popularity estimation, and our realistic, trace-driven evaluation shows that it significantly outperforms state-of-the-art policies, with up to 45% cost reduction.

48 citations


Journal ArticleDOI
TL;DR: A novel cache replacement method named popularity prediction caching (PPC) for chunk-level cache is presented by discovering the relevance among video chunks in information centric network from the perspective of user watching behavior.
Abstract: Thriving future network conceives embedding the ubiquitous in-network caching. The fine-grained cache behavior reveals an intimate relationship among contents in the same stream—sequenced contents have similar cache behavior. This letter presents a novel cache replacement method named popularity prediction caching (PPC) for chunk-level cache by discovering the relevance among video chunks in information centric network from the perspective of user watching behavior. PPC predicts and caches the future most popular chunks and evicts those with least future popularity in a linear complexity. Simulations in a GEANT model show that PPC outperforms cache policy based on content popularity, least recently used, least frequently used, and first in first out.

47 citations


Journal ArticleDOI
TL;DR: The experiment shows that this algorithm has a better hit rate, byte hit rate and access latency than state-of-the-art algorithms, such as least Recently Used, least Frequently Used, and GDSF.
Abstract: Caches are used to improve the performance of the internet, and to reduce the latency of data access time and the low speed of repeated computing processes. Cache replacement is one of the most important issues in a caching system; therefore, it must be coordinated with the caching system to minimize the access latency and maximize the hit rate or byte hit rate. In this paper, we presented a novel caching replacement algorithm named Weighted Greedy Dual Size Frequency (WGDSF) algorithm, which is an improvement on the Greedy Dual Size Frequency (GDSF) algorithm. The WGDSF algorithm mainly adds weighted frequency-based time and weighted document type to GDSF. By increasing the above two weighted parameters, WGDSF performs fairly well at keeping popular objects in the cache and replacing rarely used ones. Our experiment shows that this algorithm has a better hit rate, byte hit rate and access latency than state-of-the-art algorithms, such least Recently Used, least Frequently Used, and GDSF.

27 citations


Journal ArticleDOI
TL;DR: An efficient Reconfigurable Cache Architecture (ReCA) for storage systems is presented using a comprehensive workload characterization to find an optimal cache configuration for I/O intensive applications.
Abstract: In recent years, Solid-State Drives (SSDs) have gained tremendous attention in computing and storage systems due to significant performance improvement over Hard Disk Drives (HDDs). The cost per capacity of SSDs, however, prevents them from entirely replacing HDDs in such systems. One approach to effectively take advantage of SSDs is to use them as a caching layer to store performance critical data blocks in order to reduce the number of accesses to HDD-based disk subsystem. Due to characteristics of Flash-based SSDs such as limited write endurance and long latency on write operations, employing caching algorithms at the Operating System (OS) level necessitates to take such characteristics into consideration. Previous OS-level caching techniques are optimized towards only one type of application, which affects both generality and applicability. In addition, they are not adaptive when the workload pattern changes over time. This paper presents an efficient Reconfigurable Cache Architecture (ReCA) for storage systems using a comprehensive workload characterization to find an optimal cache configuration for I/O intensive applications. For this purpose, we first investigate various types of I/O workloads and classify them into five major classes. Based on this characterization, an optimal cache configuration is presented for each class of workloads. Then, using the main features of each class, we continuously monitor the characteristics of an application during system runtime and the cache organization is reconfigured if the application changes from one class to another class of workloads. The cache reconfiguration is done online and workload classes can be extended to emerging I/O workloads in order to maintain its efficiency with the characteristics of I/O requests. Experimental results obtained by implementing ReCA in a 4U rackmount server with SATA 6Gb/s disk interfaces running Linux 3.17.0 show that the proposed architecture improves performance and lifetime up to 24 and 33 percent, respectively.

26 citations


Journal ArticleDOI
07 Sep 2018
TL;DR: In this article, the authors provide theoretical justification for the TTL approximation in the case where distinct contents are described by independent stationary and ergodic processes, and show that this approximation is exact as the cache size and the number of contents go to infinity.
Abstract: The modeling and analysis of an LRU cache is extremely challenging as exact results for the main performance metrics (e.g., hit rate) are either lacking or cannot be used because of their high computational complexity for large caches. As a result, various approximations have been proposed. The state-of-the-art method is the so-called TTL approximation, first proposed and shown to be asymptotically exact for IRM requests by Fagin [13]. It has been applied to various other workload models and numerically demonstrated to be accurate but without theoretical justification. In this article, we provide theoretical justification for the approximation in the case where distinct contents are described by independent stationary and ergodic processes. We show that this approximation is exact as the cache size and the number of contents go to infinity. This extends earlier results for the independent reference model. Moreover, we establish results not only for the aggregate cache hit probability but also for every individual content. Last, we obtain bounds on the rate of convergence.

Journal ArticleDOI
TL;DR: This work proposes two TTL-based caching algorithms that provide provable performance guarantees for request traffic that is bursty and non-stationary, and evaluates both d-T TL and f-TTL using an extensive trace containing more than 500 million requests from a production CDN server.
Abstract: Content delivery networks (CDNs) cache and serve a majority of the user-requested content on the Internet. Designing caching algorithms that automatically adapt to the heterogeneity, burstiness, and non-stationary nature of real-world content requests is a major challenge and is the focus of our work. While there is much work on caching algorithms for stationary request traffic, the work on non-stationary request traffic is very limited. Consequently, most prior models are inaccurate for non-stationary production CDN traffic. We propose two TTL-based caching algorithms that provide provable performance guarantees for request traffic that is bursty and non-stationary. The first algorithm called d-TTL dynamically adapts a TTL parameter using stochastic approximation. Given a feasible target hit rate, we show that d-TTL converges to its target value for a general class of bursty traffic that allows Markov dependence over time and non-stationary arrivals. The second algorithm called f-TTL uses two caches, each with its own TTL. The first-level cache adaptively filters out non-stationary traffic, while the second-level cache stores frequently-accessed stationary traffic. Given feasible targets for both the hit rate and the expected cache size, f-TTL asymptotically achieves both targets. We evaluate both d-TTL and f-TTL using an extensive trace containing more than 500 million requests from a production CDN server. We show that both d-TTL and f-TTL converge to their hit rate targets with an error of about 1.3%. But, f-TTL requires a significantly smaller cache size than d-TTL to achieve the same hit rate, since it effectively filters out non-stationary content.

Proceedings ArticleDOI
13 Aug 2018
TL;DR: A new cache management policy is proposed and developed that utilizes DAG information to optimize both eviction and prefetching of data to improve cache management, and works best for I/O-intensive workloads.
Abstract: Optimizing memory cache usage is vital for performance of in-memory data-parallel frameworks such as Spark. Current data-analytic frameworks utilize the popular Least Recently Used (LRU) policy, which does not take advantage of data dependency information available in the application's directed acyclic graph (DAG). Recent research in dependency-aware caching, notably MemTune and Least Reference Count (LRC), have made important improvements to close this gap. But they do not fully leverage the DAG structure, which imparts information such as the time-spatial distribution of data references across the workflow, to further improve cache hit ratio and application runtime. In this paper, we propose and develop a new cache management policy, Most Reference Distance (MRD) that utilizes DAG information to optimize both eviction and prefetching of data to improve cache management. MRD takes into account the relative stage distance of each data block reference in the application workflow, effectively evicting the furthest and least likely data in the cache to be used, while aggressively prefetching the nearest and most likely data that will be needed, and in doing so, better overlapping computation with I/O time. Our experiments with a Spark implementation, utilizing popular benchmarking workloads show that, MRD has low overhead and improves performance by an average of 53% compared to LRU, and up to 68% and 45% when compared to MemTune and LRC respectively. It works best for I/O-intensive workloads.

Journal ArticleDOI
TL;DR: A hit-count based victim-selection procedure is suggested on top of existing low-cost replacement policies to significantly improve the quality of victim selection in last-level caches without commensurate area overhead.
Abstract: Memory-intensive workloads operate on massive amounts of data that cannot be captured by last-level caches (LLCs) of modern processors. Consequently, processors encounter frequent off-chip misses, and hence, lose significant performance potential. One of the components of a modern processor that has a prominent influence on the off-chip miss traffic is LLC's replacement policy. Existing processors employ a variation of least recently used (LRU) policy to determine the victim for replacement. Unfortunately, there is a large gap between what LRU offers and that of Belady's MIN, which is the optimal replacement policy. Belady's MIN requires selecting a victim with the longest reuse distance, and hence, is unfeasible due to the need for knowing the future. In this work, we observe that there exists a strong correlation between the expected number of hits of a cache block and the reciprocal of its reuse distance. Taking advantage of this observation, we improve the efficiency of last-level caches through a low-cost-yet-effective replacement policy. We suggest a hit-count based victim-selection procedure on top of existing low-cost replacement policies to significantly improve the quality of victim selection in last-level caches without commensurate area overhead. Our proposal offers 12.2 percent performance improvement over the baseline LRU in a multi-core processor and outperforms EVA, which is the state-of-the-art replacement policy.

Journal ArticleDOI
TL;DR: A novel fingerprint caching mechanism that estimates the temporal locality of duplicates in different data streams and prioritizes the cache allocation based on the estimation is proposed and results show that the proposed mechanism provides significant improvement for both deduplication ratio and overhead reduction.
Abstract: Existing primary deduplication techniques either use inline caching to exploit locality in primary workloads or use post-processing deduplication to avoid the negative impact on I/O performance. However, neither of them works well in the cloud servers running multiple services for the following two reasons: First, the temporal locality of duplicate data writes varies among primary storage workloads, which makes it challenging to efficiently allocate the inline cache space and achieve a good deduplication ratio. Second, the post-processing deduplication does not eliminate duplicate I/O operations that write to the same logical block address as it is performed after duplicate blocks have been written. A hybrid deduplication mechanism is promising to deal with these problems. Inline fingerprint caching is essential to achieving efficient hybrid deduplication. In this paper, we present a detailed analysis of the limitations of using existing caching algorithms in primary deduplication in the cloud. We reveal that existing caching algorithms either perform poorly or incur significant memory overhead in fingerprint cache management. To address this, we propose a novel fingerprint caching mechanism that estimates the temporal locality of duplicates in different data streams and prioritizes the cache allocation based on the estimation. We integrate the caching mechanism and build a hybrid deduplication system. Our experimental results show that the proposed mechanism provides significant improvement for both deduplication ratio and overhead reduction.

Journal ArticleDOI
TL;DR: A kinetic model of LRU cache memory, based on the average eviction time (AET) of the cached data, that enables fast measurement and use of low-cost sampling and is a composable model that can characterize shared cache behavior through sampling and modeling individual programs or traces.
Abstract: The reuse distance (least recently used (LRU) stack distance) is an essential metric for performance prediction and optimization of storage cache. Over the past four decades, there have been steady improvements in the algorithmic efficiency of reuse distance measurement. This progress is accelerating in recent years, both in theory and practical implementation. In this article, we present a kinetic model of LRU cache memory, based on the average eviction time (AET) of the cached data. The AET model enables fast measurement and use of low-cost sampling. It can produce the miss ratio curve in linear time with extremely low space costs. On storage trace benchmarks, AET reduces the time and space costs compared to former techniques. Furthermore, AET is a composable model that can characterize shared cache behavior through sampling and modeling individual programs or traces.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: Analysis of the performance of multiple flows of data item requests under resource pooling and separation when the cache size is large shows that it is asymptotically optimal to jointly serve multiple flows if their data item sizes and popularity distributions are similar, and their arrival rates do not differ significantly.
Abstract: Caching systems using the Least Recently Used (LRU) principle have now become ubiquitous. A fundamental question for these systems is whether the cache space should be pooled together or divided to serve multiple flows of data item requests in order to minimize the miss probabilities. In this paper, we show that there is no straight yes or no answer to this question, and depends on complex combinations of critical factors, including, e.g., request rates, overlapped data items across different request flows, data item popularities and their sizes. To this end, we characterize the performance of multiple flows of data item requests under resource pooling and separation when the cache size is large. Analytically we show that it is asymptotically optimal to jointly serve multiple flows if their data item sizes and popularity distributions are similar, and their arrival rates do not differ significantly; the self-organizing property of LRU caching automatically optimizes the resource allocation among them asymptotically. Otherwise, separating these flows could be better, e.g., when data sizes vary significantly. We also quantify critical points beyond which resource pooling is better than separation for each of the flows when the overlapped data items exceed certain levels. These results provide new insights on the performance of caching systems.

Proceedings ArticleDOI
TL;DR: In this paper, the authors defined four important characteristics of a suitable eviction policy for information centric networks (ICN) and proposed a new eviction scheme which is well suitable for ICN type of cache networks.
Abstract: The information centric networks (ICN) can be viewed as a network of caches. Conversely, ICN type of cache networks has distinctive features e.g, contents popularity, usability time of content and other factors inflicts some diverse requirements for cache eviction policies. In this paper we defined four important characteristics of a suitable eviction policy for ICN. We analysed well known eviction policies in view of defined characteristics. Based upon analysis we propose a new eviction scheme which is well suitable for ICN type of cache networks.

Journal ArticleDOI
TL;DR: It is proved that, under a weak assumption on the content popularity distribution, choosing smaller chunks allows to improve the performance of chunk-LRU policy, and it is shown numerically that even for a small number of chunks, the gains of chunk -LRU are almost optimal.

Journal ArticleDOI
TL;DR: This paper gives an analytical method to find the miss rate of L2 cache for various configurations from the RD profile with respect to L1 cache and considers all three types of cache inclusion policies namely (i) Strictly Inclusive, (ii) Mutually Exclusive and (iii) Non-Inclusive Non-Exclusive.
Abstract: Reuse distance is an important metric for analytical estimation of cache miss rate. To find the miss rate of a particular cache, the reuse distance profile has to be measured for that particular level and configuration of the cache. Significant amount of simulation time and overhead can be reduced if we can find the miss rate of higher level cache like L2 cache from the RD profile with respect to a lower level cache (i.e., cache that is closer to the processor) such as L1. The objective of this paper is to give an analytical method to find the miss rate of L2 cache for various configurations from the RD profile with respect to L1 cache. We consider all three types of cache inclusion policies namely (i) Strictly Inclusive, (ii) Mutually Exclusive and (iii) Non-Inclusive Non-Exclusive policy. We first prove some general results relating the RD profile of L1 cache to that of L2 cache. We use probabilistic analysis for our derivations. We validate our model against simulations, using the multi-core simulator Sniper with the PARSEC and the SPLASH benchmark suites.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: This work presents a comprehensive study on internet-scale photo caching algorithms in the case of QQPhoto from Tencent Inc., the largest social network service company in China, and proposes to incorporate a prefetcher in the cache stack based on the observed immediacy feature that is unique to the QQ photo workload.
Abstract: Photo service providers are facing critical challenges of dealing with the huge amount of photo storage, typically in a magnitude of billions of photos, while ensuring national-wide or world-wide satisfactory user experiences Distributed photo caching architecture is widely deployed to meet high performance expectations, where efficient still mysterious caching policies play essential roles In this work, we present a comprehensive study on internet-scale photo caching algorithms in the case of QQPhoto from Tencent Inc, the largest social network service company in China We unveil that even advanced cache algorithms can only perform at a similar level as simple baseline algorithms and there still exists a large performance gap between these cache algorithms and the theoretically optimal algorithm due to the complicated access behaviors in such a large multi-tenant environment We then expound the behind reasons for that phenomenon via extensively investigating the characteristics of QQPhoto workloads Finally, in order to realistically further improve QQPhoto cache efficiency, we propose to incorporate a prefetcher in the cache stack based on the observed immediacy feature that is unique to the QQPhoto workload Evaluation results show that with appropriate prefetching we improve the cache hit ratio by up to 74%, while reducing the average access latency by 69% at a marginal cost of 414% backend network traffic compared to the original system that performs no prefetching

Journal ArticleDOI
TL;DR: This paper proposes a scalable pipeline of components built on top of the Spark engine for large-scale data processing, whose goal is collecting from different sites the dataset access logs, organizing them into weekly snapshots, and training, on these snapshots, predictive models able to forecast which datasets will become popular over time.
Abstract: The Compact Muon Solenoid (CMS) experiment at the European Organization for Nuclear Research (CERN) deploys its data collections, simulation and analysis activities on a distributed computing infrastructure involving more than 70 sites worldwide. The historical usage data recorded by this large infrastructure is a rich source of information for system tuning and capacity planning. In this paper we investigate how to leverage machine learning on this huge amount of data in order to discover patterns and correlations useful to enhance the overall efficiency of the distributed infrastructure in terms of CPU utilization and task completion time. In particular we propose a scalable pipeline of components built on top of the Spark engine for large-scale data processing, whose goal is collecting from different sites the dataset access logs, organizing them into weekly snapshots, and training, on these snapshots, predictive models able to forecast which datasets will become popular over time. The high accuracy achieved indicates the ability of the learned model to correctly separate popular datasets from unpopular ones. Dataset popularity predictions are then exploited within a novel data caching policy, called PPC (Popularity Prediction Caching). We evaluate the performance of PPC against popular caching policy baselines like LRU (Least Recently Used). The experiments conducted on large traces of real dataset accesses show that PPC outperforms LRU reducing the number of cache misses up to 20% in some sites.

Journal ArticleDOI
TL;DR: A more effective static probabilistic timing analysis (SPTA) for multi-path programs is introduced that substantially outperforms the only prior approach to SPTA, and is efficient at capturing locality in the cache.
Abstract: Probabilistic hard real-time systems, based on hardware architectures that use a random replacement cache, provide a potential means of reducing the hardware over-provision required to accommodate pathological scenarios and the associated extremely rare, but excessively long, worst-case execution times that can occur in deterministic systems. Timing analysis for probabilistic hard real-time systems requires the provision of probabilistic worst-case execution time (pWCET) estimates. The pWCET distribution can be described as an exceedance function which gives an upper bound on the probability that the execution time of a task will exceed any given execution time budget on any particular run. This paper introduces a more effective static probabilistic timing analysis (SPTA) for multi-path programs. The analysis estimates the temporal contribution of an evict-on-miss, random replacement cache to the pWCET distribution of multi-path programs. The analysis uses a conservative join function that provides a proper over-approximation of the possible cache contents and the pWCET distribution on path convergence, irrespective of the actual path followed during execution. Simple program transformations are introduced that reduce the impact of path indeterminism while ensuring sound pWCET estimates. Evaluation shows that the proposed method is efficient at capturing locality in the cache, and substantially outperforms the only prior approach to SPTA for multi-path programs based on path merging. The evaluation results show incomparability with analysis for an equivalent deterministic system using an LRU cache. For some benchmarks the performance of LRU is better, while for others, the new analysis techniques show that random replacement has provably better performance.

Journal ArticleDOI
TL;DR: The results show that the one-side layout achieves the best performance and the lowest power consumption with the considered hw–sw optimizations, and software based profile driven optimization allows the system to achieve the lowest usage of network resources.

Proceedings ArticleDOI
16 Apr 2018
TL;DR: In this article, the authors derive the asymptotic miss ratio of data item requests on a LRU cluster with consistent hashing and show that these individual cache spaces on different servers can be effectively viewed as if they could be pooled together to form a single virtual LRU cache space parametrized by an appropriate cache size.
Abstract: To efficiently scale data caching infrastructure to support emerging big data applications, many caching systems rely on consistent hashing to group a large number of servers to form a cooperative cluster. These servers are organized together according to a random hash function. They jointly provide a unified but distributed hash table to serve swift and voluminous data item requests. Different from the single least-recently-used (LRU) server that has already been extensively studied, theoretically characterizing a cluster that consists of multiple LRU servers remains yet to be explored. These servers are not simply added together; the random hashing complicates the behavior. To this end, we derive the asymptotic miss ratio of data item requests on a LRU cluster with consistent hashing. We show that these individual cache spaces on different servers can be effectively viewed as if they could be pooled together to form a single virtual LRU cache space parametrized by an appropriate cache size. This equivalence can be established rigorously under the condition that the cache sizes of the individual servers are large. For typical data caching systems this condition is common. Our theoretical framework provides a convenient abstraction that can directly apply the results from the simpler single LRU cache to the more complex LRU cluster with consistent hashing.

Posted Content
TL;DR: In this paper, the authors proposed an adaptation to the LRU strategy, called gLRU, where the file is sub-divided into equal-sized chunks, and a chunk of the newly requested file is added in the cache, and the chunk of least-recently-used file is removed from the cache.
Abstract: Caching plays a crucial role in networking systems to reduce the load on the network and is commonly employed by content delivery networks (CDNs) in order to improve performance. One of the commonly used mechanisms, Least Recently Used (LRU), works well for identical file sizes. However, for asymmetric file sizes, the performance deteriorates. This paper proposes an adaptation to the LRU strategy, called gLRU, where the file is sub-divided into equal-sized chunks. In this strategy, a chunk of the newly requested file is added in the cache, and a chunk of the least-recently-used file is removed from the cache. Even though approximate analysis for the hit rate has been studied for LRU, the analysis does not extend to gLRU since the metric of interest is no longer the hit rate as the cache has partial files. This paper provides a novel approximation analysis for this policy where the cache may have partial file contents. The approximation approach is validated by simulations. Further, gLRU outperforms the LRU strategy for a Zipf file popularity distribution and censored Pareto file size distribution for the file download times. Video streaming applications can further use the partial cache contents to help the stall duration significantly, and the numerical results indicate significant improvements (32\%) in stall duration using the gLRU strategy as compared to the LRU strategy. Furthermore, the gLRU replacement policy compares favorably to two other cache replacement policies when simulated on MSR Cambridge Traces obtained from the SNIA IOTTA repository.

Proceedings ArticleDOI
27 Apr 2018
TL;DR: Simulation results demonstrate that the proposed COCA algorithm has a better delay performance than the existing offline algorithms.
Abstract: In this paper, we study delay-aware cooperative online content caching with limited caching space and unknown content popularity in dense small cell wireless networks We propose a Cooperative Online Content cAching algorithm (COCA) that decides in which BS the requested content should be cached with considerations of three important factors: the residual cache space in each small cell basestation (SBS), the number of coordinated connections each SBS establishes with other SBSs, and the number of served users in the coverage area of each SBS In addition, due to limited storage space in the cache, the proposed COCA algorithm eliminates the least recently used (LRU) contents to free up the space We compare the delay performance of the proposed COCA algorithm with the existing offline cooperative caching schemes through simulations Simulation results demonstrate that the proposed COCA algorithm has a better delay performance than the existing offline algorithms

Journal ArticleDOI
TL;DR: An analytical performance evaluation of LRU caches that takes into account data requests and invalidation events, both modeled as independent renewal processes are presented and it is concluded that the presence of invalidation Events does not severely impact the LRU performance in single caches.

Proceedings ArticleDOI
07 Aug 2018
TL;DR: This work proposes two novel caching strategies that mine user/group interests to improve caching performance at network edge and demonstrates that the proposed caching algorithms outperform the existing caching algorithms and approach the caching performance upper bound in the large cache size regime.
Abstract: Content caching at network edge is a promising solution for serving emerging high-throughput low-delay applications, such as virtual reality, augmented reality and Internet-of-Things. The traditional caching algorithms need to adapt to the edge networking environment since old traffic assumptions may no longer hold. Meanwhile, user/group content interest as a new important element should be considered to improve the caching performance. In this work, we propose two novel caching strategies that mine user/group interests to improve caching performance at network edge. The static user-group interest patterns are handled by the Matrix Factorization method and the temporal content request patterns are handled by the Least-Recently-Used or Nearest-Neighbor algorithms. Through empirical experiments with a large-scale real IPTV user traces, we demonstrate that the proposed caching algorithms outperform the existing caching algorithms and approach the caching performance upper bound in the large cache size regime. Leveraging on offline computation, we can limit the online computation cost and achieve good caching performance in realtime.

Posted Content
TL;DR: This paper develops an analysis based on abstract interpretation that comes close to the efficiency of the classical approach, while achieving exact classification of all memory accesses as the model-checking approach, and shows that LRU cache analysis problems are in general NP-complete.
Abstract: For applications in worst-case execution time analysis and in security, it is desirable to statically classify memory accesses into those that result in cache hits, and those that result in cache misses. Among cache replacement policies, the least recently used (LRU) policy has been studied the most and is considered to be the most predictable. The state-of-the-art in LRU cache analysis presents a tradeoff between precision and analysis efficiency: The classical approach to analyzing programs running on LRU caches, an abstract interpretation based on a range abstraction, is very fast but can be imprecise. An exact analysis was recently presented, but, as a last resort, it calls a model checker, which is expensive. In this paper, we develop an analysis based on abstract interpretation that comes close to the efficiency of the classical approach, while achieving exact classification of all memory accesses as the model-checking approach. Compared with the model-checking approach we observe speedups of several orders of magnitude. As a secondary contribution we show that LRU cache analysis problems are in general NP-complete.