scispace - formally typeset
Search or ask a question

Showing papers on "Smart Cache published in 1997"


Proceedings ArticleDOI
01 Dec 1997
TL;DR: Experimental results across a wide range of embedded applications show that the filter cache results in improved memory system energy efficiency, and this work proposes to trade performance for power consumption by filtering cache references through an unusually small L1 cache.
Abstract: Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. These caches are typically implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches often consume a significant amount of power. In many applications, such as portable devices, low power is more important than performance. We propose to trade performance for power consumption by filtering cache references through an unusually small L1 cache. An L2 cache, which is similar in size and structure to a typical L1 cache, is positioned behind the filter cache and serves to reduce the performance loss. Experimental results across a wide range of embedded applications show that the filter cache results in improved memory system energy efficiency. For example, a direct mapped 256-byte filter cache achieves a 58% power reduction while reducing performance by 21%, corresponding to a 51% reduction in the energy-delay product over conventional design.

544 citations



Proceedings ArticleDOI
09 Jun 1997
TL;DR: An OS-controlled application-transparent cache-partitioning technique that can be transparently assigned to tasks for their exclusive use and the interaction of both are analysed with regard to cache-induced worst case penalties.
Abstract: Cache-partitioning techniques have been invented to make modern processors with an extensive cache structure useful in real-time systems where task switches disrupt cache working sets and hence make execution times unpredictable. This paper describes an OS-controlled application-transparent cache-partitioning technique. The resulting partitions can be transparently assigned to tasks for their exclusive use. The major drawbacks found in other cache-partitioning techniques, namely waste of memory and additions on the critical performance path within CPUs, are avoided using memory coloring techniques that do nor require changes within the chips of modern CPUs or on the critical path for performance. A simple filter algorithm commonly used in real-time systems, a matrix-multiplication algorithm and the interaction of both are analysed with regard to cache-induced worst case penalties. Worst-case penalties are determined for different widely-used cache architectures. Some insights regarding the impact of cache architectures on worst-case execution are described.

224 citations


Proceedings ArticleDOI
11 Jul 1997
TL;DR: In this article, the authors describe methods for generating and solving cache miss equations that give a detailed representation of the cache misses in loop-oriented scientific code, which can be used to guide code optimizations for improving cache performance.
Abstract: With the widening performance gap between processors and main memory, efficient memory accessing behavior is necessary for good program performance. Both hand-tuning and compiler optimization techniques are often used to transform codes to improve memory performance. Effective transformations require detailed knowledge about the frequency and causes of cache misses in the code. This paper describes methods for generating and solving Cache Miss equations that give a detailed representation of the cache misses in loop-oriented scientific code. Implemented within the SUIF compiler framework, our approach extends on traditional compiler reuse analysis to generate linear Diophantine equations that summarize each loop’s memory behavior. Mathematical techniques for msnipulating Diophantine equations allow us to compute the number of possible solutions, where each solution corresponds to a potential cache miss. These equations provide a general framework to guide code optimizations for improving cache performance. The paper gives examples of their use to determine array padding and offset amounts that minimize cache misses, and also to determine optimal blocking factors for tiled code. Overall, these equations represent an analysis framework that is more precise than traditional memory behavior heuristics, and is also potentially fazter than simulation.

205 citations


Proceedings Article
Arun Iyengar1, Jim Challenger1
08 Dec 1997
TL;DR: The DynamicWeb cache is analyzed, which resulted in near-optimal performance for many cases and 58% of optimal performance in the worst case on systems which invoke server programs via CGI.
Abstract: Dynamic Web pages can seriously reduce the performance of Web servers. One technique for improving performance is to cache dynamic Web pages. We have developed the Dynamic Web cache which is particularly well-suited for dynamic pages. Our cache has improved performance significantly at several commercial Web sites. This paper analyzes the design and performance of the DynamicWeb cache. It also presents a model for analyzing overall system performance in the presence of caching. Our cache can satisfy several hundred requests per second. On systems which invoke server programs via CGI, the DynamicWeb cache results in near-optimal performance, where optimal performance is that which would be achieved by a hypothetical cache which consumed no CPU cycles. On a system we tested which invoked server programs via ICAPI which has significantly less overhead than CGI, the DynamicWeb cache resulted in near-optimal performance for many cases and 58% of optimal performance in the worst case. The DynamicWeb cache achieved a hit rate of around 80% when it was deployed to support the official Internet Web site for the 1996 Atlanta Olympic games.

201 citations


Proceedings ArticleDOI
05 Jan 1997
TL;DR: In this article, the effect of cache misses on the performance of sorting algorithms was investigated both experimentally and analytically, and it was shown that high cache miss penalties lead to worse overall performance than the efficient comparison based sorting algorithms.
Abstract: We investigate the effect that caches have on the performance of sorting algorithms both experimentally and analytically. To address the performance problems that high cache miss penalties introduce we restructure mergesort, quicksort, and heapsort in order to improve their cache locality. For all three algorithms the improvement in cache performance leads to a reduction in total execution time. We also investigate the performance of radix sort. Despite the extremely low instruction count incurred by this linear time sorting algorithm, its relatively poor cache performance results in worse overall performance than the efficient comparison based sorting algorithms. For each algorithm we provide an analysis that closely predicts the number of cache misses incurred by the algorithm.

200 citations


Patent
28 May 1997
TL;DR: In this article, the cache memory is divided into segments to store multiple streams of data and the number of segments may be continuously adapted according to the types of access to maximize performance by maintaining a segment for each sequential stream of data.
Abstract: A magnetic disk drive with a caching system includes an intelligent interface to communicate with a host, a magnetic disk and a cache memory to buffer data transferred to and from the host. The caching system maximizes drive performance based on past access history. The caching system alters execution of commands by coalescing commands or executing internal commands in parallel. The caching system anticipates data requests by using a prefetch to store data that may be requested. The caching system divides the cache memory into segments to store multiple streams of data. The number of segments may be continuously adapted according to the types of access to maximize performance by maintaining a segment for each sequential stream of data. The caching system uses a dynamic priority list to determine segments to maintain and discard. Each segment is monitored to determine access types such as sequential, random, and repeating. The access type determines the amount of data to prefetch and to save, including a minimum and maximum prefetch. The caching system may prescan the cache memory during prefetch to alter the prefetch amount in response to a command request. The caching system may wait for a cache memory access that has not yet occurred. An initiator changes the caching parameters though a mode page.

184 citations


Proceedings ArticleDOI
11 Jul 1997
TL;DR: It is shown that for a 8 Kbyte data cache, XOR-mapping schemes approximately halve the miss ratio for two-way associative and column-associative organizations, and XOR mapping schemes provide a very significant reduction in the misses ratio for the other cache organizations, including the direct-mapped cache.
Abstract: This paper makes the case for the use of XOR-based placement functions for cache memories. It shows that these XOR-mapping schemes can eliminate many conflict misses for direct-mapped and victim caches and practically all of them for (pseudo) two-way associative organizations. The paper evaluates the performance of XOR-mapping schemes for a number of different cache organizations: direct-mapped, set-associative, victim, hash-rehash, column-associative and skewed-associative. It also proposes novel replacement policies for some of these cache organizations. In particular, it presents a low-cost implementation of a pure LRU replacement policy which demonstrates a significant improvement over the pseudo-LRU replacement previously proposed. The paper shows that for a 8 Kbyte data cache, XOR-mapping schemes approximately halve the miss ratio for two-way associative and column-associative organizations. Skewed-associative caches, which already make use of XOR-mapping functions, can benefit from the LRU replacement and also from the use of more sophisticated mapping functions. For two-way associative, columnassociative and two-way skewed-associative organizations, XORmapping schemes achieve a miss ratio that is not higher than 1.10 times that of a fully-associative cache. XOR mapping schemes also provide a very significant reduction in the miss ratio for the other cache organizations, including the direct-mapped cache. Ultimately, the conclusion of this study is that XOR-based placement functions unequivocally provide highly significant performance benefits to most cache organizations.

152 citations


Patent
31 Jul 1997
TL;DR: In this article, a cache-extension disk region is used to expand the size of the log structured cache by partitioning the cache memory region into write cache segments and redundancy data (parity) cache segments.
Abstract: Method and apparatus for accelerating write operations logging write requests in a log structured cache and by expanding the log structured cache using a cache-extension disk region The log structured cache include a cache memory region partitioned into one or more write cache segments and one or more redundancy-data (parity) cache segments The cache-extension disk region is a portion of a disk array separate from a main disk region The cache-extension disk region is also partitioned into segments and is used to extend the size of the log structured cache The main disk region is instead managed in accordance with storage management techniques (eg, RAID storage management) The write cache segment is partitioned into multiple write cache segments so that when one is full another can be used to handle new write requests When one of these multiple write cache segments is filled, it is moved to the cache-extension disk region thereby freeing the write cache segment for reuse The redundancy-data (parity) cache segment holds redundancy data for recent write requests, thereby assuring integrity of the logged write request data in the log structured cache

150 citations


Proceedings ArticleDOI
29 Dec 1997
TL;DR: This paper presents a resource-based caching (RBC) algorithm that manages the heterogeneous requirements of multiple data types and performs extensive simulations to evaluate and present simulation results that show that RBC outperforms other known caching algorithms.
Abstract: The WWW employs a hierarchical data dissemination architecture in which hyper-media objects stored at a remote server are served to clients across the Internet, and cached on disks at intermediate proxy servers. One of the objectives of web caching algorithms is to maximize the data transferred from the proxy servers or cache hierarchies. Current web caching algorithms are designed only for text and image data. Recent studies predict that within the next five years more than half the objects stored at web servers will contain continuous media data. To support these trends, the next generation proxy cache algorithms will need to handle multiple data types, each with different cache resource usage, for a cache limited by both bandwidth and space. In this paper, we present a resource-based caching (RBC) algorithm that manages the heterogeneous requirements of multiple data types. The RBC algorithm (1) characterizes each object by its resource requirement and a caching gain, (2) dynamically selects the granularity of the entity to be cached that minimally uses the limited cache resource (i.e., bandwidth or space), and (3) if required, replaces the cached entities based on their cache resource usage and caching gain. We have performed extensive simulations to evaluate our caching algorithm and present simulation results that show that RBC outperforms other known caching algorithms.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

139 citations


Proceedings ArticleDOI
01 May 1997
TL;DR: This paper presents a link-time procedure mapping algorithm which can significantly improve the eflectiveness of the instruction cache and produces an improved program layout by performing a color mapping of procedures to cache lines, taking into consideration the procedure size, cache size,cache line size, and call graph.
Abstract: As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory eflectively. Both hardware and aoftware approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replacement policy, associativity, line size and the resulting cache access time. Software writers use various optimization techniques, including software prefetching, data scheduling and code reordering. Our focus is on improving memory usage through code reordering compiler techniques.In this paper we present a link-time procedure mapping algorithm which can significantly improve the eflectiveness of the instruction cache. Our algorithm produces an improved program layout by performing a color mapping of procedures to cache lines, taking into consideration the procedure size, cache size, cache line size, and call graph. We use cache line coloring to guide the procedure mapping, indicating which cache lines to avoid when placing a procedure in the program layout. Our algorithm reduces on average the instruction cache miss rate by 40% over the original mapping and by 17% over the mapping algorithm of Pettis and Hansen [12].

Patent
30 Sep 1997
TL;DR: In this article, a central cache controller performs RAID management functions on behalf of the plurality of storage controllers including redundancy information (parity) generation and checking as well as RAID geometry (striping) management.
Abstract: Apparatus and methods which allow multiple storage controllers sharing access to common data storage devices in a data storage subsystem to access a centralized intelligent cache. The intelligent central cache provides substantial processing for storage management functions. In particular, the central cache of the present invention performs RAID management functions on behalf of the plurality of storage controllers including, for example, redundancy information (parity) generation and checking as well as RAID geometry (striping) management. The plurality of storage controllers (also referred to herein as RAID controllers) transmit cache requests to the central cache controller. The central cache controller performs all operations related to storing supplied data in cache memory as well as posting such cached data to the storage array as required. The storage controllers are significantly simplified because the present invention obviates the need for duplicative local cache memory on each of the plurality of storage controllers. The storage subsystem of the present invention obviates the need for inter-controller communication for purposes of synchronizing local cache contents of the storage controllers. The storage subsystem of the present invention offers improved scalability in that the storage controllers are simplified as compared to those of prior designs. Addition of storage controllers to enhance subsystem performance is less costly than prior designs. The central cache controller may include a mirrored cache controller to enhance redundancy of the central cache controller. Communication between the cache controller and its mirror are performed over a dedicated communication link.

Proceedings Article
08 Dec 1997
TL;DR: Trace-driven simulation of this mechanism on two large, independent data sets shows that PCV both provides stronger cache coherency and reduces the request traffic in comparison to the time-to-live (TTL) based techniques currently used.
Abstract: This paper presents work on piggyback cache validation (PCV), which addresses the problem of maintaining cache coherency for proxy caches. The novel aspect of our approach is to capitalize on requests sent from the proxy cache to the server to improve coherency. In the simplest case, whenever a proxy cache has a reason to communicate with a server it piggybacks a list of cached, but potentially stale, resources from that server for validation. Trace-driven simulation of this mechanism on two large, independent data sets shows that PCV both provides stronger cache coherency and reduces the request traffic in comparison to the time-to-live (TTL) based techniques currently used. Specifically, in comparison to the best TTL-based policy, the best PCV-based policy reduces the number of request messages from a proxy cache to a server by 16-17% and the average cost (considering response latency, request messages and bandwidth) by 6-8%. Moreover, the best PCV policy reduces the staleness ratio by 57-65% in comparison to the best TTL-based policy. Additionally, the PCV policies can easily be implemented within the HTTP 1.1 protocol.

Journal ArticleDOI
01 Sep 1997
TL;DR: This paper presents a new, delay-conscious cache replacement algorithm LNC-R-W3 which maximizes a performance metric called delay-savings-ratio and compares it with other existing cache replacement algorithms, namely LRU and LRU-MIN.
Abstract: Caching at proxy servers plays an important role in reducing the latency of the user response, the network delays and the load on Web servers. The cache performance depends critically on the design of the cache replacement algorithm. Unfortunately, most cache replacement algorithms ignore the Web's scale. In this paper we argue for the design of delay-conscious cache replacement algorithms which explicitly consider the Web's scale by preferentially caching documents which require a long time to fetch to the cache. We present a new, delay-conscious cache replacement algorithm LNC-R-W3 which maximizes a performance metric called delay-savings-ratio. Subsequently, we test the performance of LNC-R-W3 experimentally and compare it with the performance of other existing cache replacement algorithms, namely LRU and LRU-MIN.

01 Sep 1997
TL;DR: In this article, the authors describe the application of ICPv2 (Internet Cache Protocol version 2, RFC2186) to Web caching, which is a lightweight message format used for communication among Web caches.
Abstract: This document describes the application of ICPv2 (Internet Cache Protocol version 2, RFC2186) to Web caching. ICPv2 is a lightweight message format used for communication among Web caches. Several independent caching implementations now use ICP[3,5], making it important to codify the existing practical uses of ICP for those trying to implement, deploy, and extend its use.

Patent
25 Jun 1997
TL;DR: In this paper, a method of executing coded instructions in a dynamically configurable multiprocessor having shared execution resources including steps of placing a first processor in an active state upon booting of the multi-core processor.
Abstract: A method of executing coded instructions in a dynamically configurable multiprocessor having shared execution resources including steps of placing a first processor in an active state upon booting of the multiprocessor. In response to a processor create command, a second processor is placed in an active state. When either the first or second processor encounter a cache miss that has to be serviced by off-chip cache the processor requiring service is placed in nap state in which instruction fetching for that processor is disabled. When either the first or second processor encounter a cache miss that has to be serviced by main memory, the processor requiring services I placed in a sleep state by flushing all instructions from the processor in the sleep state and disabling instruction fetching for the processor in the sleep state.

Patent
07 Mar 1997
TL;DR: In this paper, cache misses occur simultaneously on two or more ports of a multi-port cache, different replacement sets are selected for different ports through different write ports, and the replacements are performed simultaneously through different read ports.
Abstract: When cache misses occur simultaneously on two or more ports of a multi-port cache, different replacement sets are selected for different ports. The replacements are performed simultaneously through different write ports. In some embodiments, every set has its own write ports. The tag memory of every set has its own write port. In addition, the tag memory of every set has several read ports, one read port for every port of the cache. For every cache entry, a tree data structure is provided to implement a tree replacement policy (for example, a tree LRU replacement policy). If only one cache miss occurred, the search for the replacement set is started from the root of the tree. If multiple cache misses occurred simultaneously, the search starts at a tree level that has at least as many nodes as the number of cache misses. For each cache miss, a separate node is selected at that tree level, and the search for the respective replacement set starts at the selected node.

Patent
28 Feb 1997
TL;DR: In this paper, an active cache memory for use with microprocessors is disclosed, which is capable of performing transfers from external random access memory independently of the encache misaligned references and to transfer data to the microprocessor in bursts.
Abstract: An active cache memory for use with microprocessors is disclosed. The cache is external to the microprocessor and forms a second level cache which is novel in that it is capable of performing transfers from external random access memory independently of the encache misaligned references and to transfer data to the microprocessor in bursts.

Patent
29 May 1997
TL;DR: In this paper, a virtual data storage system provides a method and apparatus for adaptively throttling transfers into a cache storage to prevent an overrun in the cache storage, and a recall throttle is computed based on cache free space and a number of storage devices reserved for recalling data files from the set of storage volumes.
Abstract: A virtual data storage system provides a method and apparatus for adaptively throttling transfers into a cache storage to prevent an overrun in the cache storage. The virtual data storage system includes a storage interface appearing as a set of addressable, virtual storage devices, a cache storage for initially storing host-originated data files, storage devices for eventually storing the data files on a set of storage volumes, and a storage manager for directing the data files between the cache storage and the storage devices. An amount of available space in the cache storage, or a cache free space, is monitored against an adjustable cache space threshold. A storage throttle is computed when the cache free space drops below the cache space threshold. Additionally, a recall throttle is computed based on the cache free space and a number of storage devices reserved for recalling data files from the set of storage volumes. A maximum value of the storage throttle and the recall throttle is used to delay the storing of data files and the recalling of data files into the cache storage and to prevent overrunning the cache storage by completely depleting the cache free space.

Patent
07 Mar 1997
TL;DR: In this article, a battery backup mirrored cache memory module (200) for a cache dynamic random access memory (DRAM) system that senses the Vcc level supplied through the cache controller (310) to the cache memory and switches off the battery backup apparatus (400) switches cache memory array to a backup battery Vcc source (220), and a backup refresh control generator unit (230) that is also powered by the backup battery source.
Abstract: A battery backup mirrored cache memory module (210) for a cache dynamic random access memory (DRAM (200)) system that senses the Vcc level supplied through the cache controller (310) to the cache memory and, if the cache controller supplied Vcc falls below a preset threshold level, the battery backup apparatus (400) switches (210) the cache memory array to a backup battery Vcc source (220), and a backup refresh control generator unit (230) that is also powered by the backup battery Vcc source (220). The cache DRAM (200), backup battery (220), and backup refresh generator are physically contained in a single module (400) that can be disconnected from the cache controller and host while preserving cache memory contents. The backup system is installed in an operating system for recovery of the cache memory contents and/or resumption of execution of the program that was running when the Vcc power failure occurred.

Journal ArticleDOI
TL;DR: Two schemes for implementing associativity greater than two are proposed, which are an extension of the column-associative cache and the parallel multicolumn cache, which can effectively reduce the average access time.
Abstract: In the race to improve cache performance, many researchers have proposed schemes that increase a cache's associativity. The associativity of a cache is the number of places in the cache where a block may reside. In a direct-mapped cache, which has an associativity of 1, there is only one location to search for a match for each reference. In a cache with associativity n-an n-way set-associative cache-there are n locations. Increasing associativity reduces the miss rate by decreasing the number of conflict, or interference, references. The column-associative cache and the predictive sequential associative cache seem to have achieved near-optimal performance for an associativity of two. Increasing associativity beyond two, therefore, is one of the most important ways to further improve cache performance. We propose two schemes for implementing associativity greater than two: the sequential multicolumn cache, which is an extension of the column-associative cache, and the parallel multicolumn cache. For an associativity of four, they achieve the low miss rate of a four-way set-associative cache. Our simulation results show that both schemes can effectively reduce the average access time.

Patent
25 Aug 1997
TL;DR: In this article, a reconfigurable cache optimized for texture mapping is proposed, which provides two-banks of memory during one mode of operation and a palettized map under a second mode of operations.
Abstract: A reconfigurable cache in a signal processor provides a cache optimized for texture mapping. In particular, the reconfigurable cache provides two-banks of memory during one mode of operation and a palettized map under a second mode of operation. In one implementation, the reconfigurable cache optimizes mip-mapping by assigning one texture map in one of the memory banks and a second texture map of a different resolution to the other memory bank. A special mapping pattern ("supertiling") between a graphical image to cache lines minimizes cache misses in texture mapping operations.

Patent
03 Jun 1997
TL;DR: In this article, a superscalar microprocessor employing a data cache configured to perform store accesses in a single clock cycle is provided, where the data cache speculatively stores data within a predicted way of the cache after capturing the data currently being stored in that predicted way.
Abstract: A superscalar microprocessor employing a data cache configured to perform store accesses in a single clock cycle is provided. The superscalar microprocessor speculatively stores data within a predicted way of the data cache after capturing the data currently being stored in that predicted way. During a subsequent clock cycle, the cache hit information for the store access validates the way prediction. If the way prediction is correct, then the store is complete, utilizing a single clock cycle of data cache bandwidth. Additionally, the way prediction structure implemented within the data cache bypasses the tag comparisons of the data cache to select data bytes for the output. Therefore, the access time of the associative data cache may be substantially similar to a direct-mapped cache access time. The superscalar microprocessor may therefore be capable of high frequency operation.

Proceedings ArticleDOI
01 Dec 1997
TL;DR: The Locally-Based Interleaved Cache (LBIC) as discussed by the authors was proposed to exploit the characteristics of the data reference stream while approaching the economy of traditional multi-bank cache design.
Abstract: Highly aggressive multi-issue processor designs of the past few years and projections for the decade, require that we redesign the operation of the cache memory system. The number of instructions that must be processed (including incorrectly predicted ones) will approach 16 or more per cycle. Since memory operations account for about a third of all instructions executed, these systems will have to support multiple data references per cycle. In this paper, we explore reference stream characteristics to determine how best to meet the need for ever increasing access rates. We identify limitations of existing multi-ported cache designs and propose a new structure, the Locally-Based Interleaved Cache (LBIC), to exploit the characteristics of the data reference stream while approaching the economy of traditional multi-bank cache design. Experimental results show that the LBIC structure is capable of outper forming current multi-ported approaches.

Patent
17 Mar 1997
TL;DR: In this article, the cache controller determines whether the requested data object is to be cached or is exempt from being cached, and then it is loaded directly into a local memory and is not stored in the cache.
Abstract: A method for selectively caching data in a computer network. Initially, data objects which are anticipated as being accessed only once or seldomly accessed are designated as being exempt from being cached. When a read request is generated, the cache controller reads the requested data object from the cache memory if it currently resides in the cache memory. However, if the requested data object cannot be found in the cache memory, it is read from a mass storage device. Thereupon, the cache controller determines whether the requested data object is to be cached or is exempt from being cached. If the data object is exempt from being cached, it is loaded directly into a local memory and is not stored in the cache. This provides improved cache utilization because only objects that are used multiple times are entered in the cache. Furthermore, processing overhead is minimized by reducing unnecessary cache insertion and purging operations. In addition, I/O operations are minimized by increasing the likelihood that hot objects are retained in the cache longer at the expense of infrequently used objects.

Patent
Gary T. Hunt1
17 Oct 1997
TL;DR: In this article, the authors propose a shared cache contents data structure including information required to time-out pages and to determine if a page is in the process of being loaded or updated by another client sharing the cache.
Abstract: Browsers for different clients in an enterprise are configured to cache pages at least in part in a common file area in a remote, shared file server. Duplication or redundancy in caching pages is thus eliminate, and a larger body of distinct pages may be cached within a given allocation of memory space. Each remote, shared cache includes a shared cache contents data structure including information required to “time-out” pages and to determine if a page is in the process of being loaded or updated by another client sharing the cache. Where multiple caches are supported by the browsers, the remote, shared cache may form part of a local/remote cache hierarchy. When accessing a page, browsers check each cache in a multiple cache configuration, updating all caches as necessary.

Proceedings ArticleDOI
01 Oct 1997
TL;DR: HAC is a hybrid between page and object caching that combines the virtues of both while avoiding their disadvantages, and is able to perform well even when locality is poor, since it can discard pages while retaining their hot objects.
Abstract: This paper presents HAC, a novel technique for managing the client cache in a distributed, persistent object storage system. HAC is a hybrid between page and object caching that combines the virtues of both while avoiding their disadvantages. It achieves the low miss penalties of a page-caching system, but is able to perform well even when locality is poor, since it can discard pages while retaining their hot objects. It realizes the potentially lower miss rates of object-caching systems, yet avoids their problems of fragmentation and high overheads. Furthermore, HAC is adaptive: when locality is good it behaves like a page-caching system, while if locality is poor it behaves like an object-caching system. It is able to adjust the amount of cache space devoted to pages dynamically so that space in the cache can be used in the way that best matches the needs of the application. The paper also presents results of experiments that indicate that HAC outperforms other object storage systems across a wide range of cache sizes and workloads; it performs substantially better on the expected workloads, which have low to moderate locality. Thus we show that our hybrid, adaptive approach is the cache management technique of choice for distributed, persistent object systems.

Proceedings ArticleDOI
23 Jun 1997
TL;DR: It is proposed that Soft Caching, where an image can be cached at one of a set of levels of resolutions, can benefit the overall performance when combined with cache management strategies that estimate, for each object, both the bandwidth to the server where the object is stored and the appropriate resolution level demanded by the user.
Abstract: The vast majority of current Internet traffic is generated by web browsing applications. Proxy caching, which allows some of the most popular web objects to be cached at intermediate nodes within the network, has been shown to provide substantial performance improvements. In this paper we argue that image-specific caching strategies are desirable and will result in improved performance over approaches treating all objects alike. We propose that Soft Caching, where an image can be cached at one of a set of levels of resolutions, can benefit the overall performance when combined with cache management strategies that estimate, for each object, both the bandwidth to the server where the object is stored and the appropriate resolution level demanded by the user. We formalize the cache management problem under these conditions and describe an experimental system to test these techniques.

Proceedings ArticleDOI
22 Sep 1997
TL;DR: This paper uses log files from four Web servers and proposes and evaluates static caching, a novel cache policy for Web servers that incur no CPU overhead and does not suffer from memory fragmentation.
Abstract: This paper studies caching in primary Web servers. We use log files from four Web servers to analyze the performance of various proposed cache policies for Web servers: LRU-threshold, LFU, LRU-SIZE, LRU-MIN, LRU-k-threshold and the Pitkow/Recker (1994) policy. Web document access patterns change very slowly. Based on this fact, we propose and evaluate static caching, a novel cache policy for Web servers. In static caching, the set of documents kept in the cache is determined periodically by analyzing the request log file for the previous period. The cache is filled with documents to maximize cache performance provided document access patterns do not change. The set of cached documents remains constant during the period. Surprisingly, this simple policy results in high cache performance, especially for small cache sizes. Unlike other policies, static caching incur no CPU overhead and does not suffer from memory fragmentation.

Patent
19 Dec 1997
TL;DR: In this paper, an improved hashing system is presented that takes advantage of the caching architecture of many of today's processors to improve performance, where collisions occur so that the buckets contain many entries, and at runtime, the entries in the buckets are reordered to increase the number of times that the primary cache of the processor is used and to reduce the use of main memory.
Abstract: An improved hashing system is provided that takes advantage of the caching architecture of many of today's processors to improve performance. Some of today's most advanced processors, like the PENTIUM processor, have a two level caching scheme utilizing a primary cache and a secondary cache, where data contained in the primary cache is accessible 50-150 times faster than data in main memory. The improved hashing system ensures that collisions occur so that the buckets contain many entries, and at runtime, the entries in the buckets are reordered to increase the number of times that the primary cache of the processor is used and to reduce the number of times that main memory is used, thereby improving the performance of the hashing system.