scispace - formally typeset
Search or ask a question

Showing papers on "Cache algorithms published in 1997"


Proceedings Article
08 Dec 1997
TL;DR: GreedyDual-Size as discussed by the authors incorporates locality with cost and size concerns in a simple and nonparameterized fashion for high performance, which can potentially improve the performance of main-memory caching of Web documents.
Abstract: Web caches can not only reduce network traffic and downloading latency, but can also affect the distribution of web traffic over the network through cost-aware caching. This paper introduces GreedyDual-Size, which incorporates locality with cost and size concerns in a simple and non-parameterized fashion for high performance. Trace-driven simulations show that with the appropriate cost definition, GreedyDual-Size outperforms existing web cache replacement algorithms in many aspects, including hit ratios, latency reduction and network cost reduction. In addition, GreedyDual-Size can potentially improve the performance of main-memory caching of Web documents.

1,048 citations


Proceedings ArticleDOI
01 Dec 1997
TL;DR: Experimental results across a wide range of embedded applications show that the filter cache results in improved memory system energy efficiency, and this work proposes to trade performance for power consumption by filtering cache references through an unusually small L1 cache.
Abstract: Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. These caches are typically implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches often consume a significant amount of power. In many applications, such as portable devices, low power is more important than performance. We propose to trade performance for power consumption by filtering cache references through an unusually small L1 cache. An L2 cache, which is similar in size and structure to a typical L1 cache, is positioned behind the filter cache and serves to reduce the performance loss. Experimental results across a wide range of embedded applications show that the filter cache results in improved memory system energy efficiency. For example, a direct mapped 256-byte filter cache achieves a 58% power reduction while reducing performance by 21%, corresponding to a 51% reduction in the energy-delay product over conventional design.

544 citations


Patent
25 Sep 1997
TL;DR: In this article, a method for storing a plurality of multimedia objects in a cache memory is described, where first ones of the multimedia objects are written into the cache memory sequentially from the beginning of the cache cache memory in the order in which they are received.
Abstract: A method for storing a plurality of multimedia objects in a cache memory is described. First ones of the multimedia objects are written into the cache memory sequentially from the beginning of the cache memory in the order in which they are received. When a first memory amount from a most recently stored one of the first multimedia objects to the end of the cache memory is insufficient to accommodate a new multimedia object, the new multimedia object is written from the beginning of the cache memory, thereby writing over a previously stored one of the first multimedia objects. Second ones of the multimedia objects are then written into the cache memory sequentially following the new multimedia object in the order in which they are received, thereby writing over the first ones of the multimedia objects. This cycle is repeated, thereby maintaining a substantially full cache memory.

398 citations


Proceedings ArticleDOI
03 Aug 1997
TL;DR: This work has developed algorithms that use caching and lazy creation of texture and geometry to manage scene complexity and increase locality of reference by dynamically reordering the rendering computation based on the contents of the cache.
Abstract: Simulating realistic lighting and rendering complex scenes are usually considered separate problems with incompatible solutions. Accurate lighting calculations are typically performed using ray tracing algorithms, which require that the entire scene database reside in memory to perform well. Conversely, most systems capable of rendering complex scenes use scan-conversion algorithms that access memory coherently, but are unable to incorporate sophisticated illumination. We have developed algorithms that use caching and lazy creation of texture and geometry to manage scene complexity. To improve cache performance, we increase locality of reference by dynamically reordering the rendering computation based on the contents of the cache. We have used these algorithms to compute images of scenes containing millions of primitives, while storing ten percent of the scene description in memory. Thus, a machine of a given memory capacity can render realistic scenes that are an order of magnitude more complex than was previously possible. CR Categories: I.3.3 [Computer Graphics]: Picture/Image Generation; I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Raytracing

263 citations



Journal ArticleDOI
01 Sep 1997
TL;DR: Two new caching algorithms that keep in the cache documents that take the longest to retrieve are explored, and are compared to the best three existing policies—LRU, LFU, and SIZE—using three measures-user response time and ability to minimize Web server loads and network bandwidth consumed.
Abstract: Do users wait less if proxy caches incorporate estimates of the current network conditions into document replacement algorithms? To answer this, we explore two new caching algorithms: (1) keep in the cache documents that take the longest to retrieve; and (2) use a hybrid of several factors, trying to keep in the cache documents from servers that take a long time to connect to, that must be loaded over the slowest Internet links, that have been referenced the most frequently, and that are small. The algorithms work by estimating the Web page download delays or proxy-to-Web server bandwidth using recent page fetches. The new algorithms are compared to the best three existing policies—LRU, LFU, and SIZE—using three measures-user response time and ability to minimize Web server loads and network bandwidth consumed—on workloads from Virginia Tech and Boston University.

236 citations


Patent
Brijesh Agarwal1
28 Feb 1997
TL;DR: In this article, an Optimizer communicates with a Buffer Manager before it formulates the query plan, and the Optimizer formulates a query strategy or plan with "hints", which are ultimately passed to the Cache or Buffer Manager.
Abstract: Database system and methods are described for improving execution speed of database queries (e.g., for transaction processing and for decision support) by optimizing use of buffer caches. The system includes an Optimizer for formulating an optimal strategy for a given query. More particularly, the Optimizer communicates with a Buffer Manager before it formulates the query plan. For instance, the Optimizer may query the Buffer Manager for the purpose of determining whether the object of interest (e.g., table or index to be scanned) exists in its own buffer cache (i.e., whether it has been bound to a particular named cache). If the object exists in its own cache, the Optimizer may inquire as to how much of the cache (i.e., how much memory) the object requires, together with the optimal I/O size for the cache (e.g., 16K blocks). Based on this information, the Optimizer formulates a query strategy or plan with "hints," which are ultimately passed to the Cache or Buffer Manager. By formulating "hints" for the Buffer Manager at the level of the Optimizer, knowledge of the query is, in effect, passed down to the Buffer Manager so that it may service the query using an optimal caching strategy--one based on the dynamics of the query itself. Based on the "hints" received from the Optimizer, the Buffer Manager can fine tune input/output (i.e., cache management) for the query. Specific Optimizer strategies are described for each scan method available to the system, including heap scan, clustered index, and non-clustered index access. Additional strategies are described for multi-table access during processing of join queries.

228 citations


Proceedings ArticleDOI
09 Jun 1997
TL;DR: An OS-controlled application-transparent cache-partitioning technique that can be transparently assigned to tasks for their exclusive use and the interaction of both are analysed with regard to cache-induced worst case penalties.
Abstract: Cache-partitioning techniques have been invented to make modern processors with an extensive cache structure useful in real-time systems where task switches disrupt cache working sets and hence make execution times unpredictable. This paper describes an OS-controlled application-transparent cache-partitioning technique. The resulting partitions can be transparently assigned to tasks for their exclusive use. The major drawbacks found in other cache-partitioning techniques, namely waste of memory and additions on the critical performance path within CPUs, are avoided using memory coloring techniques that do nor require changes within the chips of modern CPUs or on the critical path for performance. A simple filter algorithm commonly used in real-time systems, a matrix-multiplication algorithm and the interaction of both are analysed with regard to cache-induced worst case penalties. Worst-case penalties are determined for different widely-used cache architectures. Some insights regarding the impact of cache architectures on worst-case execution are described.

224 citations


Journal ArticleDOI
TL;DR: A taxonomy is presented that describes the design space for transactional cache consistency maintenance algorithms and shows how proposed algorithms relate to one another and investigates the performance of six of these algorithms, and examines the tradeoffs inherent in the design choices identified in the taxonomy.
Abstract: Client-server database systems based on a data shipping model can exploit client memory resources by caching copies of data items across transaction boundaries. Caching reduces the need to obtain data from servers or other sites on the network. In order to ensure that such caching does not result in the violation of transaction semantics, a transactional cache consistency maintenance algorithm is required. Many such algorithms have been proposed in the literature and, as all provide the same functionality, performance is a primary concern in choosing among them. In this article we present a taxonomy that describes the design space for transactional cache consistency maintenance algorithms and show how proposed algorithms relate to one another. We then investigate the performance of six of these algorithms, and use these results to examine the tradeoffs inherent in the design choices identified in the taxonomy. The results show that the interactions among dimensions of the design space impact performance in many ways, and that classifications of algorithms as simply “pessimistic” or “optimistic” do not accurately characterize the similarities and differences among the many possible cache consistency algorithms.

206 citations


Proceedings ArticleDOI
11 Jul 1997
TL;DR: In this article, the authors describe methods for generating and solving cache miss equations that give a detailed representation of the cache misses in loop-oriented scientific code, which can be used to guide code optimizations for improving cache performance.
Abstract: With the widening performance gap between processors and main memory, efficient memory accessing behavior is necessary for good program performance. Both hand-tuning and compiler optimization techniques are often used to transform codes to improve memory performance. Effective transformations require detailed knowledge about the frequency and causes of cache misses in the code. This paper describes methods for generating and solving Cache Miss equations that give a detailed representation of the cache misses in loop-oriented scientific code. Implemented within the SUIF compiler framework, our approach extends on traditional compiler reuse analysis to generate linear Diophantine equations that summarize each loop’s memory behavior. Mathematical techniques for msnipulating Diophantine equations allow us to compute the number of possible solutions, where each solution corresponds to a potential cache miss. These equations provide a general framework to guide code optimizations for improving cache performance. The paper gives examples of their use to determine array padding and offset amounts that minimize cache misses, and also to determine optimal blocking factors for tiled code. Overall, these equations represent an analysis framework that is more precise than traditional memory behavior heuristics, and is also potentially fazter than simulation.

205 citations


01 Sep 1997
TL;DR: This document describes version 2 of the Internet Cache Protocol (ICPv2) as currently implemented in two World-Wide Web proxy cache packages[3,5].
Abstract: This document describes version 2 of the Internet Cache Protocol (ICPv2) as currently implemented in two World-Wide Web proxy cache packages[3,5]. ICP is a lightweight message format used for communicating among Web caches. ICP is used to exchange hints about the existence of URLs in neighbor caches. Caches exchange ICP queries and replies to gather information to use in selecting the most appropriate location from which to retrieve an object.

Proceedings Article
Arun Iyengar1, Jim Challenger1
08 Dec 1997
TL;DR: The DynamicWeb cache is analyzed, which resulted in near-optimal performance for many cases and 58% of optimal performance in the worst case on systems which invoke server programs via CGI.
Abstract: Dynamic Web pages can seriously reduce the performance of Web servers. One technique for improving performance is to cache dynamic Web pages. We have developed the Dynamic Web cache which is particularly well-suited for dynamic pages. Our cache has improved performance significantly at several commercial Web sites. This paper analyzes the design and performance of the DynamicWeb cache. It also presents a model for analyzing overall system performance in the presence of caching. Our cache can satisfy several hundred requests per second. On systems which invoke server programs via CGI, the DynamicWeb cache results in near-optimal performance, where optimal performance is that which would be achieved by a hypothetical cache which consumed no CPU cycles. On a system we tested which invoked server programs via ICAPI which has significantly less overhead than CGI, the DynamicWeb cache resulted in near-optimal performance for many cases and 58% of optimal performance in the worst case. The DynamicWeb cache achieved a hit rate of around 80% when it was deployed to support the official Internet Web site for the 1996 Atlanta Olympic games.

Proceedings ArticleDOI
05 Jan 1997
TL;DR: In this article, the effect of cache misses on the performance of sorting algorithms was investigated both experimentally and analytically, and it was shown that high cache miss penalties lead to worse overall performance than the efficient comparison based sorting algorithms.
Abstract: We investigate the effect that caches have on the performance of sorting algorithms both experimentally and analytically. To address the performance problems that high cache miss penalties introduce we restructure mergesort, quicksort, and heapsort in order to improve their cache locality. For all three algorithms the improvement in cache performance leads to a reduction in total execution time. We also investigate the performance of radix sort. Despite the extremely low instruction count incurred by this linear time sorting algorithm, its relatively poor cache performance results in worse overall performance than the efficient comparison based sorting algorithms. For each algorithm we provide an analysis that closely predicts the number of cache misses incurred by the algorithm.

Patent
24 Sep 1997
TL;DR: In this paper, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored.
Abstract: On the Internet, different caches may contain copies of objects that have been copied from originating servers when they were accessed by users. Interconnected caches may have different objects stored thereon that might at some time be requested by a client terminal that is connected to a cache other than the one on which the object is stored. Rather than awaiting a request for a particular object and then querying each neighbor cache to determine whether a copy of the requested object is stored thereon, and then downloading the requested object if it is found, information about the contents of the neighbor caches is exchanged between these caches so that when a request for an object is received, the object can be retrieved from the cache in which it is stored. In the alternative, the object may be retrieved from the originating server if, for example, the object stored in a cache is stale based on the date and time it was last modified in the cache.

Proceedings ArticleDOI
01 May 1997
TL;DR: A technique for dynamic analysis of program data access behavior is presented, which is then used to proactively guide the placement of data within the cache hierarchy in a location-sensitive manner and is fully compatible with existing Instruction Set Architectures.
Abstract: Improvements in main memory speeds have not kept pace with increasing processor clock frequency and improved exploitation of instruction-level parallelism. Consequently, the gap between processor and main memory performance is expected to grow, increasing the number of execution cycles spent waiting for memory accesses to complete. One solution to this growing problem is to reduce the number of cache misses by increasing the effectiveness of the cache hierarchy. In this paper we present a technique for dynamic analysis of program data access behavior, which is then used to proactively guide the placement of data within the cache hierarchy in a location-sensitive manner. We introduce the concept of a macroblock, which allows us to feasibly characterize the memory locations accessed by a program, and a Memory Address Table, which performs the dynamic reference analysis. Our technique is fully compatible with existing Instruction Set Architectures. Results from detailed simulations of several integer programs show significant speedups.

Patent
29 Dec 1997
TL;DR: In this article, the cache directory structure is used for defining the name of each configured central cache system and for providing an index value identifying the particular set of descriptors associated therewith.
Abstract: A host system includes a multicache system configured within the host system's memory which has a plurality of local and central cache systems used for storing information being utilized by a plurality of processes running on the system. Persistent shared memory is used to store control structure information entries required for operating central cache systems for substantially long periods of time in conjunction with the local caches established for the processes. Such entries includes a descriptor value for identifying a directory control structure and individual sets of descriptors for identifying a group of control structures defining those components required for operating the configured central cache systems. The cache directory structure is used for defining the name of each configured central cache system and for providing an index value identifying the particular set of descriptors associated therewith. The multicache system also includes a plurality of interfaces for configuring the basic characteristics of both local and central cache systems as a function of the type and performance requirements of application processes being run.

Proceedings ArticleDOI
01 Dec 1997
TL;DR: This work revisits memory hierarchy design viewing memory as an inter-operation communication agent and uses data dependence prediction to identify and link dependent loads and stores so that they can communicate speculatively without incurring the overhead of address calculation, disambiguation and data cache access.
Abstract: We revisit memory hierarchy design viewing memory as an inter-operation communication agent. This perspective leads to the development of novel methods of performing inter-operation memory communication. We use data dependence prediction to identify and link dependent loads and stores so that they can communicate speculatively without incurring the overhead of address calculation, disambiguation and data cache access. We also use data dependence prediction to convert, DEF-store-load-USE chains within the instruction window into DEF-USE chains prior to address calculation and disambiguation. We use true and output data dependence status prediction to introduce and manage a small storage structure called the transient value cache (TVC). The TVC captures memory values that are short-lived. It also captures recently stored values that are likely to be accessed soon. Accesses that are serviced by the TVC do not have to be serviced by other parts of the memory hierarchy, e.g., the data cache. The first two techniques are aimed at reducing the effective communication latency whereas the last technique is aimed at reducing data cache bandwidth requirements. Experimental analysis of the proposed techniques shows that: the proposed speculative communication methods correctly handle a large fraction of memory dependences; and a large number of the loads and stores do not have to ever reach the data cache when the TVC is in place.

Patent
02 Apr 1997
TL;DR: In this paper, the PICS protocol is used to pass the caching information of some or all the upper hierarchy down the hierarchy, and the caching status information can also be used to direct the object request to the closest higher level proxy which has potentially cached the object, instead of blindly requesting it from the next immediate higher layer proxy.
Abstract: A method and system of collaboratively caching information to allow improved caching decisions by a lower level or sibling node. In a caching hierarchy, the client and/or servers may factor in the caching status at the higher level in deciding whether to cache an object and which objects are to be replaced. The PICS protocol may be used to pass the caching information of some or all the upper hierarchy down the hierarchy. Furthermore, the caching status information can also be used to direct the object request to the closest higher level proxy which has potentially cached the object, instead of blindly requesting it from the next immediate higher level proxy. A selection policy used to select objects for replacement in the cache may be prioritized not only on the size and the frequency of access of the object, but also on the access time required to get the object if it is not cached. The selection policy may also include a selection weight factor wherein each object is assigned a selection weight based on its replacement cost, the object size and how frequently it is modified. Non-uniform size objects may be classified in ranges of selection weights having geometrically increasing intervals. Multiple LRU stacks may be independently maintained wherein each stack contains objects in a certain range of selection weights. In order to choose candidates for replacement, only the least recently used objects in each group need be considered.

Proceedings ArticleDOI
09 Jun 1997
TL;DR: Results of incorporating instruction cache predictions within pipeline simulation show that timing predictions for set-associative caches remain just as tight as predictions for direct-mapped caches.
Abstract: The contributions of this paper are twofold. First, an automatic tool-based approach is described to bound worst-case data cache performance. The given approach works on fully optimized code, performs the analysis over the entire control flow of a program, detects and exploits both spatial and temporal locality within data references, produces results typically within a few seconds, and estimates, on average, 30% tighter WCET bounds than can be predicted without analyzing data cache behavior. Results obtained by running the system on representative programs are presented and indicate that timing analysis of data cache behavior can result in significantly tighter worst-case performance predictions. Second, a framework to bound worst-case instruction cache performance for set-associative caches is formally introduced and operationally described. Results of incorporating instruction cache predictions within pipeline simulation show that timing predictions for set-associative caches remain just as tight as predictions for direct-mapped caches. The cache simulation overhead scales linearly with increasing associativity.

Proceedings ArticleDOI
11 Jul 1997
TL;DR: It is shown that for a 8 Kbyte data cache, XOR-mapping schemes approximately halve the miss ratio for two-way associative and column-associative organizations, and XOR mapping schemes provide a very significant reduction in the misses ratio for the other cache organizations, including the direct-mapped cache.
Abstract: This paper makes the case for the use of XOR-based placement functions for cache memories. It shows that these XOR-mapping schemes can eliminate many conflict misses for direct-mapped and victim caches and practically all of them for (pseudo) two-way associative organizations. The paper evaluates the performance of XOR-mapping schemes for a number of different cache organizations: direct-mapped, set-associative, victim, hash-rehash, column-associative and skewed-associative. It also proposes novel replacement policies for some of these cache organizations. In particular, it presents a low-cost implementation of a pure LRU replacement policy which demonstrates a significant improvement over the pseudo-LRU replacement previously proposed. The paper shows that for a 8 Kbyte data cache, XOR-mapping schemes approximately halve the miss ratio for two-way associative and column-associative organizations. Skewed-associative caches, which already make use of XOR-mapping functions, can benefit from the LRU replacement and also from the use of more sophisticated mapping functions. For two-way associative, columnassociative and two-way skewed-associative organizations, XORmapping schemes achieve a miss ratio that is not higher than 1.10 times that of a fully-associative cache. XOR mapping schemes also provide a very significant reduction in the miss ratio for the other cache organizations, including the direct-mapped cache. Ultimately, the conclusion of this study is that XOR-based placement functions unequivocally provide highly significant performance benefits to most cache organizations.

Patent
31 Jul 1997
TL;DR: In this article, a cache-extension disk region is used to expand the size of the log structured cache by partitioning the cache memory region into write cache segments and redundancy data (parity) cache segments.
Abstract: Method and apparatus for accelerating write operations logging write requests in a log structured cache and by expanding the log structured cache using a cache-extension disk region The log structured cache include a cache memory region partitioned into one or more write cache segments and one or more redundancy-data (parity) cache segments The cache-extension disk region is a portion of a disk array separate from a main disk region The cache-extension disk region is also partitioned into segments and is used to extend the size of the log structured cache The main disk region is instead managed in accordance with storage management techniques (eg, RAID storage management) The write cache segment is partitioned into multiple write cache segments so that when one is full another can be used to handle new write requests When one of these multiple write cache segments is filled, it is moved to the cache-extension disk region thereby freeing the write cache segment for reuse The redundancy-data (parity) cache segment holds redundancy data for recent write requests, thereby assuring integrity of the logged write request data in the log structured cache

Proceedings ArticleDOI
29 Dec 1997
TL;DR: This paper presents a resource-based caching (RBC) algorithm that manages the heterogeneous requirements of multiple data types and performs extensive simulations to evaluate and present simulation results that show that RBC outperforms other known caching algorithms.
Abstract: The WWW employs a hierarchical data dissemination architecture in which hyper-media objects stored at a remote server are served to clients across the Internet, and cached on disks at intermediate proxy servers. One of the objectives of web caching algorithms is to maximize the data transferred from the proxy servers or cache hierarchies. Current web caching algorithms are designed only for text and image data. Recent studies predict that within the next five years more than half the objects stored at web servers will contain continuous media data. To support these trends, the next generation proxy cache algorithms will need to handle multiple data types, each with different cache resource usage, for a cache limited by both bandwidth and space. In this paper, we present a resource-based caching (RBC) algorithm that manages the heterogeneous requirements of multiple data types. The RBC algorithm (1) characterizes each object by its resource requirement and a caching gain, (2) dynamically selects the granularity of the entity to be cached that minimally uses the limited cache resource (i.e., bandwidth or space), and (3) if required, replaces the cached entities based on their cache resource usage and caching gain. We have performed extensive simulations to evaluate our caching algorithm and present simulation results that show that RBC outperforms other known caching algorithms.© (1997) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
01 May 1997
TL;DR: This paper presents a link-time procedure mapping algorithm which can significantly improve the eflectiveness of the instruction cache and produces an improved program layout by performing a color mapping of procedures to cache lines, taking into consideration the procedure size, cache size,cache line size, and call graph.
Abstract: As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory eflectively. Both hardware and aoftware approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replacement policy, associativity, line size and the resulting cache access time. Software writers use various optimization techniques, including software prefetching, data scheduling and code reordering. Our focus is on improving memory usage through code reordering compiler techniques.In this paper we present a link-time procedure mapping algorithm which can significantly improve the eflectiveness of the instruction cache. Our algorithm produces an improved program layout by performing a color mapping of procedures to cache lines, taking into consideration the procedure size, cache size, cache line size, and call graph. We use cache line coloring to guide the procedure mapping, indicating which cache lines to avoid when placing a procedure in the program layout. Our algorithm reduces on average the instruction cache miss rate by 40% over the original mapping and by 17% over the mapping algorithm of Pettis and Hansen [12].

Patent
10 Apr 1997
TL;DR: In this paper, a scalable distributed caching system on a network receives a request for a data object from a user and carries out a locator function that locates a directory cache for the object.
Abstract: A scalable distributed caching system on a network receives a request for a data object from a user. The caching system carries out a locator function that locates a directory cache for the object. The directory cache stores a directory list that identifies the locations of object caches that purport to store copies of the object requested by the user. The object caches on the object directory list are polled, and in response send messages to the cache that received the user request indicating if each object cache stores a copy of the requested object. The receiving cache sends a message requesting a copy of the object to the object cache that sent the message first received by the receiving cache indicating that an object cache stores the requested object. The object cache that sent the first received message then sends a copy of the object to the receiving cache, which stores a copy and then sends a copy to the user. The directory list for the object is then updated by adding the network address of the receiving cache. Outdated copies of objects stored on object caches are deleted in a distributed fashion to maintain the coherence of the cached copies. This is further reinforced by the association of time-to-live parameters with the each copy and each object cache address on directory lists.

Patent
30 Sep 1997
TL;DR: In this article, a central cache controller performs RAID management functions on behalf of the plurality of storage controllers including redundancy information (parity) generation and checking as well as RAID geometry (striping) management.
Abstract: Apparatus and methods which allow multiple storage controllers sharing access to common data storage devices in a data storage subsystem to access a centralized intelligent cache. The intelligent central cache provides substantial processing for storage management functions. In particular, the central cache of the present invention performs RAID management functions on behalf of the plurality of storage controllers including, for example, redundancy information (parity) generation and checking as well as RAID geometry (striping) management. The plurality of storage controllers (also referred to herein as RAID controllers) transmit cache requests to the central cache controller. The central cache controller performs all operations related to storing supplied data in cache memory as well as posting such cached data to the storage array as required. The storage controllers are significantly simplified because the present invention obviates the need for duplicative local cache memory on each of the plurality of storage controllers. The storage subsystem of the present invention obviates the need for inter-controller communication for purposes of synchronizing local cache contents of the storage controllers. The storage subsystem of the present invention offers improved scalability in that the storage controllers are simplified as compared to those of prior designs. Addition of storage controllers to enhance subsystem performance is less costly than prior designs. The central cache controller may include a mirrored cache controller to enhance redundancy of the central cache controller. Communication between the cache controller and its mirror are performed over a dedicated communication link.

Patent
17 Jan 1997
TL;DR: In this article, the authors proposed a set-associative cache memory for allocating entries in a branch prediction table (BPT) to branch prediction information for related branch instructions.
Abstract: Allocation circuitry for allocating entries within a set-associative cache memory is disclosed. The set-associative cache memory comprises N ways, each way having M entries and corresponding entries in each of the N ways constituting a set of entries. The allocation circuitry has a first circuit which identifies related data units by identifying a probability that the related data units may be successively read from the cache memory. A second circuit within the allocation circuitry allocates the corresponding entries in each of the ways to the related data units, so that related data units are stored in a common set of entries. Accordingly, the related data units will be simultaneously outputted from the set-associative cache memory, and are thus concurrently available for processing. The invention may find application in allocating entries of a common set in a branch prediction table (BPT) to branch prediction information for related branch instructions.

Proceedings Article
08 Dec 1997
TL;DR: Trace-driven simulation of this mechanism on two large, independent data sets shows that PCV both provides stronger cache coherency and reduces the request traffic in comparison to the time-to-live (TTL) based techniques currently used.
Abstract: This paper presents work on piggyback cache validation (PCV), which addresses the problem of maintaining cache coherency for proxy caches. The novel aspect of our approach is to capitalize on requests sent from the proxy cache to the server to improve coherency. In the simplest case, whenever a proxy cache has a reason to communicate with a server it piggybacks a list of cached, but potentially stale, resources from that server for validation. Trace-driven simulation of this mechanism on two large, independent data sets shows that PCV both provides stronger cache coherency and reduces the request traffic in comparison to the time-to-live (TTL) based techniques currently used. Specifically, in comparison to the best TTL-based policy, the best PCV-based policy reduces the number of request messages from a proxy cache to a server by 16-17% and the average cost (considering response latency, request messages and bandwidth) by 6-8%. Moreover, the best PCV policy reduces the staleness ratio by 57-65% in comparison to the best TTL-based policy. Additionally, the PCV policies can easily be implemented within the HTTP 1.1 protocol.

Journal ArticleDOI
01 Sep 1997
TL;DR: This paper presents a new, delay-conscious cache replacement algorithm LNC-R-W3 which maximizes a performance metric called delay-savings-ratio and compares it with other existing cache replacement algorithms, namely LRU and LRU-MIN.
Abstract: Caching at proxy servers plays an important role in reducing the latency of the user response, the network delays and the load on Web servers. The cache performance depends critically on the design of the cache replacement algorithm. Unfortunately, most cache replacement algorithms ignore the Web's scale. In this paper we argue for the design of delay-conscious cache replacement algorithms which explicitly consider the Web's scale by preferentially caching documents which require a long time to fetch to the cache. We present a new, delay-conscious cache replacement algorithm LNC-R-W3 which maximizes a performance metric called delay-savings-ratio. Subsequently, we test the performance of LNC-R-W3 experimentally and compare it with the performance of other existing cache replacement algorithms, namely LRU and LRU-MIN.

Patent
03 Jul 1997
TL;DR: An apparatus for increased data access in a network includes a file/object server computer having a permanent storage memory, a cache verifying computer operably connected to the file or object server computer in a manner to form a network for rapidly transferring data as discussed by the authors.
Abstract: An apparatus for increased data access in a network includes a file/object server computer having a permanent storage memory, a cache verifying computer operably connected to the file/object server computer in a manner to form a network for rapidly transferring data, the cache verifying computer having an operating system, a first memory and a processor capable of performing an operation on data stored in the permanent storage memory of the file/object server computer to produce a signature of the data characteristic of one of a file, an object and a directory, a remote client computer having an operating system, a first memory, a cache memory and a processor capable of performing an operation on data stored in the cache memory to produce a signature of the data, a communication server operably connected to the remote client computer to the cache verifying computer and the file/object server computer and comparators operably associated with the cache verifying computer and remote client computer for comparing the signatures of data with one another to determine whether the signature of data of the remote client is valid

Patent
26 Mar 1997
TL;DR: In this paper, the cache coherency attribute information is used to define a limitable cache coherent area to maintain data consistency among caches, and a processor memory interface unit includes a cache-coherency control which identifies whether cache co-herency is required only within a particular cluster of processors or is required for every one of the cache memories in every cluster throughout the system.
Abstract: To provide a large scale multiprocessor system capable of executing an area limited cache coherency control implementing a high speed operation while substantially reducing the amount of processor-to-processor communications there is provided a translation lookaside buffer which retains cache coherency attribute information defining a limitable cache coherent area to maintain data consistency among caches, and a processor memory interface unit includes a cache coherency control which identifies whether cache coherency is required only within a particular cluster of processors or is required for every one of the cache memories in every one of the clusters throughout the system, on the basis of the contents of the cache coherency attribute information. Further, in another version of large scale multiprocessor system, each cluster may be provided with an export directory which registers an identifier of data whose copy is cached in cache memories in other clusters. Thereby, latency in cache coherency procedures can be reduced greatly, since a cache coherent area can be limited in dependence on various characteristics of data. Further, it is also possible to greatly reduce inter-cluster communication quantities, since it is no longer necessary to broadcast to all processors in the system upon every occasion of a memory read/write.