Showing papers on "Smart Cache published in 1996"

PDF

Open Access

Report•DOI•

[...]

Anawat Chankhunthod¹, Peter B. Danzig¹, Chuck Neerdaels¹, Michael F. Schwartz², Kurt J. Worrell² - Show less +1 more•Institutions (2)

University of Southern California¹, University of Colorado Boulder²

22 Jan 1996

TL;DR: The design and performance of a hierarchical proxy-cache designed to make Internet information systems scale better are discussed, and performance measurements indicate that hierarchy does not measurably increase access latency.

...read moreread less

Abstract: This paper discusses the design and performance of a hierarchical proxy-cache designed to make Internet information systems scale better. The design was motivated by our earlier trace-driven simulation study of Internet traffic. We challenge the conventional wisdom that the benefits of hierarchical file caching do not merit the costs, and believe the issue merits reconsideration in the Internet environment. The cache implementation supports a highly concurrent stream of requests. We present performance measurements that show that our cache outperforms other popular Internet cache implementations by an order of magnitude under concurrent load. These measurements indicate that hierarchy does not measurably increase access latency. Our software can also be configured as a Web-server accelerator; we present data that our httpd-accelerator is ten times faster than Netscape's Netsite and NCSA 1.4 servers. Finally, we relate our experience fitting the cache into the increasingly complex and operational world of Internet information systems, including issues related to security, transparency to cache-unaware clients, and the role of file systems in support of ubiquitous wide-area information systems.

...read moreread less

853 citations

Proceedings Article•DOI•

Trace cache: a low latency approach to high bandwidth instruction fetching

[...]

Eric Rotenberg¹, Steve Bennett², James E. Smith¹•Institutions (2)

University of Wisconsin-Madison¹, Intel²

02 Dec 1996

TL;DR: It is shown that the trace cache's efficient, low latency approach enables it to outperform more complex mechanisms that work solely out of the instruction cache.

...read moreread less

Abstract: As the issue width of superscalar processors is increased, instruction fetch bandwidth requirements will also increase. It will become necessary to fetch multiple basic blocks per cycle. Conventional instruction caches hinder this effort because long instruction sequences are not always in contiguous cache locations. We propose supplementing the conventional instruction cache with a trace cache. This structure caches traces of the dynamic instruction stream, so instructions that are otherwise noncontiguous appear contiguous. For the Instruction Benchmark Suite (IBS) and SPEC92 integer benchmarks, a 4 kilobyte trace cache improves performance on average by 28% over conventional sequential fetching. Further, it is shown that the trace cache's efficient, low latency approach enables it to outperform more complex mechanisms that work solely out of the instruction cache.

...read moreread less

637 citations

Proceedings Article•

Semantic Data Caching and Replacement

[...]

Shaul Dar, Michael J. Franklin, Björn Þór Jónsson, Divesh Srivastava, Michael Tan - Show less +1 more

03 Sep 1996

TL;DR: A semantic model for client-side caching and replacement in a client-server database system and compared to page caching and tuple caching strategies is proposed and validated with a detailed performance study.

...read moreread less

Abstract: We propose a semantic model for client-side caching and replacement in a client-server database system and compare this approach to page caching and tuple caching strategies. Our caching model is based on, and derives its advantages from, three key ideas. First, the client maintains a semantic description of the data in its cache,which allows for a compact specification, as a remainder query, of the tuples needed to answer a query that are not available in the cache. Second, usage information for replacement policies is maintained in an adaptive fashion for semantic regions, which are associated with collections of tuples. This avoids the high overheads of tuple caching and, unlike page caching, is insensitive to bad clustering. Third, maintaining a semantic description of cached data enables the use of sophisticated value functions that incorporate semantic notions of locality, not just LRU or MRU, for cache replacement. We validate these ideas with a detailed performance study that includes traditional workloads as well as a workload motivated by a mobile navigation application.

...read moreread less

610 citations

Proceedings Article•

World-wide web cache consistency

[...]

James S. Gwertzman¹, Margo Seltzer²•Institutions (2)

Microsoft¹, Harvard University²

22 Jan 1996

TL;DR: Using trace-driven simulation, it is shown that a weak cache consistency protocol (the one used in the Alex ftp cache) reduces network bandwidth consumption and server load more than either time-to-live fields or an invalidation protocol and can be tuned to return stale data less than 5% of the time.

...read moreread less

Abstract: The bandwidth demands of the World Wide Web continue to grow at a hyper-exponential rate. Given this rocketing growth, caching of web objects as a means to reduce network bandwidth consumption is likely to be a necessity in the very near future. Unfortunately, many Web caches do not satisfactorily maintain cache consistency. This paper presents a survey of contemporary cache consistency mechanisms in use on the Internet today and examines recent research in Web cache consistency. Using trace-driven simulation, we show that a weak cache consistency protocol (the one used in the Alex ftp cache) reduces network bandwidth consumption and server load more than either time-to-live fields or an invalidation protocol and can be tuned to return stale data less than 5% of the time.

...read moreread less

342 citations

Journal Article•DOI•

Implementation and performance of integrated application-controlled file caching, prefetching, and disk scheduling

[...]

Pei Cao¹, Edward W. Felten¹, Anna R. Karlin², Kai Li¹•Institutions (2)

Princeton University¹, University of Washington²

01 Nov 1996-ACM Transactions on Computer Systems

TL;DR: This article presents the design, implementation, and performance of a file system that integrates application-controlled caching, prefetching, and disk scheduling and shows that this combination of techniques greatly improves the performance of the file system.

...read moreread less

Abstract: As the performance gap between disks and micropocessors continues to increase, effective utilization of the file cache becomes increasingly immportant. Application-controlled file caching and prefetching can apply application-specific knowledge to improve file cache management. However, supporting application-controlled file caching and prefetching is nontrivial because caching and prefetching need to be integrated carefully, and the kernel needs to allocate cache blocks among processes appropriately. This article presents the design, implementation, and performance of a file system that integrates application-controlled caching, prefetching, and disk scheduling. We use a two-level cache management strategy. The kernel uses the LRU-SP (Least-Recently-Used with Swapping and Placeholders) policy to allocate blocks to processes, and each process integrates application-specific caching and prefetching based on the controlled-aggressive policy, an algorithm previously shown in a theoretical sense to be nearly optimal. Each process also improves its disk access latency by submittint its prefetches in batches so that the requests can be scheduled to optimize disk access performance. Our measurements show that this combination of techniques greatly improves the performance of the file system. We measured that the running time is reduced by 3% to 49% (average 26%) for single-process workloads and by 5% to 76% (average 32%) for multiprocess workloads.

...read moreread less

249 citations

Proceedings Article•DOI•

Predictive sequential associative cache

[...]

Brad Calder¹, Dirk Grunwald¹, Joel Emer•Institutions (1)

University of Colorado Boulder¹

03 Feb 1996

TL;DR: A cache design that provides the same miss rate as a two-way set associative cache, but with an access time closer to a direct-mapped cache, and is easier to implement than previous designs.

...read moreread less

Abstract: In this paper we propose a cache design that provides the same miss rate as a two-way set associative cache, but with an access time closer to a direct-mapped cache As with other designs, a traditional direct-mapped cache is conceptually partitioned into multiple banks, and the blocks in each set are probed, or examined, sequentially Other designs either probe the set in a fixed order or add extra delay in the access path for all accesses We use prediction sources to guide the cache examination, reducing the amount of searching and thus the average access latency A variety of accurate prediction sources are considered, with some being available in early pipeline stages We feel that our design offers the same or better performance and is easier to implement than previous designs

...read moreread less

233 citations

Proceedings Article•DOI•

Adding instruction cache effect to schedulability analysis of preemptive real-time systems

[...]

J.V. Busquets-Mataix, Juan José Serrano, R. Ors, Pedro Gil, Andy Wellings - Show less +1 more

10 Jun 1996

TL;DR: The paper describes how to incorporate the effect of instruction cache to the Response Time schedulability Analysis (RTA), an efficient analysis for preemptive fixed priority schedulers and compares the results of such an approach to both cache partitioning and CRMA.

...read moreread less

Abstract: Cache memories are commonly avoided in real time systems because of their unpredictable behavior. Recently, some research has been done to obtain tighter bounds on the worst case execution time (WCET) of cached programs. These techniques usually assume a non preemptive underlying system. However, some techniques can be applied to allow the use of caches in preemptive systems. The paper describes how to incorporate the effect of instruction cache to the Response Time schedulability Analysis (RTA). RTA is an efficient analysis for preemptive fixed priority schedulers. We also compare through simulations the results of such an approach to both cache partitioning (increase of the cache predictability by assigning private cache partitions to tasks) and CRMA (Cached RMA: cache effect is incorporated in the utilization based rate monotonic schedulability analysis). The results show that the cached version of RTA (CRTA) clearly outperforms CRMA, however the partitioning scheme may be better depending on the system configuration. The obtained results bound the applicability domain for each method for a variety of hardware and workload configurations. The results can be used as design guidelines.

...read moreread less

182 citations

Proceedings Article•DOI•

Energy-efficient caching for wireless mobile computing

[...]

Kung-Lung Wu¹, Philip S. Yu¹, Ming-Syan Chen¹•Institutions (1)

IBM¹

26 Feb 1996

TL;DR: An energy-efficient cache invalidation method, called GCORE (Grouping with COld update-set REtention), that allows a mobile computer to operate in a disconnected mode to save the battery while still retaining most of the caching benefits after a reconnection is presented.

...read moreread less

Abstract: Caching can reduce the bandwidth requirement in a mobile computing environment. However, due to battery power limitations, a wireless mobile computer may often be forced to operate in a doze (or even totally disconnected) mode. As a result, the mobile computer may miss some cache invalidation reports broadcast by a server, forcing it to discard the entire cache contents after waking up. In this paper, we present an energy-efficient cache invalidation method, called GCORE (Grouping with COld update-set REtention), that allows a mobile computer to operate in a disconnected mode to save the battery while still retaining most of the caching benefits after a reconnection. We present an efficient implementation of GCORE and conduct simulations to evaluate its caching effectiveness. The results show that GCORE can substantially improve mobile caching by reducing the communication bandwidth (or energy consumption) for query processing.

...read moreread less

173 citations

Proceedings Article•

WATCHMAN: A Data Warehouse Intelligent Cache Manager

[...]

Peter Scheuermann¹, Junho Shim¹, Radek Vingralek¹•Institutions (1)

Northwestern University¹

03 Sep 1996

TL;DR: The design of an intelligent cache manager for sets retrieved by queries called WATCHMAN, which is particularly well suited for data warehousing environment, and achieves a substantial performance improvement in a decision support environment when compared to a traditional LRU replacement algorithm.

...read moreread less

Abstract: Data warehouses store large volumes of data which are used frequently by decision support applications. Such applications involve complex queries. Query performance in such an environment is critical because decision support applications often require interactive query response time. Because data warehouses are updated infrequently, it becomes possible to improve query performance by caching sets retrieved by queries in addition to query execution plans. In this paper we report on the design of an intelligent cache manager for sets retrieved by queries called WATCHMAN, which is particularly well suited for data warehousing environment. Our cache manager employs two novel, complementary algorithms for cache replacement and for cache admission. WATCHMAN aims at minimizing query response time and its cache replacement policy swaps out entire retrieved sets of queries instead of individual pages. The cache replacement and admission algorithms make use of a profit metric, which considers for each retrieved set its average rate of reference, its size, and execution cost of the associated query. We report on a performance evaluation based on the TPC-D and Set Query benchmarks. These experiments show that WATCHMAN achieves a substantial performance improvement in a decision support environment when compared to a traditional LRU replacement algorithm.

...read moreread less

165 citations

Patent•

Computer system with private and shared partitions in cache

[...]

David Brian Kirk¹•Institutions (1)

IBM¹

18 Mar 1996

Abstract: The traditional computer system is modified by providing, in addition to a processor unit, a main memory and a cache memory buffer, remapping logic for remapping the cache memory buffer, and a plurality of registers for containing remapping information. With this environment the cache memory buffer is divided into segments, and the segments are one or more cache lines allocated to a task to form a partition, so as to make available (if a size is set above zero) of a shared partition and a group of private partitions. Registers include the functions of count registers which contain count information for the number of cache segments in a specific partition, a flag register, and two register which act as cache identification number registers. The flag register has bits acting as a flag, which bits include a non-real time flag which allows operation without the partition system, a private partition permitted flag, and a private partition selected flag. With this system a traditional computer system can be changed to operate without impediments of interrupts and other prior impediments to a real-time task to perform. By providing cache partition areas, and causing an active task to always have a pointer to a private partition, and a size register to specify how many segments can be used by the task, real time systems can take advantage of a cache. Thus each task can make use of a shared partition, and know how many segments can be used by the task. The system cache provides a high speed access path to memory data, so that during execution of a task the logic means and registers provide any necessary cache partitioning to assure a preempted task that it's cache contents will not be destroyed by a preempting task. This permits use of a software controlled partitioning system which allows segments of a cache to be statically allocated on a priority I benefit basis without hardware modification to said system. The cache allocation provided by the logic gives consideration of the scheduling requirements of tasks of the system in deciding the size of each cache partition. Accordingly, the cache can make use of a for dynamic programming implementation of an allocation algorithm which can determine an optimal cache allocation in polynomial time.

...read moreread less

155 citations

Proceedings Article•DOI•

Reducing conflicts in direct-mapped caches with a temporality-based design

[...]

Jude A. Rivers¹, Edward S. Davidson¹•Institutions (1)

University of Michigan¹

12 Aug 1996

TL;DR: This paper presents a simple but efficient novel hardware design called the non-temporal streaming (NTS) cache that supplements the conventional direct-mapped cache with a parallel fully associative buffer.

...read moreread less

Abstract: Direct-mapped caches are often plagued by conflict misses because they lack the associativity to store more than one memory block in each set. However, some blocks that have no temporal locality actually cause program execution degradation by displacing blocks that do manifest temporal behavior. In this paper, we present a simple but efficient novel hardware design called the non-temporal streaming (NTS) cache that supplements the conventional direct-mapped cache with a parallel fully associative buffer. Every cache block loaded into the main cache is monitored for temporal behavior by a hardware detection unit. Cache blocks identified as nontemporal are allocated to the buffer on subsequent requests. Our simulations show that the NTS Cache not only provides a performance improvement over the conventional direct-mapped cache, but can also save on-chip area. For some numerical programs like FFTPDE, APPSP and APPBT from the NAS benchmark suite, an integral NTS Cache of size 9 KB (i.e., 8 KB direct-mapped cache plus 1 KB NT buffer) performs as well as a 16 KB conventional direct-mapped cache.

...read moreread less

Patent•

Hybrid numa coma caching system and methods for selecting between the caching modes

[...]

Erik Hagersten¹, Robert C. Zak¹•Institutions (1)

Sun Microsystems¹

20 Dec 1996

TL;DR: In this paper, a hybrid NUMA/COMA cache architecture with cache-coherent protocol is proposed for a computer system having a plurality of sub-systems coupled to each other via a system interconnect.

...read moreread less

Abstract: The present invention provides a hybrid Non-Uniform Memory Architecture (NUMA) and Cache-Only Memory Architecture (COMA) caching architecture together with a cache-coherent protocol for a computer system having a plurality of sub-systems coupled to each other via a system interconnect. In one implementation, each sub-system includes at least one processor, a page-oriented COMA cache and a line-oriented hybrid NUMA/COMA cache. Such a hybrid system provides flexibility and efficiency in caching both large and small, and/or sparse and packed data structures. Each sub-system is able to independently store data in COMA mode or in NUMA mode. When caching in COMA mode, a sub-system allocates a page of memory space and then stores the data within the allocated page in its COMA cache. Depending on the implementation, while caching in COMA mode, the sub-system may also store the same data in its hybrid cache for faster access. Conversely, when caching in NUMA mode, the sub-system stores the data, typically a line of data, in its hybrid cache.

...read moreread less

Patent•

Integrated processing and L2 DRAM cache

[...]

William Todd Boyd¹, Thomas J. Heller¹, Michael Ignatowski¹, Richard E. Matick¹, Stanley E. Schuster¹ - Show less +1 more•Institutions (1)

IBM¹

13 Nov 1996

TL;DR: In this article, an integrated processor and level two (L2) dynamic random access memory (DRAM) are fabricated on a single chip, and the L2 DRAM cache is placed on the same chip as the processor to reduce the time needed for two chip-to-chip crossings.

...read moreread less

Abstract: An integrated processor and level two (L2) dynamic random access memory (DRAM) are fabricated on a single chip. As an extension of this basic structure, the invention also contemplates multiprocessor "node" chips in which multiple processors are integrated on a single chip with L2 cache. By integrating the processor and L2 DRAM cache on a single chip, high on-chip bandwidth, reduced latency and higher performance are achieved. A multiprocessor system can be realized in which a plurality of processors with integrated L2 DRAM cache are connected in a loosely coupled multiprocessor system. Alternatively, the single chip technology can be used to implement a plurality of processors integrated on a single chip with an L2 DRAM cache which may be either private or shared. This approach overcomes a number of issues which limit the performance and cost of a memory hierarchy. When the L2 DRAM cache is placed on the same chip as the processor, the time needed for two chip-to-chip crossings is eliminated. Since these crossings require off-chip drivers and receivers and must be synchronized with the system clock, the time involved is substantial. This means that with the integrated L2 DRAM cache, latency is reduced.

...read moreread less

Patent•

System and method for maintaining a shared cache look-up table

[...]

Douglas B. Boyle¹•Institutions (1)

LSI Corporation¹

05 Jan 1996

TL;DR: Group cache look-up tables minimize requests for data items outside the groups and greatly minimize the service load on servers having popular data items as discussed by the authors, where each client in the group has access to the group cache lookup table, and any client or group can cache any data item.

...read moreread less

Abstract: An information system and method for reducing workload load on servers in an information system network. The system defines a group of interconnected clients which have associated cache memories. The system maintains a shared group cache look-up table for the group having entries which identify data items cached by the clients within the group and identify the clients at which the data items are cached. Each client in the group has access to the group cache look-up table, and any client or group can cache any data item. The system can include a hierarchy of groups, with each group having a group cache look-up table. The group cache look-up tables minimize requests for data items outside the groups and greatly minimize the service load on servers having popular data items.

...read moreread less

Proceedings Article•DOI•

Thread scheduling for cache locality

[...]

James Philbin¹, Jan Sterling Edler¹, Otto J. Anshus², Craig C. Douglas³, Kai Li¹ - Show less +1 more•Institutions (3)

Princeton University¹, University of Tromsø², Yale University³

01 Sep 1996

TL;DR: Experiments with several application programs show that the thread scheduling method can improve program performance by reducing second-level cache misses.

...read moreread less

Abstract: This paper describes a method to improve the cache locality of sequential programs by scheduling fine-grained threads. The algorithm relies upon hints provided at the time of thread creation to determine a thread execution order likely to reduce cache misses. This technique may be particularly valuable when compiler-directed tiling is not feasible. Experiments with several application programs, on two systems with different cache structures, show that our thread scheduling method can improve program performance by reducing second-level cache misses.

...read moreread less

Patent•

A fast, dual ported cache controller for data processors in a packet switched cache coherent multiprocessor system

[...]

Zahir Ebrahim¹, Kevin Normoyle¹, Satyanarayana Nishtala¹, William C. Van Loo¹•Institutions (1)

Sun Microsystems¹

28 Mar 1996

TL;DR: In this article, the cache controller has two modes of operation, including a first standard mode of operation in which read/write access to the cache memory is preceded by generation of the hit/miss signal by the comparator, and a second accelerated mode of operating without waiting for the comparators to process the access request's address value.

...read moreread less

Abstract: A multiprocessor computer system has data processors and a main memory coupled to a system controller. Each data processor has a cache memory. Each cache memory has a cache controller with two ports for receiving access requests. A first port receives access requests from the associated data processor and a second port receives access requests from the system controller. All cache memory access requests include an address value; access requests from the system controller also include a mode flag. A comparator in the cache controller processes the address value in each access request and generates a hit/miss signal indicating whether the data block corresponding to the address value is stored in the cache memory. The cache controller has two modes of operation, including a first standard mode of operation in which read/write access to the cache memory is preceded by generation of the hit/miss signal by the comparator, and a second accelerated mode of operation in which read/write access to the cache memory is initiated without waiting for the comparator to process the access request's address value. The first mode of operation is used for all access requests by the data processor and for system controller access requests when the mode flag has a first value. The second mode of operation is used for the system controller access requests when the mode flag has a second value distinct from the first value.

...read moreread less

Journal Article•DOI•

Main memory caching of Web documents

[...]

Evangelos P. Markatos¹•Institutions (1)

Foundation for Research & Technology – Hellas¹

01 May 1996

TL;DR: It is shown that even a small amount of main memory that is used as a document cache, is enough to hold more than 60% of the documents requested, and that traditional file system cache management methods are inappropriate for managing Main Memory Web caches.

...read moreread less

Abstract: An increasing amount of information is currently becoming available through World Wide Web servers. Document requests to popular Web servers arrive every few tens of milliseconds at peak rate. To reduce the overhead imposed by frequent document requests, we propose the notion of caching a World Wide Web server's documents in its main memory (which we call Main Memory Web Caching). We show that even a small amount of main memory (512 Kbytes) that is used as a document cache, is enough to hold more than 60% of the documents requested. We also show that traditional file system cache management methods are inappropriate for managing Main Memory Web caches, and may result in poor performance. Based on trace-driven simulations of several server traces we quantify our claims, and propose a new cache management that dynamically adjusts itself to the clients' request pattern and cache size. We show that our policy is robust over a variety of parameters and results is better overall performance.

...read moreread less

Patent•

Method and apparatus for managing coherency in object and page caches

[...]

Yoshiki Watanabe¹, Hayata Hiroshi¹•Institutions (1)

Fuji Xerox¹

15 Apr 1996

TL;DR: In this paper, the authors propose a separate region conversion system that is capable of maintaining the hit rate of the cache at high level by simplifying the cache status and to improve the execution efficiency of the application program.

...read moreread less

Abstract: To reduce process time by simplifying the cache status and to improve the execution efficiency of the application program in the separate region conversion system that is capable of maintaining the hit rate of the cache at high level. When an access request is made to the object and if the object is not stored in the object cache, the page containing the object is read from the database and is stored in the page cache, and the object is read from the page and stored in the cache. The status of the page cache describing the status of the page stored in the page cache is stored in the page status storage device and at the same time the status of the object cache describing the status of the object stored in the object cache is stored in the object status storage device. By establishing a relationship between the status of the page cache and the status of the object, if the status of the page cache and the corresponding status of the object cache are not consistent, the status synchronizing device executes a synchronization process to make these status consistent.

...read moreread less

Patent•

Cache control unit with a cache request transaction-oriented protocol

[...]

Yet-Ping Pai¹, Le T. Nguyen¹•Institutions (1)

Samsung¹

15 Nov 1996

TL;DR: In this paper, a cache control unit and a method of controlling a cache is coupled to a cache accessing device, and a request identification information is assigned to the first cache request and provided to the requesting device.

...read moreread less

Abstract: A cache control unit and a method of controlling a cache. The cache is coupled to a cache accessing device. A first cache request is received from the device. A request identification information is assigned to the first cache request and provided to the requesting device. The first cache request may begin to be processed. A second cache request is received from the cache accessing device. The second cache request is assigned to the first cache request and provided to the requesting device. The first and second cache requests are finally fully serviced.

...read moreread less

Patent•

Multi-tier cache and method for implementing such a system

[...]

Brian Berliner¹•Institutions (1)

Sun Microsystems¹

01 May 1996

TL;DR: In this article, a multi-tier cache system and a method for implementing the multilevel cache system is described, where a small cache in random access memory (RAM) is managed in a Least Recent Used (LRU) fashion.

...read moreread less

Abstract: A multi-tier cache system and a method for implementing the multi-tier cache system is disclosed. The multi-tier cache system has a small cache in random access memory (RAM) that is managed in a Least Recent Used (LRU) fashion. The RAM cache is a subset of a much larger non-volatile cache on rotating magnetic media (e.g., a hard disk drive). The non-volatile cache is, in turn a subset of a local CD-ROM or of a CD-ROM or mass storage device controlled by a server system. In a preferred embodiment of the invention, a heuristic technique is employed to establish a RAM cache of optimum size within the system memory. Also in a preferred embodiment, the RAM cache is made up of multiple identically-sized sub-blocks. A small amount of RAM is utilized to maintain a table which implements a Least Recently Used (LRU) RAM cache purging scheme. A hashing mechanism is employed to search for the "bucket" within the RAM cache in which the requested data may be located. If the requested data is in the RAM cache, the request is satisfied with that data. If the requested data is not in the RAM cache, the least recently used sub-block is purged from the cache if the cache is full, and the RAM cache is updated from the non-volatile cache whenever possible, and from the cached storage device when the non-volatile cache does not contain the requested data.

...read moreread less

Patent•

System and method for caching objects of non-uniform size using multiple LRU stacks partitions into a range of sizes

[...]

Charu C. Aggarwal¹, Marina A. Epelman¹, Joel L. Wolf¹, Philip S. Yu¹•Institutions (1)

IBM¹

29 Oct 1996

TL;DR: In this paper, the authors propose a caching logic consisting of a selection logic and an admission control logic to decide whether an object not currently in the cache is accessed may be cached at all.

...read moreread less

Abstract: A system and method for caching objects of non-uniform size. A caching logic includes a selection logic and an admission control logic. The admission control logic determines whether an object not currently in the cache is accessed may be cached at all. The admission control logic uses an auxiliary LRU stack which contains the identities and time stamps of the objects which have been recently accessed. Thus, the memory required is relatively small. The auxiliary cache serves as a dynamic popularity list and an object may be admitted to the cache if and only if it appears on the popularity list. The selection logic selects one or more of the objects in the cache which have to be purged when a new object enters the cache. The order of removal of the objects is prioritized based both on the size as well as the frequency of access of the object and may be adjusted by a time to obsolescence factor (TTO). To reduce the time required to compare the space-time product of each object in the cache, the objects may be classified in ranges having geometrically increasing intervals. Specifically, multiple LRU stacks are maintained independently wherein each LRU stack contains only objects in a predetermined range of sizes. In order to choose candidates for replacement, only the least recently used objects in each group need be considered.

...read moreread less

Patent•

Computer architecture incorporating processor clusters and hierarchical cache memories

[...]

Michael A. Blake¹, Carl B. Ford¹, Pak-Kin Mak¹•Institutions (1)

IBM¹

15 Aug 1996

TL;DR: In this article, a hierarchical cache architecture that reduces traffic on a main memory bus while overcoming the disadvantages of prior systems is proposed. But it does not address the disadvantages associated with the use of store-through-type caches at level one.

...read moreread less

Abstract: A hierarchical cache architecture that reduces traffic on a main memory bus while overcoming the disadvantages of prior systems. The architecture includes a plurality of level one caches that are of the store through type, each level one cache is associated with a processor and may be incorporated into the processor. Subsets (or "clusters") of processors, along with their associated level one caches, are formed and a level two cache is provided for each cluster. Each processor-level one cache pair within a cluster is coupled to the cluster's level two cache through a dedicated bus. By configuring the processors and caches in this manner, not only is the speed advantage normally associated with the use of cache memory realized, but the number of memory bus accesses is reduced without the disadvantages associated with the use of store in type caches at level one and without the disadvantages associated with the use of a shared cache bus.

...read moreread less

Patent•

Apparatus and method for synchronizing a cache mode in a dual controller, dual cache memory system operating in a plurality of cache modes

[...]

Stephen J. Sicola, Wayne H. Umland, Clark Edward Lubbers¹, Susan G. Elkington¹•Institutions (1)

Hewlett-Packard¹

28 Jun 1996

TL;DR: In this article, the authors propose an apparatus and method for synchronizing a cache mode in a cache memory system in a computer to protect cache operations, where cache mode is stored as metadata in the cache modules and is detected by the first controller to determine the cache mode.

...read moreread less

Abstract: An apparatus and method for synchronizing a cache mode in a cache memory system in a computer to protect cache operations. The cache memory system has a first controller and a second controller and two cache modules and operates in a plurality of cache modes. The cache mode is stored as metadata in the cache modules and is detected by the first controller to determine the cache mode. Lock signals in the first controller are set in accordance with the cache mode detected to set the cache mode state in the first controller. The second controller copies the cache mode state from the first controller to synchronize both controllers in the same cache mode state. After a failure of the second controller, the first controller may lock access to both caches to recover data previously accessed by the second controller. The second controller restarts and copies the cache mode state from the first controller, so that both controllers return to the cache mode state prior to the failure of the second controller.

...read moreread less

Book•

Client Data Caching: A Foundation for High Performance Object Database Systems

[...]

Michael J. Franklin

31 Mar 1996

TL;DR: This work modeling a Page Server DBMS architecture towards a Flexible Distributed DBMS Architecture and the performance of Cache Consistency Algorithms shows clear trends in both client and server performance.

...read moreread less

Abstract: Foreword Preface 1 Introduction 2 Client-Server Database Systems 3 Modeling a Page Server DBMS 4 Client Cache Consistency 5 Performance of Cache Consistency Algorithms 6 Global Memory Management 7 Local Disk Caching 8 Towards a Flexible Distributed DBMS Architecture 9 Conclusions References Index

...read moreread less

Patent•

Enabling mirror, nonmirror and partial mirror cache modes in a dual cache system

[...]

Stephen J. Sicola, Wayne H. Umland, Thomas F. Fava, Clark Edward Lubbers, Susan G. Elkington - Show less +1 more

28 Jun 1996

TL;DR: In this paper, the cache memory system is enabled into one of a plurality of cache modes in cache memory systems in a computer, where the cache memories are partitioned into quadrants with two quadrants in each cache memory.

...read moreread less

Abstract: A cache memory system is enabled into one of a plurality of cache modes in a cache memory system in a computer. The cache memory system has a first controller and two cache memories, the cache memories are partitioned into quadrants with two quadrants in each cache memory. A cache mode detector in the first controller detects a mirror cache mode set for the cache memory system. An address enabler in the first controller enables access to first pair of quadrants, one quadrant in each cache memory, in response to detection of a mirror cache mode. A second controller follows the cache mode set by the cache mode detector and has an address enabler. The address enabler in the second controller enables access to both quadrants in one cache memory in a non-mirror cache mode, and enables the access to a second pair of quadrants, one quadrant in each cache memory, in response to detection of a mirror cache mode by said cache mode detector.

...read moreread less

Patent•

Computer system and method of allocating cache memories in a multilevel cache hierarchy utilizing a locality hint within an instruction

[...]

Millind Mittal¹•Institutions (1)

Intel¹

17 Dec 1996

TL;DR: In this article, the locality hint is used to identify a lowest level where management of cache avocation is desired and cache memory is allocated at that level and any higher level(s).

...read moreread less

Abstract: A computer system and method in which allocation of a cache memory is managed by utilizing a locality hint value included within an instruction. When a processor accesses a memory for transfer of data between the processor and the memory, that access can be allocated or not allocated in the cache memory. The locality hint included within the instruction controls if the cache allocation is to be made. When a plurality of cache memories are present, they are arranged into a cache hierarchy and a locality value is assigned to each level of the cache hierarchy where allocation control is desired. The locality hint may be used to identify a lowest level where management of cache avocation is desired and cache memory is allocated at that level and any higher level(s). The locality hint value is based on spatial and/or temporal locality for the data associated with the access. Data is recognized at each cache hierarchy level depending on the attributes associated with the data at a particular level. If the locality hint identifies a particular access for data as temporal or non-temporal with respect to a particular cache level, the particular access may be determined to be temporal or non-temporal with respect to the higher and lower cache levels.

...read moreread less

Proceedings Article•DOI•

Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors

[...]

Kenneth M. Wilson¹, Kunle Olukotun¹, Mendel Rosenblum¹•Institutions (1)

Stanford University¹

01 May 1996

TL;DR: The authors' techniques for improving the bandwidth of a single cache port by using additional buffering in the processor, and by taking maximum advantage of a wider cache port achieve 91% of the performance of a dual-ported cache.

...read moreread less

Abstract: The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi-ported caches are costly to implement. In this paper we propose techniques for improving the bandwidth of a single cache port by using additional buffering in the processor, and by taking maximum advantage of a wider cache port. We evaluate these techniques using realistic applications that include the operating system. Our techniques using a single-ported cache achieve 91% of the performance of a dual-ported cache.

...read moreread less

Proceedings Article•DOI•

A comparison of hardware prefetching techniques for multimedia benchmarks

[...]

D.F. Zucker¹, Michael J. Flynn¹, Ruby B. Lee¹•Institutions (1)

Stanford University¹

17 Jun 1996

TL;DR: The stream cache, proposed for the first time in this paper, has the potential to cut execution times by half with the addition of a relatively small amount of additional hardware.

...read moreread less

Abstract: Data prefetching is a well known technique for improving cache performance. While several studies have examined prefetch strategies for scientific and commercial applications, no published work has studied the special memory requirements of multimedia applications. This paper presents data for three types of hardware prefetching schemes: stream buffers, stride prediction tables, and a hybrid combination of the two, the stream cache. Use of the stride prediction table is shown to eliminate up to 90% of the misses that would otherwise be incurred in a moderate or large sized cache with no prefetching hardware. The stream cache, proposed for the first time in this paper, has the potential to cut execution times by half with the addition of a relatively small amount of additional hardware.

...read moreread less

Patent•

Microprocessor having a cache memory system using multi-level cache set prediction

[...]

Robert Yung¹•Institutions (1)

Sun Microsystems¹

13 Mar 1996

TL;DR: In this article, a cache structure for a microprocessor which provides set-prediction information for a separate, second-level cache, and a method for improving cache accessing, are provided.

...read moreread less

Abstract: A cache structure for a microprocessor which provides set-prediction information for a separate, second-level cache, and a method for improving cache accessing, are provided. In the event of a first-level cache miss, the second-level set-prediction information is used to select the set in an N-way off-chip set-associative cache. This allows a set-associative structure to be used in a second-level cache (on or off chip) without requiring a large number of traces and/or pins. Since set-prediction is used, the subsequent access time for a comparison to determine that the correct set was predicted is not in the critical timing path unless there is a mis-prediction or a miss in the second-level cache. Also, a cache memory can be partitioned into M sets, with M being chosen so that the set size is less than or equal to the page size, allowing a cache access before a TLB translation is done, further speeding the access.

...read moreread less

Patent•

High-speed, multiple-port, interleaved cache with arbitration of multiple access addresses

[...]

Peichun Peter Liu¹•Institutions (1)

IBM¹

29 Apr 1996

TL;DR: In this article, content-addressable tag-compare arrays (CAMs) are used to select a cache line, and arbitration logic in each subarray selects a word line (cache line).

...read moreread less

Abstract: A cache memory for a computer uses content-addressable tag-compare arrays (CAM) to determine if a match occurs. The cache memory is partitioned in four subarrays, i.e., interleaved, providing a wide cache line (word lines) but shallow depth (bit lines). The cache can be accessed by multiple addresses, producing multiple data outputs in a given cycle. Two effective addresses and one real address are applied at one time, and if addresses are matched in different subarrays, or two on the same line in a single subarray, then multiple access is permitted. The two content-addressable memories, or CAMs, are used to select a cache line, and in parallel with this, arbitration logic in each subarray selects a word line (cache line).

...read moreread less

Collapse