Showing papers on "Smart Cache published in 1995"

PDF

Open Access

Caching Proxies: Limitations and Potentials

[...]

Marc D. Abrams, Charles R. Standridge, Ghaleb Abdulla, Stephen Williams, Edward A. Fox - Show less +1 more

18 Jul 1995

TL;DR: This work assesses the potential of proxy servers to cache documents retrieved with the HTTP protocol, and finds that a proxy server really functions as a second level cache, and its hit rate may tend to decline with time after initial loading given a more or less constant set of users.

...read moreread less

Abstract: As the number of World-Wide Web users grow, so does the number of connections made to servers. This increases both network load and server load. Caching can reduce both loads by migrating copies of server files closer to the clients that use those files. Caching can either be done at a client or in the network (by a proxy server or gateway). We assess the potential of proxy servers to cache documents retrieved with the HTTP protocol. We monitored traffic corresponding to three types of educational workloads over a one semester period, and used this as input to a cache simulation. Our main findings are (1) that with our workloads a proxy has a 30-50% maximum possible hit rate no matter how it is designed; (2) that when the cache is full and a document is replaced, least recently used (LRU) is a poor policy, but simple variations can dramatically improve hit rate and reduce cache size; (3) that a proxy server really functions as a second level cache, and its hit rate may tend to decline with time after initial loading given a more or less constant set of users; and (4) that certain tuning configuration parameters for a cache may have little benefit.

...read moreread less

495 citations

Proceedings Article•DOI•

A data cache with multiple caching strategies tuned to different types of locality

[...]

Antonio González¹, Carlos Aliagas¹, Mateo Valero¹•Institutions (1)

Polytechnic University of Catalonia¹

03 Jul 1995

304 citations

Proceedings Article•DOI•

Application-level document caching in the Internet

[...]

Azer Bestavros¹, Robert L. Carter¹, Mark Crovella¹, Carlos Cunha¹, Abdelsalam A. Heddaya¹, Sulaiman A. Mirdad¹ - Show less +2 more•Institutions (1)

Boston University¹

05 Jun 1995

TL;DR: The results suggest that distinguishing between documents produced locally and those produced remotely can provide useful leverage in designing caching policies, because of differences in the potential for sharing these two document types among multiple users.

...read moreread less

Abstract: With the increasing demand for document transfer services such as the World Wide Web comes a need for better resource management to reduce the latency of documents in these systems. To address this need, we analyze the potential for document caching at the application level in document transfer services. We have collected traces of actual executions of Mosaic, reflecting over half a million user requests for WWW documents. Using those traces, we study the tradeoffs between caching at three levels in the system, and the potential for use of application-level information in the caching system. Our traces show that while a high hit rate in terms of URLs is achievable, a much lower hit rate is possible in terms of bytes, because most profitably-cached documents are small. We consider the performance of caching when applied at the level of individual user sessions, at the level of individual hosts, and at the level of a collection of hosts on a single LAN. We show that the performance gain achievable by caching at the session level (which is straightforward to implement) is nearly all of that achievable at the LAN level (where caching is more difficult to implement). However, when resource requirements are considered, LAN level caching becomes muck more desirable, since it can achieve a given level of caching performance using a much smaller amount of cache space. Finally, we consider the use of organizational boundary information as an example of the potential for use of application-level information in caching. Our results suggest that distinguishing between documents produced locally and those produced remotely can provide useful leverage in designing caching policies, because of differences in the potential for sharing these two document types among multiple users. >

...read moreread less

177 citations

Journal Article•DOI•

Compiler support for software-based cache partitioning

[...]

Frank Mueller¹•Institutions (1)

Humboldt University of Berlin¹

01 Nov 1995

TL;DR: A method to maintain predictability of execution time within preemptive, cached real-time systems is introduced and the impact on compilation support for such a system is discussed.

...read moreread less

Abstract: Cache memories have become an essential part of modern processors to bridge the increasing gap between fast processors and slower main memory. Until recently, cache memories were thought to impose unpredictable execution time behavior for hard real-time systems. But recent results show that the speedup of caches can be exploited without a significant sacrifice of predictability. These results were obtained under the assumption that real-time tasks be scheduled non-preemptively.This paper introduces a method to maintain predictability of execution time within preemptive, cached real-time systems and discusses the impact on compilation support for such a system. Preemptive systems with caches are made predictable via software-based cache partitioning. With this approach, the cache is divided into distinct portions associated with a real-time task, such that a task may only use its portion. The compiler has to support instruction and data partitioning for each task. Instruction partitioning involves non-linear control-flow transformations, while data partitioning involves code transformations of data references. The impact on execution time of these transformations is also discussed.

...read moreread less

139 citations

Journal Article•DOI•

Web traffic characterization: an assessment of the impact of caching documents from NCSA's web server

[...]

Hans-Werner Braun¹, kc claffy¹•Institutions (1)

San Diego Supercomputer Center¹

01 Dec 1995-Computer Networks and Isdn Systems

TL;DR: In this paper, the authors analyzed two days of queries to the NCSA Mosaic server to assess the geographic distribution of transaction requests and found that caching the results of queries within the geographic zone from which the request was sourced, in terms of reduction of transactions with and bandwidth volume from the main server.

...read moreread less

Abstract: We analyze two days of queries to the popular NCSA Mosaic server to assess the geographic distribution of transaction requests. The wide geographic diversity of query sources and popularity of a relatively small portion of the web server file set present a strong case for deployment of geographically distributed caching mechanisms to improve server and network efficiency. The NCSA web server consists of four servers in a cluster. We show time series of bandwidth and transaction demands for the server cluster and break these demands down into components according to geographical source of the query. We analyze the impact of caching the results of queries within the geographic zone from which the request was sourced, in terms of reduction of transactions with and bandwidth volume from the main server. We find that a cache document timeout even as low as 1024 seconds (about 17 minutes) during the two days that we analyzed would have saved between 40% and 70% of the bytes transferred from the central server. We investigate a range of timeouts for flushing documents from the cache, outlining the tradeoff between bandwidth savings and memory/cache management costs. We discuss the implications of this tradeoff in the face of possible future usage-based pricing of backbone services that may connect several cache sites. We also discuss other issues that caching inevitably poses, such as how to redirect queries initially destined for a central server to a preferred cache site. The preference of a cache site may be a function of not only geographic proximity, but also current load on nearby servers or network links. Such refinements in the web architecture will be essential to the stability of the network as the web continues to grow, and operational geographic analysis of queries to archive and library servers will be fundamental to its effective evolution.

...read moreread less

118 citations

Patent•

Data cache which speculatively updates a predicted data cache storage location with store data and subsequently corrects mispredicted updates

[...]

David B. Witt¹, Rajiv M. Hattangadi¹•Institutions (1)

Advanced Micro Devices¹

31 Aug 1995

TL;DR: In this article, a data cache configured to perform store accesses in a single clock cycle is provided, where the data cache speculatively stores data within a predicted way of the cache after capturing the data currently being stored in that predicted way.

...read moreread less

Abstract: A data cache configured to perform store accesses in a single clock cycle is provided. The data cache speculatively stores data within a predicted way of the cache after capturing the data currently being stored in that predicted way. During a subsequent clock cycle, the cache hit information for the store access validates the way prediction. If the way prediction is correct, then the store is complete. If the way prediction is incorrect, then the captured data is restored to the predicted way. If the store access hits in an unpredicted way, the store data is transferred into the correct storage location within the data cache concurrently with the restoration of data in the predicted storage location. Each store for which the way prediction is correct utilizes a single clock cycle of data cache bandwidth. Additionally, the way prediction structure implemented within the data cache bypasses the tag comparisons of the data cache to select data bytes for the output. Therefore, the access time of the associative data cache may be substantially similar to a direct-mapped cache access time. The present data cache is therefore suitable for high frequency superscalar microprocessors.

...read moreread less

114 citations

Patent•

Method and apparatus for processing multiple cache misses using reload folding and store merging

[...]

Nicholas G. Samra¹, Betty Y. Kikuta²•Institutions (2)

Motorola¹, Freescale Semiconductor²

13 Nov 1995

TL;DR: In this paper, a cache line is merged with the cache line prior to storage in the cache and other matching entries become active and are allowed to reaccess the cache (71).

...read moreread less

Abstract: A data processor (40) keeps track of misses to a cache (71) so that multiple misses within the same cache line can be merged or folded at reload time. A load/store unit (60) includes a completed store queue (61) for presenting store requests to the cache (71) in order. If a store request misses in the cache (71), the completed store queue (61) requests the cache line from a lower-level memory system (90) and thereafter inactivates the store request. When a reload cache line is received, the completed store queue (61) compares the reload address to all entries. If at least one address matches the reload address, one entry's data is merged with the cache line prior to storage in the cache (71). Other matching entries become active and are allowed to reaccess the cache (71). A miss queue (80) coupled between the load/store unit (60) and the lower-level memory system (90) implements reload folding to improve efficiency.

...read moreread less

111 citations

Patent•

Adaptive read-ahead disk cache

[...]

Randy D. Schneider

13 Oct 1995

TL;DR: In this paper, an adaptive read ahead cache is provided with a real cache and a virtual cache, where the real cache has a data buffer, an address buffer, and a status buffer.

...read moreread less

Abstract: An adaptive read ahead cache is provided with a real cache and a virtual cache. The real cache has a data buffer, an address buffer, and a status buffer. The virtual cache contains only an address buffer and a status buffer. Upon receiving an address associated with the consumer's request, the cache stores the address in the virtual cache address buffer if the address is not found in the real cache address buffer and the virtual cache address buffer. Further, the cache fills the real cache data buffer with data responsive to the address from said memory if the address is found only in the virtual cache address buffer. The invention thus loads data into the cache only when sequential accesses are occurring and minimizes the overhead of unnecessarily filling the real cache when the host is accessing data in a random access mode.

...read moreread less

96 citations

Proceedings Article•DOI•

Stack caching for interpreters

[...]

M. Anton Ertl¹•Institutions (1)

Vienna University of Technology¹

01 Jun 1995

TL;DR: This paper explores two methods to reduce this overhead for virtual stack machines by caching top-of-stack values in (real machine) registers by using a dynamic or a static method.

...read moreread less

Abstract: An interpreter can spend a significant part of its execution time on accessing arguments of virtual machine instructions. This paper explores two methods to reduce this overhead for virtual stack machines by caching top-of-stack values in (real machine) registers. The dynamic method is based on having, for every possible state of the cache, one specialized version of the whole interpreter; the execution of an instruction usually changes the state of the cache and the next instruction is executed in the version corresponding to the new state. In the static method a state machine that keeps track of the cache state is added to the compiler. Common instructions exist in specialized versions for several states, but it is not necessary to have a version of every instruction for every cache state. Stack manipulation instructions are optimized away.

...read moreread less

94 citations

Static cache simulation and its applications

[...]

Frank Mueller¹•Institutions (1)

Florida State University¹

20 Nov 1995

TL;DR: The technique of static cache simulation is shown to address the issue of predicting cache behavior, contrary to the belief that cache memories introduce unpredictability to real-time systems that cannot be efficiently analyzed.

...read moreread less

Abstract: This work takes a fresh look at the simulation of cache memories. It introduces the technique of static cache simulation that statically predicts a large portion of cache references. To efficiently utilize this technique, a method to perform efficient on-the-fly analysis of programs in general is developed and proved correct. This method is combined with static cache simulation for a number of applications. The application of fast instruction cache analysis provides a new framework to evaluate instruction cache memories that outperforms even the fastest techniques published. Static cache simulation is shown to address the issue of predicting cache behavior, contrary to the belief that cache memories introduce unpredictability to real-time systems that cannot be efficiently analyzed. Static cache simulation for instruction caches provides a large degree of predictability for real-time systems. In addition, an architectural modification through bit-encoding is introduced that provides fully predictable caching behavior. Even for regular instruction caches without architectural modifications, tight bounds for the execution time of real-time programs can be derived from the information provided by the static cache simulator. Finally, the debugging of real-time applications can be enhanced by displaying the timing information of the debugged program at breakpoints. The timing information is determined by simulating the instruction cache behavior during program execution and can be used, for example, to detect missed deadlines and locate time-consuming code portions. Overall, the technique of static cache simulation provides a novel approach to analyze cache memories and has been shown to be very efficient for numerous applications.

...read moreread less

93 citations

Patent•

Parallelized coherent read and writeback transaction processing system for use in a packet switched cache coherent multiprocessor system

[...]

Satyanarayana Nishtala¹, Zahir Ebrahim¹, William C. Van Loo¹, Paul N. Loewenstein¹, Sue K. Lee¹, Louis F. Coffin¹ - Show less +2 more•Institutions (1)

Sun Microsystems¹

31 Mar 1995

TL;DR: In this paper, a multiprocessor computer system is provided having a multiplicity of sub-systems and a main memory coupled to a system controller, each of which includes a master interface having master classes for sending memory transaction requests to the system controller.

...read moreread less

Abstract: A multiprocessor computer system is provided having a multiplicity of sub-systems and a main memory coupled to a system controller. An interconnect module, interconnects the main memory and sub-systems in accordance with interconnect control signals received from the system controller. At least two of the sub-systems are data processors, each having a respective cache memory that stores multiple blocks of data and a respective master cache index. Each master cache index has a set of master cache tags (Etags), including one cache tag for each data block stored by the cache memory. Each data processor includes a master interface having master classes for sending memory transaction requests to the system controller. The system controller includes memory transaction request logic for processing each memory transaction request by a data processor. The system controller maintains a duplicate cache index having a set of duplicate cache tags (Dtags) for each data processor. Each data processor has a writeback buffer for storing the data block previously stored in a victimized cache line until its respective writeback transaction is completed and an Nth+1 Dtag for storing the cache state of a cache line associated with a read transaction which is executed prior to an associated writeback transaction of a read-writeback transaction pair. Accordingly, upon a cache miss, the interconnect may execute the read and writeback transactions in parallel relying on the writeback buffer or Nth+1 Dtag to accommodate any ordering of the transactions.

...read moreread less

Proceedings Article•DOI•

Cache miss heuristics and preloading techniques for general-purpose programs

[...]

Toshihiro Ozawa¹, Yasunori Kimura¹, Shin'ichiro Nishizaki¹•Institutions (1)

Fujitsu¹

01 Dec 1995

TL;DR: This paper presents a latency-hiding compiler technique that is applicable to general-purpose C programs that 'preloads' the data that are likely to cause a cache-miss before they are used, and thereby hiding the cache miss latency.

...read moreread less

Abstract: Previous research on hiding memory latencies has tended to focus on regular numerical programs. This paper presents a latency-hiding compiler technique that is applicable to general-purpose C programs. By assuming a lock-up free cache and instruction score-boarding, our technique 'preloads' the data that are likely to cause a cache-miss before they are used, and thereby hiding the cache miss latency. We have developed simple compiler heuristics to identify load instructions that are likely to cause a cache-miss. Experimentation with a set of SPEC92 benchmarks shows that our heuristics are successful in identifying 85% of cache misses. We have also developed an algorithm that flexibly schedules the selected load instruction and instructions that use the loaded data to hide memory latency. Our simulation suggests that our technique is successful in hiding memory latency and improves the overall performance.

...read moreread less

Proceedings Article•DOI•

Cache designs for energy efficiency

[...]

Ching-Long Su¹, A.M. Despain¹•Institutions (1)

University of Southern California¹

04 Jan 1995

TL;DR: Experimental results suggest that both the block buffering and Gray code addressing techniques are ideal for instruction cache designs which tend to be accessed in a consecutive sequence and can achieve an order of magnitude energy reduction on caches.

...read moreread less

Abstract: Caches usually consume a significant amount of energy in modern microprocessors (eg superpipelined or superscalar processors) In this paper, we examine contemporary cache design techniques and provide an analytical model for estimating cache energy consumption We also present several novel techniques for designing energy-efficient caches, which include block buffering, cache sub-banking, and Gray code addressing Experimental results suggest that both the block buffering and Gray code addressing techniques are ideal for instruction cache designs which tend to be accessed in a consecutive sequence Cache sub-banking is ideal for both instruction and data caches Overall, these techniques can achieve an order of magnitude energy reduction on caches >

...read moreread less

Patent•

Microprocessor unit having a first level write-through cache memory and a smaller second-level write-back cache memory

[...]

Akio Shigeeda¹•Institutions (1)

Texas Instruments¹

15 Mar 1995

TL;DR: In this paper, an electronic device for use in a computer system, and having a small second-level write-back cache, is disclosed, where the device may be implemented into a single integrated circuit, as a microprocessor unit, to include a micro processor core, a memory controller circuit, and first and second level caches.

...read moreread less

Abstract: An electronic device for use in a computer system, and having a small second-level write-back cache, is disclosed. The device may be implemented into a single integrated circuit, as a microprocessor unit, to include a microprocessor core, a memory controller circuit, and first and second level caches. In a system implementation, the device is connected to external dynamic random access memory (DRAM). The first level cache is a write-through cache, while the second level cache is a write-back cache that is much smaller than the first level cache. In operation, a write access that is a cache hit in the second level cache writes to the second level cache, rather than to DRAM, thus saving a wait state. A dirty bit is set for each modified entry in the second level cache. Upon the second level cache being full of modified data, a cache flush to DRAM is automatically performed. In addition, each entry of the second level cache is flushed to DRAM upon each of its byte locations being modified. The computer system may also include one or more additional integrated circuit devices, such as a direct memory access (DMA) circuit and a bus bridge interface circuit for bidirectional communication with the microprocessor unit. The microprocessor unit may also include handshaking control to prohibit configuration register updating when a memory access is in progress or is imminent. The disclosed microprocessor unit also includes circuitry for determining memory bank size and memory address type.

...read moreread less

Patent•

Superscalar microprocessor employing away prediction structure

[...]

James S. Roberts¹, James K. Pickett¹•Institutions (1)

Advanced Micro Devices¹

31 Aug 1995

TL;DR: In this article, a superscalar microprocessor employing a way prediction structure is provided, which predicts a way of an associative cache in which an access will hit, and causes the data bytes from the predicted way to be conveyed as the output of the cache.

...read moreread less

Abstract: A superscalar microprocessor employing a way prediction structure is provided. The way prediction structure predicts a way of an associative cache in which an access will hit, and causes the data bytes from the predicted way to be conveyed as the output of the cache. The typical tag comparisons to the request address are bypassed for data byte selection, causing the access time of the associative cache to be substantially the access time of the direct-mapped way prediction array within the way prediction structure. Also included in the way prediction structure is a way prediction control unit configured to update the way prediction array when an incorrect way prediction is detected. The clock cycle of the superscalar microprocessor including the way prediction structure with its caches may be increased if the cache access time is limiting the clock cycle. Additionally, the associative cache may be retained in the high frequency superscalar microprocessor (which might otherwise employ a direct-mapped cache for access time reasons). Single clock cycle cache access to an associative data cache is maintained for high frequency operation.

...read moreread less

Proceedings Article•DOI•

Streamlining data cache access with fast address calculation

[...]

Todd Austin¹, Dionisios Pnevmatikatos¹, Gurindar S. Sohi¹•Institutions (1)

University of Wisconsin-Madison¹

01 May 1995

TL;DR: This paper presents the design and evaluation of a fast address generation mechanism capable of eliminating the delays caused by effective address calculation for many loads and stores, and responds well to software support, in many cases providing better program speedups and reducing cache bandwidth requirements.

...read moreread less

Abstract: For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the design and evaluation of a fast address generation mechanism capable of eliminating the delays caused by effective address calculation for many loads and stores.Our approach works by predicting early in the pipeline (part of) the effective address of a memory access and using this predicted address to speculatively access the data cache. If the prediction is correct, the cache access is overlapped with non-speculative effective address calculation. Otherwise, the cache is accessed again in the following cycle, this time using the correct effective address. The impact on the cache access critical path is minimal; the prediction circuitry adds only a single OR operation before cache access can commence. In addition, verification of the predicted effective address is completely decoupled from the cache access critical path.Analyses of program reference behavior and subsequent performance analysis of this approach shows that this design is a good one, servicing enough accesses early enough to result in speedups for all the programs we tested. Our approach also responds well to software support, which can significantly reduce the number of mispredicted effective addresses, in many cases providing better program speedups and reducing cache bandwidth requirements.

...read moreread less

Patent•

Disk caching system for selectively providing interval caching or segment caching of vided data

[...]

Asit Dan¹, Dinkar Sitaram¹•Institutions (1)

IBM¹

31 Jul 1995

TL;DR: In this article, a system and method for caching sequential data streams in a cache storage device is presented, where each information stream is made as to whether its data blocks should be discarded from cache as they are read by a consuming process.

...read moreread less

Abstract: A system and method for caching sequential data streams in a cache storage device. For each information stream, a determination is made as to whether its data blocks should discarded from cache as they are read by a consuming process. Responsive to a determination that the data blocks of a stream should be discarded from the cache are read by the consuming process, the data blocks associated with that stream are cached in accordance with an interval caching algorithm. Alternatively, responsive to a determination that the data blocks of a stream should not be discarded from the cache storage device as they are read by the consuming process, the data blocks of that stream are cached in accordance with a segment caching algorithm.

...read moreread less

Proceedings Article•DOI•

Caching processor general registers

[...]

R. Yung¹, N.C. Wilhelm•Institutions (1)

Sun Microsystems Laboratories¹

02 Oct 1995

TL;DR: A combination of bypassing and register caching is proposed, taking advantage of register values that are bypassed within a processor's pipeline, and supplementing the bypassed values with values supplied by a small register cache to meet a fast target cycle time.

...read moreread less

Abstract: VLIW, multi-context, or windowed-register architectures may require one hundred or more processor registers. It can be difficult to design a register file with so many registers that meets processor cycle time requirements. We propose to resolve this problem by taking advantage of register values that are bypassed within a processor's pipeline, and supplementing the bypassed values with values supplied by a small register cache. If the register cache is sufficiently small then it can be designed to meet a fast target cycle time. We call this combination of bypassing and register caching the register scoreboard and cache. We develop a simple performance model and show by simulations that it can be effective for windowed-register architectures.

...read moreread less

Patent•

Methods and structure to maintain a two level cache in a RAID controller and thereby selecting a preferred posting method

[...]

Rodney A. Dekoning, Donald R. Humlicek, Max L. Johnson¹, Curtis W. Rink•Institutions (1)

Avago Technologies¹

23 Oct 1995

TL;DR: In this article, a two-level cache data structure and associated methods are implemented with a RAID controller to reduce the overhead of the RAID controller in determining which blocks are present in the lower level cache.

...read moreread less

Abstract: Methods and associated data structures operable in a RAID subsystem to improve I/O performance. A two level cache data structure and associated methods are implemented with a RAID controller. The lower level cache comprises buffers holding recently utilized blocks of the disk devices. The upper level cache records which blocks are present in the lower level cache for each stripe in the RAID level 5 configuration. The upper level cache serves to reduce the overhead processing required of the RAID controller to determine which blocks are present in the lower level cache. Having more rapid access to this information by lowering the processing overhead enables the present invention to rapidly select between different write techniques to post data and error blocks from low level cache to the disk array. A RMW write technique is used to post data and error checking blocks to disk when insufficient information reside in the lower level cache. A faster Full Write technique (also referred to as Stripe Write) is used to post data and error checking blocks to disk when all required, related blocks are resident in the lower level cache. The Full Write technique reduces the total number of I/O operations required of the disk devices to post the update as compared to the RMW technique. The two level cache of the present invention enables a rapid selection between the RMW and Full Write techniques.

...read moreread less

Patent•

High available error self-recovering shared cache for multiprocessor systems

[...]

Gerhard Döttling¹, Klaus-Jörg Getzlaff¹, Bernd Leppla¹, Udo Wille¹•Institutions (1)

IBM¹

18 Apr 1995

TL;DR: In this article, the authors propose a self-recovery mechanism for errors in the associated cache directory or the shared cache itself by invalidating all the entries in the cache directory of the accessed congruence class by resetting Valid bits to "0" and setting the Parity bit to a correct value.

...read moreread less

Abstract: A high available shared cache memory in a tightly coupled multiprocessor system provides an error self-recovery mechanism for errors in the associated cache directory or the shared cache itself. After an error in a congruence class of the cache is indicated by an error status register, self-recovery is accomplished by invalidating all the entries in the shared cache directory means of the accessed congruence class by resetting Valid bits to '0' and by setting the Parity bit to a correct value, wherein the request for data to the main memory is not cancelled. Multiple bit failures in the cached data are recovered by setting the Valid bit in the matching column to '0'. The processor reissues the request for data, which is loaded into the processor's private cache and the shared cache as well. Further requests to this data by other processors are served by the shared cache.

...read moreread less

Patent•

Method and apparatus for maintaining cache coherency using a single controller for multiple cache memories

[...]

Gurbir Singh¹, Konrad K. Lai¹, Michael W. Rhodehamel¹•Institutions (1)

Intel¹

31 Oct 1995

TL;DR: In this article, the authors present a method and apparatus for controlling multiple cache memories with a single cache controller using a processor to control the operation of its on-chip level one cache memory and a level two cache memory.

...read moreread less

Abstract: A method and apparatus for controlling multiple cache memories with a single cache controller. The present invention uses a processor to control the operation of its on-chip level one (L1) cache memory and a level two (L2) cache memory. In this manner, the processor is able to send operations to be performed to the L2 cache memory, such as writing state and/or cache line status to the L2 cache memory. A dedicated bus is coupled between dice. This dedicated bus is used to send control and other signals between the processor and the L2 cache memory.

...read moreread less

Patent•

Distributed shared-cache for multi-processors

[...]

David R. Cheriton

03 Mar 1995

TL;DR: In this paper, the authors propose a distributed shared cache that operates at the level of a second-level memory cache, at the third level of the memory system, and at the purely software-managed page cache level.

...read moreread less

Abstract: A distributed-shared cache operates at the level of a second-level memory cache, at the third level of the memory system, and at the purely software-managed page cache level. On a cache miss that is local to a processor, an attempt is made to locate the data in a cache memory block on a peer memory level, before explicitly requesting the data from more distant memory. Communication support is integrated into the memory system to piggyback communication performance improvements on improvements to the memory system. In particular, the cache lines can operate in a message mode to deliver message data to interested receivers to support networking and devices. Embodiments of the invention works across all memory levels with only modest changes in detail.

...read moreread less

Patent•

Power management architecture for a reconfigurable write-back cache

[...]

James C. Lin

07 Jun 1995

TL;DR: In this paper, a power monitoring device places a write-back cache memory into a writethrough mode upon detection of a low-battery condition or a user request, which can save significant power during typical portable computer operations.

...read moreread less

Abstract: A computer system having a power monitoring device places a write-back cache memory into a write-through mode upon detection of a low-battery condition or a user request. Under write-through mode, the cache memory need not be flushed every time a suspend mode of the computer system is entered. Thus significant power is saved during typical portable computer operations.

...read moreread less

Patent•

System and method for indicating that a processor has prefetched data into a primary cache and not into a secondary cache

[...]

Michael John Mayfield¹, Trinh Huy Nguyen¹, Robert J. Reese¹, Michael Thomas Vaden¹•Institutions (1)

IBM¹

23 Aug 1995

TL;DR: In this paper, the prefetching of cache lines is performed in a progressive manner in a data processing system implementing L1 and L2 caches and stream filters and buffers, where data may not be prefetched.

...read moreread less

Abstract: Within a data processing system implementing L1 and L2 caches and stream filters and buffers, prefetching of cache lines is performed in a progressive manner. In one mode, data may not be prefetched. In a second mode, two cache lines are prefetched wherein one line is prefetched into the L1 cache and the next line is prefetched into a stream buffer. In a third mode, more than two cache lines are prefetched at a time. In the third mode cache lines may be prefetched to the L1 cache and not the L2 cache, resulting in no inclusion between the L1 and L2 caches. A directory field entry provides an indication of whether or not a particular cache line in the L1 cache is also included in the L2 cache.

...read moreread less

Patent•

Method and apparatus for maintaining a macro instruction for refetching in a pipelined processor

[...]

Darrell D. Boggs¹, Robert P. Colwell¹, Michael A. Fetterman¹, Andrew F. Glew¹, Ashwani Kumar Gupta¹, Glenn J. Hinton¹, David B. Papworth¹ - Show less +3 more•Institutions (1)

Intel¹

04 Aug 1995

TL;DR: In this article, a method and apparatus for instruction refetching in a processor is described, where a marker micro instruction is inserted into the processor pipeline when an instruction cache line is victimized.

...read moreread less

Abstract: A method and apparatus for instruction refetch in a processor is provided. To ensure that a macro instruction is available for refetching after the processor has handled an event or determined a correct restart address after a branch misprediction, an instruction memory includes an instruction cache for caching macro instructions to be fetched, and a victim cache for caching victims from the instruction cache. To ensure the availability of a macro instruction for refetching, the instruction memory (the instruction cache and victim cache together) always stores a macro instruction that may need to be refetched until the macro instruction is committed to architectural state. A marker micro instruction is inserted into the processor pipeline when an instruction cache line is victimized. The marker specifies an entry in the victim cache occupied by the victimized cache line. When the marker instruction is committed to architectural state, the victim cache entry specified by the marker is deallocated in the victim cache to permit storage of other instruction cache victims.

...read moreread less

Patent•

System and method for selectively controlling fetching and prefetching of data to a processor

[...]

Dwain A. Hicks¹, Mayfield Michael John¹, David Scott Ray¹, Shih-Hsiung Stephen Tung¹•Institutions (1)

IBM¹

06 Nov 1995

TL;DR: In this article, a control circuitry coupled with a stream filter circuit, selectively controls fetching and prefetching of data from system memory to the primary and secondary caches associated with a processor and to a stream buffer circuit.

...read moreread less

Abstract: Within a data processing system implementing primary and secondary caches and stream filters and buffers, prefetching of cache lines is performed in a progressive manner. In one mode, data may not be prefetched. In a second mode, two cache lines are prefetched wherein one line is prefetched into the L1 cache and the next line is prefetched into a stream buffer. In a third mode, more than two cache lines are prefetched at a time. Prefetching may be performed on cache misses or hits. Cache misses on successive cache lines may allocate a stream of cache lines to the stream buffers. Control circuitry, coupled to a stream filter circuit, selectively controls fetching and prefetching of data from system memory to the primary and secondary caches associated with a processor and to a stream buffer circuit.

...read moreread less

Patent•

Modified L1/L2 cache inclusion for aggressive prefetch

[...]

Michael John Mayfield¹, Trinh Huy Nguyen¹, Robert J. Reese¹, Michael Thomas Vaden¹•Institutions (1)

IBM¹

23 Aug 1995

TL;DR: In this article, prefetching of cache lines is performed in a progressive manner in a data processing system implementing L1 and L2 caches and stream filters and buffers, where data may not be prefetched.

...read moreread less

Proceedings Article•DOI•

Energy optimization of multi-level processor cache architectures

[...]

Uming Ko¹, Poras T. Balsara², Ashwini K. Nanda¹•Institutions (2)

Texas Instruments¹, University of Texas at Dallas²

23 Apr 1995

TL;DR: Investigation is extended to analyze energy effects from cache parameters in a multi-level cache design based on execution of SPECint92 benchmark programs with miss ratios of a RISC processor.

...read moreread less

Abstract: To optimize performance and power of a processor’s cache, a multiple-divided module (MDM) cache architecture is proposed to save power at memory peripherals as well as the bit array. For a MxB-divided MDM cache, latency is equivalent to that of the smallest module and power consumption is only 1/MxB of the regular, non-divided cache. Based on the architecture and given transistor budgets for onchip processor caches, this paper extends investigation to analyze energy effects from cache parameters in a multi-level cache design. The analysis is based on execution of SPECint92 benchmark programs with miss ratios of a RISC processor.

...read moreread less

Patent•

Method and apparatus for concurrent access to multiple physical caches

[...]

David R. Stiles¹, Teresa A. Roth¹•Institutions (1)

Advanced Micro Devices¹

13 Mar 1995

TL;DR: In this paper, a cache controller for a system having first and second level cache memories is presented, where a stack of registers coupled to the address pipeline is used to perform multiple line replacements of the first level cache memory without interfering with current first level look-ups.

...read moreread less

Abstract: A cache controller for a system having first and second level cache memories. The cache controller has multiple stage address and data pipelines. A look-up system allows concurrent look-up of tag addresses in the first and second level caches using the address pipeline. The multiple stages allow a miss in the first level cache to be moved to the second stage so that the latency does not slow the look-up of a next address in the first level cache. A write data pipeline allows the look-up of data being written to the first level cache for current read operations. A stack of registers coupled to the address pipeline is used to perform multiple line replacements of the first level cache memory without interfering with current first level cache look-ups. Multiple banks associated with a multiple set associative cache are stored in a single chip, reducing the number of SRAMs required. Certain status information for the second level (L2) cache is stored with the status information of the first level cache. This enhances the speed of operations by avoiding a status look-up and modification in the L2 cache during a write operation. In addition, the L2 cache tag address and status bits are stored in a portion of one bank of the L2 data RAMs, further reducing the number of SRAMs required. Finally, the present invention also provides local read-write storage for use by the processor by reserving a number of L2 cache lines.

...read moreread less

Journal Article•DOI•

Performance improvement of the memory hierarchy of RISC-systems by application of 3-D-technology

[...]

M.B. Kleiner¹, S.A. Kuhn¹, Peter Ramm², Werner Weber¹•Institutions (2)

Siemens¹, Fraunhofer Society²

21 May 1995

TL;DR: In this paper, the performance of 3-D-based RISC-systems is investigated and a model based on measured miss rates and on an analytical access time model is used.

...read moreread less

Abstract: In this paper, potential performance improvements of the memory hierarchy of RISC-systems for implementations employing 3-D-technology are investigated. Relating to RISC-systems, 3-D ICs will offer the opportunity for integrating much more memory on-chip (i.e. on one IC or 3-D IC with the processor). As a result, the second-level cache may be moved on-chip. The available on-chip cache may alternatively be organized in three levels. Investigations were also performed for the case of the main memory being integrated on-chip. Current restrictions of conventional RISC-system implementations, such as limited available transistor count for on-chip caching, confined data bus width between processor-chip and the off-chip second-level cache, long access times of the second-level cache, strongly limit the achievable performance of the memory hierarchy and may be either removed or at least substantially reduced by the use of 3-D ICs. To evaluate the performance improvements of implementations employing 3-D ICs, a model based on measured miss rates and on an analytical access time model is used. The average time per-instruction is employed as the performance measure. Results of extensive case studies indicate, that substantial performance improvements depending on implementation, cache sizes, cache organization, and miss rates are achievable using 3-D ICs. A comparison of four optimized implementations all with a total cache size of approximately 1 MB yielded performance improvements in the range of 23% to 31% for the implementations employing 3-D-technology over the conventionally implemented system. It is concluded that 3-D-technology will be very attractive for future high performance RISC-systems, since the system performance depends vitally on the performance of the memory hierarchy.

...read moreread less