Showing papers on "Cache invalidation published in 1994"

PDF

Open Access

Proceedings Article•DOI•

Sleepers and workaholics: caching strategies in mobile environments

[...]

Daniel Barbará¹, Tomasz Imielinski²•Institutions (2)

Princeton University¹, Rutgers University²

24 May 1994

TL;DR: A taxonomy of different cache invalidation strategies is proposed and it is determined that for the units which are often disconnected (sleepers) the best cache invalidations strategy is based on signatures previously used for efficient file comparison, and for units which is connected most of the time (workaholics), the best Cache invalidation strategy isbased on the periodic broadcast of changed data items.

...read moreread less

Abstract: In the mobile wireless computing environment of the future a large number of users equipped with low powered palm-top machines will query databases over the wireless communication channels. Palmtop based units will often be disconnected for prolonged periods of time due to the battery power saving measures; palmtops will also frequencly relocate between different cells and connect to different data servers at different times. Caching of frequently accessed data items will be an important technique that will reduce contention on the narrow bandwidth wireless channel. However, cache invalidation strategies will be severely affected by the disconnection and mobility of the clients. The server may no longer know which clients are currently residing under its cell and which of them are currently on. We propose a taxonomy of different cache invalidation strategies and study the impact of client's disconnection times on their performance. We determine that for the units which are often disconnected (sleepers) the best cache invalidation strategy is based on signatures previously used for efficient file comparison. On the other hand, for units which are connected most of the time (workaholics), the best cache invalidation strategy is based on the periodic broadcast of changed data items.

...read moreread less

454 citations

Journal Article•DOI•

False sharing and spatial locality in multiprocessor caches

[...]

Josep Torrellas¹, H.S. Lam², John L. Hennessy²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Stanford University²

01 Jun 1994-IEEE Transactions on Computers

TL;DR: To mitigate false sharing and to enhance spatial locality, the layout of shared data in cache blocks is optimized in a programmer-transparent manner and it is shown that this approach can reduce the number of misses on shared data by about 10% on average.

...read moreread less

Abstract: The performance of the data cache in shared-memory multiprocessors has been shown to be different from that in uniprocessors. In particular, cache miss rates in multiprocessors do not show the sharp drop typical of uniprocessors when the size of the cache block increases. The resulting high cache miss rate is a cause of concern, since it can significantly limit the performance of multiprocessors. Some researchers have speculated that this effect is due to false sharing, the coherence transactions that result when different processors update different words of the same cache block in an interleaved fashion. While the analysis of six applications in the paper confirms that false sharing has a significant impact on the miss rate, the measurements also show that poor spatial locality among accesses to shared data has an even larger impact. To mitigate false sharing and to enhance spatial locality, we optimize the layout of shared data in cache blocks in a programmer-transparent manner. We show that this approach can reduce the number of misses on shared data by about 10% on average. >

...read moreread less

265 citations

Patent•

Method and apparatus for managing relational data in an object cache

[...]

Richard H. Jensen, Derek P. Henninger

29 Jul 1994

TL;DR: In this article, a technique called key swizzling is proposed to reduce the volume of queries to the structured database by using explicit relationship pointers between object instances in the object cache.

...read moreread less

Abstract: In an object-oriented application being executed in a digital computing system comprising a processor, a method and apparatus are provided for managing information retrieved from a structured database, such as a relational database, wherein the processor is used to construct a plurality of object instances, each of these object instances having its own unique object ID that provides a mapping between the object instance and at least one row in the structured database. The processor is used to construct a single cohesive data structure, called an object cache, that comprises all the object instances and that represents information retrieved from the structured database in a form suitable for use by one or more object-oriented applications. A mechanism for managing the object cache is provided that has these three properties: First, through a technique called key swizzling, it uses explicit relationship pointers between object instances in the object cache to reduce the volume of queries to the structured database. Second, it ensures that only one copy of an object instance is in the cache at any given time, even if several different queries return the same information from the database. Third, the mechanism guarantees the integrity of data in the cache by locking data appropriately in the structured database during a database transaction, flushing cache data at the end of each transaction, and transparently re-reading the data and reacquiring the appropriate locks for an object instance whose data has been flushed.

...read moreread less

242 citations

Proceedings Article•DOI•

Bounding worst-case instruction cache performance

[...]

Arnold¹, Mueller¹, Whalley¹, Harmon•Institutions (1)

Florida State University¹

07 Dec 1994

TL;DR: This paper describes an approach for bounding the worst-case instruction cache performance of large code segments by using static cache simulation to analyze a program's control flow to statically categorize the caching behavior of each instruction.

...read moreread less

Abstract: The use of caches poses a difficult tradeoff for architects of real-time systems. While caches provide significant performance advantages, they have also been viewed as inherently unpredictable, since the behavior of a cache reference depends upon the history of the previous references. The use of caches is only suitable for real-time systems if a reasonably tight bound on the performance of programs using cache memory can be predicted. This paper describes an approach for bounding the worst-case instruction cache performance of large code segments. First, a new method called static cache simulation is used to analyze a program's control flow to statically categorize the caching behavior of each instruction. A timing analyzer, which uses the categorization information, then estimates the worst-case instruction cache performance for each loop and function in the program. >

...read moreread less

233 citations

Proceedings Article•

Cache Conscious Algorithms for Relational Query Processing

[...]

Ambuj Shatdal¹, Chander Kant¹, Jeffrey F. Naughton¹•Institutions (1)

University of Wisconsin-Madison¹

12 Sep 1994

TL;DR: It is shown that there are significant benefits in redesigning traditional query processing algorithms so that they can make better use of the cache, and new algorithms run 8%-200% faster than the traditional ones.

...read moreread less

Abstract: The current main memory (DRAM) access speeds lag far behind CPU speeds Cache memory, made of static RAM, is being used in today’s architectures to bridge this gap It provides access latencies of 2-4 processor cycles, in contrast to main memory which requires 15-25 cycles Therefore, the performance of the CPU depends upon how well the cache can be utilized We show that there are significant benefits in redesigning our traditional query processing algorithms so that they can make better use of the cache The new algorithms run 8%-200% faster than the traditional ones

...read moreread less

215 citations

Proceedings Article•DOI•

Tradeoffs in two-level on-chip caching

[...]

Norman P. Jouppi, Steven J. E. Wilton¹•Institutions (1)

University of Toronto¹

01 Apr 1994

TL;DR: Two-level exclusive caching improves the performance of two-level caching organizations by increasing the effective associativity and capacity.

...read moreread less

Abstract: The performance of two-level on-chip caching is investigated for a range of technology and architecture assumptions. The area and access time of each level of cache is modeled in detail. The results indicate that for most workloads, two-level cache configurations (with a set-associative second level) perform marginally better than single-level cache configurations that require the same chip area once the first-level cache sizes are 64KB or larger. Two-level configurations become even more important in systems with no off-chip cache and in systems in which the memory cells in the first-level caches are multiported and hence larger than those in the second-level cache. Finally, a new replacement policy called two-level exclusive caching is introduced. Two-level exclusive caching improves the performance of two-level caching organizations by increasing the effective associativity and capacity.

...read moreread less

195 citations

Journal Article•DOI•

Avoiding conflict misses dynamically in large direct-mapped caches

[...]

Brian N. Bershad¹, Dennis Lee¹, Theodore H. Romer¹, J. Bradley Chen²•Institutions (2)

University of Washington¹, Carnegie Mellon University²

01 Nov 1994

TL;DR: Using trace-driven simulation of applications and the operating system, it is shown that a CML buffer enables a large direct-mapped cache to perform nearly as well as a two-way set associative cache of equivalent size and speed, although with lower hardware cost and complexity.

...read moreread less

Abstract: This paper describes a method for improving the performance of a large direct-mapped cache by reducing the number of conflict misses. Our solution consists of two components: an inexpensive hardware device called a Cache Miss Lookaside (CML) buffer that detects conflicts by recording and summarizing a history of cache misses, and a software policy within the operating system's virtual memory system that removes conflicts by dynamically remapping pages whenever large numbers of conflict misses are detected. Using trace-driven simulation of applications and the operating system, we show that a CML buffer enables a large direct-mapped cache to perform nearly as well as a two-way set associative cache of equivalent size and speed, although with lower hardware cost and complexity.

...read moreread less

187 citations

Patent•

Dynamic flow instruction cache memory organized around trace segments independent of virtual address line

[...]

Alexander D. Peleg¹, Uri Weiser¹•Institutions (1)

Intel¹

30 Mar 1994

TL;DR: In this paper, the authors propose an improved cache and organization particularly suitable for superscalar architectures, where the cache is organized around trace segments of running programs rather than an organization based on memory addresses.

...read moreread less

Abstract: An improved cache and organization particularly suitable for superscalar architectures. The cache is organized around trace segments of running programs rather than an organization based on memory addresses. A single access to the cache memory may cross virtual address line boundaries. Branch prediction is integrally incorporated into the cache array permitting the crossing of branch boundaries with a single access.

...read moreread less

186 citations

Proceedings Article•

Application-controlled file caching policies

[...]

Pei Cao¹, Edward W. Felten¹, Kai Li¹•Institutions (1)

Princeton University¹

06 Jun 1994

TL;DR: The main contribution of this paper is the solution to the allocation problem, which allows processes to manage their own cache blocks, while at the same time maintains the dynamic allocation of cache blocks among processes.

...read moreread less

Abstract: We consider how to improve the performance of file caching by allowing user-level control over file cache replacement decisions. We use two-level cache management: the kernel allocates physical pages to individual applications (allocation), and each application is responsible for deciding how to use its physical pages (replacement). Previous work on two-level memory management has focused on replacement, largely ignoring allocation. The main contribution of this paper is our solution to the allocation problem. Our solution allows processes to manage their own cache blocks, while at the same time maintains the dynamic allocation of cache blocks among processes. Our solution makes sure that good user-level policies can improve the file cache hit ratios of the entire system over the existing replacement approach. We evaluate our scheme by trace-based simulation, demonstrating that it leads to significant improvements in hit ratios for a variety of applications.

...read moreread less

138 citations

Software-based cache partitioning for real-time applications

[...]

Andrew Wolfe

01 Mar 1994

124 citations

Patent•

Performance enhancement system and method for a hierarchical data cache using a RAID parity scheme

[...]

Brent Cameron Beardsley¹, Joel H. Cord¹, S. Hyde Ii Joseph¹, Vernon J. Legvold¹, Carol Santich Michod¹, Gary E. Morain¹, Chan Y. Ng¹, John R. Paveza¹, Lloyd R. Shipman¹ - Show less +5 more•Institutions (1)

IBM¹

20 Jun 1994

TL;DR: In this paper, a cache storage drawer containing a plurality of DASD devices for implementing a RAID parity data protection scheme, and permanently storing data, is coupled with a cache controller.

...read moreread less

Abstract: A system and method for reducing device wait time in response to a host initiated write operation modifying a data block. The system includes a host computer channel connected to a storage controller which has cache memory and a nonvolatile storage buffer in a first embodiment. An identical system makes up the second embodiment with the exception that there is no nonvolatile storage buffer in the storage controller of the second embodiment. The controller in either embodiment is coupled to a cache storage drawer containing a plurality of DASD devices for implementing a RAID parity data protection scheme, and for permanently storing data. The drawer has nonvolatile cache memory which is used for accepting data destaged from controller cache. In a first embodiment, no commit reply is sent to the controller to indicate that data has been written to DASD. Instead a status information block is created to indicate that the data has been destaged from controller cache but is not committed. The status information is stored in directory means attached to the controller. The system uses this information to create a list of data which is in the state of Not committed. In this way data can be committed according to a cache management algorithm of least recently used (LRU), rather than requiring synchronous commit which is inefficient because it requires waiting on a commit response and ties up nonvolatile storage space allocated to back-up copies of cache data. In a second embodiment, directory means attached to the controller stores information about status blocks that may be modified or unmodified. The status information is used to eliminate wait times associated with waiting for data to be written to HDAs below.

...read moreread less

Patent•

Process of predicting and controlling the use of cache memory in a computer system

[...]

Michael S. Milillo¹, Patrick A. L. De Martine¹•Institutions (1)

Storage Technology Corporation¹

30 Dec 1994

TL;DR: In this article, the cache memory space in a computer system is controlled on a dynamic basis by adjusting the low threshold which triggers the release of more cache free space and the high threshold which ceases the free space.

...read moreread less

Abstract: The cache memory space in a computer system is controlled on a dynamic basis by adjusting the low threshold which triggers the release of more cache free space and by adjusting the high threshold which ceases the release of free space The low and high thresholds are predicted based on the number of allocations which are accomplished in response to I/O requests, and based on the number of blockages which occur when an allocation can not be accomplished The predictions may be based on weighted values of different historical time periods, and the high and low thresholds may be made equal to one another In this manner the performance degradation resulting from variations in workload caused by prior art fixed or static high and low thresholds is avoided Instead only a predicted amount of cache memory space is freed and that amount of free space is more likely to accommodate the predicted output requests without releasing so much cache space that an unacceptable number of blockages occur

...read moreread less

Patent•

System for dynamically controlling cache manager maintaining cache index and controlling sequential data access

[...]

Moshe Yanai¹, Natan Vishlitzky¹, Bruno Alterescu¹, Daniel Castel¹•Institutions (1)

EMC Corporation¹

12 Dec 1994

TL;DR: In this paper, a cache indexer maintains a current index of data elements which are stored in cache memory and a sequential data access indicator, responsive to the cache index and to a user selectable sequential access threshold, determines that a sequential access is in progress for a given process and provides an indication of the same.

...read moreread less

Abstract: A cache management system and method monitors and controls the contents of cache memory coupled to at least one host and at least one data storage device. A cache indexer maintains a current index of data elements which are stored in cache memory. A sequential data access indicator, responsive to the cache index and to a user selectable sequential data access threshold, determines that a sequential data access is in progress for a given process and provides an indication of the same. The system and method allocate a micro-cache memory to any process performing a sequential data access. In response to the indication of a sequential data access in progress and to a user selectable maximum number of data elements to be prefetched, a data retrieval requestor requests retrieval of up to the selected maximum number of data elements from a data storage device. A user selectable number of sequential data elements determines when previously used micro-cache memory locations will be overwritten. A method of dynamically monitoring and adjusting cache management parameters is also presented.

...read moreread less

Patent•

Cache memory system and method with multiple hashing functions and hash control storage

[...]

Anant Agarwal¹, Steven D. Pudar¹•Institutions (1)

Massachusetts Institute of Technology¹

23 Dec 1994

TL;DR: In this paper, a column-associative cache that reduces conflict misses, increases the hit rate and maintains a minimum hit access time is proposed, where the cache lines represent a column of sets.

...read moreread less

Abstract: A column-associative cache that reduces conflict misses, increases the hit rate and maintains a minimum hit access time. The column-associative cache indexes data from a main memory into a plurality of cache lines according to a tag and index field through hash and rehash functions. The cache lines represent a column of sets. Each cache line contains a rehash block indicating whether the set is a rehash location. To increase the performance of the column-associative cache, a content addressable memory (CAM) is used to predict future conflict misses.

...read moreread less

Patent•

Method and structure for evaluating and enhancing the performance of cache memory systems

[...]

Michael A. Salsburg

13 Jul 1994

TL;DR: In this article, an analytical model for calculating cache hit rate for combinations of data sets and LRU sizes is presented. But the model is not directly applied in software for constructing a precise model that can be used to predict cache hit rates for a cache, using statistics accumulated for each element independently.

...read moreread less

Abstract: Method and structure for collecting statistics for quantifying locality of data and thus selecting elements to be cached, and then calculating the overall cache hit rate as a function of cached elements. LRU stack distance has a straight-forward probabilistic interpretation and is part of statistics to quantify locality of data for each element considered for caching. Request rates for additional slots in the LRU are a function of file request rate and LRU size. Cache hit rate is a function of locality of data and the relative request rates for data sets. Specific locality parameters for each data set and arrival rate of requests for data-sets are used to produce an analytical model for calculating cache hit rate for combinations of data sets and LRU sizes. This invention provides algorithms that can be directly implemented in software for constructing a precise model that can be used to predict cache hit rates for a cache, using statistics accumulated for each element independently. The model can rank the elements to find the best candidates for caching. Instead of considering the cache as a whole, the average arrival rates and re-reference statistics for each element are estimated, and then used to consider various combinations of elements and cache sizes in predicting the cache hit rate. Cache hit rate is directly calculated using the to-be-cached files' arrival rates and re-reference statistics and used to rank the elements to find the set that produces the optimal cache hit rate.

...read moreread less

Proceedings Article•DOI•

A caching model of operating system kernel functionality

[...]

David R. Cheriton¹, Kenneth J. Duda¹•Institutions (1)

Stanford University¹

12 Sep 1994

TL;DR: In this article, the cache kernel is proposed to provide a hardware adaptation layer to operating system services rather than just providing a key subset of OS services, as has been the common approach in previous microkernel work.

...read moreread less

Abstract: Operating system design has had limited success in providing adequate application functionality and a poor record in avoiding excessive growth in size and complexity, especially with protected operating systems. Applications require far greater control over memory, I/O and processing resources to meet their requirements. For example, database transaction processing systems include their own "kernel" which can much better manage resources for the application than can the application-ignorant general-purpose conventional operating system mechanisms. Large-scale parallel applications have similar requirements. The same requirements arise with servers implemented outside the operating system kernel.In our research, we have been exploring the approach of making the operating system kernel a cache for active operating systems objects such as processes, address spaces and communication channels, rather than a complete manager of these objects. The resulting system is smaller than recent so-called microkernels, and also provides greater flexibility for applications, including real-time applications, database management systems and large-scale simulations. As part of this research, we have developed what we call a cache kernel, a new generation of microkernel that supports operating system configurations across these dimensions.The cache kernel can also be regarded as providing a hardware adaptation layer (HAL) to operating system services rather than trying to just provide a key subset of OS services, as has been the common approach in previous microkernel work. However, in contrast to conventional HALs, the cache kernel is fault-tolerant because it is protected from the rest of the operating system (and applications), it is replicated in large-scale configurations and it includes audit and recovery mechanisms. A cache kernel has been implemented on a scalable shared-memory and networked multi-computer [1] hardware which provides architectural support for the cache kernel approach.Fig 1 illustrates a typical target configuration. There is an instance of the cache kernel per multi-processor module (MPM), each managing the processors, second-level cache and network interface of that MPM. The cache kernel executes out of PROM and local memory of the MPM, making it hardware-independent of the rest of the system except for power. That is, the separate cache kernels and MPMs fail independently. Operating system services are provided by application kernels, server kernels and conventional operating system emulation kernels in conjunction with privileged MPM resource managers (MRM) that execute on top of the cache kernel. These kernels may be in separate protected address spaces or a shared library within a sophisticated application address space. A system bus connects the MPMs to each other and the memory modules. A high-speed network interface per MPM connects this node to file servers and other similarly configured processing nodes. This overall design can be simplified for real-time applications and similar restricted scenarios. For example, with relatively static partitioning of resources, an embedded real-time application could be structured as one or more application spaces incorporating application kernels as shared libraries executing directly on top of the cache kernel.

...read moreread less

Patent•

Stream buffers for high-performance computer memory systems

[...]

Richard E. Kessler¹, Steven M. Oberlin¹, Steven L. Scott¹, Subbarao Palacharla¹•Institutions (1)

Cray¹

01 Nov 1994

TL;DR: In this article, a filtered stream buffer coupled with a memory and a processor is proposed to prefetch data from the memory, where the filter controller determines whether a pattern of references has a predetermined relationship, and if so, prefetches stream data into the cache block storage area.

...read moreread less

Abstract: Method and apparatus for a filtered stream buffer coupled to a memory and a processor, and operating to prefetch data from the memory. The filtered stream buffer includes a cache block storage area and a filter controller. The filter controller determines whether a pattern of references has a predetermined relationship, and if so, prefetches stream data into the cache block storage area. Such stream data prefetches are particularly useful in vector processing computers, where once the processor starts to fetch a vector, the addresses of future fetches can be predicted based in the pattern of past fetches. According to various aspects of the present invention, the filtered stream buffer further includes a history table, a validity indicator which is associated with the cache block storage area and indicates which cache blocks, if any, are valid. According to yet another aspect of the present invention, the filtered stream buffer controls random access memory (RAM) chips to stream the plurality of consecutive cache blocks from the RAM into the cache block storage area. According to yet another aspect of the present invention, the stream data includes data for a plurality of strided cache blocks, wherein each of which these strided cache blocks corresponds to an address determined by adding to the first address an integer multiple of the difference between the second address and the first address. According to yet another aspect of the present invention, the processor generates three addresses of data words in the memory, and the filter controller determines whether a predetermined relationship exists among three addresses, and if so, prefetches strided stream data into said cache block storage area.

...read moreread less

Patent•

Method and apparatus for implementing a single clock cycle line replacement in a data cache unit

[...]

Haitham Akkary¹, Mandar S. Joshi¹, Rob Murray¹, Brent E. Lince¹, Paul D. Madland¹, Andrew F. Glew¹, Glenn J. Hinton¹ - Show less +3 more•Institutions (1)

Intel¹

30 Sep 1994

TL;DR: In this paper, a data cache unit is employed within a microprocessor capable of speculative and out-of-order processing of memory instructions, where each microprocessor is capable of snooping the cache lines of data cache units of each other microprocessor.

...read moreread less

Abstract: The data cache unit includes a separate fill buffer and a separate write-back buffer. The fill buffer stores one or more cache lines for transference into data cache banks of the data cache unit. The write-back buffer stores a single cache line evicted from the data cache banks prior to write-back to main memory. Circuitry is provided for transferring a cache line from the fill buffer into the data cache banks while simultaneously transferring a victim cache line from the data cache banks into the write-back buffer. Such allows the overall replace operation to be performed in only a single clock cycle. In a particular implementation, the data cache unit is employed within a microprocessor capable of speculative and out-of-order processing of memory instructions. Moreover, the microprocessor is incorporated within a multiprocessor computer system wherein each microprocessor is capable of snooping the cache lines of data cache units of each other microprocessor. The data cache unit is also a non-blocking cache.

...read moreread less

Proceedings Article•DOI•

Decoupled sectored caches: conciliating low tag implementation cost and low miss ratio

[...]

A. Seznec

18 Apr 1994

TL;DR: A decoupled sectored cache will allow the same level of performance as a non-sectored cache, but at a significantly lower hardware cost.

...read moreread less

Abstract: Sectored caches have been used for many years in order to reconcile low tag array size and small or medium block size. In a sectored cache, a single address tag is associated with a sector consisting on several cache lines, while validity, dirty and coherency tags are associated with each of the inner cache lines. Usually in a cache, a cache line location is statically linked to one and only one address tag word location. In the decoupled sectored cache introduced in the paper, this monolithic association is broken; the address tag location associated with a cache line location is dynamically chosen at fetch time among several possible locations. The tag volume on a decoupled sectored cache is in the same range as the tag volume in a traditional sectored cache; but the hit ratio on a decoupled sectored cache is very close to the hit ratio on a non-sectored cache. A decoupled sectored cache will allow the same level of performance as a non-sectored cache, but at a significantly lower hardware cost. >

...read moreread less

Patent•

Master-slave cache system for instruction and data cache memories

[...]

Earl T. Cohen, Russell W. Tilleman, Jay C. Pattin, James S. Blomgren

29 Jun 1994

TL;DR: A master-slave cache system as discussed by the authors uses a set-associative master cache and two smaller direct-mapped slave caches, a slave instruction cache for supplying instructions to an instruction pipeline of a processor, and a slave data cache for providing data operands to an execution pipeline of the processor.

...read moreread less

Abstract: A master-slave cache system has a large, set-associative master cache, and two smaller direct-mapped slave caches, a slave instruction cache for supplying instructions to an instruction pipeline of a processor, and a slave data cache for supplying data operands to an execution pipeline of the processor. The master cache and the slave caches are tightly coupled to each other. This tight coupling allows the master cache to perform most cache management operations for the slave caches, freeing the slave caches to supply a high bandwidth of instructions and operands to the processor's pipelines. The master cache contains tags that include valid bits for each slave, allowing the master cache to determine if a line is present and valid in either of the slave caches without interrupting the slave caches. The master cache performs all search operations required by external snooping, cache invalidation, cache data zeroing instructions, and store-to-instruction-stream detection. The master cache interrupts the slave caches only when the search reveals that a line is valid in a slave cache, the master cache causing the slave cache to invalidate the line. A store queue is shared between the master cache and the slave data cache. Store data is written from the store queue directly in to both the slave data cache and the master cache, eliminating the need for the slave data cache to write data through to the master cache. The master-slave cache system also eliminates the need for a second set of address tags for snooping and coherency operations. The master cache can be large and designed for a low miss rate, while the slave caches are designed for the high speed required by the processor's pipelines.

...read moreread less

Proceedings Article•DOI•

Decoupled sectored caches: conciliating low tag implementation cost

[...]

André Seznec

01 Apr 1994

TL;DR: The decoupled sectored cache introduced in this paper will allow the same level of performance as a non-sectored cache, but at a significantly lower hardware cost.

...read moreread less

Abstract: Sectored caches have been used for many years in order to reconcile low tag array size and small or medium block size In a sectored cache, a single address tag is associated with a sector consisting on several cache lines, while validity, dirty and coherency tags are associated with each of the inner cache linesMaintaining a low tag array size is a major issue in many cache designs (eg L2 caches) Using a sectored cache is a design trade-off between a low size of the tag array which is possible with large line size and a low memory traffic which requires a small line sizeThis technique has been used in many cache designs including small on-chip microprocessor caches and large external second level caches Unfortunately, as on some applications, the miss ratio on a sectored cache is significantly higher than the miss ratio on a non-sectored cache (factors higher than two are commonly observed), a significant part of the potential performance may be wasted in miss penaltiesUsually in a cache, a cache line location is statically linked to one and only one address tag word location In the decoupled sectored cache we introduce in this paper, this monolithic association is broken; the address tag location associated with a cache line location is dynamically chosen at fetch time among several possible locationsThe tag volume on a decoupled sectored cache is in the same range as the tag volume in a traditional sectored cache; but the hit ratio on a decoupled sectored cache is very close to the hit ratio on a non-sectored cache A decoupled sectored cache will allow the same level of performance as a non-sectored cache, but at a significantly lower hardware cost

...read moreread less

Patent•

Exclusive and/or partially inclusive extension cache system and method to minimize swapping therein

[...]

Konrad K. Lai¹•Institutions (1)

Intel¹

23 Mar 1994

TL;DR: In this paper, a multi-level memory system is provided having a primary cache and a secondary cache in which unnecessary swapping operations are minimized, and the secondary cache responds to the request.

...read moreread less

Abstract: A multi-level memory system is provided having a primary cache and a secondary cache in which unnecessary swapping operations are minimized. If a memory access request misses in the primary cache, but hits in the secondary cache, then the secondary cache responds to the request. If, however, the request also misses in the secondary cache, but is found in main memory, then main memory responds to the request. In responding to the request, the secondary cache or main memory returns the requested data to the primary cache. If an address tag of a primary cache victim line does not match an address tag in the secondary cache or the primary cache victim line is dirty, then the victim is stored in the secondary cache. The primary cache victim line includes a first bit for indicating whether the address tag of the primary cache victim line matches an address tag of the secondary cache.

...read moreread less

Proceedings Article•DOI•

Exploring the design space for a shared-cache multiprocessor

[...]

Basem A. Nayfeh¹, Kunle Olukotun¹•Institutions (1)

Stanford University¹

01 Apr 1994

TL;DR: This paper investigates the architecture and partitioning of resources between processors and cache memory for single chip and MCM-based multiprocessors, and shows that for parallel applications, clustering via shared caches provides an effective mechanism for increasing the total number of processors in a system.

...read moreread less

Abstract: In the near future, semiconductor technology will allow the integration of multiple processors on a chip or multichip-module (MCM). In this paper we investigate the architecture and partitioning of resources between processors and cache memory for single chip and MCM-based multiprocessors. We study the performance of a cluster-based multiprocessor architecture in which processors within a cluster are tightly coupled via a shared cluster cache for various processor-cache configurations. Our results show that for parallel applications, clustering via shared caches provides an effective mechanism for increasing the total number of processors in a system, without increasing the number of invalidations. Combining these results with cost estimates for shared cluster cache implementations leads to two conclusions: 1) For a four cluster multiprocessor with single chip clusters, two processors per cluster with a smaller cache provides higher performance and better cost/performance than a single processor with a larger cache and 2) this four cluster configuration can be scaled linearly in performance by adding processors to each cluster using MCM packaging techniques.

...read moreread less

Patent•

Apparatus and method of handling race conditions in mesi-based multiprocessor system with private caches

[...]

Nitin V. Sarangdhar¹, Wen-Hann Wang¹, Matthew A. Fisch¹•Institutions (1)

Intel¹

25 Feb 1994

TL;DR: In this article, a method for handling race conditions arising when multiple processors simultaneously write to a particular cache line is presented, where a determination is made as to whether the cache lines are in an exclusive, modified, invalid, or shared state.

...read moreread less

Abstract: In a computer system having a plurality of processors with internal caches, a method for handling race conditions arising when multiple processors simultaneously write to a particular cache line. Initially, a determination is made as to whether the cache line is in an exclusive, modified, invalid, or shared state. If the cache line is in either the exclusive or modified state, the cache line is written to and then set to the modified state. If the cache line is in the invalid state, a Bus-Read-Invalidate operation is performed. However, if the cache line is in the shared state and multiple processors initiate Bus-Write-Invalidate operations, the invalidation request belonging to the first processor is allowed to complete. Thereupon, the cache line is sent to the exclusive state, data is updated, and the cache line is set to the modified state. The second processor receives a second cache line, updates this second cache line, and sets the second cache line to the modified state.

...read moreread less

Patent•

Two-level virtual/real set associative cache system and method with improved synonym detection

[...]

Ching-Farn E. Wu¹•Institutions (1)

IBM¹

22 Nov 1994

TL;DR: In this article, a two-level virtual/real cache system and a method for detecting and resolving synonyms in the two level virtual and real cache system are described. Butler et al. use a translation lookaside buffer (TLB) for translating virtual to real addresses for accessing the second level real cache.

...read moreread less

Abstract: A two-level virtual/real cache system, and a method for detecting and resolving synonyms in the two-level virtual/real cache system, are described. Lines of a first level virtual cache are tagged with a virtual address and a real pointer which points to a corresponding line in a second level real cache. Lines in the second level real cache are tagged with a real address and a virtual pointer which points to a corresponding line in the first level virtual cache, if one exists. A translation-lookaside buffer (TLB) is used for translating virtual to real addresses for accessing the second level real cache. Synonym detection is performed at the second level real cache. An inclusion bit I is set in a directory of the second level real cache to indicate that a particular line is included in the first level virtual cache. Another bit, called a buffer bit B, is set whenever a line in the first level virtual cache is placed in a first level virtual cache writeback buffer for updating main memory. When a first level cache miss occurs, the TLB generates a corresponding real address for that page and the first level virtual cache selects a line for replacement and also notifies the second level real cache which line it chooses for replacement. The real address is then used to access the second level real cache. Synonym detection and resolution are performed by the second level real cache.

...read moreread less

Patent•

Method and apparatus for combining a direct-mapped cache and a multiple-way cache in a cache memory

[...]

Wen-Hann Wang¹, Konrad K. Lai¹•Institutions (1)

Intel¹

11 Aug 1994

TL;DR: A two-way set-associative cache memory as mentioned in this paper includes both a set array and a data array in one embodiment, with each set in the set array containing information which indicates whether an address received by the cache memory matches the cache line contained in its corresponding element of the data array.

...read moreread less

Abstract: A two-way set-associative cache memory includes both a set array and a data array in one embodiment. The data array comprises multiple elements, each of which can contain a cache line. The set array comprises multiple sets, with each set in the set array corresponding to an element in the data array. Each set in the set array contains information which indicates whether an address received by the cache memory matches the cache line contained in its corresponding element of the data array. The information stored in each set includes a tag and a state. The tag contains a reference to one of the cache lines in the data array. If the tag of a particular set matches the address received by the cache memory, then the cache line associated with that particular set is the requested cache line. The state of a particular set indicates the number of cache lines mapped into that particular set.

...read moreread less

Patent•

Hierarchical cache arrangement wherein the replacement of an LRU entry in a second level cache is prevented when the cache entry is the only inclusive entry in the first level cache

[...]

Hoichi Cheong¹, Dwain A. Hicks¹, Kimming So¹•Institutions (1)

IBM¹

09 Dec 1994

TL;DR: In this paper, the authors present a balanced cache performance in a data processing system consisting of a first processor, a second processor, an intermediate cache memory, and a control circuit.

...read moreread less

Abstract: The present invention provides balanced cache performance in a data processing system The data processing system includes a first processor, a second processor, a first cache memory, a second memory and a control circuit The first processor is connected to the first cache memory, which serves as a first level cache for the first processor The second processor and the first cache memory are connected to the second cache memory, which serves as a second level cache for the first processor and as a first level cache for the second processor Replacement of a set in the second cache memory results in the set being invalidated in the first cache memory The control circuit is connected to the second level cache and prevents replacing from a second level cache congruence class all sets that are in the first cache

...read moreread less

Patent•

Secondary cache system for portable computer

[...]

Samuel Fuller

22 Feb 1994

TL;DR: In this article, a secondary cache memory system is described for use in a portable computer that increases system performance while also conserving battery life, which includes a cache controller for controlling the transfer to and from a cache memory, comprised of fast SRAM circuits.

...read moreread less

Abstract: A secondary cache memory system is disclosed for use in a portable computer that increases system performance while also conserving battery life. The secondary cache includes a cache controller for controlling the transfer to and from a cache memory, comprised of fast SRAM circuits. The cache controller includes a control and status register with at least three status bits to control power to the cache, and to insure that the data stored in the cache memory is coherent with system memory. A control and power management logic checks the contents of the control and status register, and monitors the activity level of the processor. When the processor is determined to be inactive, the control and power management logic turns off the cache by changing the state of a bit in the control and status register. Before doing so, however, the control and power management logic checks the status of a second bit in the control register to determine if some or all of the contents of the cache need to be flushed to system memory. During power up, the control and power management logic checks another status bit in the control register to determine if the contents of the cache is invalid, and if so, clears the cache.

...read moreread less

Patent•

Distributed file system permitting each user to enhance cache hit ratio in file access mode

[...]

Hirohiko Nakano¹, Seiichi Domyo¹, Takaki Kuroda¹, Naofumi Shouji¹, Atsushi Kobayashi¹ - Show less +1 more•Institutions (1)

Hitachi¹

28 Sep 1994

TL;DR: In this paper, the cache hit ratio of a client is enhanced to speed up a file access for each of users logging into the client by setting priority levels for the copies of the files stored in a cache area, based on the contents of the access frequency database and the log-in user table.

...read moreread less

Abstract: A distributed file system in which the cache hit ratio of a client is enhanced to speed up a file access for each of users logging into the client. A file server includes an access frequency database in which the names of users are listed in association with the names of files that are frequently accessed by the individual users. Each client includes a log-in user table for entering the name of a user who is logging in, and a cache priority control module. The cache priority control module sets priority levels for the copies of the files stored in a cache area, on the basis of the contents of the access frequency database and the log-in user table. The set priority levels function as criteria when any of the file copies is to be expelled from the cache area. Owing to this construction, the copies of the files of high usage frequencies are preferentially kept in the cache area of the client for each user logging into this client, whereby the cache hit ratio can be enhanced to speed up the file access.

...read moreread less

Patent•

Cache memory containing extra status bits to indicate memory regions where logging of data should occur

[...]

David R. Cheriton¹•Institutions (1)

Stanford University¹

01 Jul 1994

TL;DR: In this article, the authors present a digital computer memory cache organization for efficient data logging, log-based copy and rollback, high-performance I/O, network switching and multi-cache consistency maintenance.

...read moreread less

Abstract: The present invention provides a digital computer memory cache organization for efficient data logging, log-based copy and rollback, high-performance I/O, network switching and multi-cache consistency maintenance. The cache organization implements efficient selective cache write-back, mapping and transferring of data. Write or store operations to cache lines tagged as logged are written through to a log block builder associated with the cache. Non-logged store operations are handled local to the cache, as in a writeback cache. The log block builder combines write operations into data blocks and transfers the data blocks to a log splitter. A log splitter demultiplexes the logged data into separate streams based on address.

...read moreread less

Collapse