scispace - formally typeset
Search or ask a question

Showing papers on "Cache coloring published in 1999"


Proceedings ArticleDOI
16 Nov 1999
TL;DR: In this paper, a tradeoff between performance and energy is made between a small performance degradation for energy savings, and the tradeoff can produce a significant reduction in cache energy dissipation.
Abstract: Increasing levels of microprocessor power dissipation call for new approaches at the architectural level that save energy by better matching of on-chip resources to application requirements. Selective cache ways provides the ability to disable a subset of the ways in a set associative cache during periods of modest cache activity, while the full cache may remain operational for more cache-intensive periods. Because this approach leverages the subarray partitioning that is already present for performance reasons, only minor changes to a conventional cache are required, and therefore, full-speed cache operation can be maintained. Furthermore, the tradeoff between performance and energy is flexible, and can be dynamically tailored to meet changing application and machine environmental conditions. We show that trading off a small performance degradation for energy savings can produce a significant reduction in cache energy dissipation using this approach.

733 citations


Proceedings ArticleDOI
01 May 1999
TL;DR: It is demonstrated that careful data organization and layout provides an essential mechanism to improve the cache locality of pointer-manipulating programs and consequently, their performance.
Abstract: Hardware trends have produced an increasing disparity between processor speeds and memory access times. While a variety of techniques for tolerating or reducing memory latency have been proposed, these are rarely successful for pointer-manipulating programs.This paper explores a complementary approach that attacks the source (poor reference locality) of the problem rather than its manifestation (memory latency). It demonstrates that careful data organization and layout provides an essential mechanism to improve the cache locality of pointer-manipulating programs and consequently, their performance. It explores two placement techniques---clustering and coloring---that improve cache performance by increasing a pointer structure's spatial and temporal locality, and by reducing cache-conflicts.To reduce the cost of applying these techniques, this paper discusses two strategies---cache-conscious reorganization and cache-conscious allocation---and describes two semi-automatic tools---ccmorph and ccmalloc---that use these strategies to produce cache-conscious pointer structure layouts. ccmorph is a transparent tree reorganizer that utilizes topology information to cluster and color the structure. ccmalloc is a cache-conscious heap allocator that attempts to co-locate contemporaneously accessed data elements in the same physical cache block. Our evaluations, with microbenchmarks, several small benchmarks, and a couple of large real-world applications, demonstrate that the cache-conscious structure layouts produced by ccmorph and ccmalloc offer large performance benefits---in most cases, significantly outperforming state-of-the-art prefetching.

382 citations


Journal ArticleDOI
TL;DR: This article describes methods for generating and solving Cache Miss Equations (CMEs) that give a detailed representation of cache behavior, including conflict misses, in loop-oriented scientific code within the SUIF compiler framework.
Abstract: With the ever-widening performance gap between processors and main memory, cache memory, which is used to bridge this gap, is becoming more and more significant. Caches work well for programs that exhibit sufficient locality. Other programs, however, have reference patterns that fail to exploit the cache, thereby suffering heavily from high memory latency. In order to get high cache efficiency and achieve good program performance, efficient memory accessing behavior is necessary. In fact, for many programs, program transformations or source-code changes can radically alter memory access patterns, significantly improving cache performance. Both hand-tuning and compiler optimization techniques are often used to transform codes to improve cache utilization. Unfortunately, cache conflicts are difficult to predict and estimate, precluding effective transformations. Hence, effective transformations require detailed knowledge about the frequency and causes of cache misses in the code. This article describes methods for generating and solving Cache Miss Equations (CMEs) that give a detailed representation of cache behavior, including conflict misses, in loop-oriented scientific code. Implemented within the SUIF compiler framework, our approach extends traditional compiler reuse analysis to generate linear Diophantine equations that summarize each loop's memory behavior. While solving these equations is in general difficult, we show that is also unnecessary, as mathematical techniques for manipulating Diophantine equations allow us to relatively easily compute and/or reduce the number of possible solutions, where each solution corresponds to a potential cache miss. The mathematical precision of CMEs allows us to find true optimal solutions for transformations such as blocking or padding. The generality of CMEs also allows us to reason about interactions between transformations applied in concert. The article also gives examples of their use to determine array padding and offset amounts that minimize cache misses, and to determine optimal blocking factors for tiled code. Overall, these equations represent an analysis framework that offers the generality and precision needed for detailed compiler optimizations.

300 citations


Proceedings ArticleDOI
17 Aug 1999
TL;DR: In this paper, a new approach using way prediction for achieving high performance and low energy consumption of set-associative caches is proposed, where only a single cache way is accessed, instead of accessing all the ways in a set.
Abstract: This paper proposes a new approach using way prediction for achieving high performance and low energy consumption of set-associative caches. By accessing only a single cache way predicted, instead of accessing all the ways in a set, the energy consumption can be reduced. This paper shows that the way-predicting set-associative cache improves the ED (energy-delay) product by 60-70% compared to a conventional set-associative cache,.

295 citations


Journal ArticleDOI
TL;DR: This paper proposes the Active Cache scheme, a feasible scheme that can result in significant network bandwidth savings at the expense of moderate CPU costs, and describes the protocol, interface and security mechanisms of the scheme.
Abstract: Dynamic documents constitute an increasing percentage of contents on the Web, and caching dynamic documents becomes an increasingly important issue that affects the scalability of the Web. In this paper, we propose the Active Cache scheme to support caching of dynamic contents at Web proxies. The scheme allows servers to supply cache applets to be attached with documents, and requires proxies to invoke cache applets upon cache hits to furnish the necessary processing without contacting the server. We describe the protocol, interface and security mechanisms of the Active Cache scheme, and illustrate its use via several examples. Through prototype implementation and performance measurements, we show that Active Cache is a feasible scheme that can result in significant network bandwidth savings at the expense of moderate CPU costs.

283 citations


Proceedings ArticleDOI
01 May 1999
TL;DR: In this article, the authors describe two techniques, structure splitting and field reordering, that improve the cache behavior of data structures larger than a cache block by increasing the number of hot fields that can be placed in the cache block.
Abstract: A program's cache performance can be improved by changing the organization and layout of its data---even complex, pointer-based data structures. Previous techniques improved the cache performance of these structures by arranging distinct instances to increase reference locality. These techniques produced significant performance improvements, but worked best for small structures that could be packed into a cache block.This paper extends that work by concentrating on the internal organization of fields in a data structure. It describes two techniques---structure splitting and field reordering---that improve the cache behavior of structures larger than a cache block. For structures comparable in size to a cache block, structure splitting can increase the number of hot fields that can be placed in a cache block. In five Java programs, structure splitting reduced cache miss rates 10--27% and improved performance 6--18% beyond the benefits of previously described cache-conscious reorganization techniques.For large structures, which span many cache blocks, reordering fields, to place those with high temporal affinity in the same cache block can also improve cache utilization. This paper describes bbcache, a tool that recommends C structure field reorderings. Preliminary measurements indicate that reordering fields in 5 active structures improves the performance of Microsoft SQL Server 7.0 2--3%.

278 citations


Journal ArticleDOI
TL;DR: A unified cache maintenance algorithm, LNC-R-WS-U, is described, which integrates both cache replacement and consistency algorithms and considers in the eviction consideration the validation rate of each document, as provided by the cache consistency component of LNC.R-W3-U.
Abstract: Caching at proxy servers is one of the ways to reduce the response time perceived by World Wide Web users. Cache replacement algorithms play a central role in the response time reduction by selecting a subset of documents for caching, so that a given performance metric is maximized. At the same time, the cache must take extra steps to guarantee some form of consistency of the cached documents. Cache consistency algorithms enforce appropriate guarantees about the staleness of the cached documents. We describe a unified cache maintenance algorithm, LNC-R-WS-U, which integrates both cache replacement and consistency algorithms. The LNC-R-WS-U algorithm evicts documents from the cache based on the delay to fetch each document into the cache. Consequently, the documents that took a long time to fetch are preferentially kept in the cache. The LNC-R-W3-U algorithm also considers in the eviction consideration the validation rate of each document, as provided by the cache consistency component of LNC-R-WS-U. Consequently, documents that are infrequently updated and thus seldom require validations are preferentially retained in the cache. We describe the implementation of LNC-R-W3-U and its integration with the Apache 1.2.6 code base. Finally, we present a trace-driven experimental study of LNC-R-W3-U performance and its comparison with other previously published algorithms for cache maintenance.

211 citations


Patent
22 Mar 1999
TL;DR: In this article, a cache system is described that includes a storage that is partitioned into a plurality of storage areas, each for storing one kind of objects received from remote sites and to be directed to target devices.
Abstract: A cache system is described that includes a storage that is partitioned into a plurality of storage areas, each for storing one kind of objects received from remote sites and to be directed to target devices. The cache system further includes a cache manager coupled to the storage to cause objects to be stored in the corresponding storage areas of the storage. The cache manager causes cached objects in each of the storage areas to be replaced in accordance with one of a plurality of replacement policies, each being optimized for one kind of objects.

197 citations


Proceedings ArticleDOI
01 Jun 1999
TL;DR: This work presents a memory exploration strategy based on three performance metrics, namely, cache size, the number of processor cycles and the energy consumption, and shows how the performance is affected by cache parameters such as caches size, line size, set associativity and tiling, and the off-chip data organization.
Abstract: In embedded system design, the designer has to choose an on-chip memory configuration that is suitable for a specific application. To aid in this design choice, we present a memory exploration strategy based on three performance metrics, namely, cache size, the number of processor cycles and the energy consumption. We show how the performance is affected by cache parameters such as cache size, line size, set associativity and tiling, and the off-chip data organization. We show the importance of including energy in the performance metrics, since an increase in the cache line size, cache size, tiling and set associativity reduces the number of cycles but does not necessarily reduce the energy consumption. These performance metrics help us find the minimum energy cache configuration if time is the hard constraint, or the minimum time cache configuration if energy is the hard constraint.

193 citations


Proceedings ArticleDOI
17 Aug 1999
TL;DR: This paper proposes using a small instruction buffer, also called a loop cache, to save power in caches, which has no address tag store and knows precisely whether the next instruction request will hit in the loop cache well ahead of time.
Abstract: A fair amount of work has been done in recent years on reducing power consumption in caches by using a small instruction buffer placed between the execution pipe and a larger main cache. These techniques, however, often degrade the overall system performance. In this paper, we propose using a small instruction buffer, also called a loop cache, to save power. A loop cache has no address tag store. It consists of a direct-mapped data array and a loop cache controller. The loop cache controller knows precisely whether the next instruction request will hit in the loop cache, well ahead of time. As a result, there is no performance degradation.

190 citations


Patent
03 Mar 1999
TL;DR: In this article, the authors propose a technique for automatic, transparent, distributed, scalable and robust replication of document copies in a computer network where request messages for a particular document follow paths from the clients to a home server that form a routing graph.
Abstract: A technique for automatic, transparent, distributed, scalable and robust replication of document copies in a computer network wherein request messages for a particular document follow paths from the clients to a home server that form a routing graph. Client request messages are routed up the graph towards the home server as would normally occur in the absence of caching. However, cache servers are located along the route, and may intercept requests if they can be serviced. In order to be able to service requests in this manner without departing from standard network protocols, the cache server needs to be able to insert a packet filter into the router associated with it, and needs also to proxy for the home server from the perspective of the client. Cache servers cooperate to update cache content by communicating with neighboring caches whenever information is received about invalid cache copies.

Patent
John Scharber1
02 Jun 1999
TL;DR: In this paper, a cache protocol is selected according to the type of the content, a site (e.g., an origin server) associated with the content and/or a class of service requirement.
Abstract: Storing content of a particular type at one or more cache servers may be accomplished according to a cache protocol selected according to the type of the content, a site (e.g., an origin server) associated with the content and/or a class of service requirement. In this scheme, the cache protocol may be selected and/or varied according to load balancing requirements and/or traffic conditions within a network. For example, the cache protocol may migrate from a first protocol (e.g., CARP) that allows only one copy of the content to be stored to a second protocol (e.g., HTCP or ICP) that allows more than one copy of the content to be stored. Further, the depth to which a request query is to be searched within a cache hierarchy may be determined according to the site, the content type and/or the class of service. Where necessary, a path for retrieving the content may be determined, at least in part, according to the content type.

Journal ArticleDOI
TL;DR: This paper describes an algorithm for procedure placement, one type of code placement, that signicantly differs from previous approaches in the type of information used to drive the placement algorithm, and gathers temporal-ordering information that summarizes the interleaving of procedures in a program trace.
Abstract: Instruction cache performance is important to instruction fetch efficiency and overall processor performance. The layout of an executable has a substantial effect on the cache miss rate and the instruction working set size during execution. This means that the performance of an executable can be improved by applying a code-placement algorithm that minimizes instruction cache conflicts and improves spatial locality. We describe an algorithm for procedure placement, one type of code placement, that signicantly differs from previous approaches in the type of information used to drive the placement algorithm. In particular, we gather temporal-ordering information that summarizes the interleaving of procedures in a program trace. Our algorithm uses this information along with cache configuration and procedure size information to better estimate the conflict cost of a potential procedure ordering. It optimizes the procedure placement for single level and multilevel caches. In addition to reducing instruction cache conflicts, the algorithm simultaneously minimizes the instruction working set size of the program. We compare the performance of our algorithm with a particularly successful procedure-placement algorithm and show noticeable improvements in the instruction cache behavior, while maintaining the same instruction working set size.

Patent
David C. Stewart1
13 Oct 1999
TL;DR: In this paper, a filter driver is provided to monitor writes to the disk and determine if a cache line is invalidated, such that a write to a sector held in the cache results in that sector being invalidated until such time as the cache is updated.
Abstract: A computer system includes a nonvolatile memory positioned between a disk controller and a disk drive storing a boot program, in a computer system. Upon an initial boot sequence, the boot program is loaded into a cache in the nonvolatile memory. Subsequent boot sequences retrieve the boot program from the cache. Cache validity is maintained by monitoring cache misses, and/or by monitoring writes to the disk such that a write to a sector held in the cache results in the cache line for that sector being invalidated until such time as the cache is updated. A filter driver is provided to monitor writes to the disk and determine if a cache line is invalidated.

Journal ArticleDOI
TL;DR: This paper examines the theoretical upper bounds on the cache hit ratio that cache bypassing can provide for integer applications, including several Windows applications with OS activity, and proposes a microarchitecture scheme where the hardware determines data placement within the cache hierarchy based on dynamic referencing behavior.
Abstract: The growing disparity between processor and memory performance has made cache misses increasingly expensive. Additionally, data and instruction caches are not always used efficiently, resulting in large numbers of cache misses. Therefore, the importance of cache performance improvements at each level of the memory hierarchy will continue to grow. In numeric programs, there are several known compiler techniques for optimizing data cache performance. However, integer (nonnumeric) programs often have irregular access patterns that are more difficult for the compiler to optimize. In the past, cache management techniques such as cache bypassing were implemented manually at the machine-language-programming level. As the available chip area grows, it makes sense to spend more resources to allow intelligent control over the cache management. In this paper, we present an approach to improving cache effectiveness, taking advantage of the growing chip area, utilizing run-time adaptive cache management techniques, optimizing both performance and cost of implementation. Specifically, we are aiming to increase data cache effectiveness for integer programs. We propose a microarchitecture scheme where the hardware determines data placement within the cache hierarchy based on dynamic referencing behavior. This scheme is fully compatible with existing instruction set architectures. This paper examines the theoretical upper bounds on the cache hit ratio that cache bypassing can provide for integer applications, including several Windows applications with OS activity. Then, detailed trace-driven simulations of the integer applications are used to show that the implementation described in this paper can achieve performance close to that of the upper bound.

Proceedings ArticleDOI
01 May 1999
TL;DR: This paper examines the performance of compiler and hardware approaches for reordering pages in physically addressed caches to eliminate cache misses and shows that software page placement provided a 28% speedup and hardware page placementprovided a 21% speed up on average for a superscalar processor.
Abstract: As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction and data cache performance for virtually indexed caches by mapping code and data with temporal locality to different cache blocks. In this paper we examine the performance of compiler and hardware approaches for reordering pages in physically addressed caches to eliminate cache misses. The software approach provides a color mapping at compile-time for code and data pages, which can then be used by the operating system to guide its allocation of physical pages. The hardware approach works by adding a page remap field to the TLB, which is used to allow a page to be remapped to a different color in the physically indexed cache while keeping the same physical page in memory. The results show that software page placement provided a 28% speedup and hardware page placement provided a 21% speedup on average for a superscalar processor. For a 4 processor single-chip multiprocessor, the miss rate was reduced from 8.7% down to 7.2% on average.

Proceedings ArticleDOI
01 May 1999
TL;DR: This work focuses on transient fault tolerance in primary cache memories and develops new architectural solutions, to maximize fault coverage when the budgeted silicon area is not sufficient for the conventional configuration of an error checking code.
Abstract: Information integrity in cache memories is a fundamental requirement for dependable computing. Conventional architectures for enhancing cache reliability using check codes make it difficult to trade between the level of data integrity and the chip area requirement. We focus on transient fault tolerance in primary cache memories and develop new architectural solutions, to maximize fault coverage when the budgeted silicon area is not sufficient for the conventional configuration of an error checking code. The underlying idea is to exploit the corollary of reference locality in the organization and management of the code. A higher protection priority is dynamically assigned to the portions of the cache that are more error-prone and have a higher probability of access. The error-prone likelihood prediction is based on the access frequency. We evaluate the effectiveness of the proposed schemes using a trace-driven simulation combined with software error injection using four different fault manifestation models. From the simulation results, we show that for most benchmarks the proposed architectures are effective and area efficient for increasing the cache integrity under all four models.

Patent
Robert Drew Major1
22 Jun 1999
TL;DR: In this article, a cache object store is organized to provide fast and efficient storage of data as cache objects organized into cache object groups, and a multi-level hierarchical storage architecture comprising a primary memory-level cache store and, optionally, a secondary disk level cache store, each of which is configured to optimize access to the cache objects groups.
Abstract: A cache object store is organized to provide fast and efficient storage of data as cache objects organized into cache object groups. The cache object store preferably embodies a multi-level hierarchical storage architecture comprising a primary memory-level cache store and, optionally, a secondary disk-level cache store, each of which is configured to optimize access to the cache object groups. These levels of the cache object store further exploit persistent and non-persistent storage characteristics of the inventive architecture.

Journal ArticleDOI
TL;DR: DAT is presented, a technique that augments loop tiling with data alignment, achieving improved efficiency (by ensuring that the cache is never under-utilized) as well as improved flexibility (by eliminating self-interference cache conflicts independent of the tile size) in a more stable and better cache performance.
Abstract: Loop blocking (tiling) is a well-known compiler optimization that helps improve cache performance by dividing the loop iteration space into smaller blocks (tiles); reuse of array elements within each tile is maximized by ensuring that the working set for the tile fits into the data cache. Padding is a data alignment technique that involves the insertion of dummy elements into a data structure for improving cache performance. In this work, we present DAT, a technique that augments loop tiling with data alignment, achieving improved efficiency (by ensuring that the cache is never under-utilized) as well as improved flexibility (by eliminating self-interference cache conflicts independent of the tile size). This results in a more stable and better cache performance than existing approaches, in addition to maximizing cache utilization, eliminating self-interference, and minimizing cross-interference conflicts. Further, while all previous efforts are targeted at programs characterized by the reuse of a single array, we also address the issue of minimizing conflict misses when several tiled arrays are involved. To validate our technique, we ran extensive experiments using both simulations as well as actual measurements on SUN Sparc5 and Sparc10 workstations. The results on benchmarks exhibiting varying memory access patterns demonstrate the effectiveness of our technique through consistently high hit ratios and improved performance across varying problem sizes.

Patent
19 Nov 1999
TL;DR: Curious caching as mentioned in this paper improves upon cache snooping by allowing a cache to insert data from snooped bus operations that is not currently in the cache and independent of any prior accesses to the associated memory location.
Abstract: Curious caching improves upon cache snooping by allowing a snooping cache to insert data from snooped bus operations that is not currently in the cache and independent of any prior accesses to the associated memory location. In addition, curious caching allows software to specify which data producing bus operations, e.g., reads and writes, result in data being inserted into the cache. This is implemented by specifying “memory regions of curiosity” and insertion and replacement policy actions for those regions. In column caching, the replacement of data can be restricted to particular regions of the cache. By also making the replacement address-dependent, column caching allows different regions of memory to be mapped to different regions of the cache. In a set-associative cache, a replacement policy specifies the particular column(s) of the set-associative cache in which a page of data can be stored. The column specification is made in page table entries in a TLB that translates between virtual and physical addresses. The TLB includes a bit vector, one bit per column, which indicates the columns of the cache that are available for replacement.

Proceedings ArticleDOI
01 Oct 1999
TL;DR: This research explores any potential for an on-chip cache compression which can reduce not only cache miss ratio but also miss penalty, if main memory is also managed in compressed form, and suggests several techniques to reduce the decompression overhead and to manage the compressed blocks efficiently.
Abstract: This research explores any potential for an on-chip cache compression which can reduce not only cache miss ratio but also miss penalty, if main memory is also managed in compressed form. However, the decompression time causes a critical effect on the memory access time and variable-sized compressed blocks tend to increase the design complexity of the compressed cache architecture. This paper suggests several techniques to reduce the decompression overhead and to manage the compressed blocks efficiently which include selective compression, fixed space allocation for the compressed blocks, parallel decompression, the use of a decompression buffer, and so on. Moreover a simple compressed cache architecture based on the above techniques and its management method are proposed. The results from trace-driven simulation show that this approach can provide around 35% decrease in the on-chip cache miss ratio as well as a 53% decrease in the data traffic over the conventional memory systems. Also, a large amount of the decompression overhead can be reduced, and thus the average memory access time can also be reduced by maximum 20% against the conventional memory systems.

Proceedings ArticleDOI
03 Aug 1999
TL;DR: This work describes an architecture for data intensive applications where a high-speed distributed data cache is used as a common element for all of the sources and sinks of data, and provides standard interfaces to a large, application-oriented, distributed, on-line, transient storage system.
Abstract: Modern scientific computing involves organizing, moving, visualizing, and analyzing massive amounts of data at multiple sites around the world. The technologies, the middleware services, and the architectures that are used to build useful high-speed, wide area distributed systems, constitute the field of data intensive computing. We describe an architecture for data intensive applications where we use a high-speed distributed data cache as a common element for all of the sources and sinks of data. This cache-based approach provides standard interfaces to a large, application-oriented, distributed, on-line, transient storage system. We describe our implementation of this cache, how we have made it "network aware ", and how we do dynamic load balancing based on the current network conditions. We also show large increases in application throughput by access to knowledge of the network conditions.

Patent
Rabindranath Dutta1
26 Aug 1999
TL;DR: In this article, the caching agent starts transferring this partial file to the client while it is simultaneously retrieving the remaining portion of the file across the Internet, thereby creating a safety margin in storing more than one page.
Abstract: A system, method and program stores, in a local cache, only a small part of a large file that is being requested over a network such as the Internet. In a preferred embodiment, the caching agent starts transferring this partial file to the client while it is simultaneously retrieving the remaining portion of the file across the Internet. A preferred embodiment of the invention stores a first page of the browser display in the cache. Other embodiments store more than the first page, or a part of the full file or document, thereby creating a safety margin in storing more than one page. Another preferred embodiment initially stores the full file or document, and if there is a need for cache replacement, the cache is replaced up until the first page is reached. As such, the cache space requirements are minimized for large documents being retrieved over the World Wide Web.

Patent
31 Mar 1999
TL;DR: In this paper, the authors describe a system for handling requests received from a client for information stored on a server, where cache functions are bypassed or executed based on whether an execution of cache functions in an attempt to access the information from cache is likely to slow processing of a request for the information without at least some compensating reduction in processing time for a request of the information received at a later time.
Abstract: Methods and systems for handling requests received from a client for information stored on a server. In general, when a request for information is received, cache functions are bypassed or executed based on whether an execution of cache functions in an attempt to access the information from cache is likely to slow processing of a request for the information without at least some compensating reduction in processing time for a request for the information received at a later time. Also described is receiving information that identifies the location of a resource within a domain and selecting a cache based on the information that identifies the location of the resource within the domain.

Patent
Hubertus Franke1, Douglas J. Joseph1
29 Mar 1999
TL;DR: In this paper, the authors propose fault contained memory partitioning in a cache coherent, symmetric shared memory multiprocessor system while enabling fault contained cache coherence domains as well as cache coherent inter partition memory regions.
Abstract: The present invention provides fault contained memory partitioning in a cache coherent, symmetric shared memory multiprocessor system while enabling fault contained cache coherence domains as well as cache coherent inter partition memory regions. The entire system may be executed as a single coherence domain regardless of partitioning, and the general memory access and cache coherency traffic are distinguished. All memory access is intercepted and processed by the memory controller. Before data is read from or written to memory, the address is verified and the executed operation is aborted if the address is outside the memory regions assigned to the processor in use. Inter cache requests are allowed to pass, though concurrently the accessed memory address is verified in the same manner as the memory requests. During the corresponding inter cache response, a failed validity check for the request results in the stopping of the requesting processor and the repair of the potentially corrupted memory hierarchy of the responding processor.

Patent
10 Nov 1999
TL;DR: In this paper, a memory system having a main memory coupled with a plurality of parallel virtual access channels is described, each of which provides a set of memory access resources for controlling the main memory.
Abstract: A memory system having a main memory which is coupled to a plurality of parallel virtual access channels. Each of the virtual access channels provides a set of memory access resources for controlling the main memory. These memory access resources include cache resources (including cache chaining), burst mode operation control and precharge operation control. A plurality of the virtual access channels are cacheable virtual access channels, each of which includes a channel row cache memory for storing one or more cache entries and a channel row address register for storing corresponding cache address entries. One or more non-cacheable virtual access channels are provided by a bus bypass circuit. Each virtual access channel is addressable, such that particular memory masters can be assigned to access particular virtual access channels.

Patent
Matthias A. Blumrich1
31 Mar 1999
TL;DR: In this article, a cache memory shared among a plurality of separate, disjoint entities each having a disjointed address space, includes a cache segregator for dynamically segregating a storage space allocated to each entity of the entities such that no interference occurs with respective ones of the entity.
Abstract: A cache memory shared among a plurality of separate, disjoint entities each having a disjoint address space, includes a cache segregator for dynamically segregating a storage space allocated to each entity of the entities such that no interference occurs with respective ones of the entities. A multiprocessor system including the cache memory, a method and a signal bearing medium for storing a program embodying the method also are provided.

Proceedings ArticleDOI
17 Aug 1999
TL;DR: This work proposes, implements, and evaluates a series of run-time techniques for dynamic analysis of the program instruction access behavior, which are then used to preactively guide the access of the LO-Cache, an additional mini cache located between the instruction cache (I-Cache) and the CPU core.
Abstract: In this paper, we propose a technique that uses an additional mini cache, the LO-Cache, located between the instruction cache (I-Cache) and the CPU core. This mechanism can provide the instruction stream to the data path and, when managed properly, it can effectively eliminate the need for high utilization of the more expensive I-Cache. In this work, we propose, implement, and evaluate a series of run-time techniques for dynamic analysis of the program instruction access behavior, which are then used to preactively guide the access of the LO-Cache. The basic idea is that only the most frequently executed portions of the code should be stored in the LO-Cache since this is where the program spends most of its time. We present experimental results to evaluate the effectiveness of our scheme in terms of performance and energy dissipation for a series of SPEC95 benchmarks. We also discuss the performance and energy tradeoffs that are involved in these dynamic schemes.

Patent
19 Feb 1999
TL;DR: In this article, a method and apparatus for accessing a cache memory of a computer graphics system, including a frame buffer memory having a graphics memory for storing pixel data for ultimate supply to a video display device, was presented.
Abstract: A method and apparatus for accessing a cache memory of a computer graphics system, the apparatus including a frame buffer memory having a graphics memory for storing pixel data for ultimate supply to a video display device, a read cache memory for storing data received from the graphics memory, and a write cache memory for storing data received externally of the frame buffer and data that is to be written into the graphics memory. Also included is a frame buffer controller for controlling access to the graphics memory and read and write cache memories. The frame buffer controller includes a cache first in, first out (FIFO) memory pipeline for temporarily storing pixel data prior to supply thereof to the cache memories.

Patent
26 Jan 1999
TL;DR: In this paper, a relatively high-speed, intermediate-volume storage device is operated as a user-configurable cache, where requests to access a mass storage device such as a disk or tape are intercepted by a device driver that compares the access request against a directory of the contents of the user configurable cache.
Abstract: An apparatus and method for accessing data in a computer system. A relatively high-speed, intermediate-volume storage device is operated as a user-configurable cache. Requests to access a mass storage device such as a disk or tape are intercepted by a device driver that compares the access request against a directory of the contents of the user-configurable cache. If the user-configurable cache contains the data sought to be accessed, the access request is carried out in the user-configurable cache instead of being forwarded to the device driver for the target mass storage device. Because the user-cache is implemented using memory having a dramatically shorter access time than most mechanical mass storage devices, the access request is fulfilled much more quickly than if the originally intended mass storage device was accessed. Data is preloaded and responsively cached in the user-configurable cache memory based on user preferences.