scispace - formally typeset
Search or ask a question

Showing papers on "Cache coloring published in 2001"


Proceedings ArticleDOI
01 May 2001
TL;DR: This paper discusses policies and implementations for reducing cache leakage by invalidating and “turning off” cache lines when they hold data not likely to be reused, and proposes adaptive policies that effectively reduce LI cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.
Abstract: Power dissipation is increasingly important in CPUs ranging from those intended for mobile use, all the way up to high-performance processors for high-end servers. While the bulk of the power dissipated is dynamic switching power, leakage power is also beginning to be a concern. Chipmakers expect that in future chip generations, leakage's proportion of total chip power will increase significantly.This paper examines methods for reducing leakage power within the cache memories of the CPU. Because caches comprise much of a CPU chip's area and transistor counts, they are reasonable targets for attacking leakage. We discuss policies and implementations for reducing cache leakage by invalidating and “turning off” cache lines when they hold data not likely to be reused. In particular, our approach is targeted at the generational nature of cache line usage. That is, cache lines typically have a flurry of frequent use when first brought into the cache, and then have a period of “dead time” before they are evicted. By devising effective, low-power ways of deducing dead time, our results show that in many cases we can reduce LI cache leakage energy by 4x in SPEC2000 applications without impacting performance. Because our decay-based techniques have notions of competitive on-line algorithms at their roots, their energy usage can be theoretically bounded at within a factor of two of the optimal oracle-based policy. We also examine adaptive decay-based policies that make energy-minimizing policy choices on a per-application basis by choosing appropriate decay intervals individually for each cache line. Our proposed adaptive policies effectively reduce LI cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.

725 citations


Proceedings Article
11 Sep 2001
TL;DR: This paper proposes a new data organization model called PAX (Partition Attributes Across), that significantly improves cache performance by grouping together all values of each attribute within each page, and demonstrates that in-page data placement is the key to high cache performance.
Abstract: Relational database systems have traditionally optimzed for I/O performance and organized records sequentially on disk pages using the N-ary Storage Model (NSM) (a.k.a., slotted pages). Recent research, however, indicates that cache utilization and performance is becoming increasingly important on modern platforms. In this paper, we first demonstrate that in-page data placement is the key to high cache performance and that NSM exhibits low cache utilization on modern platforms. Next, we propose a new data organization model called PAX (Partition Attributes Across), that significantly improves cache performance by grouping together all values of each attribute within each page. Because PAX only affects layout inside the pages, it incurs no storage penalty and does not affect I/O behavior. According to our experimental results, when compared to NSM (a) PAX exhibits superior cache and memory bandwidth utilization, saving at least 75% of NSM’s stall time due to data cache accesses, (b) range selection queries and updates on memoryresident relations execute 17-25% faster, and (c) TPC-H queries involving I/O execute 11-48% faster.

428 citations


Proceedings ArticleDOI
01 Dec 2001
TL;DR: Two previously-proposed techniques, way-prediction and selective direct-mapping, are applied to reducing L1 cache dynamic energy while maintaining high performance, and caches achieve the energy-delay of sequential access while maintaining the performance of parallel access.
Abstract: Set-associative caches achieve low miss rates for typical applications but result in significant energy dissipation. Set-associative caches minimize access time by probing all the data ways in parallel with the tag lookup, although the output of only the matching way is used. The energy spent accessing the other ways is wasted Eliminating the wasted energy by performing the data lookup sequentially following the tag lookup substantially increases cache access time, and is unacceptable for high-performance L1 caches. In this paper, we apply two previously-proposed techniques, way-prediction and selective direct-mapping, to reducing L1 cache dynamic energy while maintaining high performance. The techniques predict the matching way and probe only the predicted way and not all the ways, achieving energy savings. While these techniques were originally proposed to improve set-associative cache access times, this is the first paper to apply them to reducing cache energy. We evaluate the effectiveness of these techniques in reducing L1 d-cache, L1 i-cache, and overall processor energy. Using these techniques, our caches achieve the energy-delay of sequential access while maintaining the performance of parallel access. Relative to parallel access L1 i- and d-caches, the techniques achieve overall processor energy-delay reduction of 8%, while perfect way-prediction with no performance degradation achieves 10% reduction. The performance degradation of the techniques is less than 3%, compared to an aggressive,.1-cycle, 4-way, parallel access cache.

310 citations


Patent
01 Feb 2001
TL;DR: In this article, the authors propose a method for managing information in a mobile device consisting of downloading a first set of files, determining whether a local cache has enough space to store the first set, storing the first sets of files into the local cache, selecting an out-dated record and removing a second set of records corresponding to the out-of-date record from the local caches if the local cached record does not have enough space.
Abstract: An exemplary method for managing information in a mobile device comprises the steps of downloading a first set of files, determining whether a local cache has enough space to store the first set of files, storing the first set of files into the local cache if the local cache has enough space, selecting an out-dated record and removing a second set of files corresponding to the out-dated record from the local cache if the local cache does not have enough space, and repeating the determining step until the first set of files is stored into the local cache.

235 citations


Patent
26 Apr 2001
TL;DR: In this article, the authors present a selection procedure for information object repository selection procedures for determining which of a number of information object repositories should service a request for the information object, including a direct cache selection process, a redirect cache selection, a remote DNS cache, or a local DNS cache selection.
Abstract: Various information object repository selection procedures for determining which of a number of information object repositories should service a request for the information object include a direct cache selection process, a redirect cache selection process, a remote DNS cache selection process, or a local DNS cache selection process. Different combinations of these procedures may also be used. For example different combination may be used depending on the type of content being requested. The direct cache selection process may be used for information objects that will be immediately loaded without user action, while any of the redirect cache selection process, the remote DNS cache selection process and/or the local DNS cache selection process may be used for information objects that will be loaded only after some user action.

231 citations


Patent
13 Aug 2001
TL;DR: In this article, a content analysis engine determines which of the caches a data item should be stored in, based on an analysis of data requests or data items served in response to the requests, guidelines set by a system administrator, etc.
Abstract: A multi-tier caching system and method of operating the same. The system comprises a first cache implemented in operating system or kernel space (e.g., in memory managed by or allocated to an operating system) and a second cache implemented in application or user space (e.g., in memory managed by or allocated to an application program). Data requests requiring little processing to identify responsive data may be served from the first cache, while those requiring further processing are served from the second. The first cache may therefore store frequently requested data items or items that can be served in response to requests having different forms, qualifiers or other indicia. A content analysis engine determines which of the caches a data item should be stored in, based on an analysis of data requests or data items served in response to the requests, guidelines set by a system administrator, etc.

220 citations


Proceedings ArticleDOI
19 Jan 2001
TL;DR: It is shown that even with an aggressive, next-generation memory system using four Direct Rambus channels and an integrated one-megabyte level-two cache, a processor still spends over half of its time stalling for L2 misses.
Abstract: In this paper we address the severe performance gap caused by high processor clock rates and slow DRAM accesses. We show that even with an aggressive, next-generation memory system using four Direct Rambus channels and an integrated one-megabyte level-two cache, a processor still spends over half of its time stalling for L2 misses. Large cache blocks can improve performance, but only when coupled with wide memory channels. DRAM address mappings also affect performance significantly. We evaluate an aggressive prefetch unit integrated with the L2 cache and memory, controllers. By issuing prefetches only when the Rambus channels are idle, prioritizing them to maximize DRAM row buffer hits, and giving them low replacement priority, we achieve a 43% speedup across 10 of the 26 SPEC2000 benchmarks, without degrading performance an the others. With eight Rambus channels, these ten benchmarks improve to within 10% of the performance of a perfect L2 cache.

213 citations


Patent
13 Jan 2001
TL;DR: In this article, a network-infrastructure cache provides proxy services to a plurality of client workstations concurrently requesting access to data stored on a server, where the cache stores data for inclusion in responses, and forwards a request for the missing data onto the server, and receives data responsive thereto.
Abstract: A digital computer network includes a network-infrastructure cache that provides proxy services to a plurality of client workstations concurrently requesting access to data stored on a server. A network interconnecting the workstations and the server carries requests for data to the server, and responses thereto back to requesting client workstations. The network-infrastructure cache receives and responds to requests from the client workstations for access to data for which the network-infrastructure cache provides proxy services. A cache in the network-infrastructure cache stores data for inclusion in responses. If the cache lacks data needed for a response, then the network-infrastructure cache forwards a request for the missing data onto the server, and receives data responsive thereto. In one embodiment the network-infrastructure cache converts requests received from clients in a first protocol into requests in a second protocol for transmission to the server, and conversely.

213 citations


Proceedings ArticleDOI
01 May 2001
TL;DR: An exact model of the behavior of loop nests executing in a memory hicrarchy is developed by using a nontraditional classification of misses that has the key property of composability, allowing the model to gain efficiency in counting cache misses by exploiting repetitive patterns of cache behavior.
Abstract: We develop from first principles an exact model of the behavior of loop nests executing in a memory hicrarchy, by using a nontraditional classification of misses that has the key property of composability. We use Presburger formulas to express various kinds of misses as well as the state of the cache at the end of the loop nest. We use existing tools to simplify these formulas and to count cache misses. The model is powerful enough to handle imperfect loop nests and various flavors of non-linear array layouts based on bit interleaving of array indices. We also indicate how to handle modest levels of associativity, and how to perform limited symbolic analysis of cache behavior. The complexity of the formulas relates to the static structure of the loop nest rather than to its dynamic trip count, allowing our model to gain efficiency in counting cache misses by exploiting repetitive patterns of cache behavior. Validation against cache simulation confirms the exactness of our formulation. Our method can serve as the basis for a static performance predictor to guide program and data transformations to improve performance.

175 citations


Journal ArticleDOI
TL;DR: The proposed Speculative Versioning Cache uses distributed caches to eliminate the latency and bandwidth problems of the ARB and conceptually unifies cache coherence and speculative versioning by using an organization similar to snooping bus-based coherent caches.
Abstract: Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during the execution of a sequential program. Such ambiguous memory dependences can be overcome by memory dependence speculation which enables a load or store to be speculatively executed before the addresses of all preceding loads and stores are known. Furthermore, multiple speculative stores to a memory location create multiple speculative versions of the location. Program order among the speculative versions must be tracked to maintain sequential semantics. A previously proposed approach, the Address Resolution Buffer (ARB) uses a centralized buffer to support speculative versions. Our proposal, called the Speculative Versioning Cache (SVC), uses distributed caches to eliminate the latency and bandwidth problems of the ARB. The SVC conceptually unifies cache coherence and speculative versioning by using an organization similar to snooping bus-based coherent caches. Our evaluation for the Multiscalar architecture shows that hit latency is an important factor affecting performance and private cache solutions trade-off hit rate for hit latency.

167 citations


Patent
01 Mar 2001
TL;DR: In this article, a garbage collector that uses an LRU algorithm to free memory from an XML DOM tree active in an application cache is described. But it is not shown how to remove the nodes from the DOM tree.
Abstract: The present invention relates to a garbage collector that uses an LRU algorithm to free memory from an XML DOM tree active in an application cache. According to one or more embodiments of the present invention, a threshold for the amount of memory permitted to reside in an application cache is set. Then, a garbage collector removes entries from the cache until it falls below the threshold. In one or more embodiments, a node table is used. When nodes are added to the XML DOM tree in the application cache the node table is updated. When the threshold for the amount of memory permitted to reside in the application cache is exceeded, the garbage collector applies an LRU algorithm uses the node table to determine which nodes to remove from the application cache. In one embodiment, the LRU algorithm scans the node table to determine the least recently used node in the table by examining time stamp entries in the table. Then, the algorithm removes that node and repeats the process until the XML DOM tree uses less memory in the cache than the threshold.

Proceedings ArticleDOI
17 Jun 2001
TL;DR: In this paper, an analytical cache model for time-shared systems is presented, which estimates the overall cache miss-rate of a multiprocessing system with any cache size and time quanta.
Abstract: An accurate, tractable, analytic cache model for time-shared systems is presented, which estimates the overall cache miss-rate of a multiprocessing system with any cache size and time quanta. The input to the model consists of the isolated miss-rate curves for each process, the time quanta for each of the executing processes, and the total cache size. The output is the overall miss-rate. Trace-driven simulations demonstrate that the estimated miss-rate is very accurate. Since the model provides a fast and accurate way to estimate the effect of context switching, it is useful for both understanding the effect of context switching on caches and optimizing cache performance for time-shared systems. A cache partitioning mechanism is also presented and is shown to improve the cache miss-rate up to 25% over the normal LRU replacement policy.

Patent
19 Dec 2001
TL;DR: In this paper, a method, a system, an apparatus, and a computer program product are presented for fragment caching, where a message is received at a computing device that contains a cache management unit, a fragment in the message body of the message is cached.
Abstract: A method, a system, an apparatus, and a computer program product are presented for fragment caching. After a message is received at a computing device that contains a cache management unit, a fragment in the message body of the message is cached. Subsequent requests for the fragment at the cache management unit result in a cache hit. The cache management unit operates equivalently in support of fragment caching operations without regard to whether the computing device acts as a client, a server, or a hub located throughout the network; in other words, the fragment caching technique is uniform throughout a network. Cache ID rules accompany a fragment from an origin server; the cache ID rules describe a method for forming a unique cache ID for the fragment such that dynamic content can be cached away from an origin server.

Patent
08 Jun 2001
TL;DR: In this article, a method and system for exclusive two-level caching in a chip-multiprocessor is presented to maximize the effective use of on-chip cache.
Abstract: To maximize the effective use of on-chip cache, a method and system for exclusive two-level caching in a chip-multiprocessor are provided. The exclusive two-level caching in accordance with the present invention involves method relaxing the inclusion requirement in a two-level cache system in order to form an exclusive cache hierarchy. Additionally, the exclusive two-level caching involves providing a first-level tag-state structure in a first-level cache of the two-level cache system. The first tag-state structure has state information. The exclusive two-level caching also involves maintaining in a second-level cache of the two-level cache system a duplicate of the first-level tag-state structure and extending the state information in the duplicate of the first tag-state structure, but not in the first-level tag-state structure itself, to include an owner indication. The exclusive two-level caching further involves providing in the second-level cache a second tag-state structure so that a simultaneous lookup at the duplicate of the first tag-state structure and the second tag-state structure is possible. Moreover, the exclusive two-level caching involves associating a single owner with a cache line at any given time of its lifetime in the chip-multiprocessor.

Patent
16 Apr 2001
TL;DR: In this article, the authors propose a system and method for caching network resources in an intermediary server topologically located between a client and a server in a network, where the intermediate server includes a cache and methods for loading content into the cache as according to rules specified by a site owner.
Abstract: A system and method for caching network resources in an intermediary server topologically located between a client and a server in a network. The intermediate server preferably caches at both a back-end location and a front-end location. Intermediary server includes a cache and methods for loading content into the cache as according to rules specified by a site owner. Optionally, content can be proactively loaded into the cache to include content not yet requested. In another option, requests can be held at the cache when a prior request for similar content is pending.

Patent
14 Aug 2001
TL;DR: In this paper, a method and apparatus for the selection of digital content for broadcast delivery to multiple users is described, where each user filters the received content for storage in a client-side cache based on user preferences.
Abstract: A method and apparatus are disclosed for the selection of digital content for broadcast delivery to multiple users. A broadcast edge cache server selects content for broadcast distribution to multiple users. Each user filters the received content for storage in a client-side cache based on user preferences. Each client computer includes a local cache that records material that has been accessed by the user and a broadcast cache that records material that is predicted to be of interest to the user, in accordance with the present invention. Each client computer is connected to the network environment by a relatively high bandwidth uni-directional broadcast channel, and a second bi-directional channel, such as a lower bandwidth channel. A client initially determines if requested content is available local in a client cache or a broadcast cache before requesting the content over the network from an edge server or the content provider (such as a web site) on a lower bandwidth channel.

Patent
27 Jun 2001
TL;DR: In this article, a system and method to reduce the time for system initializations is described, where data accessed during a system initialization is loaded into a non-volatile cache and is pinned to prevent eviction.
Abstract: A system and method to reduce the time for system initializations is disclosed. In accordance with the invention, data accessed during a system initialization is loaded into a non-volatile cache and is pinned to prevent eviction. By pinning data into the cache, the data required for system initialization is pre-loaded into the cache on a system reboot, thereby eliminating the need to access a disk.

Patent
25 Jan 2001
TL;DR: In this paper, a system for adaptive bypassing one or more higher cache levels following a miss in a lower level of a cache hierarchy is described, where each cache level preferably includes a tag store containing address and state information for each cache line resident in the respective cache.
Abstract: A system for adaptively bypassing one or more higher cache levels following a miss in a lower level of a cache hierarchy is described. Each cache level preferably includes a tag store containing address and state information for each cache line resident in the respective cache. When an invalidate request is received at a given cache hierarchy, each cache level is searched for the address specified by the invalidate request. When an address match is detected, the state of the respective cache line is changed to the invalid state, although the address of the cache line is left in the tag store. Thereafter, if the processor or entity associated with this cache hierarchy issues its own request for this same cache line, the cache hierarchy begins searching the tag store of each level starting with the lowest cache level. Since the address of the invalidated cache line was left in the respective tag store, a match will be detected at one of the cache levels, although the corresponding state of this cache line is invalid. This condition is specifically detected and is considered to be an “inval_miss” occurrence. In response, to an inval_miss, the cache hierarchy calls off searching any higher levels, and instead, issues a memory reference request for the desired cache line. In a further embodiment, the entity that sourced an invalidate request is stored, and a subsequent memory reference request for the same cache line is sent directly to the source entity.

Patent
08 Aug 2001
TL;DR: In this paper, the cache system determines that an object, such as an image file, is missing from the cache memory, locates sufficient components from cache memory and/or external storage, and constructs the object from the located components.
Abstract: Methods and apparatus for constructing objects within a cache system thereby allowing the cache system to respond to requested objects that are not initially available within the cache system. One embodiment of the invention caches image files, where the images are divided into components and stored in a format that allows identification and access to the components. The cache system determines that an object, such as an image file, is missing from the cache memory, locates sufficient components from the cache memory and/or external storage, and constructs the object from the located components.

Journal ArticleDOI
TL;DR: This study shows that the two proposed schemes are not only effective in salvaging the cache content but consume significantly less energy than their counterparts.
Abstract: Caching can reduce the bandwidth requirement in a wireless computing environment as well as minimize the energy consumption of wireless portable computers. To facilitate mobile clients in ascertaining the validity of their cache content, servers periodically broadcast cache invalidation reports that contain information of data that has been updated. However, as mobile clients may operate in a doze or even totally disconnected mode (to conserve energy), it is possible that some reports may be missed and the clients are forced to discard the entire cache content. In this paper, we reexamine the issue of designing cache invalidation strategies. We identify the basic issues in designing cache invalidation strategies. From the solutions to these issues, a large set of cache invalidation schemes can be constructed. We evaluate the performance of four representative algorithms-two of which are known algorithms (i.e., Dual-Report Cache Invalidation and Bit-Sequences) while the other two are their counterparts that exploit selective tuning (namely, Selective Dual-Report Cache Invalidation and Bit-Sequences with Bit Count). Our study shows that the two proposed schemes are not only effective in salvaging the cache content but consume significantly less energy than their counterparts. While the Selective Dual-Report Cache Invalidation scheme performs best in most cases, it is inferior to the Bit-Sequences with the Bit-Count scheme under high update rates.

Patent
07 Jun 2001
TL;DR: In this article, a proxy partition cache (PPC) architecture and a technique for address-partitioning a proxy cache consisting of a grouping of discrete, cooperating caches (servers) is provided.
Abstract: A proxy partition cache (PPC) architecture and a technique for address-partitioning a proxy cache consisting of a grouping of discrete, cooperating caches (servers) is provided. Client requests for objects (files) of a given size are redirected or reassigned to a single cache in the grouping, notwithstanding the cache to which the request is made by the load-balancing mechanism (such as a Layer 4 switch) based upon load-balancing considerations. The file is then returned to the switch via the switch-designated cache for vending to the requesting client. The redirection/reassignment occurs according to a function within the cache to which the request is directed so that the switch remains freed from additional tasks that can compromise speed.

Patent
31 Oct 2001
TL;DR: In this article, a cache memory system can determine that an entry is stale if the entry has not been accessed or modified for a predetermined time, and the predetermined time is made dynamically variable.
Abstract: A cache memory system can determine that an entry is stale if the entry has not been accessed or modified for a predetermined time. If an entry is stale, the entry may be preemptively evicted. The predetermined time is made dynamically variable. A computer system can adjust the time to optimize a measure of performance. In a specific example, evicted lines are temporarily stored in an eviction queue. The time is adjusted to be as short as possible without substantially increasing the number of lines that must be recalled from the eviction queue.

Patent
11 Jun 2001
TL;DR: In this article, the authors propose a cache coherence protocol for a plurality of processor nodes and input/output nodes, where each processor node includes a multiplicity of processor cores, an interface to a local memory system and a protocol engine.
Abstract: A computer system has a plurality of processor nodes and a plurality of input/output nodes. Each processor node includes a multiplicity of processor cores, an interface to a local memory system and a protocol engine implementing a predefined cache coherence protocol. Each processor core has an associated memory cache for caching memory lines of information. Each input/output node includes no processor cores, an input/output interface for interfacing to an input/output bus or input/output device, a memory cache for caching memory lines of information and an interface to a local memory subsystem. The local memory subsystem of each processor node and input/output node stores a multiplicity of memory lines of information. The protocol engine of each processor node and input/output node implements the same predefined cache coherence protocol.

Patent
27 Aug 2001
TL;DR: In this article, a cache directory is also provided to track cache lines in the write cache and the at least one read cache, which provides a low-latency copy of data that is most likely to be used.
Abstract: A caching input/output hub includes a host interface to connect with a host. At least one input/output interface is provided to connect with an input/output device. A write cache manages memory writes initiated by the input/output device. At least one read cache, separate from the write cache, provides a low-latency copy of data that is most likely to be used. The at least one read cache is in communication with the write cache. A cache directory is also provided to track cache lines in the write cache and the at least one read cache. The cache directory is in communication with the write cache and the at least one read cache.

Patent
05 Mar 2001
TL;DR: In this paper, a method of servicing a request for a document over a computer network includes independently caching portions of pages called blocks, each of which includes a reference to a data source and code that is adapted to access the source and to format the data accessed from the data source.
Abstract: A method of servicing a request for a document over a computer network includes independently caching portions of pages called blocks. Each block includes a reference to a data source and code that is adapted to access the data source and to format the data accessed from the data source. When a request for a page is received over a computer network, one or more of the plurality of blocks defined in the script of the requested document may be retrieved from a cache memory. Any block that is not found in the cache memory is dynamically generated and a copy thereof is stored in the cache memory. The requested page may then be assembled from the page blocks retrieved from the cache memory and/or the dynamically generated page blocks.

Proceedings ArticleDOI
08 Sep 2001
TL;DR: The r-a cache is proposed, which provides flexible associativity by placing most blocks in direct-mapped positions and reactively displacing only conflicting blocks to set-associative positions, and using a novel PC-based way-prediction to achieve high accuracy.
Abstract: While set-associative caches typically incur fewer misses than direct-mapped caches, set-associative caches have slower hit times. We propose the reactive-associative cache (r-a cache), which provides flexible associativity by placing most blocks in direct-mapped positions and reactively displacing only conflicting blocks to set-associative positions. The r-a cache uses way-prediction (like the predictive associative cache, PSA) to access displaced blocks on the initial probe. Unlike PSA, however, the r-a cache employs a novel feedback mechanism to prevent unpredictable blocks from being displaced. Reactive displacement and feedback allow the r-a cache to use a novel PC-based way-prediction and achieve high accuracy; without impractical block swapping as in column associative and group associative, and without relying on timing-constrained XOR way prediction. A one-port, 4-way r-a cache achieves up to 9% speedup over a direct-mapped cache and performs within 2% of an idealized 2-way set-associative, 1-cycle cache. A 4-way r-a cache achieves up to 13% speedup over a PSA cache, with both r-a and PSA using the PC scheme. CACTI estimates that for sizes larger than 8KB, a 4-way r-a cache is within 1% of direct-mapped hit times, and 24% faster than a 2-way set-associative cache.

Patent
22 May 2001
TL;DR: In this article, a method and apparatus for web caching is described, which can be implemented in hardware, software, or firmware, and can be used in either hardware or software.
Abstract: A method and apparatus for web caching is disclosed. The method and apparatus may be implemented in hardware, software or firmware. Complementary cache management modules, a coherency module and a cache module(s) are installed complementary gateways for data and for clients respectively. The coherency management module monitors data access requests and or response and determines for each: the uniform resource locator (URL) of the requested web page, the URL of the requestor and a signature. The signature is computed using cryptographic techniques and in particular a hash function for which the input is the corresponding web page for which a signature is to be generated. The coherency management module caches these signatures and the corresponding URL and uses the signatures to determine when a page has been updated. When, on the basis of signature comparisons it is determined that a page has been updated the coherency management module sends a notification to all complementary cache modules. Each cache module caches web pages requested by the associated client(s) to which it is coupled. The notification from the cache management module results in the cache module(s) which are the recipient of a given notice updating their tag table with a stale bit for the associated web page. The cache module(s) use this information in the associated tag tables to determine which pages they need to update. The cache modules initiate this update during intervals of reduced activity in the servers, gateways, routers, or switches of which they are a part. All clients requesting data through the system of which each cache module is a part are provided by the associated cache module with cached copies of requested web pages.

Patent
Terry L. Kendall1
28 Mar 2001
TL;DR: A small cache memory can be incorporated with a main memory, such as a flash memory, on an integrated circuit to improve average access times between a processor and the main memory as discussed by the authors, which can also allow a suspended transfer with minimal latency when the transfer is resumed.
Abstract: A small cache memory can be incorporated with a main memory, such as a flash memory, on an integrated circuit to improve average access times between a processor and the main memory To minimize cost and complexity, the cache memory may contain only a few words of data The cache can also allow a suspended transfer with minimal latency when the transfer is resumed Designing the cache memory to interface with the processor over a standard memory bus permits the cache to be implemented in a system that could otherwise have no cache memory unless the processor and/or memory bus were redesigned

Proceedings ArticleDOI
23 Apr 2001
TL;DR: The potential for addressing bandwidth limitations by increasing global cache reuse is explored-that is, reusing data across whole program and over the entire data collection, in a two-step global strategy.
Abstract: Reusing data in cache is critical to achieving high performance on modern machines because it reduces the impact of the latency and bandwidth limitations of direct memory access. To date, most studies of software memory hierarchy management have focused on the latency problem. However today's machines are increasingly limited by insufficient memory bandwidth-on these machines, latency-oriented techniques are inadequate because they do not seek to minimize the total memory traffic over the whole program. This paper explores the potential for addressing bandwidth limitations by increasing global cache reuse-that is, reusing data across whole program and over the entire data collection. To this end, the paper explores a two-step global strategy. The first step fuses computations on the same data to enable the caching of repeated accesses. The second step groups data used by the same computation to bring about contiguous access to memory. While the first step reduces the frequency of memory accesses, the second step improves their efficiency. The paper demonstrates the effectiveness of this strategy and shows how to automate it in a production compiler.

Proceedings ArticleDOI
01 May 2001
TL;DR: This paper proposes two index structures, pkT-trees and pkB-tree, which significantly reduce cache misses by storing partial-key information in the index, and shows that a small, fixed amount of key information allows most cache misses to be avoided, allowing for a simple node structure and efficient implementation.
Abstract: The performance of main-memory index structures is increasingly determined by the number of CPU cache misses incurred when traversing the index. When keys are stored indirectly, as is standard in main-memory databases, the cost of key retrieval in terms of cache misses can dominate the cost of an index traversal. Yet it is inefficient in both time and space to store even moderate sized keys directly in index nodes. In this paper, we investigate the performance of tree structures suitable for OLTP workloads in the face of expensive cache misses and non-trivial key sizes. We propose two index structures, pkT-trees and pkB-trees, which significantly reduce cache misses by storing partial-key information in the index. We show that a small, fixed amount of key information allows most cache misses to be avoided, allowing for a simple node structure and efficient implementation. Finally, we study the performance and cache behavior of partial-key trees by comparing them with other main-memory tree structures for a wide variety of key sizes and key value distributions.