Showing papers on "Cache published in 1993"

PDF

Open Access

Patent•

Coherence controls for store-multiple shared data coordinated by cache directory entries in a shared electronic storage

[...]

Kelly Carpenter¹, Gerard Maclean Dearing¹, Jeffrey M. Nick¹, Jimmy Paul Strickland¹, Michael D. Swanson¹, Wendell W. Wilkinson¹ - Show less +2 more•Institutions (1)

IBM¹

08 Nov 1993

TL;DR: In this paper, the authors propose a shared high-speed cache management logic to meet the serialization and data coherency requirements of data systems when sharing the high speed cache as a store-multiple cache in a multi-system environment.

...read moreread less

Abstract: A high-speed cache is shared by a plurality of independently-operating data systems in a multi-system data sharing complex. Each data system has access both to the high-speed cache and to lower-speed, upper-level storage for obtaining and storing data. Management logic in the shared high-speed cache is provided to meet the serialization and data coherency requirements of the data systems when sharing the high speed cache as a store-multiple cache in a multi-system environment.

...read moreread less

478 citations

Book•

The Cache Memory Book

[...]

James E. Handy

01 Jan 1993

TL;DR: What is Cache Memory?

...read moreread less

Abstract: What is Cache Memory? How are Caches Designed? Cache Memories and RISC Processors. Maintaining Coherency in Cached Systems. Cute Cache Tricks. Subject Index.

...read moreread less

447 citations

Proceedings Article•DOI•

The Wisconsin Wind Tunnel: virtual prototyping of parallel computers

[...]

Steven K. Reinhardt, Mark D. Hill, James R. Larus, Alvin R. Lebeck, James C. Lewis, Darien Wood - Show less +2 more

01 Jun 1993

TL;DR: The Wisconsin Wind Tunnel (WWT) as mentioned in this paper runs a parallel shared-memory program on a parallel computer (CM-5) and uses execution-driven, distributed, discrete-event simulation to accurately calculate program execution time.

...read moreread less

Abstract: We have developed a new technique for evaluating cache coherent, shared-memory computers The Wisconsin Wind Tunnel (WWT) runs a parallel shared-memory program on a parallel computer (CM-5) and uses execution-driven, distributed, discrete-event simulation to accurately calculate program execution time WWT is a virtual prototype that exploits similarities between the system under design (the target) and an existing evaluation platform (the host) The host directly executes all target program instructions and memory references that hit in the target cache WWT's shared memory uses the CM-5 memory's error-correcting code (ECC) as valid bits for a fine-grained extension of shared virtual memory Only memory references that miss in the target cache trap to WWT, which simulates a cache-coherence protocol WWT correctly interleaves target machine events and calculates target program execution time WWT runs on parallel computers with greater speed and memory capacity than uniprocessors WWT's simulation time decreases as target system size increases for fixed-size problems and holds roughly constant as the target system and problem scale

...read moreread less

304 citations

Proceedings Article•DOI•

Practical prefetching via data compression

[...]

Curewitz Kenneth Marion, P. Krishnan¹, Jeffrey Scott Vitter²•Institutions (2)

Brown University¹, Duke University²

01 Jun 1993

TL;DR: This paper adapts three well-known data compressors to get three simple, deterministic, and universal prefetchers, and concludes that prediction for prefetching based on data compression techniques holds great promise.

...read moreread less

Abstract: An important issue that affects response time performance in current OODB and hypertext systems is the I/O involved in moving objects from slow memory to cache. A promising way to tackle this problem is to use prefetching, in which we predict the user's next page requests and get those pages into cache in the background. Current databases perform limited prefetching using techniques derived from older virtual memory systems. A novel idea of using data compression techniques for prefetching was recently advocated in [KrV, ViK], in which prefetchers based on the Lempel-Ziv data compressor (the UNIX compress command) were shown theoretically to be optimal in the limit. In this paper we analyze the practical aspects of using data compression techniques for prefetching. We adapt three well-known data compressors to get three simple, deterministic, and universal prefetchers. We simulate our prefetchers on sequences of page accesses derived from the OO1 and OO7 benchmarks and from CAD applications, and demonstrate significant reductions in fault-rate. We examine the important issues of cache replacement, size of the data structure used by the prefetcher, and problems arising from bursts of “fast” page requests (that leave virtually no time between adjacent requests for prefetching and book keeping). We conclude that prediction for prefetching based on data compression techniques holds great promise.

...read moreread less

260 citations

Patent•

Method for storing data in an interactive computer network

[...]

Robert Filepp¹, Michael L. Gordon¹, Alexander W. Bidwell¹, Allan M. Wolf¹, Francis C. Young¹, Duane Tiemann¹, Kenneth H. Appleman¹, Sam Meo¹ - Show less +4 more•Institutions (1)

IBM¹

26 Nov 1993

TL;DR: In this paper, a method for storing data in an interactive computer network is described, which includes steps for establishing data stores of prescribed capacities within a network for delivering an interactive service.

...read moreread less

Abstract: A method for storing data in an interactive computer network is described. In preferred form, the method features steps for establishing data stores of prescribed capacities within a network for delivering an interactive service. The stored data is used in presenting the applications that makeup the service. The method features steps for associating storage control parameters with the application data to be stored and supplying data to the respective stores in excess of their respective capacities. The method includes steps for retaining data at the stores based on the respective prescribed storage control parameters and the date usage experience at the respective stores. In preferred form, the method features steps for providing the data stores with a temporary cache for storing data during a data use session and a variable-content, permanent, file for retaining data between data use sessions. The method configures the cache from available RAM and a prescribed disk file, and the stage from a content-variable, permanent disk file. Data is retained at the cache and subsequently at the stage based on control parameters associated with the data identification, storage candidacy and version, as combined with a least-recently-used criterion. Accordingly, over multiple use sessions, the stage self-configures with data tailored to use experience. Also in the preferred form of the method described, the data is arranged as objects having a header including the storage control parameters.

...read moreread less

246 citations

Proceedings Article•DOI•

Adaptive cache coherency for detecting migratory shared data

[...]

Alan L. Cox¹, Robert J. Fowler²•Institutions (2)

Rice University¹, University of Rochester²

01 May 1993

TL;DR: A family of adaptive cache coherency protocols that dynamically identify migratory shared data in order to reduce the cost of moving them are described, which indicate that the use of the adaptive protocol can almost halve the number of inter-node messages on some applications.

...read moreread less

Abstract: Parallel programs exhibit a small number of distinct data-sharing patterns. A common data-sharing pattern, migratory access, is characterized by exclusive read and write access by one processor at a time to a shared datum. We describe a family of adaptive cache coherency protocols that dynamically identify migratory shared data in order to reduce the cost of moving them. The protocols use a standard memory model and processor-cache interface. They do not require any compile-time or run-time software support. We describe implementations for bus-based multiprocessors and for shared-memory multiprocessors that use directory-based caches. These implementations are simple and would not significantly increase hardware cost. We use trace- and execution-driven simulation to compare the performance of the adaptive protocols to standard write-invalidate protocols. These simulations indicate that, compared to conventional protocols, the use of the adaptive protocol can almost halve the number of inter-node messages on some applications. Since cache coherency traffic represents a larger part of the total communication as cache size increases, the relative benefit of using the adaptive protocol also increases.

...read moreread less

234 citations

Proceedings Article•DOI•

Cache write policies and performance

[...]

Norman P. Jouppi

01 May 1993

TL;DR: Tradeoffs on writes that miss in the cache are investigated and a mixture of these two alternatives, called write caching, which places a small fully-associative cache behind a write-through cache.

...read moreread less

Abstract: This paper investigates issues involving writes and caches. First, tradeoffs on writes that miss in the cache are investigated. In particular, whether the missed cache block is fetched on a write miss, whether the missed cache block is allocated in the cache, and whether the cache line is written before hit or miss is known are considered. Depending on the combination of these polices chosen, the entire cache miss rate can vary by a factor of two on some applications. The combination of no-fetch-on-write and write-allocate can provide better performance than cache line allocation instructions. Second, tradeoffs between write-through and write-back caching when writes hit in a cache are considered. A mixture of these two alternatives, called write caching is proposed. Write caching places a small fully-associative cache behind a write-through cache. A write cache can eliminate almost as much write traffic as a write-back cache.

...read moreread less

234 citations

Patent•

System for seamless processing of encrypted and non-encrypted data and instructions

[...]

Jr. Robert Charles Hartman¹•Institutions (1)

IBM¹

05 Aug 1993

TL;DR: In this article, the authors propose a data processing system that seamlessly processes both encrypted and non-encrypted data and instructions, including an internal cache memory in a secure physical region that is not accessible to a user.

...read moreread less

Abstract: The data processing system herein seamlessly processes both encrypted and non-encrypted data and instructions. The system includes an internal cache memory in a secure physical region that is not accessible to a user of the system. An external memory is positioned outside of the secure physical region and stores encrypted and non-encrypted data and instructions. The system includes an instruction to access a private key contained within the secure physical region. That key is used to decrypt an encrypted master key that accompanies encrypted data and instructions. An interface circuit is positioned in the secure physical region and decrypts each encrypted master key through the use of the private key and also decrypts encrypted data and instructions associated with each decrypted master key. A plurality of segment registers in the secure physical region maintain a record of active memory segments in the external memory and associates therewith each decrypted master key. A central processor accesses segments of both non-encrypted and encrypted data and instructions from the external memory and causes the interface circuit to employ a decrypted master key to de-encrypt data and instructions from the external memory and to store the de-encrypted information in the internal memory cache. Non-encrypted data and instructions are directly stored in the internal memory cache.

...read moreread less

204 citations

Patent•

Locating resources in computer networks having cache server nodes

[...]

Ray William Boyles¹, Michael Francis Gierlach¹, Prabandham Madan Gopal¹, Robert Sultan¹, Gary Michael Vacek¹ - Show less +1 more•Institutions (1)

IBM¹

23 Mar 1993

TL;DR: Cache server nodes play a key role in the LOCATE process and can prevent redundant network-wide broadcasts of LOCATE requests as mentioned in this paper, where an origin cache server receives a request from a served node, the cache server node searches its local directories first, then forwards the request to alternate cache server nodes if necessary.

...read moreread less

Abstract: A computer network in which resources are dynamically located through the use of LOCATE requests includes multiple cache server nodes, network nodes which have an additional obligation to build and maintain large caches of directory entries. Cache server nodes play a key role in the LOCATE process and can prevent redundant network-wide broadcasts of LOCATE requests. Where an origin cache server node receives a request from a served node, the cache server node searches its local directories first, then forwards the request to alternate cache server nodes if necessary. If the necessary information isn't found locally or in alternate cache server nodes, the LOCATE request is then broadcast to all network nodes in the network. If the broadcast results are negative, the request is forwarded to selected gateway nodes to permit the search to continue in adjacent networks.

...read moreread less

199 citations

Journal Article•DOI•

Using processor-cache affinity information in shared-memory multiprocessor scheduling

[...]

Mark S. Squillante¹, Edward D. Lazowska²•Institutions (2)

IBM¹, University of Washington²

01 Feb 1993-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The importance of having a policy that adapts its behavior to changes in system load is demonstrated and the effects of an initial burst of cache misses experienced by tasks when they return to a processor for execution are demonstrated.

...read moreread less

Abstract: In a shared-memory multiprocessor system, it may be more efficient to schedule a task on one processor than on another if relevant data already reside in a particular processor's cache. The effects of this type of processor affinity are examined. It is observed that tasks continuously alternate between executing at a processor and releasing this processor due to I/O, synchronization, quantum expiration, or preemption. Queuing network models of different abstract scheduling policies are formulated, spanning the range from ignoring affinity to fixing tasks on processors. These models are solved via mean value analysis, where possible, and by simulation otherwise. An analytic cache model is developed and used in these scheduling models to include the effects of an initial burst of cache misses experienced by tasks when they return to a processor for execution. A mean-value technique is also developed and used in the scheduling models to include the effects of increased bus traffic due to these bursts of cache misses. Only a small amount of affinity information needs to be maintained for each task. The importance of having a policy that adapts its behavior to changes in system load is demonstrated. >

...read moreread less

194 citations

Proceedings Article•DOI•

Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

[...]

Tse-Yu Yeh¹, Deborah T. Marr¹, Yale N. Patt¹•Institutions (1)

University of Michigan¹

01 Aug 1993

Journal Article•DOI•

Cache performance of the SPEC92 benchmark suite

[...]

J.D. Gee¹, Mark D. Hill, Dionisios Pnevmatikatos, Alan Jay Smith•Institutions (1)

University of California, Berkeley¹

01 Jul 1993-IEEE Micro

TL;DR: The authors consider whether SPECmarks, the figures of merit obtained from running the SPEC benchmarks under certain specified conditions, accurately indicate the performance to be expected from real, live work loads, and it is found that instruction cache miss ratios in general, and data cache miss ratio for the integer benchmarks, are quite low.

...read moreread less

Abstract: The authors consider whether SPECmarks, the figures of merit obtained from running the SPEC benchmarks under certain specified conditions, accurately indicate the performance to be expected from real, live work loads. Miss ratios for the entire set of SPEC92 benchmarks are measured. It is found that instruction cache miss ratios in general, and data cache miss ratios for the integer benchmarks, are quite low. Data cache miss ratios for the floating-point benchmarks are more in line with published measurements for real work loads. >

...read moreread less

Patent•

Cache coherency protocol for multi processor computer system

[...]

Darrel D. Donaldson, Mark N. Howard, David A. Orbits, John M. Parchem, David M. Robinson, Douglas D. Williams - Show less +2 more

24 May 1993

TL;DR: In this paper, the authors propose a cache coherency protocol for multi-processor systems which provides for read/write, read-only and transitional data states and for an indication of these states to be stored in a memory directory in main memory.

...read moreread less

Abstract: A cache coherency protocol for a multi-processor system which provides for read/write, read-only and transitional data states and for an indication of these states to be stored in a memory directory in main memory. The transitional data state occurs when a processor requests from main memory a data block in another processor's cache and the request is pending completion. All subsequent read requests for the data block during the pendency of the first request are inhibited until completion of the first request. Also provided in the memory directory for each data block is a field for identifying the processor which owns the data block in question. Data block ownership information is used to determine where requested owned data is located.

...read moreread less

Patent•

System and Method for dynamically controlling cache management

[...]

Moshe Yanai¹, Natan Vishlitzky¹, Bruno Alterescu¹, Daniel Castel¹•Institutions (1)

EMC Corporation¹

03 Jun 1993

TL;DR: In this article, a cache management system and method coupled to at least one host and one data storage device is presented, where a cache indexer maintains a current index (25) of data elements which are stored in cache memory.

...read moreread less

Abstract: A cache management system and method monitors and controls the contents of cache memory (12) coupled to at least one host (22a) and at least one data storage device (18a). A cache indexer (16) maintains a current index (25) of data elements which are stored in cache memory (12). A sequential data access indicator (30), responsive to the cache index (16) and to a user selectable sequential data access threshold, determines that a sequential data access is in progress for a given process and provides an indication of the same. The system and method allocate a micro-cache memory (12) to any process performing a sequential data access. In response to the indication of a sequential data access in progress and to a user selectable maximum number of data elements to be prefetched, a data retrieval requestor requests retrieval of up to the selected maximum number of data elements from a data storage device (18b). A user selectable number of sequential data elements determines when previously used micro-cache memory locations will be overwritten. A method of dynamically monitoring and adjusting cache management parameters is also disclosed.

...read moreread less

Patent•

System for maintaining data coherency in cache memory by periodically broadcasting invalidation reports from server to client

[...]

Daniel Barbará¹, Tomasz Imielinski¹•Institutions (1)

Princeton University¹

06 Dec 1993

TL;DR: In this article, a method and system for maintaining coherency between a server processor and a client processor that has a cache memory is presented, where the server processor periodically broadcasts invalidation reports to the client processor.

...read moreread less

Abstract: A method and system are provided for maintaining coherency between a server processor and a client processor that has a cache memory. The server may, for example, be a fixed location mobile unit support station. The client may, for example, be a palmtop computer. The server stores a plurality of data values, and the client stores a subset of the plurality of data values in the cache. The server processor periodically broadcasts invalidation reports to the client processor. Each respective invalidation report includes information identifying which, if any, of the plurality of data values have been updated within a predetermined period of time before the server processor broadcasts the respective invalidation report. The client processor determines, based on the invalidation reports, whether a selected data value in the cache memory of the client processor has been updated in the server processor since the selected data value was stored in the cache memory. The client processor invalidates the selected data value in the cache memory of the client processor, if the selected data value has been updated in the server processor.

...read moreread less

Proceedings Article•DOI•

Efficient simulation of caches under optimal replacement with applications to miss characterization

[...]

Rabin A. Sugumar, Santosh G. Abraham

01 Jun 1993

TL;DR: The OPT model is proposed that uses cache simulation under optimal (OPT) replacement to obtain a finer and more accurate characterization of misses than the three Cs model, and three new techniques for optimal cache simulation are presented.

...read moreread less

Abstract: Cache miss characterization models such as the three Cs model are useful in developing schemes to reduce cache misses and their penalty. In this paper we propose the OPT model that uses cache simulation under optimal (OPT) replacement to obtain a finer and more accurate characterization of misses than the three Cs model. However, current methods for optimal cache simulation are slow and difficult to use. We present three new techniques for optimal cache simulation. First, we propose a limited lookahead strategy with error fixing, which allows one pass simulation of multiple optimal caches. Second, we propose a scheme to group entries in the OPT stack, which allows efficient tree based fully-associative cache simulation under OPT. Third, we propose a scheme for exploiting partial inclusion in set-associative cache simulation under OPT. Simulators based on these algorithms were used to obtain cache miss characterizations using the OPT model for nine SPEC benchmarks. The results indicate that miss ratios under OPT are substantially lower than those under LRU replacement, by up to 70% in fully-associative caches, and up to 32% in two-way set-associative caches.

...read moreread less

Patent•

Method of operating a cache memory including determining desirability of cache ahead or cache behind based on a number of available I/O operations

[...]

Marvin Lautzenheiser

21 Jun 1993

TL;DR: In this article, a computer data storage device made up of both solid state storage and rotating magnetic disk storage maintains a fast response time approaching that of a solid state device for many workloads and improves on the response time of a normal magnetic disk for practically all workloads.

...read moreread less

Abstract: A computer data storage device made up of both solid state storage and rotating magnetic disk storage maintains a fast response time approaching that of a solid state device for many workloads and improves on the response time of a normal magnetic disk for practically all workloads. The high performance is accomplished by a special hardware configuration coupled with unique procedures and algorithms for placing and maintaining data in the most appropriate media based on actual and projected activity. The system management features a completely searchless method (no table searches) for determining the location of data within and between the two devices. Sufficient solid state memory capacity is incorporated to permit retention of useful, active data, as well as to permit prefetching of data into the solid state storage when the probabilities favor such action. Movement of updated data from the solid state storage to the magnetic disk and of prefetched data from the magnetic disk to the solid state storage is done on a timely, but unobtrusive, basis as background tasks of the described device. A direct, private channel between the solid state storage and the magnetic disk prevents the conversations between these two media from conflicting with the transmission of data between the host computer and the described device. A set of microprocessors manages and oversees the data transmission and storage. Data integrity is maintained through a power interruption via a battery assisted, automatic and intelligent shutdown procedure.

...read moreread less

Proceedings Article•DOI•

[...]

Mayan Moudgill¹, Keshav Pingali¹, Stamatis Vassiliadis¹•Institutions (1)

Cornell University¹

01 Dec 1993

TL;DR: A novel microparallel taxonomy for machines with multiple-instruction processing capabilities including VLIW, superscalar, and decoupled machines is presented and four new processor microarchitectures are postulated which provide additional features and are instances of the remaining unexplored micropARallel classifications.

...read moreread less

Abstract: Presents a novel mechanism that implements register renaming, dynamic speculation and precise interrupts. Renaming of registers is performed during the instruction fetch stage instead of the decode stage, and the mechanism is designed to operate in parallel with the tag match logic used by most cache designs. It is estimated that the critical path of the mechanism requires approximately the same number of logic levels as the tag match logic, and therefore should not impact cycle time. >

...read moreread less

Patent•

Disk cache management techniques using non-volatile storage

[...]

Kadangode K. Ramakrishnan, Prabuddha Biswas

30 Jun 1993

TL;DR: In this paper, a nonvolatile cache memory used to hold data blocks for which write requests have been made is purged of "dirty" blocks, not yet written to the disk, based on the proportion of dirty blocks in relation to an upper threshold and a lower threshold.

...read moreread less

Abstract: A method, and apparatus for its use, for reducing the number of disk accesses needed to satisfy requests for reading data from and writing data to a hard disk. A non-volatile cache memory used to hold data blocks for which write requests have been made is purged of "dirty" blocks, not yet written to the disk, based on the proportion of dirty blocks in relation to an upper threshold and a lower threshold. A purge request flag is set when the proportion of dirty blocks goes above the upper threshold, but is not cleared until the proportion of dirty blocks goes below the lower threshold. So long as the purge request flag is set, dirty blocks are purged when the disk is not busy with read requests. Immediate purging is initiated when the write cache becomes totally full of dirty blocks. Purging of dirty blocks is also effected during disk read accesses, by "piggybacking" a writing operation with the reading operation, to write dirty blocks destined for the same track or cylinder in which the requested read data blocks are located.

...read moreread less

Patent•

Methods and apparatus for implementing a pseudo-LRU cache memory replacement scheme with a locking feature

[...]

Adam Malamy¹, Rajiv N. Patel¹, Norman M. Hayes¹•Institutions (1)

Sun Microsystems¹

19 Apr 1993

TL;DR: In this paper, a cache memory replacement scheme with a locking feature is provided, where the locking bits associated with each line in the cache are supplied in the tag table and used by the application program/process executing and are utilized in conjunction with cache replacement bits by the cache controller to determine the lines in cache to replace.

...read moreread less

Abstract: In a memory system having a main memory and a faster cache memory, a cache memory replacement scheme with a locking feature is provided. Locking bits associated with each line in the cache are supplied in the tag table. These locking bits are preferably set and reset by the application program/process executing and are utilized in conjunction with cache replacement bits by the cache controller to determine the lines in the cache to replace. The lock bits and replacement bits for a cache line are "ORed" to create a composite bit for the cache line. If the composite bit is set the cache line is not removed from the cache. When deadlock due to all composite bits being set will result, all replacement bits are cleared. One cache line is always maintained as non-lockable. The locking bits "lock" the line of data in the cache until such time when the process resets the lock bit. By providing that the process controls the state of the lock bits, the intelligence and knowledge the process contains regarding the frequency of use of certain memory locations can be utilized to provide a more efficient cache.

...read moreread less

The Compression Cache: Using On-line Compression to Extend Physical Memory.

[...]

Fred Douglis

01 Jan 1993

TL;DR: Measurements using Sprite on a DECstation 1 5000/200 workstation with a local disk indicate that some memory-intensive applications running with a compression cache can run two to three times faster than on an unmodiied system.

...read moreread less

Abstract: This paper describes a method for trading oo computation for disk or network I/O by using less expensive on-line compression. By using some memory to store data in compressed format, it may be possible to t the working set of one or more large applications in relatively small memory. For working sets that are too large to t in memory even when compressed, compression still provides a beneet by reducing bandwidth and space requirements. Overall, the eeectiveness of this compression cache depends on application behavior and the relative costs of compression and I/O. Measurements using Sprite on a DECstation 1 5000/200 workstation with a local disk indicate that some memory-intensive applications running with a compression cache can run two to three times faster than on an unmodiied system. Better speedups would be expected in a system with a greater disparity between the speed of its processor and the bandwidth to its backing store.

...read moreread less

Proceedings Article•DOI•

Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors

[...]

Fredrik Dahlgren¹, Michel Dubois², Per Stenström¹•Institutions (2)

Lund University¹, University of Southern California²

16 Aug 1993

TL;DR: This work proposes to adapt the number of pref etched blocks according to a dynamic measure of prefetching effectiveness, and shows significant reductions of the read penalty and of the overall execution time.

...read moreread less

Abstract: To offset the effect of read miss penalties on processor utilization in shared-memory multiprocessors, several software- and hardware-based data prefetching schemes have been proposed. A major advantage of hardware tech niques is that they need no support from the programmer or compiler. Sequential prefetching is a simple hardware-controlled prefetching technique which relies on the automatic prefetch of consecutive blocks following the block that misses in the cache. In its simplest form, the number of prefetched blocks on each miss is fixed throughout the exe cution. However, since the prefetching efficiency varies during the execution of a program, we propose to adapt the number of pref etched blocks according to a dynamic measure of prefetching effectiveness. Simulations of this adaptive scheme show significant reductions of the read penalty and of the overall execution time.

...read moreread less

Proceedings Article•DOI•

To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts

[...]

Olivier Temam¹, E. D. Granston¹, William Jalby•Institutions (1)

Leiden University¹

01 Dec 1993

TL;DR: Preliminary experimental results demonstrate that, because of the sensitivity of cache conflicts to small changes in problem size and base addresses, selective copying can lead to better overall performance than either no copying, complete copying, or copying based on manually applied heuristics.

...read moreread less

Abstract: To reduce conflict misses, this technique, the data layout in a cache is adjusted by copying array files into temporary arrays that exhibit better cache behavior. This approach incurs a cost proportional to the amount of data being copied. To date, there has been no discussion regarding either this tradeoff or the problem of determining what and when to copy. The authors present a compile-time technique for making this determination and present a selective copying strategy based on this methodology. Preliminary experimental results demonstrate that, because of the sensitivity of cache conflicts to small changes in problem size and base addresses, selective copying can lead to better overall performance than either no copying, complete copying, or copying based on manually applied heuristics.

...read moreread less

Proceedings Article•DOI•

A case for caching file objects inside internetworks

[...]

Peter B. Danzig¹, Richard S. Hall², Michael F. Schwartz²•Institutions (2)

University of Southern California¹, University of Colorado Boulder²

01 Oct 1993

TL;DR: Evidence is presented that several, judiciously placed file caches could reduce the volume of FTP traffic by 42%, and hence theVolume of all NSFNET backbone traffic by 21%, and if FTP client and server software automatically compressed data, this savings could increase to 27%.

...read moreread less

Abstract: This paper presents evidence that several, judiciously placed file caches could reduce the volume of FTP traffic by 42%, and hence the volume of all NSFNET backbone traffic by 21%. In addition, if FTP client and server software automatically compressed data, this savings could increase to 27%. We believe that a hierarchical architecture of whole file caches, modeled after the existing name server's caching architecture, could become a valuable part of any internet.We derived these conclusions by performing trace driven simulations of various file caching architectures, cache sizes, and replacement policies. We collected the traces of file transfer traffic employed in our simulations on a network that connects the NSFNET backbone to a large, regional network. This particular regional network is responsible for about 5 to 7% of NSFNET traffic.While this paper's analysis and discussion focus on caching for FTP file transfer, the proposed caching architecture applies to caching objects from other internetwork services.

...read moreread less

Journal Article•DOI•

A load-instruction unit for pipelined processors

[...]

Richard J. Eickemeyer¹, Stamatis Vassiliadis²•Institutions (2)

University of Rochester¹, IBM²

01 Jul 1993-Ibm Journal of Research and Development

TL;DR: It is shown, using trace-driven simulations, that the proposed mechanism, when incorporated in a design, may contribute to a significant increase in processor performance.

...read moreread less

Abstract: A special-purpose load unit is proposed as part of a processor design. The unit prefetches data from the cache by predicting the address of the data fetch in advance. This prefetch allows the cache access to take place early, in an otherwise unused cache cycle, eliminating one cycle from the load instruction. The prediction also allows the cache to prefetch data if they are not already in the cache. The cache-miss handling can be overlapped with other instruction execution. It is shown, using trace-driven simulations, that the proposed mechanism, when incorporated in a design, may contribute to a significant increase in processor performance. The paper also compares different prediction methods and describes a hardware implementation for the load unit.

...read moreread less

Proceedings Article•DOI•

The architecture of a fault-tolerant cached RAID controller

[...]

Jai Menon¹, Jim Cortney•Institutions (1)

IBM¹

01 May 1993

TL;DR: This work examines three alternatives for handling Fast Writes and describes a hierarchy of destage algorithms with increasing robustness to failures, which are compared against those that would be used by a disk controller employing mirroring.

...read moreread less

Abstract: RAID-5 arrays need 4 disk accesses to update a data block—2 to read old data and parity, and 2 to write new data and parity. Schemes previously proposed to improve the update performance of such arrays are the Log-Structured File System [10] and the Floating Parity Approach [6]. Here, we consider a third approach, called Fast Write, which eliminates disk time from the host response time to a write, by using a Non-Volatile Cache in the disk array controller. We examine three alternatives for handling Fast Writes and describe a hierarchy of destage algorithms with increasing robustness to failures. These destage algorithms are compared against those that would be used by a disk controller employing mirroring. We show that array controllers require considerably more (2 to 3 times more) bus bandwidth and memory bandwidth than do disk controllers that employ mirroring. So, array controllers that use parity are likely to be more expensive than controllers that do mirroring, though mirroring is more expensive when both controllers and disks are considered.

...read moreread less

Journal Article•DOI•

Replication algorithms in a remote caching architecture

[...]

Avraham Leff¹, Joel L. Wolf¹, Philip S. Yu¹•Institutions (1)

IBM¹

01 Nov 1993-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Performance of the distributed algorithms is found to be close to optimal, while that of the greedy algorithms is far from optimal.

...read moreread less

Abstract: Studies the cache performance in a remote caching architecture. The authors develop a set of distributed object replication policies that are designed to implement different optimization goals. Each site is responsible for local cache decisions, and modifies cache contents in response to decisions made by other sites. The authors use the optimal and greedy policies as upper and lower bounds, respectively, for performance in this environment. Critical system parameters are identified, and their effect on system performance studied. Performance of the distributed algorithms is found to be close to optimal, while that of the greedy algorithms is far from optimal. >

...read moreread less

Disconnected operation for AFS

[...]

L. B. Huston¹, Peter Honeyman¹•Institutions (1)

University of Michigan¹

02 Aug 1993

TL;DR: The system brings the benefits of contemporary distributed computing environments to mobile laptops, offering a fresh look at the potential for nomadic computing.

...read moreread less

Abstract: AFS plays a prominent role in our plans for a mobile workstation. The AFS client manages a cache of the most recently used files and directories. But even when the cache is hot, access to cached data frequently involves some communication with one or more file servers to maintain consistency guarantees. Without network access, cached data is soon rendered unavailable. We have modified the AFS cache manager to offer optimistic consistency guarantees when it can not communicate with a fileserver. When the client reestablishes a connection with the file server, it tries to propagate all file modifications to the server. If conflicts are detected, the replay agent notifies the user that manual resolution is needed. Our system brings the benefits of contemporary distributed computing environments to mobile laptops, offering a fresh look at the potential for nomadic computing.

...read moreread less

Journal Article•DOI•

Network subsystem design

[...]

Peter Druschel¹, Mark B. Abbott¹, Michael A. Pagels¹, Larry L. Peterson¹•Institutions (1)

University of Arizona¹

01 Jul 1993-IEEE Network

TL;DR: It is argued that the bandwidth of the CPU/memory data path on workstations will remain within the same order of magnitude as the network bandwidth delivered to the workstation, and it is essential that the number of times network data traverses theCPU/ memory data path be minimized.

...read moreread less

Abstract: It is argued that the bandwidth of the CPU/memory data path on workstations will remain within the same order of magnitude as the network bandwidth delivered to the workstation. This makes it essential that the number of times network data traverses the CPU/memory data path be minimized. Evidence which suggests that the cache cannot be expected to significantly reduce the number of data movements over this path is reviewed. Hardware and software techniques for avoiding the CPU/memory bottleneck are discussed. It is concluded that naively applying these techniques is not sufficient for achieving good application-to-application throughput; they must also be carefully integrated. Various techniques that can be integrated to provide a high bandwidth data path between I/O devices and application programs are outlined. >

...read moreread less

Patent•

Automatic cache flush with readable and writable cache tag memory

[...]

David Lin

26 Oct 1993

TL;DR: In this article, a chipset is provided which permits reading and writing to cache tag memory for testing purposes and for writing non-cacheable tags into tag RAM entries to effectively invalidate the corresponding cache data entries.

...read moreread less

Abstract: A chipset is provided which permits reading and writing to cache tag memory for testing purposes and for writing non-cacheable tags into tag RAM entries to effectively invalidate the corresponding cache data entries.

...read moreread less

Collapse