Top 395 papers published in the topic of Cache in 1990

Proceedings Article•DOI•

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

[...]

01 May 1990

TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.

...read moreread less

Abstract: Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on conventional caching techniques. This paper presents hardware techniques to improve the performance of caches.Miss caching places a small fully-associative cache between a cache and its refill path. Misses in the cache that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache. Small miss caches of 2 to 5 entries are shown to be very effective in removing mapping conflict misses in first-level direct-mapped caches.Victim caching is an improvement to miss caching that loads the small fully-associative cache with the victim of a miss and not the requested line. Small victim caches of 1 to 5 entries are even more effective at removing conflict misses than miss caching.Stream buffers prefetch cache lines starting at a cache miss address. The prefetched data is placed in the buffer and not in the cache. Stream buffers are useful in removing capacity and compulsory cache misses, as well as some instruction cache conflict misses. Stream buffers are more effective than previously investigated prefetch techniques at using the next slower level in the memory hierarchy when it is pipelined. An extension to the basic stream buffer, called multi-way stream buffers, is introduced. Multi-way stream buffers are useful for prefetching along multiple intertwined data reference streams.Together, victim caches and stream buffers reduce the miss rate of the first level in the cache hierarchy by a factor of two to three on a set of six large benchmarks.

...read moreread less

1,481 citations

Patent•

Flash EEprom system

[...]

Eliyahou Harari¹, Robert D. Norman¹, Sanjay Mehrotra¹•Institutions (1)

SanDisk¹

30 Mar 1990

TL;DR: In this paper, the authors proposed selective multiple sector erase, in which any combinations of Flash sectors may be erased together, and select sectors among the selected combination may also be de-selected during the erase operation.

...read moreread less

Abstract: A system of Flash EEprom memory chips with controlling circuits serves as non-volatile memory such as that provided by magnetic disk drives. Improvements include selective multiple sector erase, in which any combinations of Flash sectors may be erased together. Selective sectors among the selected combination may also be de-selected during the erase operation. Another improvement is the ability to remap and replace defective cells with substitute cells. The remapping is performed automatically as soon as a defective cell is detected. When the number of defects in a Flash sector becomes large, the whole sector is remapped. Yet another improvement is the use of a write cache to reduce the number of writes to the Flash EEprom memory, thereby minimizing the stress to the device from undergoing too many write/erase cycling.

...read moreread less

1,279 citations

Proceedings Article•DOI•

Efficient implementation of a BDD package

[...]

Karl S. Brace¹, Richard Rudell², Randal E. Bryant¹•Institutions (2)

Carnegie Mellon University¹, Synopsys²

24 Jun 1990

TL;DR: A package for manipulating Boolean functions based on the reduced, ordered, binary decision diagram (ROBDD) representation is described, based on an efficient implementation of the if-then-else (ITE) operator.

...read moreread less

Abstract: Efficient manipulation of Boolean functions is an important component of many computer-aided design tasks This paper describes a package for manipulating Boolean functions based on the reduced, ordered, binary decision diagram (ROBDD) representation The package is based on an efficient implementation of the if-then-else (ITE) operator A hash table is used to maintain a strong canonical form in the ROBDD, and memory use is improved by merging the hash table and the ROBDD into a hybrid data structure A memory function for the recursive ITE algorithm is implemented using a hash-based cache to decrease memory use Memory function efficiency is improved by using rules that detect when equivalent functions are computed The usefulness of the package is enhanced by an automatic and low-cost scheme for recycling memory Experimental results are given to demonstrate why various implementation trade-offs were made These results indicate that the package described here is significantly faster and more memory-efficient than other ROBDD implementations described in the literature

...read moreread less

1,252 citations

Proceedings Article•DOI•

Memory consistency and event ordering in scalable shared-memory multiprocessors

[...]

Kourosh Gharachorloo¹, Daniel E. Lenoski¹, James Laudon¹, Phillip B. Gibbons¹, Anoop Gupta¹, John L. Hennessy¹ - Show less +2 more•Institutions (1)

Stanford University¹

01 May 1990

TL;DR: A new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models is introduced and is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization.

...read moreread less

Abstract: Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the fast processors. Unless carefully controlled, such architectural optimizations can cause memory accesses to be executed in an order different from what the programmer expects. The set of allowable memory access orderings forms the memory consistency model or event ordering model for an architecture.This paper introduces a new model of memory consistency, called release consistency, that allows for more buffering and pipelining than previously proposed models. A framework for classifying shared accesses and reasoning about event ordering is developed. The release consistency model is shown to be equivalent to the sequential consistency model for parallel programs with sufficient synchronization. Possible performance gains from the less strict constraints of the release consistency model are explored. Finally, practical implementation issues are discussed, concentrating on issues relevant to scalable architectures.

...read moreread less

1,169 citations

Proceedings Article•DOI•

The Tera computer system

[...]

Robert Alverson¹, David Callahan¹, Daniel Cummings¹, Brian D. Koblenz¹, Allan Porterfield¹, Burton Smith¹ - Show less +2 more•Institutions (1)

Tera Computer Company¹

01 Jun 1990

TL;DR: The Tera architecture was designed with several goals in mind; it needed to be suitable for very high speed implementations, i.

...read moreread less

Abstract: The Tera architecture was designed with several ma jor goals in mind. First, it needed to be suitable for very high speed implementations, i. e., admit a short clock period and be scalable to many processors. This goal will be achieved; a maximum configuration of the first implementation of the architecture will have 256 processors, 512 memory units, 256 I/O cache units, 256 I/O processors, and 4096 interconnection network nodes and a clock period less than 3 nanoseconds. The abstract architecture is scalable essentially without limit (although a particular implementation is not, of course). The only requirement is that the number of instruction streams increase more rapidly than the number of physical processors. Although this means that speedup is sublinear in the number of instruction streams, it can still increase linearly with the number of physical pro cessors. The price/performance ratio of the system is unmatched, and puts Tera’s high performance within economic reach. Second, it was important that the architecture be applicable to a wide spectrum of problems. Programs that do not vectoriae well, perhaps because of a preponderance of scalar operations or too-frequent conditional branches, will execute efficiently as long as there is sufficient parallelism to keep the processors busy. Virtually any parallelism available in the total computational workload can be turned into speed, from operation level parallelism within program basic blocks to multiuser timeand space-sharing. The architecture

...read moreread less

797 citations

Journal Article•DOI•

A cache-based natural language model for speech recognition

[...]

Roland Kuhn¹, R. De Mori¹•Institutions (1)

McGill University¹

01 Jun 1990-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel kind of language model which reflects short-term patterns of word use by means of a cache component (analogous to cache memory in hardware terminology) is presented and contains a 3g-gram component of the traditional type.

...read moreread less

Abstract: Speech-recognition systems must often decide between competing ways of breaking up the acoustic input into strings of words. Since the possible strings may be acoustically similar, a language model is required; given a word string, the model returns its linguistic probability. Several Markov language models are discussed. A novel kind of language model which reflects short-term patterns of word use by means of a cache component (analogous to cache memory in hardware terminology) is presented. The model also contains a 3g-gram component of the traditional type. The combined model and a pure 3g-gram model were tested on samples drawn from the Lancaster-Oslo/Bergen (LOB) corpus of English text. The relative performance of the two models is examined, and suggestions for the future improvements are made. >

...read moreread less

555 citations

Journal Article•DOI•

APRIL: a processor architecture for multiprocessing

[...]

Anant Agarwal¹, Beng-Hong Lim¹, David M. Kranz¹, John Kubiatowicz¹•Institutions (1)

Massachusetts Institute of Technology¹

01 May 1990

TL;DR: The architecture of a rapid-context-switching processor called APRIL with support for fine-grain threads and synchronization is described and the scalability of a multiprocessor based on APRIL is explored using a performance model.

...read moreread less

Abstract: Processors in large-scale multiprocessors must be able to tolerate large communication latencies and synchronization delays. This paper describes the architecture of a rapid-context-switching processor called APRIL with support for fine-grain threads and synchronization. APRIL achieves high single-thread performance and supports virtual dynamic threads. A commercial RISC-based implementation of APRIL and a run-time software system that can switch contexts in about 10 cycles is described. Measurements taken for several parallel applications on an APRIL simulator show that the overhead for supporting parallel tasks based on futures is reduced by a factor of two over a corresponding implementation on the Encore Multimax. The scalability of a multiprocessor based on APRIL is explored using a performance model. We show that the SPARC-based implementation of APRIL can achieve close to 80% processor utilization with as few as three resident threads per processor in a large-scale cache-based machine with an average base network latency of 55 cycles.

...read moreread less

434 citations

Journal Article•DOI•

Data caching issues in an information retrieval system

[...]

Rafael Alonso¹, Daniel Barbará¹, Hector Garcia-Molina¹•Institutions (1)

Princeton University¹

01 Sep 1990-ACM Transactions on Database Systems

TL;DR: Using a user's local storage capabilities to cache data at the user's site would improve the response time of user queries albeit at the cost ofurring the overhead required in maintaining multiple copies.

...read moreread less

Abstract: Currently, a variety of information retrieval systems are available to potential users.… While in many cases these systems are accessed from personal computers, typically no advantage is taken of the computing resources of those machines (such as local processing and storage). In this paper we explore the possibility of using the user's local storage capabilities to cache data at the user's site. This would improve the response time of user queries albeit at the cost of incurring the overhead required in maintaining multiple copies. In order to reduce this overhead it may be appropriate to allow copies to diverge in a controlled fashion.… Thus, we introduce the notion of quasi-copies, which embodies the ideas sketched above. We also define the types of deviations that seem useful, and discuss the available implementation strategies.—From the Authors' Abstract

...read moreread less

345 citations

Journal Article•DOI•

Directory-based cache coherence in large-scale multiprocessors

[...]

David Chaiken¹, Craig Fields¹, K. Kurihara¹, Anant Agarwal¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jun 1990-IEEE Computer

TL;DR: In this article, the usefulness of shared data caches in large-scale multiprocessors, the relative merits of different coherence schemes, and system-level methods for improving directory efficiency are addressed.

...read moreread less

Abstract: The usefulness of shared-data caches in large-scale multiprocessors, the relative merits of different coherence schemes, and system-level methods for improving directory efficiency are addressed. The research presented is part of an effort to build a high-performance, large-scale multiprocessor. The various classes of cache directory schemes are described, and a method of measuring cache coherence is presented. The various directory schemes are analyzed, and ways of improving the performance of directories are considered. It is found that the best solutions to the cache-coherence problem result from a synergy between a multiprocessor's software and hardware components. >

...read moreread less

291 citations

Proceedings Article•DOI•

On rules, procedure, caching and views in data base systems

[...]

Michael Stonebraker¹, Anant Jhingran¹, Jeffrey Goh¹, Spyros Potamianos¹•Institutions (1)

University of California, Berkeley¹

01 May 1990

TL;DR: It is demonstrated that a simple rule system can be constructed that supports a more powerful view system than available in current commercial systems and that a rule system is a fundamental concept in a next generation DBMS, and it subsumes both views and procedures as special cases.

...read moreread less

Abstract: This paper demonstrates that a simple rule system can be constructed that supports a more powerful view system than available in current commercial systems. Not only can views be specified by using rules but also special semantics for resolving ambiguous view updates are simply additional rules. Moreover, procedural data types as proposed in POSTGRES are also efficiently simulated by the same rules system. Lastly, caching of the action part of certain rules is a possible performance enhancement and can be applied to materialize views as well as to cache procedural data items. Hence, we conclude that a rule system is a fundamental concept in a next generation DBMS, and it subsumes both views and procedures as special cases.

...read moreread less

168 citations

Journal Article•DOI•

Adaptive software cache management for distributed shared memory architectures

[...]

John K. Bennett¹, John B. Carter¹, Willy Zwaenepoel¹•Institutions (1)

Rice University¹

01 May 1990

TL;DR: This work has examined the sharing and synchronization behavior of a variety of shared memory parallel programs and found that the access patterns of a large percentage of shared data objects fall in a small number of categories for which efficient software coherence mechanisms exist.

...read moreread less

Abstract: An adaptive cache coherence mechanism exploits semantic information about the expected or observed access behavior of particular data objects. We contend that, in distributed shared memory systems, adaptive cache coherence mechanisms will outperform static cache coherence mechanisms. We have examined the sharing and synchronization behavior of a variety of shared memory parallel programs. We have found that the access patterns of a large percentage of shared data objects fall in a small number of categories for which efficient software coherence mechanisms exist. In addition, we have performed a simulation study that provides two examples of how an adaptive caching mechanism can take advantage of semantic information.

...read moreread less

Book•

Cache and memory hierarchy design: a performance-directed approach

[...]

Steven A. Przybylski

03 Jan 1990

TL;DR: This paper presents a meta-modelling architecture that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing a caching system.

...read moreread less

Abstract: 1. Introduction 2. Background Material 3. The Cache Design Problem and Its Solution 4. Performance-Directed Cache Design 5. Multi-Level Cache Hierarchies 6. Summary, Implications and Conclusions Appendix A. Validation of the Empirical Results Appendix B. Modelling Write Strategy Effects

...read moreread less

Proceedings Article•DOI•

Analysis of multithreaded architectures for parallel computing

[...]

R. Saavedra-Barrera¹, David E. Culler¹, T. von Eicken¹•Institutions (1)

University of California, Berkeley¹

01 May 1990

TL;DR: Prescriptive use of the model under various scenarios indicates that multithreading is effective, and an analytical models of multi threaded processor behavior based on a small set of architectural and program parameters are developed.

...read moreread less

Abstract: Multithreading has been proposed as an architectural strategy for tolerating latency on multiprocessors and, through limited empirical studies shows to offer promise. This paper develops an analytical models of multi threaded processor behavior based on a small set of architectural and program parameters. The model gives rise to a large Markov chain, which is solved to obtain a formula for processor in terms of the number of threads). transition, and saturation efficiency depends only on the remorse reference rate and switch case. Formulas for regime boundaries are derived. The model is embellished to reflect cache degradation due to multithreading, using an analytical model of cache behavior, demonstrating that returns diminish as the number threads becomes large. Predictions from the embellished model correlate will with published empirical measurements. Prescriptive use of the model under various scenarios indicates that multithreading is effective But the number of useful threads per processor is fairly small.

...read moreread less

Patent•

Data processing system and method with small fully-associative cache and prefetch buffers

[...]

Norman P. Jouppi, Alan Eustace

27 Mar 1990

TL;DR: In this article, the authors propose an extension to the basic stream buffer, called multi-way stream buffers (62), which is useful for prefetching along multiple intertwined data reference streams.

...read moreread less

Abstract: A memory system (10) utilizes miss caching by incorporating a small fully-associative miss cache (42) between a cache (18 or 20) and second-level cache (26). misses in the cache (18 or 20) that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache (42). Victim caching is an improvement to miss caching that loads a small, fully associative cache (52) with the victim of a miss and not the requested line. Small victim caches (52) of 1 to 4 entries are even more effective at removing conflict misses than miss caching. Stream buffers (62) prefetch cache lines starting at a cache miss address. The prefetched data is placed in the buffer (62) and not in the cache (18 or 20). Stream buffers (62) are useful in removing capacity and compulsory cache misses, as well as some instruction cache misses. Stream buffers (62) are more effective than previously investigated prefetch techniques when the next slower level in the memory hierarchy is pipelined. An extension to the basic stream buffer, called multi-way stream buffers (62), is useful for prefetching along multiple intertwined data reference streams.

...read moreread less

Patent•

Method and apparatus for independently resetting processors and cache controllers in multiple processor systems

[...]

David A. Miller, Kenneth A. Jansen, Paul R. Culley, Mark E. Taylor, Javier F. Izquierdo - Show less +1 more

24 Oct 1990

TL;DR: In this paper, a method and system for independently resetting primary and secondary processors 20 and 120 respectively under program control in a multiprocessor, cache memory system is presented.

...read moreread less

Abstract: A method and system for independently resetting primary and secondary processors 20 and 120 respectively under program control in a multiprocessor, cache memory system. Processors 20 and 120 are reset without causing cache memory controllers 24 and 124 to reset.

...read moreread less

Patent•

Data storage system with asynchronous host operating system communication link

[...]

Goodlander Theodore Jay, Raul Kacirek, András Sárközy, Tamas Hetenyi, Janos Selmeczi - Show less +1 more

17 Jul 1990

TL;DR: In this article, a disk drive access control apparatus for connection between a host computer and a plurality of disk drives to provide an asynchronously operating storage system is presented. But the disk drive controller channels are not connected to the disk drives and each of the channels includes a cache/buffer memory and a micro-processor unit.

...read moreread less

Abstract: This invention provides disk drive access control apparatus for connection between a host computer and a plurality of disk drives to provide an asynchronously operating storage system. It also provides increases in performance over earlier versions thereof. There are a plurality of disk drive controller channels connected to respective ones of the disk drives and controlling transfers of data to and from the disk drives, each of the disk drive controller channels includes a cache/buffer memory and a micro-processor unit. An interface and driver unit interfaces with the host computer and there is a central cache memory. Cache memory control logic controls transfers of data from the cache/buffer memory of the plurality of disk drive controller channels to the cache memory and from the cache memory to the cache/buffer memory of the plurality of disk drive controller channels and from the cache memory to the host computer through the interface and driver unit. A central processing unit manages the use of the cache memory by requesting data transfers only of data not presently in the cache memory and by sending high level commands to the disk drive controller channels. A first (data) bus interconnects the plurality of disk drive cache/buffer memories, the interface and driver unit, and the cache memory for the transfer of information therebetween and a second (information and commands) bus interconnects the same elements with the central processing unit for the transfer of control and information therebetween.

...read moreread less

Journal Article•DOI•

An empirical evaluation of two memory-efficient directory methods

[...]

Brian W. O'Krafka¹, A. Richard Newton¹•Institutions (1)

University of California, Berkeley¹

01 May 1990

TL;DR: Simulations show that in terms of access time and network traffic both directory methods provide significant performance improvements over a memory system in which shared-writeable data is not cached.

...read moreread less

Abstract: This paper presents an empirical evaluation of two memory-efficient directory methods for maintaining coherent caches in large shared memory multiprocessors. Both directory methods are modifications of a scheme proposed by Censier and Feautrier [5] that does not rely on a specific interconnection network and can be readily distributed across interleaved main memory. The schemes considered here overcome the large amount of memory required for tags in the original scheme in two different ways. In the first scheme each main memory block is sectored into sub-blocks for which the large tag overhead is shared. In the second scheme a limited number of large tags are stored in an associative cache and shared among a much larger number of main memory blocks. Simulations show that in terms of access time and network traffic both directory methods provide significant performance improvements over a memory system in which shared-writeable data is not cached. The large block sizes required for the sectored scheme, however, promotes sufficient false sharing that its performance is markedly worse than using a tag cache.

...read moreread less

Patent•

Method for optimizing computer code to provide more efficient execution on computers having cache memories

[...]

Karl William Pettis¹, Robert Craig Hansen¹•Institutions (1)

Hewlett-Packard¹

01 Jun 1990

TL;DR: In this paper, the authors use statistical information obtained by running the computer code with test data to determine a new ordering for the code blocks, which places code blocks that are often executed after one another close to one another in the computer's memory.

...read moreread less

Abstract: The method uses statistical information obtained by running the computer code with test data to determine a new ordering for the code blocks. The new order places code blocks that are often executed after one another close to one another in the computer's memory. The method first generates chains of basic blocks, and then merges the chains. Finally, basic blocks that were not executed by the test data that was used to generate the statistical information are moved to a distant location to allow the blocks that were used to be more closely grouped together.

...read moreread less

Proceedings Article•

Maintaining Consistency of Client-Cached Data

[...]

W. Kevin Wilkinson, Marie-Anne Neimat

13 Aug 1990

TL;DR: A new cache consistency algorithm for client caches is proposed, a simple extension to twophase locking and consists of three additional lock modes that must be supported by the server lock manager that can significantly improve server performance over basic two-chase loins.

...read moreread less

Abstract: This paper addresses the problem of cache consistency in a client-server database environment. We assume the server provides shared database access for multiple client workstations and that client workstations may cache a portion of the database. Our primary goal is to investigate techniques to maintain the consistency of the client cache and to improve server throughput. We propose a new cache consistency algorithm for client caches. The algorithm is a simple extension to twophase locking and consists of three additional lock modes that must be supported by the server lock manager. For comparison, we devised a second cache consistency algorithm based on notify locks. A simulation model was developed to analyze the performance of the server under the two cache consistency algorithms and under noncaching two-phase locking. The results show that both consistency algorithms can significantly improve server performance over basic two-chase lo&ins. The notifv locks algorithm? at times, ou

...read moreread less

Journal Article•DOI•

Error recovery in shared memory multiprocessors using private caches

[...]

Kun-Lung Wu¹, W.K. Fuchs¹, Janak H. Patel¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Apr 1990-IEEE Transactions on Parallel and Distributed Systems

TL;DR: The problem of recovering from processor transient faults in shared memory multiprocessor systems is examined and a user-transparent checkpointing and recovery scheme using private caches is presented, which prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols.

...read moreread less

Abstract: The problem of recovering from processor transient faults in shared memory multiprocessor systems is examined. A user-transparent checkpointing and recovery scheme using private caches is presented. Processes can recover from errors due to faulty processors by restarting from the checkpointed computation state. Implementation techniques using checkpoint identifiers and recovery stacks are examined as a means of reducing performance degradation in processor utilization during normal execution. This cache-based checkpointing technique prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions to take error latency into account are presented. >

...read moreread less

Patent•

Method of maintaining consistency of cached data in a database system

[...]

Kevin Wilkinson¹, Marie-Anne Neimat¹•Institutions (1)

Hewlett-Packard¹

13 Aug 1990

TL;DR: In this article, a cache lock, a pending lock, and an out-of-date lock are added to a two-lock concurrency control system to maintain the consistency of cached data in a clientserver database system.

...read moreread less

Abstract: A method of maintaining the consistency of cached data in a client-server database system. Three new locks--a cache lock, a pending lock and an out-of-date lock--are added to a two-lock concurrency control system. A new long-running envelope transaction holds a cache lock on each object cached by a given client. A working transaction of the client works only with the cached object until commit time. If a second client's working transaction acquires an "X" lock on the object the cache lock is changed to a pending lock; if the transaction thereafter commits the pending lock is changed to an out-of-date lock. If the first client's working transaction thereafter attempts to commit, it waits for a pending lock to change; it aborts if it encounters an out-of-date lock; and otherwise it commits.

...read moreread less

Journal Article•DOI•

The TLB slice-a low-cost high-speed address translation mechanism

[...]

George S. Taylor, Peter Davies, Michael P. Farmwald

01 May 1990

TL;DR: If the cache is multi-level and references to the TLB slice are “shielded” by hits in a virtually indexed primary cache, the slice can get by with very few entries, once again lowering its cost and increasing its speed.

...read moreread less

Abstract: The MIPS R6000 microprocessor relies on a new type of translation lookaside buffer — called a TLB slice — which is less than one-tenth the size of a conventional TLB and as fast as one multiplexer delay, yet has a high enough hit rate to be practical. The fast translation makes it possible to use a physical cache without adding a translation stage to the processor's pipeline. The small size makes it possible to include address translation on-chip, even in a technology with a limited number of devices.The key idea behind the TLB slice is to have both a virtual tag and a physical tag on a physically-indexed cache. Because of the virtual tag, the TLB slice needs to hold only enough physical page number bits — typically 4 to 8 — to complete the physical cache index, in contrast with a conventional TLB, which needs to hold both a virtual page number and a physical page number. The virtual page number is unnecessary because the TLB slice needs to provide only a hint for the translated physical address rather than a guarantee. The full physical page number is unnecessary because the cache hit logic is based on the virtual tag. Furthermore, if the cache is multi-level and references to the TLB slice are “shielded” by hits in a virtually indexed primary cache, the slice can get by with very few entries, once again lowering its cost and increasing its speed. With this mechanism, the simplicity of a physical cache can been combined with the speed of a virtual cache.

...read moreread less

Patent•

Multiprocessor system includes operating system for notifying only those cache managers who are holders of shared locks on a designated page by global lock manager

[...]

Robert Baird¹, Gerald Parks Bozman¹, Alexander Stafford Lett¹, James Joseph Myers¹, William H. Tetzlaff¹ - Show less +1 more•Institutions (1)

IBM¹

14 Mar 1990

TL;DR: In this paper, a conditional broadcast or notification facility of a global lock manager is utilized to both serialize access to pages stored in local caches of counterpart processors in a distributed system and to ensure consistency among pages common to the caches.

...read moreread less

Abstract: A conditional broadcast or notification facility of a global lock manager is utilized to both serialize access to pages stored in local caches of counterpart processors in a distributed system and to ensure consistency among pages common to the caches Exclusive use locks are obtained in advance of all write operations When a page is to be updated, which page is cached in a processor other than that of the requester, then a delay is posed to the grant of the exclusive lock, all shared use lock holders to the same page notified, local copies are invalidated, exclusive lock granted, page is updated and written through cache, after which the lock is demoted to shared use

...read moreread less

Patent•

Instruction storage method with a compressed format using a mask word

[...]

Robert P. Colwell, John O'Donnell, David B. Papworth, Paul Rodman

30 Jan 1990

TL;DR: In this article, a method and apparatus for storing an instruction word in a compacted form on a storage media, the instruction word having a plurality of instruction fields, features associating with each instruction word, a mask word having length in bits at least equal to the number of instructions in the instruction words, is presented.

...read moreread less

Abstract: A method and apparatus for storing an instruction word in a compacted form on a storage media, the instruction word having a plurality of instruction fields, features associating with each instruction word, a mask word having a length in bits at least equal to the number of instruction fields in the instruction word. Each instruction field is associated with a bit of the mask word and accordingly, using the mask word, only non-zero instruction fields need to be stored in memory. The instruction compaction method is advantageously used in a high speed cache miss engine for refilling portions of instruction cache after a cache miss occurs.

...read moreread less

Patent•

Configurable set associative cache with decoded data element enable lines

[...]

James A. Farrell, Richard L. Sites

10 May 1990

TL;DR: In this article, a set associative cache using decoded data element select lines which can be selectively configured to provide different data sets arrangements is presented. But the cache is limited to the maximum possible number of sets.

...read moreread less

Abstract: A set associative cache using decoded data element select lines which can be selectively configured to provide different data sets arrangements. The cache includes a tag array, a number of tag comparators corresponding to the maximum possible number of sets, a data element select logic circuit, and a data array. The tag and data arrays each provide, in response to an input address, a number of output tag and data elements, respectively. The number of output tag and data elements depends upon the maximum set size desired for the cache. An input main memory address is used to address both the tag and data arrays. The tag comparators compare a tag field portion of the input main memory address to each element output from the tag array. The select logic then uses the outputs of the tag comparators and one or more of the input main memory address bits to generate decoded data array enable signals. The decoded enable signals are then coupled to enable the desired one of the enabled data elements.

...read moreread less

Proceedings Article•DOI•

Cache-aided rollback error recovery (CARER) algorithm for shared-memory multiprocessor systems

[...]

Rana Ejaz Ahmed¹, R.C. Frazier¹, P.N. Marinos¹•Institutions (1)

Duke University¹

26 Jun 1990

TL;DR: Three cache-aided error-recovery algorithms for use in shared-memory multiprocessor systems are presented and the results of a tradeoff analysis are presented.

...read moreread less

Abstract: Three cache-aided error-recovery algorithms for use in shared-memory multiprocessor systems are presented. They rely on hardware and specially designed cache memory for all their soft error management operations and can be easily incorporated into existing cache-coherence protocols. An example illustrating their use in a multiprocessor system employing Dragon as its cache-coherence protocol is given, and the results of a tradeoff analysis are presented. >

...read moreread less

Journal Article•DOI•

The performance impact of block sizes and fetch strategies

[...]

Steven Przybylski

01 May 1990

TL;DR: This paper explores the interactions between a cache's block size, fetch size and fetch policy from the perspective of maximizing system-level performance and finds the most effective fetch strategy improved performance by between 1.7% and 4.5%.

...read moreread less

Abstract: This paper explores the interactions between a cache's block size, fetch size and fetch policy from the perspective of maximizing system-level performance. It has been previously noted that given a simple fetch strategy the performance optimal block size is almost always four or eight words [10]. If there is even a small cycle time penalty associated with either longer blocks or fetches, then the performance-optimal size is noticeably reduced. In split cache organizations, where the fetch and block sizes of instruction and data caches are all independent design variables, instruction cache block size and fetch size should be the same. For the workload and write-back write policy used in this trace-driven simulation study, the instruction cache block size should be about a factor of two greater than the data cache fetch size, which in turn should equal to or double the data cache block size. The simplest fetch strategy of fetching only on a miss and stalling the CPU until the fetch is complete works well. Complicated fetch strategies do not produce the performance improvements indicated by the accompanying reductions in miss ratios because of limited memory resources and a strong temporal clustering of cache misses. For the environments simulated here, the most effective fetch strategy improved performance by between 1.7% and 4.5% over the simplest strategy described above.

...read moreread less

Patent•

Integrated cache memory system with primary and secondary cache memories

[...]

Dennis L. Segers

21 Nov 1990

TL;DR: In this article, the data is stored in an appropriate one of the arrays (88)-(94) and transferred through the primary cache (26) via transfer circuits (96), (98), (100) and (102) to the data bus (32).

...read moreread less

Abstract: A central processing unit (10) has a cache memory system (24) associated therewith for interfacing with a main memory system (23). The cache memory system (24) includes a primary cache (26) comprised of SRAMS and a secondary cache (28) comprised of DRAM. The primary cache (26) has a faster access than the secondary cache (28). When it is determined that the requested data is stored in the primary cache (26) it is transferred immediately to the central processing unit (10). When it is determined that the data resides only in the secondary cache (28), the data is accessed therefrom and routed to the central processing unit (10) and simultaneously stored in the primary cache (26). If a hit occurs in the primary cache (26), it is accessed and output to a local data bus (32). If only the secondary cache (28) indicates a hit, data is accessed from the appropriate one of the arrays (80)-(86) and transferred through the primary cache ( 26) via transfer circuits (96), (98), (100) and (102) to the data bus (32). Simultaneously therewith, the data is stored in an appropriate one of the arrays (88)-(94). When a hit does not occur in either the secondary cache (28) or the primary cache (26), data is retrieved from the main system memory (23) through a buffer/multiplexer circuit on one side of the secondary cache (28) and passed through both the secondary cache (28) and the primary cache (26) and stored therein in a single operation due to the line for line transfer provided by the transfer circuits (96)-(102).

...read moreread less

Patent•

Cache contained type semiconductor memory device and operating method therefor

[...]

Kazuyasu Fujishima¹, Charles A Hart¹•Institutions (1)

Mitsubishi¹

14 Jun 1990

TL;DR: In this article, a dynamic random access memory with a fast serial access mode for use in a simple cache system includes a plurality of memory cell blocks prepared by division of a memory cell array.

...read moreread less

Abstract: A dynamic random access memory with a fast serial access mode for use in a simple cache system includes a plurality of memory cell blocks prepared by division of a memory cell array, a plurality of data latches each provided for each column in the memory cell blocks and a block selector. When a cache miss signal is produced by the cache system, data on the column in the cell block selected by the block decoder are transferred into the data latches provided for the columns in the selected block after selection. When a cache hit signal is produced by the cache system, the data latches are isolated from the memory cell array. Accessing is made to at least one of the data latches based on an externally applied column address on cache hit, and to at least one of the columns in the selected block based on the column address on cache miss.

...read moreread less

Patent•

Method and apparatus for ordering and queueing multiple memory access requests

[...]

David A. Webb, John E. Murray, Ricky C. Hetherington, Tryggve Fossum, Dwight P. Manley - Show less +1 more

29 Jan 1990

TL;DR: In this article, the authors propose a prioritization scheme based on the operational proximity of the request to the instruction currently being executed, which temporarily suspends the higher priority request while the desired data is being retrieved from main memory 14, but continues to operate on a lower priority request so that the overall operation will be enhanced if the lower priority requests hits in the cache 28.

...read moreread less

Abstract: In a pipelined computer system, memory access functions are simultaneously generated from a plurality of different locations. These multiple requests are passed through a multiplexer 50 according to a prioritization scheme based upon the operational proximity of the request to the instruction currently being executed. In this manner, the complex task of converting virtual-to-physical addresses is accomplished for all memory access requests by a single translation buffer 30. The physical addresses resulting from the translation buffer 30 are passed to a cache 28 of the main memory 14 through a second multiplexer 40 according to a second prioritization scheme based upon the operational proximity of the request to the instruction currently being executed. The first and second prioritization schemes differ in that the memory is capable of handling other requests while a higher priority "miss" is pending. Thus, the prioritization scheme temporarily suspends the higher priority request while the desired data is being retrieved from main memory 14, but continues to operate on a lower priority request so that the overall operation will be enhanced if the lower priority request hits in the cache 28.

...read moreread less

Showing papers on "Cache published in 1990"