scispace - formally typeset
Search or ask a question

Showing papers on "Cache coloring published in 1987"


Journal ArticleDOI
TL;DR: In this article, the authors examined the cache miss ratio as a function of line size, and found that for high performance microprocessor designs, line sizes in the range 16-64 bytes seem best; shorter line sizes yield high delays due to memory latency, although they reduce memory traffic somewhat.
Abstract: The line (block) size of a cache memory is one of the parameters that most strongly affects cache performance. In this paper, we study the factors that relate to the selection of a cache line size. Our primary focus is on the cache miss ratio, but we also consider influences such as logic complexity, address tags, line crossers, I/O overruns, etc. The behavior of the cache miss ratio as a function of line size is examined carefully through the use of trace driven simulation, using 27 traces from five different machine architectures. The change in cache miss ratio as the line size varies is found to be relatively stable across workloads, and tables of this function are presented for instruction caches, data caches, and unified caches. An empirical mathematical fit is obtained. This function is used to extend previously published design target miss ratios to cover line sizes from 4 to 128 bytes and cache sizes from 32 bytes to 32K bytes; design target miss ratios are to be used to guide new machine designs. Mean delays per memory reference and memory (bus) traffic rates are computed as a function of line and cache size, and memory access time parameters. We find that for high performance microprocessor designs, line sizes in the range 16-64 bytes seem best; shorter line sizes yield high delays due to memory latency, although they reduce memory traffic somewhat. Longer line sizes are suitable for mainframes because of the higher bandwidth to main memory.

180 citations


BookDOI
01 Jan 1987
TL;DR: This work focuses on the development of a Analytical Cache Model for Multiprogramming Cache Performance, which automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and analyzing caches.
Abstract: 1 Introduction.- 1.1 Overview of Cache Design.- 1.1.1 Cache Parameters.- 1.1.2 Cache Performance Evaluation Methodology.- 1.2 Review of Past Work.- 1.3 Then, Why This Research?.- 1.3.1 Accurately Characterizing Large Cache Performance.- 1.3.2 Obtaining Trace Data for Cache Analysis.- 1.3.3 Developing Efficient and Accurate Cache Analysis Methods.- 1.4 Contributions.- 1.5 Organization.- 2 Obtaining Accurate Trace Data.- 2.1 Current Tracing Techniques.- 2.2 Tracing Using Microcode.- 2.3 An Experimental Implementation.- 2.3.1 Storage of Trace Data.- 2.3.2 Recording Memory References.- 2.3.3 Tracing Control.- 2.4 Trace Description.- 2.5 Applications in Performance Evaluation.- 2.6 Extensions and Summary.- 3 Cache Analyses Techniques - An Analytical Cache Model.- 3.1 Motivation and Overview.- 3.1.1 The Case for the Analytical Cache Model.- 3.1.2 Overview of the Model.- 3.2 A Basic Cache Model.- 3.2.1 Start-Up Effects.- 3.2.2 Non-Stationary Effects.- 3.2.3 Intrinsic Interference.- 3.3 A Comprehensive Cache Model.- 3.3.1 Set Size.- 3.3.2 Modeling Spatial Locality and the Effect of Block Size.- 3.3.3 Multiprogramming.- 3.4 Model Validation and Applications.- 3.5 Summary.- 4 Transient Cache Analysis - Trace Sampling and Trace Stitching.- 4.1 Introduction.- 4.2 Transient Behavior Analysis and Trace Sampling.- 4.2.1 Definitions.- 4.2.2 Analysis of Start-up Effects in Single Process Traces.- 4.2.3 Start-up Effects in Multiprocess Traces.- 4.3 Obtaining Longer Samples Using Trace Stitching.- 4.4 Trace Compaction - Cache Filtering with Blocking.- 4.4.1 Cache Filter.- 4.4.2 Block Filter.- 4.4.3 Implementation of the Cache and Block Filters.- 4.4.4 Miss Rate Estimation.- 4.4.5 Compaction Results.- 5 Cache Performance Analysis for System References.- 5.1 Motivation.- 5.2 Analysis of the Miss Rate Components due to System References.- 5.3 Analysis of System Miss Rate.- 5.4 Associativity.- 5.5 Block Size.- 5.6 Evaluation of Split Caches.- 6 Impact of Multiprogramming on Cache Performance.- 6.1 Relative Performance of Multiprogramming Cache Techniques.- 6.2 More on Warm Start versus Cold Start.- 6.3 Impact of Shared System Code on Multitasking Cache Performance.- 6.4 Process Switch Statistics and Their Effects on Cache ModeUng.- 6.5 Associativity.- 6.6 Block Size.- 6.7 Improving the Multiprogramming Performance of Caches.- 6.7.1 Hashing.- 6.7.2 A Hash-Rehash Cache.- 6.7.3 Split Caches.- 7 Multiprocessor Cache Analysis.- 7.1 Tracing Multiprocessors.- 7.2 Characteristics of Traces.- 7.3 Analysis.- 7.3.1 General Methodology.- 7.3.2 Multiprocess Interference in Large Virtual and Physical Caches.- 7.3.3 Analysis of Interference Between Multiple Processors.- 7.3.4 Blocks Containing Semaphores.- 8 Conclusions and Suggestions for Future Work.- 8.1 Concluding Remarks.- 8.2 Suggestions for Future Work.- Appendices.- B.1 On the Stability of the Collision Rate.- B.2 Estimating Variations in the Collision Rate.- C Inter-Run Intervals and Spatial Locality.- D Summary of Benchmark Characteristics.- E Features of ATUM-2.- E.1 Distributing Trace Control to All Processors.- E.2 Provision of Atomic Accesses to Trace Memory.- E.3 Instruction Stream Compaction Using a Cache Simulated in Microcode.- E.4 Microcode Patch Space Conservation.

118 citations


Journal ArticleDOI
TL;DR: The authors provide an overview of MIPS-X, focusing on the techniques used to reduce the complexity of the processor and implement the on-chip instruction cache.
Abstract: MIPS-X is a 32-b RISC microprocessor implemented in a conservative 2-/spl mu/m, two-level-metal, n-well CMOS technology. High performance is achieved by using a nonoverlapping two-phase 20-MHz clock and executing one instruction every cycle. To reduce its memory bandwidth requirements, MIPS-X includes a 2-kbyte on-chip instruction cache. The authors provide an overview of MIPS-X, focusing on the techniques used to reduce the complexity of the processor and implement the on-chip instruction cache.

98 citations


Journal ArticleDOI
Douglas B. Terry1
TL;DR: A new approach to managing caches of hints suggests maintaining a minimum level of cache accuracy, rather than maximizing the cache hit ratio, in order to guarantee performance improvements.
Abstract: Caching reduces the average cost of retrieving data by amortizing the lookup cost over several references to the data Problems with maintaining strong cache consistency in a distributed system can be avoided by treating cached information as hints A new approach to managing caches of hints suggests maintaining a minimum level of cache accuracy, rather than maximizing the cache hit ratio, in order to guarantee performance improvements The desired accuracy is based on the ratio of lookup costs to the costs of detecting and recovering from invalid cache entries Cache entries are aged so that they get purged when their estimated accuracy falls below the desired level The age thresholds are dictated solely by clients' accuracy requirements instead of being suggested by data storage servers or system administrators

83 citations


Journal ArticleDOI
01 Oct 1987
TL;DR: In this article, a multiprocessor cache memory system is described that supplies data to the processor based on virtual addresses, but maintains consistency in the main memory, both across caches and across virtual address spaces.
Abstract: A multiprocessor cache memory system is described that supplies data to the processor based on virtual addresses, but maintains consistency in the main memory, both across caches and across virtual address spaces. Pages in the same or different address spaces may be mapped to share a single physical page. The same hardware is used for maintaining consistency both among caches and among virtual addresses. Three different notions of a cache "block" are defined: (1) the unit for transferring data to/from main storage, (2) the unit over which tag information is maintained, and (3) the unit over which consistency is maintained. The relation among these block sizes is explored, and it is shown that they can be optimized independently. It is shown that the use of large address blocks results in low overhead for the virtual address cache.

83 citations


Patent
22 Apr 1987
TL;DR: In this article, the memory control subsystem controls and arbitrates the access to a memory 10 which is shared by a plurality of users comprising at least a processor 2 with its cache and input/output devices 4 having direct access to the memory through a direct memory access bus 12.
Abstract: The memory control subsystem controls and arbitrates the access to a memory 10 which is shared by a plurality of users comprising at least a processor 2 with its cache and input/output devices 4 having direct access to the memory through a direct memory access bus 12. It comprises a processor controller 20, a DMA controller 22 and a memory controller 24. A processor request is buffered into the processor controller 20 and is serviced right away if the memory controller is available, possibly with a simulataneous transfer between the devices 4 and buffers in the DMA controller 22. If the memory controller 24 is busy, because a DMA request is being serviced, the DMA controller comprises means to cause the DMA transfer to be interrupted, the processor request to be serviced and the DMA transfer to be resumed afterwards. Write requests made by the processor are buffered into processor controller 20 and an acknowledgement signal is sent to the processor which can resume execution without waiting the memory update completion. A read request which does not hit the cache is sent to the processor controller which causes the cache to be updated. In case of multiple processor requests contending with a long DMA transfer, the latter is sliced into several parts, each part mapping one cache line. In case of a DMA write, the cache lines which correspond to memory positions whose content is modified by the write operation are invalidated in such a way that the processor cannot read a partially written line into the cache.

76 citations


Patent
02 Dec 1987
TL;DR: In this paper, a broadband branch history table is organized by cache line, which determines from the history of branches the next cache line to be referenced and uses that information for prefetching lines into the cache.
Abstract: Apparatus for fetching instructions in a computing system. A broadband branch history table is organized by cache line. The broadband branch history table determines from the history of branches the next cache line to be referenced and uses that information for prefetching lines into the cache.

71 citations


Patent
27 Mar 1987
TL;DR: In this paper, the cache coherence system detects when the contents of storage locations in the cache memories of the one or more of the data processors have been modified in conjuction with the activity those data processors and is responsive to such detections to generate and store in its cache invalidate table (CIT) memory a multiple element linked list.
Abstract: A cache coherence system for a multiprocessor system including a plurality of data processors coupled to a common main memory. Each of the data processors includes an associated cache memory having storage locations therein corresponding to storage locations in the main memory. The cache coherence system for a data processor includes a cache invalidate table (CIT) memory having internal storage locations corresponding to locations in the cache memory of the data processor. The cache coherence system detects when the contents of storage locations in the cache memories of the one or more of the data processors have been modified in conjuction with the activity those data processors and is responsive to such detections to generate and store in its CIT memory a multiple element linked list defining the locations in the cache memories of the data processors having modified contents. Each element of the list defines one of those cache storage locations and also identifies the location in the CIT memory of the next element in the list.

69 citations


Proceedings ArticleDOI
J. H. Chang1, H. Chao1, K. So1
01 Jun 1987
TL;DR: An innovative cache accessing scheme based on high MRU (most recently used) hit ratio is proposed for the design of a one-cycle cache in a CMOS implementation of System/370 and it is shown that with this scheme the cache access time is reduced, and the performance is within 4% of a true one- cycle cache.
Abstract: An innovative cache accessing scheme based on high MRU (most recently used) hit ratio [1] is proposed for the design of a one-cycle cache in a CMOS implementation of System/370. It is shown that with this scheme the cache access time is reduced by 30 ~ 35% and the performance is within 4% of a true one-cycle cache. This cache scheme is proposed to be used in a VLSI System/370, which is organized to achieve high performance by taking advantage of the performance and integration level of an advanced CMOS technology with half-micron channel length [2]. Decisions on the system partition are based on technology limitations, performance considerations and future extendability. Design decisions on various aspects of the cache organization are based on trace simulations for both UP (uniprocessor) and MP (multiprocessor) configurations.

68 citations


Patent
15 Sep 1987
TL;DR: In this paper, a mechanism for determining when the contents of a block in a cache memory have been rendered stale by DMA activity external to a processor and for marking the block stale in response to a positive determination is proposed.
Abstract: A mechanism for determining when the contents of a block in a cache memory have been rendered stale by DMA activity external to a processor and for marking the block stale in response to a positive determination. The commanding unit in the DMA transfer, prior to transmitting an address, asserts a cache control signal which conditions the processor to receive the address and determine whether there is a correspondence to the contents of the cache. If there is a correspondence, the processor marks the contents of that cache location for which there is a correspondence stale.

63 citations


Patent
30 Oct 1987
TL;DR: In this article, the main memory is accessed during its row address strobe (RAS) precharge time while simultaneously accessing the cache memory, reducing the time necessary for the processor unit (PU) to read the next instruction when not stored in cache memory.
Abstract: A data processing system includes a high speed buffer, or cache, memory for temporarily storing recently executed instructions and a slower main memory in which is stored the system's operating program. Rather than sequentially accessing the cache memory to determine if the next instruction is stored therein and then accessing the main memory if the cache memory does not have the next instruction, system operating speed is increased by simultaneously accessing the cache and main memories. By accessing the main memory during its row address strobe (RAS) precharge time while simultaneously accessing the cache memory, the time necessary for the system's processor unit (PU) to read the next instruction from the main memory when not stored in the cache memory is substantially reduced.

Dissertation
01 Jul 1987
TL;DR: This dissertation explores possible solutions to the cache coherence problem and identifies Cache coherence protocols--solutions implemented entirely in hardware--as an attractive alternative.
Abstract: Shared-memory multiprocessors offer increased computational power and the programmability of the shared-memory model However, sharing memory between processors leads to contention which delays memory accesses Adding a cache memory for each processor reduces the average access time, but it creates the possibility of inconsistency among cached copies The cache coherence problem is keeping all cached copies of the same memory location identical This dissertation explores possible solutions to the cache coherence problem and identifies cache coherence protocols--solutions implemented entirely in hardware--as an attractive alternative Protocols for shared-bus systems are shown to be an interesting special case Previously proposed shared-bus protocols are described using uniform terminology, and they are shown to divide into two categories: invalidation and distributed write In invalidation protocols all other cached copies must be invalidated before any copy can be changed; in distributed write protocols all copies must be updated each time a shared block is modified In each category, a new protocol is presented with better performance than previous schemes, based on simulation results The simulation model and parameters are described in detail Previous protocols for general interconnection networks are shown to contain flaws and to be costly to implement A new class of protocols is presented that offers reduced implementation cost and expandability, while retaining a high level of performance, as illustrated by simulation results using a crossbar switch All new protocols have been proven correct; one of the proofs is included Previous definitions of cache coherence are shown to be inadequate and a new definition is presented Coherence is compared and contrasted with other levels of consistency, which are also identified The consistency of shared-bus protocols is shown to be naturally stronger than that of non-bus protocols The first protocol of its kind is presented for a large hierarchical multiprocessor, using a bus-based protocol within each cluster and a general protocol in the network connecting the clusters to the shared main memory

Proceedings ArticleDOI
01 Jun 1987
TL;DR: In this paper, cache design is explored for large high-performance multiprocessors with hundreds or thousands of processors and memory modules interconnected by a pipe-lined multi-stage network and it is shown that the optimal cache block size in such multiprocessionors is much smaller than in many uniprocessor.
Abstract: In this paper, cache design is explored for large high-performance multiprocessors with hundreds or thousands of processors and memory modules interconnected by a pipe-lined multi-stage network. The majority of the multiprocessor cache studies in the literature exclusively focus on the issue of cache coherence enforcement. However, there are other characteristics unique to such multiprocessors which create an environment for cache performance that is very different from that of many uniprocessors.Multiprocessor conditions are identified and modeled, including, 1) the cost of a cache coherence enforcement scheme, 2) the effect of a high degree of overlap between cache miss services, 3) the cost of a pin limited data path between shared memory and caches, 4) the effect of a high degree of data prefetching, 5) the program behavior of a scientific workload as represented by 23 numerical subroutines, and 6) the parallel execution of programs. This model is used to show that the cache miss ratio is not a suitable performance measure in the multiprocessors of interest and to show that the optimal cache block size in such multiprocessors is much smaller than in many uniprocessors.

Patent
02 Oct 1987
TL;DR: In this article, a solid-state cache memory subsystem configured to be used in conjunction with disk drives for prestaging of data in advance of its being called for by a host computer features a controller featuring means for establishing and maintaining precise correspondence between storage locations in the solid state array and on the disk memory.
Abstract: A solid-state cache memory subsystem configured to be used in conjunction with disk drives for prestaging of data in advance of its being called for by a host computer features a controller featuring means for establishing and maintaining precise correspondence between storage locations in the solid-state array and on the disk memory, for use in establishing a reoriented position on a disk in the event of error detection, and in order to determine when a predetermined quantity of data has been read from the disk into the cache in a staging operation.

Patent
28 May 1987
TL;DR: In this article, a cache memory subsystem has multilevel directory memory and buffer memory pipeline stages shared by at least a pair of independently operated central processing units and a first in first out (FIFO) device which connects to a system bus of a tightly coupled data processing system.
Abstract: A cache memory subsystem has multilevel directory memory and buffer memory pipeline stages shared by at least a pair of independently operated central processing units and a first in first out (FIFO) device which connects to a system bus of a tightly coupled data processing system. The cache subsystem includes a number of programmable control circuits which are connected to receive signals representative of the type of operations performable by the cache subsystem. These signals are logically combined for generating an output signal indicating whether or not the contents of the directiory memory should be flushed when any one of a number of types of address or system faults has been detected in order to maintain cache coherency.

Patent
22 Dec 1987
TL;DR: In this paper, it is shown that when it becomes necessary for a processor to update its cache with a block of data from main memory, such a block is simultaneously loaded into each appropriate cache.
Abstract: A computer system having a plurality of processors with each processor having associated therewith a cache memory is disclosed. When it becomes necessary for a processor to update its cache with a block of data from main memory, such a block of data is simultaneously loaded into each appropriate cache. Thus, each processor subsequently requiring such updated block of data may retrieve the block from its own cache, and not be required to access main memory.

Patent
Steven C. Steps1
16 Jun 1987
TL;DR: In this paper, a cache memory architecture which is two blocks wide and made up of a map RAM, two cache data RAMs (each one word wide), and a selection system was presented.
Abstract: Provided is a cache memory architecture which is two blocks wide and is made up of a map RAM, two cache data RAMs (each one word wide), and a selection system for selecting data from either one or both cache data RAMs, depending on whether the access is between cache and CPU, or between cache and main memory. The data stored in the two cache data RAMs has a particular address configuration. It consists of having data with even addresses of even pages and odd addresses of odd pages stored in one cache data RAM, with odd addresses and even addresses interleaved therein; and odd addresses of even pages and even addresses of odd pages stored in the other cache data RAM, with the odd addresses and even addresses interleaved but inverted relative to the other cache data RAM.

Patent
31 Jul 1987
TL;DR: In this paper, a Hashing Indexer for a Branch Cache is proposed for use in a pipelined digital processor that employs macro-instructions utilizing interpretation by micro-insstructions.
Abstract: A Hashing Indexer For a Branch Cache for use in a pipelined digital processor that employs macro-instructions utilizing interpretation by micro-instructions. Each of the macro-instructions has an associated address and each of the micro instructions has an associated address. The hashing indexer includes a look-ahead-fetch system including a branch cache memory coupled to the prefetch section. An indexed table of branch target addressess each of which correspond to the address of a previously fetched instruction is stored in the branch cache memory. A predetermined number of bits representing the address of the macro-instruction being fetched is hashed with a predetermined number of bits representing the address of the micro-instruction being invoked. The indexer is used to apply the hashing result as an address to the branch memory in order to read out a unique predicted branch target address that is predictive of a branch for the hashed macro-instruction bits and micro-instruction bits. The hashing indexer disperses branch cache entries throughout the branch cache memory. Therefore, by hashing macro-instruction bits with micro-instruction bits and by dispersing the branch cache entries throughout the branch cache memory, the prediction rate of the system is increased.

Patent
29 Jan 1987
TL;DR: In this paper, a cache memory unit in which, in response to the application of a write command, the write operation is performed in two system clock cycles is described, where the data signal group is stored in a temporary storage unit while a determination is made if the address associated with the data signals group is present in the cache memory units.
Abstract: A cache memory unit in which, in response to the application of a write command, the write operation is performed in two system clock cycles. During the first clock cycle, the data signal group is stored in a temporary storage unit while a determination is made if the address associated with the data signal group is present in the cache memory unit. When the address is present, the data signal group is stored in the cache memory unit during the next application of a write command to the cache memory unit. If a read command is applied to the cache memory unit involving the data signal group stored in the temporary storage unit, then this data signal group is transferred to the central processing unit in response to the read command. Instead of performing the storage into the cache memory unit as a result of the next write command, the storage can occur during any free cycle.

Patent
19 Jun 1987
TL;DR: In this article, the index is utilized to access the cache to generate an output which includes a block corresponding to the index from each set of the cache, each block includes an address tag and data.
Abstract: A method of retrieving data from a multi-set cache memory in a computer system. An address, which includes an index, is presented by the processor to the cache memory. The index is utilized to access the cache to generate an output which includes a block corresponding to the index from each set of the cache. Each block includes an address tag and data. A portion of the address tag for all but one of the blocks is compared with a corresponding portion of the address. If the comparison results in a match, then the data from the block associated with match is provided to the processor. If the comparison does not result in a match, then the data from the remaining block is provided to the processor. A full address tag comparison is done in parallel with the "lookaside tag" comparison to confirm a "hit."

Journal ArticleDOI
TL;DR: The role of cache memories and the factors that decide the success of a particular design are examined, and the operation of a cache memory is described and the specification of cache parameters is considered.
Abstract: The role of cache memories and the factors that decide the success of a particular design are examined. The operation of a cache memory is described. The specification of cache parameters is considered. Also discussed are the size of a cache, cache hierarchies, fetching and replacing, cache organization, updating the main memory, the use of two caches rather than one, virtual-address caches, and cache consistency.

Patent
21 Sep 1987
TL;DR: In this paper, a data processing system having a bus master, a cache, and a memory which is capable of transferring operands in bursts in response to a burst request signal provided by the bus master is described.
Abstract: A data processing system having a bus master, a cache, and a memory which is capable of transferring operands in bursts in response to a burst request signal provided by the bus master. The bus master will provide the burst request signal to the memory in order to fill a line in the cache only if there are no valid entries in that cache line. If a requested operand spans two cache lines, the bus master will defer the burst request signal until the end of the transfer of that operand, so that only the second cache line will be burst filled.

Proceedings Article
01 Jan 1987
TL;DR: This work proposes a new architecture for shared memory multiprocessors, the crosspoint cache architecture, which consists of a crossbar interconnection network with a cache memory at each crosspoint switch and considers a two-level cache architecture in which caches on the processor chips are used in addition to the caches in the crosspoints.
Abstract: We propose a new architecture for shared memory multiprocessors, the crosspoint cache architecture. This architecture consists of a crossbar interconnection network with a cache memory at each crosspoint switch. It assures cache coherence in hardware while avoiding the performance bottlenecks associated with previous hardware cache coherence solutions. We show this architecture is feasible for a 64 processor system. We also consider a two-level cache architecture in which caches on the processor chips are used in addition to the caches in the crosspoints. This two-level cache organization achieves the goals of fast memory access and low bus tra c in a cost e ective way.


Proceedings ArticleDOI
01 Jan 1987
TL;DR: In this paper, a reduced instruction set (RISC) computer with 172k transistors in 1.5μm technology is described. The chip contains caches for prefetch buffer, decoded instructions and stack.
Abstract: A Reduced Instruction Set Computer containing 172K transistors in 1.5μm technology will be described. The chip contains caches for prefetch buffer, decoded instructions and stack. Two internal machines with three pipelined stages are used.

Patent
13 Nov 1987
TL;DR: In this article, the external memory is readily accessed to shorten the overhead time required for the cache memory reference, which can be shortened in average in average by relying upon the result that is stored without effecting the reference.
Abstract: When the access is effected sequentially such as the prefetching of an instruction or the restoration of a register in the stack region, the retrieval is effected simultaneously for the consecutive addresses and the result is stored. When the consecutive addresses are to be accessed, the hit is determined relying upon the result that is stored without effecting the cache memory reference. In the case of mishit, the external memory is readily accessed to shorten the overhead time required for the cache memory reference. Therefore, the access time can be shortened in average.



Patent
22 Dec 1987
TL;DR: In this paper, the cache memory is divided into two levels (2,4), and the write operations issued by the central processing unit of each processor are performed in the two cache-memory levels by virtue of a procedure for immediate writing in order to keep, in the second level (4), an updated copy of the information contained in the first level (2), and in updating the corresponding information in the central memory (1) from the second cachememory level(4) by a procedure of delayed writing.
Abstract: The system consists of multiprocessors P1 to Pn connected to a central memory (9) by means of a single bus (8). Each processor comprises a central-processing unit (1) and a cache memory (2,4). The method consists in dividing the cache memory into two levels (2,4), in causing the write operations issued by the central-processing unit of each processor to be performed in the two cache-memory levels by virtue of a procedure for immediate writing in order to keep, in the second level (4), an updated copy of the information contained in the first level (2), and in updating the corresponding information in the central memory (1) from the second cache-memory level (4) by a procedure for delayed writing. Application: information-processing systems.

Patent
James Gerald Brenza1
03 Apr 1987
TL;DR: In this paper, the authors propose a data processing system which contains a multi-level storage hierarchy, in which the two highest hierarchy levels (e.g. Ll and L2) are private to a single CPU, in order to be in close proximity to each other and to the CPU.
Abstract: The disclosure provides a data processing system which contains a multi-level storage hierarchy, in which the two highest hierarchy levels (e.g. Ll and L2) are private (not shared) to a single CPU, in order to be in close proximity to each other and to the CPU. Each cache has a data line length convenient to the respective cache. A common directory and an L1 control array (L1CA) are provided for the CPU to access both the L1 and L2 caches. The common directory contains and is addressed by the CPU requesting logical addresses, each of which is either a real/absolute address or a virtual address, according to whichever address mode the CPU is in. Each entry in the directory contains a logical address representation derived from a logical address that previously missed in the directory. A CPU request "hits" in the directory if its requested address is in any private cache (e.g. in L1 or L2). A line presence field (LPF) is included in each directory entry to aid in determining a hit in the L1 cache. The L1CA contains Ll cache information to supplement the corresponding common directory entry; the L1CA is used during a L1 LRU castout, but is not the critical path of an L1 or L2 hit. A translation lookaside buffer (TLB) is not used to determine cache hits. The TLB output is used only during the infrequent times that a CPU request misses in the cache directory, and the translated address (i.e. absolute address) is then used to access the data in a synonym location in the same cache, or in main storage, or in the L1 or L2 cache in another CPU in a multiprocessor system using synonym/cross-interrogate directories.