scispace - formally typeset
Search or ask a question

Showing papers on "Cache pollution published in 1987"


Journal ArticleDOI
TL;DR: In this article, the authors examined the cache miss ratio as a function of line size, and found that for high performance microprocessor designs, line sizes in the range 16-64 bytes seem best; shorter line sizes yield high delays due to memory latency, although they reduce memory traffic somewhat.
Abstract: The line (block) size of a cache memory is one of the parameters that most strongly affects cache performance. In this paper, we study the factors that relate to the selection of a cache line size. Our primary focus is on the cache miss ratio, but we also consider influences such as logic complexity, address tags, line crossers, I/O overruns, etc. The behavior of the cache miss ratio as a function of line size is examined carefully through the use of trace driven simulation, using 27 traces from five different machine architectures. The change in cache miss ratio as the line size varies is found to be relatively stable across workloads, and tables of this function are presented for instruction caches, data caches, and unified caches. An empirical mathematical fit is obtained. This function is used to extend previously published design target miss ratios to cover line sizes from 4 to 128 bytes and cache sizes from 32 bytes to 32K bytes; design target miss ratios are to be used to guide new machine designs. Mean delays per memory reference and memory (bus) traffic rates are computed as a function of line and cache size, and memory access time parameters. We find that for high performance microprocessor designs, line sizes in the range 16-64 bytes seem best; shorter line sizes yield high delays due to memory latency, although they reduce memory traffic somewhat. Longer line sizes are suitable for mainframes because of the higher bandwidth to main memory.

180 citations


BookDOI
01 Jan 1987
TL;DR: This work focuses on the development of a Analytical Cache Model for Multiprogramming Cache Performance, which automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and analyzing caches.
Abstract: 1 Introduction.- 1.1 Overview of Cache Design.- 1.1.1 Cache Parameters.- 1.1.2 Cache Performance Evaluation Methodology.- 1.2 Review of Past Work.- 1.3 Then, Why This Research?.- 1.3.1 Accurately Characterizing Large Cache Performance.- 1.3.2 Obtaining Trace Data for Cache Analysis.- 1.3.3 Developing Efficient and Accurate Cache Analysis Methods.- 1.4 Contributions.- 1.5 Organization.- 2 Obtaining Accurate Trace Data.- 2.1 Current Tracing Techniques.- 2.2 Tracing Using Microcode.- 2.3 An Experimental Implementation.- 2.3.1 Storage of Trace Data.- 2.3.2 Recording Memory References.- 2.3.3 Tracing Control.- 2.4 Trace Description.- 2.5 Applications in Performance Evaluation.- 2.6 Extensions and Summary.- 3 Cache Analyses Techniques - An Analytical Cache Model.- 3.1 Motivation and Overview.- 3.1.1 The Case for the Analytical Cache Model.- 3.1.2 Overview of the Model.- 3.2 A Basic Cache Model.- 3.2.1 Start-Up Effects.- 3.2.2 Non-Stationary Effects.- 3.2.3 Intrinsic Interference.- 3.3 A Comprehensive Cache Model.- 3.3.1 Set Size.- 3.3.2 Modeling Spatial Locality and the Effect of Block Size.- 3.3.3 Multiprogramming.- 3.4 Model Validation and Applications.- 3.5 Summary.- 4 Transient Cache Analysis - Trace Sampling and Trace Stitching.- 4.1 Introduction.- 4.2 Transient Behavior Analysis and Trace Sampling.- 4.2.1 Definitions.- 4.2.2 Analysis of Start-up Effects in Single Process Traces.- 4.2.3 Start-up Effects in Multiprocess Traces.- 4.3 Obtaining Longer Samples Using Trace Stitching.- 4.4 Trace Compaction - Cache Filtering with Blocking.- 4.4.1 Cache Filter.- 4.4.2 Block Filter.- 4.4.3 Implementation of the Cache and Block Filters.- 4.4.4 Miss Rate Estimation.- 4.4.5 Compaction Results.- 5 Cache Performance Analysis for System References.- 5.1 Motivation.- 5.2 Analysis of the Miss Rate Components due to System References.- 5.3 Analysis of System Miss Rate.- 5.4 Associativity.- 5.5 Block Size.- 5.6 Evaluation of Split Caches.- 6 Impact of Multiprogramming on Cache Performance.- 6.1 Relative Performance of Multiprogramming Cache Techniques.- 6.2 More on Warm Start versus Cold Start.- 6.3 Impact of Shared System Code on Multitasking Cache Performance.- 6.4 Process Switch Statistics and Their Effects on Cache ModeUng.- 6.5 Associativity.- 6.6 Block Size.- 6.7 Improving the Multiprogramming Performance of Caches.- 6.7.1 Hashing.- 6.7.2 A Hash-Rehash Cache.- 6.7.3 Split Caches.- 7 Multiprocessor Cache Analysis.- 7.1 Tracing Multiprocessors.- 7.2 Characteristics of Traces.- 7.3 Analysis.- 7.3.1 General Methodology.- 7.3.2 Multiprocess Interference in Large Virtual and Physical Caches.- 7.3.3 Analysis of Interference Between Multiple Processors.- 7.3.4 Blocks Containing Semaphores.- 8 Conclusions and Suggestions for Future Work.- 8.1 Concluding Remarks.- 8.2 Suggestions for Future Work.- Appendices.- B.1 On the Stability of the Collision Rate.- B.2 Estimating Variations in the Collision Rate.- C Inter-Run Intervals and Spatial Locality.- D Summary of Benchmark Characteristics.- E Features of ATUM-2.- E.1 Distributing Trace Control to All Processors.- E.2 Provision of Atomic Accesses to Trace Memory.- E.3 Instruction Stream Compaction Using a Cache Simulated in Microcode.- E.4 Microcode Patch Space Conservation.

118 citations


Journal ArticleDOI
TL;DR: The authors provide an overview of MIPS-X, focusing on the techniques used to reduce the complexity of the processor and implement the on-chip instruction cache.
Abstract: MIPS-X is a 32-b RISC microprocessor implemented in a conservative 2-/spl mu/m, two-level-metal, n-well CMOS technology. High performance is achieved by using a nonoverlapping two-phase 20-MHz clock and executing one instruction every cycle. To reduce its memory bandwidth requirements, MIPS-X includes a 2-kbyte on-chip instruction cache. The authors provide an overview of MIPS-X, focusing on the techniques used to reduce the complexity of the processor and implement the on-chip instruction cache.

98 citations


Journal ArticleDOI
Douglas B. Terry1
TL;DR: A new approach to managing caches of hints suggests maintaining a minimum level of cache accuracy, rather than maximizing the cache hit ratio, in order to guarantee performance improvements.
Abstract: Caching reduces the average cost of retrieving data by amortizing the lookup cost over several references to the data Problems with maintaining strong cache consistency in a distributed system can be avoided by treating cached information as hints A new approach to managing caches of hints suggests maintaining a minimum level of cache accuracy, rather than maximizing the cache hit ratio, in order to guarantee performance improvements The desired accuracy is based on the ratio of lookup costs to the costs of detecting and recovering from invalid cache entries Cache entries are aged so that they get purged when their estimated accuracy falls below the desired level The age thresholds are dictated solely by clients' accuracy requirements instead of being suggested by data storage servers or system administrators

83 citations


Patent
22 Apr 1987
TL;DR: In this article, the memory control subsystem controls and arbitrates the access to a memory 10 which is shared by a plurality of users comprising at least a processor 2 with its cache and input/output devices 4 having direct access to the memory through a direct memory access bus 12.
Abstract: The memory control subsystem controls and arbitrates the access to a memory 10 which is shared by a plurality of users comprising at least a processor 2 with its cache and input/output devices 4 having direct access to the memory through a direct memory access bus 12. It comprises a processor controller 20, a DMA controller 22 and a memory controller 24. A processor request is buffered into the processor controller 20 and is serviced right away if the memory controller is available, possibly with a simulataneous transfer between the devices 4 and buffers in the DMA controller 22. If the memory controller 24 is busy, because a DMA request is being serviced, the DMA controller comprises means to cause the DMA transfer to be interrupted, the processor request to be serviced and the DMA transfer to be resumed afterwards. Write requests made by the processor are buffered into processor controller 20 and an acknowledgement signal is sent to the processor which can resume execution without waiting the memory update completion. A read request which does not hit the cache is sent to the processor controller which causes the cache to be updated. In case of multiple processor requests contending with a long DMA transfer, the latter is sliced into several parts, each part mapping one cache line. In case of a DMA write, the cache lines which correspond to memory positions whose content is modified by the write operation are invalidated in such a way that the processor cannot read a partially written line into the cache.

76 citations


Patent
02 Dec 1987
TL;DR: In this paper, a broadband branch history table is organized by cache line, which determines from the history of branches the next cache line to be referenced and uses that information for prefetching lines into the cache.
Abstract: Apparatus for fetching instructions in a computing system. A broadband branch history table is organized by cache line. The broadband branch history table determines from the history of branches the next cache line to be referenced and uses that information for prefetching lines into the cache.

71 citations


Patent
27 Mar 1987
TL;DR: In this paper, the cache coherence system detects when the contents of storage locations in the cache memories of the one or more of the data processors have been modified in conjuction with the activity those data processors and is responsive to such detections to generate and store in its cache invalidate table (CIT) memory a multiple element linked list.
Abstract: A cache coherence system for a multiprocessor system including a plurality of data processors coupled to a common main memory. Each of the data processors includes an associated cache memory having storage locations therein corresponding to storage locations in the main memory. The cache coherence system for a data processor includes a cache invalidate table (CIT) memory having internal storage locations corresponding to locations in the cache memory of the data processor. The cache coherence system detects when the contents of storage locations in the cache memories of the one or more of the data processors have been modified in conjuction with the activity those data processors and is responsive to such detections to generate and store in its CIT memory a multiple element linked list defining the locations in the cache memories of the data processors having modified contents. Each element of the list defines one of those cache storage locations and also identifies the location in the CIT memory of the next element in the list.

69 citations


Patent
Jr Thomas Henry Holman1
27 Jul 1987
TL;DR: A write-shared cache circuit for multiprocessor systems maintains data consistency throughout the system and eliminates non-essential bus accesses by utilizing additional bus lines between caches of the system as mentioned in this paper.
Abstract: A "write-shared" cache circuit for multiprocessor systems maintains data consistency throughout the system and eliminates non-essential bus accesses by utilizing additional bus lines between caches of the system and by utilizing additional logic in order to enhance the intercache communication. Data is only written through to the system bus when the data is labeled "shared". A write-miss is read only once on the system bus in an "invalidate" cycle, and then it is written only to the requesting cache.

65 citations


01 Jan 1987
TL;DR: These techniques are significant extensions to the stack analysis technique (Mattson et al., 1970) which computes the read miss ratio for all cache sizes in a single trace-driven simulation, and are used to study caching in a network file system.
Abstract: This dissertation describes innovative techniques for efficiently analyzing a wide variety of cache designs, and uses these techniques to study caching in a network file system. The techniques are significant extensions to the stack analysis technique (Mattson et al., 1970) which computes the read miss ratio for all cache sizes in a single trace-driven simulation. Stack analysis is extended to allow the one-pass analysis of: (1) writes in a write-back cache, including periodic write-back and deletions, important factors in file system cache performance. (2) sub-block or sector caches, including load-forward prefetching. (3) multi-processor caches in a shared-memory system, for an entire class of consistency protocols, including all of the well-known protocols. (4) client caches in a network file system, using a new class of consistency protocols. The techniques are completely general and apply to all levels of the memory hierarchy, from processor caches to disk and file system caches. The dissertation also discusses the use of hash tables and binary trees within the simulator to further improve performance for some types of traces. Using these techniques, the performance of all cache sizes can be computed in little more than twice the time required to simulate a single cache size, and often in just 10% more time. In addition to presenting techniques, this dissertation also demonstrates their use by studying client caching in a network file system. It first reports the extent of file sharing in a UNIX environment, showing that a few shared files account for two-thirds of all accesses, and nearly half of these are to files which are both read and written. It then studies different cache consistency protocols, write policies, and fetch policies, reporting the miss ratio and file server utilization for each. Four cache consistency protocols are considered: a polling protocol that uses the server for all consistency controls; a protocol designed for single-user files; one designed for read-only files; and one using write-broadcast to maintain consistency. It finds that the choice of consistency protocol has a substantial effect on performance; both the read-only and write-broadcast protocols showed half the misses and server load of the polling protocol. The choice of write or fetch policy made a much smaller difference.

64 citations


Patent
15 Sep 1987
TL;DR: In this paper, a mechanism for determining when the contents of a block in a cache memory have been rendered stale by DMA activity external to a processor and for marking the block stale in response to a positive determination is proposed.
Abstract: A mechanism for determining when the contents of a block in a cache memory have been rendered stale by DMA activity external to a processor and for marking the block stale in response to a positive determination. The commanding unit in the DMA transfer, prior to transmitting an address, asserts a cache control signal which conditions the processor to receive the address and determine whether there is a correspondence to the contents of the cache. If there is a correspondence, the processor marks the contents of that cache location for which there is a correspondence stale.

63 citations


Patent
30 Oct 1987
TL;DR: In this article, the main memory is accessed during its row address strobe (RAS) precharge time while simultaneously accessing the cache memory, reducing the time necessary for the processor unit (PU) to read the next instruction when not stored in cache memory.
Abstract: A data processing system includes a high speed buffer, or cache, memory for temporarily storing recently executed instructions and a slower main memory in which is stored the system's operating program. Rather than sequentially accessing the cache memory to determine if the next instruction is stored therein and then accessing the main memory if the cache memory does not have the next instruction, system operating speed is increased by simultaneously accessing the cache and main memories. By accessing the main memory during its row address strobe (RAS) precharge time while simultaneously accessing the cache memory, the time necessary for the system's processor unit (PU) to read the next instruction from the main memory when not stored in the cache memory is substantially reduced.

Dissertation
01 Jul 1987
TL;DR: This dissertation explores possible solutions to the cache coherence problem and identifies Cache coherence protocols--solutions implemented entirely in hardware--as an attractive alternative.
Abstract: Shared-memory multiprocessors offer increased computational power and the programmability of the shared-memory model However, sharing memory between processors leads to contention which delays memory accesses Adding a cache memory for each processor reduces the average access time, but it creates the possibility of inconsistency among cached copies The cache coherence problem is keeping all cached copies of the same memory location identical This dissertation explores possible solutions to the cache coherence problem and identifies cache coherence protocols--solutions implemented entirely in hardware--as an attractive alternative Protocols for shared-bus systems are shown to be an interesting special case Previously proposed shared-bus protocols are described using uniform terminology, and they are shown to divide into two categories: invalidation and distributed write In invalidation protocols all other cached copies must be invalidated before any copy can be changed; in distributed write protocols all copies must be updated each time a shared block is modified In each category, a new protocol is presented with better performance than previous schemes, based on simulation results The simulation model and parameters are described in detail Previous protocols for general interconnection networks are shown to contain flaws and to be costly to implement A new class of protocols is presented that offers reduced implementation cost and expandability, while retaining a high level of performance, as illustrated by simulation results using a crossbar switch All new protocols have been proven correct; one of the proofs is included Previous definitions of cache coherence are shown to be inadequate and a new definition is presented Coherence is compared and contrasted with other levels of consistency, which are also identified The consistency of shared-bus protocols is shown to be naturally stronger than that of non-bus protocols The first protocol of its kind is presented for a large hierarchical multiprocessor, using a bus-based protocol within each cluster and a general protocol in the network connecting the clusters to the shared main memory

Proceedings ArticleDOI
01 Jun 1987
TL;DR: In this paper, cache design is explored for large high-performance multiprocessors with hundreds or thousands of processors and memory modules interconnected by a pipe-lined multi-stage network and it is shown that the optimal cache block size in such multiprocessionors is much smaller than in many uniprocessor.
Abstract: In this paper, cache design is explored for large high-performance multiprocessors with hundreds or thousands of processors and memory modules interconnected by a pipe-lined multi-stage network. The majority of the multiprocessor cache studies in the literature exclusively focus on the issue of cache coherence enforcement. However, there are other characteristics unique to such multiprocessors which create an environment for cache performance that is very different from that of many uniprocessors.Multiprocessor conditions are identified and modeled, including, 1) the cost of a cache coherence enforcement scheme, 2) the effect of a high degree of overlap between cache miss services, 3) the cost of a pin limited data path between shared memory and caches, 4) the effect of a high degree of data prefetching, 5) the program behavior of a scientific workload as represented by 23 numerical subroutines, and 6) the parallel execution of programs. This model is used to show that the cache miss ratio is not a suitable performance measure in the multiprocessors of interest and to show that the optimal cache block size in such multiprocessors is much smaller than in many uniprocessors.

Patent
02 Oct 1987
TL;DR: In this article, a solid-state cache memory subsystem configured to be used in conjunction with disk drives for prestaging of data in advance of its being called for by a host computer features a controller featuring means for establishing and maintaining precise correspondence between storage locations in the solid state array and on the disk memory.
Abstract: A solid-state cache memory subsystem configured to be used in conjunction with disk drives for prestaging of data in advance of its being called for by a host computer features a controller featuring means for establishing and maintaining precise correspondence between storage locations in the solid-state array and on the disk memory, for use in establishing a reoriented position on a disk in the event of error detection, and in order to determine when a predetermined quantity of data has been read from the disk into the cache in a staging operation.

Patent
28 May 1987
TL;DR: In this article, a cache memory subsystem has multilevel directory memory and buffer memory pipeline stages shared by at least a pair of independently operated central processing units and a first in first out (FIFO) device which connects to a system bus of a tightly coupled data processing system.
Abstract: A cache memory subsystem has multilevel directory memory and buffer memory pipeline stages shared by at least a pair of independently operated central processing units and a first in first out (FIFO) device which connects to a system bus of a tightly coupled data processing system. The cache subsystem includes a number of programmable control circuits which are connected to receive signals representative of the type of operations performable by the cache subsystem. These signals are logically combined for generating an output signal indicating whether or not the contents of the directiory memory should be flushed when any one of a number of types of address or system faults has been detected in order to maintain cache coherency.

Patent
22 Dec 1987
TL;DR: In this paper, it is shown that when it becomes necessary for a processor to update its cache with a block of data from main memory, such a block is simultaneously loaded into each appropriate cache.
Abstract: A computer system having a plurality of processors with each processor having associated therewith a cache memory is disclosed. When it becomes necessary for a processor to update its cache with a block of data from main memory, such a block of data is simultaneously loaded into each appropriate cache. Thus, each processor subsequently requiring such updated block of data may retrieve the block from its own cache, and not be required to access main memory.

Patent
Steven C. Steps1
16 Jun 1987
TL;DR: In this paper, a cache memory architecture which is two blocks wide and made up of a map RAM, two cache data RAMs (each one word wide), and a selection system was presented.
Abstract: Provided is a cache memory architecture which is two blocks wide and is made up of a map RAM, two cache data RAMs (each one word wide), and a selection system for selecting data from either one or both cache data RAMs, depending on whether the access is between cache and CPU, or between cache and main memory. The data stored in the two cache data RAMs has a particular address configuration. It consists of having data with even addresses of even pages and odd addresses of odd pages stored in one cache data RAM, with odd addresses and even addresses interleaved therein; and odd addresses of even pages and even addresses of odd pages stored in the other cache data RAM, with the odd addresses and even addresses interleaved but inverted relative to the other cache data RAM.

Patent
31 Jul 1987
TL;DR: In this paper, a Hashing Indexer for a Branch Cache is proposed for use in a pipelined digital processor that employs macro-instructions utilizing interpretation by micro-insstructions.
Abstract: A Hashing Indexer For a Branch Cache for use in a pipelined digital processor that employs macro-instructions utilizing interpretation by micro-instructions. Each of the macro-instructions has an associated address and each of the micro instructions has an associated address. The hashing indexer includes a look-ahead-fetch system including a branch cache memory coupled to the prefetch section. An indexed table of branch target addressess each of which correspond to the address of a previously fetched instruction is stored in the branch cache memory. A predetermined number of bits representing the address of the macro-instruction being fetched is hashed with a predetermined number of bits representing the address of the micro-instruction being invoked. The indexer is used to apply the hashing result as an address to the branch memory in order to read out a unique predicted branch target address that is predictive of a branch for the hashed macro-instruction bits and micro-instruction bits. The hashing indexer disperses branch cache entries throughout the branch cache memory. Therefore, by hashing macro-instruction bits with micro-instruction bits and by dispersing the branch cache entries throughout the branch cache memory, the prediction rate of the system is increased.

Patent
29 Jan 1987
TL;DR: In this paper, a cache memory unit in which, in response to the application of a write command, the write operation is performed in two system clock cycles is described, where the data signal group is stored in a temporary storage unit while a determination is made if the address associated with the data signals group is present in the cache memory units.
Abstract: A cache memory unit in which, in response to the application of a write command, the write operation is performed in two system clock cycles. During the first clock cycle, the data signal group is stored in a temporary storage unit while a determination is made if the address associated with the data signal group is present in the cache memory unit. When the address is present, the data signal group is stored in the cache memory unit during the next application of a write command to the cache memory unit. If a read command is applied to the cache memory unit involving the data signal group stored in the temporary storage unit, then this data signal group is transferred to the central processing unit in response to the read command. Instead of performing the storage into the cache memory unit as a result of the next write command, the storage can occur during any free cycle.

Journal ArticleDOI
01 Oct 1987
TL;DR: While the miss ratio is affected by object program size, it appears that this can be corrected by simplying increasing the size of the cache and measurements of bus traffic show that even with large caches, machines with simple instruction sets can expect substantially more main memory reads than machines with dense object programs.
Abstract: One potential disadvantage of a machine with a reduced instruction set is that object programs may be substantially larger than those for a machine with a richer, more complex instruction set The main reason is that a small instruction set will require more instructions to implement the same function In addition, the tendency of RISC machines to use fixed length instructions with a few instruction formats also increases object program size It has been conjectured that the resulting larger programs could adversely affect memory performance and bus traffic In this paper we report the results of a set of experiments to isolate and determine the effect of instruction set complexity on cache memory performance and bus traffic Three high-level language compilers were constructed for machines with instruction sets of varying degrees of complexity Using a set of benchmark programs, we evaluated the effect of instruction set complexity had on program size Five of the programs were used to perform a set of trace-driven simulations to study each machine's cache and bus performance While we found that the miss ratio is affected by object program size, it appears that this can be corrected by simplying increasing the size of the cache Our measurements of bus traffic, however, show that even with large caches, machines with simple instruction sets can expect substantially more main memory reads than machines with dense object programs

Journal ArticleDOI
TL;DR: The role of cache memories and the factors that decide the success of a particular design are examined, and the operation of a cache memory is described and the specification of cache parameters is considered.
Abstract: The role of cache memories and the factors that decide the success of a particular design are examined. The operation of a cache memory is described. The specification of cache parameters is considered. Also discussed are the size of a cache, cache hierarchies, fetching and replacing, cache organization, updating the main memory, the use of two caches rather than one, virtual-address caches, and cache consistency.

Patent
21 Sep 1987
TL;DR: In this paper, a data processing system having a bus master, a cache, and a memory which is capable of transferring operands in bursts in response to a burst request signal provided by the bus master is described.
Abstract: A data processing system having a bus master, a cache, and a memory which is capable of transferring operands in bursts in response to a burst request signal provided by the bus master. The bus master will provide the burst request signal to the memory in order to fill a line in the cache only if there are no valid entries in that cache line. If a requested operand spans two cache lines, the bus master will defer the burst request signal until the end of the transfer of that operand, so that only the second cache line will be burst filled.

Proceedings Article
01 Jan 1987
TL;DR: This work proposes a new architecture for shared memory multiprocessors, the crosspoint cache architecture, which consists of a crossbar interconnection network with a cache memory at each crosspoint switch and considers a two-level cache architecture in which caches on the processor chips are used in addition to the caches in the crosspoints.
Abstract: We propose a new architecture for shared memory multiprocessors, the crosspoint cache architecture. This architecture consists of a crossbar interconnection network with a cache memory at each crosspoint switch. It assures cache coherence in hardware while avoiding the performance bottlenecks associated with previous hardware cache coherence solutions. We show this architecture is feasible for a 64 processor system. We also consider a two-level cache architecture in which caches on the processor chips are used in addition to the caches in the crosspoints. This two-level cache organization achieves the goals of fast memory access and low bus tra c in a cost e ective way.


Proceedings ArticleDOI
01 Jan 1987
TL;DR: In this paper, a reduced instruction set (RISC) computer with 172k transistors in 1.5μm technology is described. The chip contains caches for prefetch buffer, decoded instructions and stack.
Abstract: A Reduced Instruction Set Computer containing 172K transistors in 1.5μm technology will be described. The chip contains caches for prefetch buffer, decoded instructions and stack. Two internal machines with three pipelined stages are used.

Patent
13 Nov 1987
TL;DR: In this article, the external memory is readily accessed to shorten the overhead time required for the cache memory reference, which can be shortened in average in average by relying upon the result that is stored without effecting the reference.
Abstract: When the access is effected sequentially such as the prefetching of an instruction or the restoration of a register in the stack region, the retrieval is effected simultaneously for the consecutive addresses and the result is stored. When the consecutive addresses are to be accessed, the hit is determined relying upon the result that is stored without effecting the cache memory reference. In the case of mishit, the external memory is readily accessed to shorten the overhead time required for the cache memory reference. Therefore, the access time can be shortened in average.



Patent
05 Oct 1987
TL;DR: In this paper, a data processing system has a bus meter, a memory capable of transferring operands requested by the bus master, and a cache for temporarily storing a selected number of the most recently transferred operands.
Abstract: A data processing system has a bus meter, a memory capable of transferring operands requested by the bus master, and a cache for temporarily storing a selected number of the most recently transferred operands. If the memory provides an operand or a portion thereof which is insufficient in size or alignment to fill a complete entry in a line in the cache, the bus master automatically transfers additional operands adjacent in the memory to the requested operand sufficient to fill that entry.

Patent
22 Dec 1987
TL;DR: In this paper, the cache memory is divided into two levels (2,4), and the write operations issued by the central processing unit of each processor are performed in the two cache-memory levels by virtue of a procedure for immediate writing in order to keep, in the second level (4), an updated copy of the information contained in the first level (2), and in updating the corresponding information in the central memory (1) from the second cachememory level(4) by a procedure of delayed writing.
Abstract: The system consists of multiprocessors P1 to Pn connected to a central memory (9) by means of a single bus (8). Each processor comprises a central-processing unit (1) and a cache memory (2,4). The method consists in dividing the cache memory into two levels (2,4), in causing the write operations issued by the central-processing unit of each processor to be performed in the two cache-memory levels by virtue of a procedure for immediate writing in order to keep, in the second level (4), an updated copy of the information contained in the first level (2), and in updating the corresponding information in the central memory (1) from the second cache-memory level (4) by a procedure for delayed writing. Application: information-processing systems.