scispace - formally typeset
Search or ask a question

Showing papers on "Cache coloring published in 1986"


Patent
04 Dec 1986
TL;DR: In this article, a non-write-through cache memory associated with each of the system's processing elements stores computations generated by that processing element at a context switch, the stored information is sequentially written to two separate main memory units.
Abstract: Apparatus for maintaining duplicate copies of information stored in fault-tolerant computer main memories is disclosed A non write-through cache memory associated with each of the system's processing elements stores computations generated by that processing element At a context switch, the stored information is sequentially written to two separate main memory units A separate status area in main memory is updated by the processing element both before and after each writing operation so that a fault occurring during data processing or during any storage operation leaves the system with sufficient information to be able to reconstruct the data without loss of integrity To efficiently transfer information between the cache memory and the system main memories without consuming a large amount of processing time at context switches, a block status memory associated with the cache memory contains an entry for each data block in the cache memory The entry indicates whether the corresponding data block has been modified during data processing or written with computational data from the processing element The storage operations are carried out by high-speed hardware which stores only the modified data blocks Additional special-purpose hardware simultaneously invalidates all cache memory entries so that a new task can be loaded and started

139 citations


Patent
06 Jun 1986
TL;DR: Memory integrity is maintained in a system with a hierarchical memory using a set of explicit cache control instructions as mentioned in this paper, with two status flags, a valid bit and a dirty bit, with each block of information stored.
Abstract: Memory integrity is maintained in a system with a hierarchical memory using a set of explicit cache control instructions. The caches in the system have two status flags, a valid bit and a dirty bit, with each block of information stored. The operating system executes selected cache control instructions to ensure memory integrity whenever there is a possibility that integrity could be compromised.

134 citations


Patent
James Gerald Brenza1
01 May 1986
TL;DR: In this paper, a common directory and an L1 control array (L1CA) are provided for the CPU to access both the L1 and L2 caches, each of which is either a real/absolute address or a virtual address according to whichever address mode the CPU is in.
Abstract: A data processing system which contains a multi-level storage hierarchy, in which the two highest hierarchy levels (e.g. L1 and L2) are private (not shared) to a single CPU, in order to be in close proximity to each other and to the CPU. Each cache has a data line length convenient to the respective cache. A common directory and an L1 control array (L1CA) are provided for the CPU to access both the L1 and L2 caches. The common directory contains and is addressed by the CPU requesting logical addresses, each of which is either a real/absolute address or a virtual address, according to whichever address mode the CPU is in. Each entry in the directory contains a logical address representation derived from a logical address that previously missed in the directory. A CPU request "hits" in the directory if its requested address is in any private cache (e.g. in L1 or L2). A line presence field (LPF) is included in each directory entry to aid in determining a hit in the L1 cache. The L1CA contains L1 cache information to supplement the corresponding common directory entry; the L1CA is used during a L1 LRU castout, but is not the critical path of an L1 or L2 hit. A translation lookaside buffer (TLB) is not used to determine cache hits. The TLB output is used only during the infrequent times that a CPU request misses in the cache directory, and the translated address (i.e. absolute address) is then used to access the data in a synonym location in the same cache, or in main storage, or in the L1 or L2 cache in another CPU in a multiprocessor system using synonym/cross-interrogate directories.

134 citations


Patent
17 Jun 1986
TL;DR: A cache memory capable of concurrently accepting and working on completion of more than one cache access from a plurality of processors connected in parallel is discussed in this paper. But the work in this paper is restricted to the case of a single processor.
Abstract: A cache memory capable of concurrently accepting and working on completion of more than one cache access from a plurality of processors connected in parallel. Current accesses to the cache are handled by current-access-completion circuitry which determines whether the current access is capable of immediate completion and either completes the access immediately if so capable or transfers the access to pending-access-completion circuitry if not so capable. The latter circuitry works on completion of pending accesses; it determines and stores for each pending access status information prescribing the steps required to complete the access and redetermines that status information as conditions change. In working on completion of current and pending accesses, the addresses of the accesses are compared to those of memory accesses in progress on the system.

120 citations


Proceedings ArticleDOI
01 May 1986
TL;DR: An analytical model for a cache-reload transient is developed and it is shown that the reload transient is related to the area in the tail of a normal distribution whose mean is a function of the footprints of the programs that compete for the cache.
Abstract: This paper develops an analytical model for a cache-reload transient. When an interrupt program or system program runs periodically in a cache-based computer, a short cache-reload transient occurs each time the interrupt program is invoked. That transient depends on the size of the cache, the fraction of the cache used by the interrupt program, and the fraction of the cache used by background programs that run between interrupts. We call the portion of a cache used by a program its footprint in the cache, and we show that the reload transient is related to the area in the tail of a normal distribution whose mean is a function of the footprints of the programs that compete for the cache. We believe that the model may be useful as well for predicting paging behavior in virtual-memory systems with round-robin scheduling.

112 citations


Patent
12 Nov 1986
TL;DR: In this paper, the authors propose a system for maintaining data consistency among distributed processors, each having its associated cache memory, where a processor addresses data in its cache by specifying the virtual address.
Abstract: A system for maintaining data consistency among distributed processors, each having its associated cache memory. A processor addresses data in its cache by specifying the virtual address. The cache will search its cells for the data associatively. Each cell has a virtual address, a real address, flags and a plurality of associated data words. If there is no hit on the virtual address supplied by the processor, a map processor supplies the equivalent real address which the cache uses to access the data from another cache if one has it, or else from real memory. When a processor writes into a data word in the cache, the cache will update all other caches that share the data before allowing the write to the local cache.

106 citations


Journal ArticleDOI
01 May 1986
TL;DR: This paper shows how the VMP design provides the high memory bandwidth required by modern high-performance processors with a minimum of hardware complexity and cost, and describes simple solutions to the consistency problems associated with virtually addressed caches.
Abstract: VMP is an experimental multiprocessor that follows the familiar basic design of multiple processors, each with a cache, connected by a shared bus to global memory. Each processor has a synchronous, virtually addressed, single master connection to its cache, providing very high memory bandwidth. An unusually large cache page size and fast sequential memory copy hardware make it feasible for cache misses to be handled in software, analogously to the handling of virtual memory page faults. Hardware support for cache consistency is limited to a simple state machine that monitors the bus and interrupts the processor when a cache consistency action is required.In this paper, we show how the VMP design provides the high memory bandwidth required by modern high-performance processors with a minimum of hardware complexity and cost. We also describe simple solutions to the consistency problems associated with virtually addressed caches. Simulation results indicate that the design achieves good performance providing data contention is not excessive.

99 citations


Patent
25 Jul 1986
TL;DR: In this article, a method and apparatus for marking data that is temporarily cacheable to facilitate the efficient management of said data is presented, where a bit in the segment and/or page descriptor of the data called the marked data bit (MDB) is generated by the compiler and included in a request for data from memory by the processor in the form of a memory address and will be stored in the cache directory at a location related to the particular line of data involved.
Abstract: A method and apparatus for marking data that is temporarily cacheable to facilitate the efficient management of said data. A bit in the segment and/or page descriptor of the data called the marked data bit (MDB) is generated by the compiler and included in a request for data from memory by the processor in the form of a memory address and will be stored in the cache directory at a location related to the particular line of data involved. The bit is passed to the cache together with the associated real address after address translation (in the case of a real cache). when the cache controls load the address of the data in the directory it is also stored the marked data bit (MDB) in the directory with the address. When the cacheability of the temporarily cacheable data changes from cacheable to non-cacheable, a single instruction is issued to cause the cache to invalidate all marked data. When an "invalidate marked data" instruction is received, the cache controls sweep through the entire cache directory and invalidate any cache line which has the "marked data bit" set in a single pass. An extension of the invention involves using a multi-bit field rather than a single bit to provide a more versatile control of the temporary cacheability of data.

89 citations


Patent
27 Mar 1986
TL;DR: In this paper, the authors propose to use a cache memory and a main memory with a transformation unit between the main memory and the cache memory so that at least a portion of an information unit retrieved from the cache may be transformed during retrieval of the information (fetch) from a mainmemory and prior to storage in the cache (cache).
Abstract: A computer having a cache memory and a main memory is provided with a transformation unit between the main memory and the cache memory so that at least a portion of an information unit retrieved from the main memory may be transformed during retrieval of the information (fetch) from a main memory and prior to storage in the cache memory (cache). In a specific embodiment, an instruction may be predecoded prior to storage in the cache memory. In another embodiment involving a branch instruction, the address of the target of the branch is calculated prior to storing in the instruction cache. The invention has advantages where a particular instruction is repetitively executed since a needed decode operation which has been partially performed previously need not be repeated with each execution of an instruction. Consequently, the latency time of each machine cycle may be reduced, and the overall efficiency of the computing system can be improved. If the architecture defines delayed branch instructions, such branch instructions may be executed in effectively zero machine cycles. This requires a wider bus and an additional register in the processor to allow the fetching of two instructions from the cache memory in the same cycle.

86 citations


Patent
03 Oct 1986
TL;DR: In this paper, a cache memory system with multiple-word boundary registers, multipleword line registers, and a multipleword boundary detector system is presented, where the cache memory stores four words per addressable line of cache storage.
Abstract: In a cache memory system, multiple-word boundary registers, multiple-word line registers, and a multiple-word boundary detector system provide accelerated access of data contained within the cache memory within the multiple-word boundaries, and provides for effective prefetch of sequentially ascending locations of stored data from the cache memory. In an illustrated embodiment, the cache memory stores four words per addressable line of cache storage, and accordingly quad-word boundary registers determine boundary limits on quad-words, quad-word line registers store, in parallel, a selected line from the cache memory, and a quad-word boundary detector system determines when to prefetch the next set of quad-words from the cache memory for storage in the quad-word line registers.

82 citations


Patent
21 Feb 1986
TL;DR: In this article, a cache and memory management system architecture and associated protocol is described, which is comprised of a set associative memory cache subsystem, a set associated translation logic memory subsystem, hardwired page translation, selectable access mode logic, and selectively enableable instruction prefetch logic.
Abstract: A cache and memory management system architecture and associated protocol is disclosed. The cache and memory management system is comprised of a set associative memory cache subsystem, a set associative translation logic memory subsystem, hardwired page translation, selectable access mode logic, and selectively enableable instruction prefetch logic. The cache and memory management system includes a system interface for coupling to a systems bus to which a main memory is coupled, and is also comprised of a processor/cache bus interface for coupling to an external CPU. As disclosed, the cache memory management system can function as either an instruction cache with instruction prefetch capability, and on-chip program counter capabilities, and as a data cache memory management system which has an address register for receiving addresses from the CPU, to initiate a transfer of defined numbers of words of data commencing at the transmitted address. Another novel feature disclosed is the quadword boundary, quadword line registers, and quadword boundary detector subsystem, which accelerates access of data within quadword boundaries, and provides for effective prefetch of sequentially ascending locations of storage instructions or data from the cache memory subsystem.

Patent
16 Oct 1986
TL;DR: In this article, a pipelined digital computer processor system is provided comprising an instruction prefetch unit (IPU,2) for prefetching instructions and an arithmetic logic processing unit (ALPU, 4) for executing instructions.
Abstract: A pipelined digital computer processor system (10, FIG. 1) is provided comprising an instruction prefetch unit (IPU,2) for prefetching instructions and an arithmetic logic processing unit (ALPU, 4) for executing instructions. The IPU (2) has associated with it a high speed instruction cache (6), and the ALPU (4) has associated with it a high speed operand cache (8). Each cache comprises a data store (84, 94, FIG. 3) for storing frequently accessed data, and a tag store (82, 92, FIG. 3) for indicating which main memory locations are contained in the respective cache. The IPU and ALPU processing units (2, 4) may access their associated caches independently under most conditions. When the ALPU performs a write operation to main memory, it also updates the corresponding data in the operand cache and, if contained therein, in the instruction cache permitting the use of self-modifying code. The IPU does not write to either cache. Provision is made for clearing the caches on certain conditions when their contents become invalid.

Patent
06 Feb 1986
TL;DR: In this article, an address translation unit is included on the same chip as, and logically between, the address generating unit and the tag comparator logic, and interleaved access to more than one cache may be accomplished on the external address, data and tag busses.
Abstract: A cache-based computer architecture has the address generating unit and the tag comparator packaged together and separately from the cache RAMS. If the architecture supports virtual memory, an address translation unit may be included on the same chip as, and logically between, the address generating unit and the tag comparator logic. Further, interleaved access to more than one cache may be accomplished on the external address, data and tag busses.

Patent
James W. Keeley1
27 Jun 1986
TL;DR: In this paper, a cache memory subsystem has multilevel directory memory and buffer memory pipeline stages shared by at least a pair of independently operated central processing units, each processing unit is allocated one-half of the total available cache memory space by separate accounting replacement apparatus included within the buffer memory stage.
Abstract: A cache memory subsystem has multilevel directory memory and buffer memory pipeline stages shared by at least a pair of independently operated central processing units. For completely independent operation, each processing unit is allocated one-half of the total available cache memory space by separate accounting replacement apparatus included within the buffer memory stage. A multiple allocation memory (MAM) is also included in the buffer memory stage. During each directory allocation cycle performed for a processing unit, the allocated space of the other processing unit is checked for the presence of a multiple allocation. The address of the multiple allocated location associated with the processing unit having the lower priority is stored in the MAM allowing for earliest data replacement thereby maintaining data coherency between both independently operated processing units.

Proceedings Article
01 Jan 1986
TL;DR: The goal is for the cache behavior to dominate but the large storage facility to dominat osts, thus giving the illusion of a large, fast, inexpensive storage system.
Abstract: m tively slow and inexpensive source of information and a uch faster consumer of that information. The cache c capacity is relatively small and expensive, but quickly ac essible. The goal is for the cache behavior to dominate e c performance but the large storage facility to dominat osts, thus giving the illusion of a large, fast, inexpensive a s storage system. Successful operation depends both on ubstantial level of locality being exhibited by the consut mer, and careful strategies being chosen for cache opera ion disciplines for replacement of contents, update syn-

Patent
03 Oct 1986
TL;DR: In this article, a very high speed instruction and data interface circuitry for coupling via respective separate very high-speed instruction and interface buses to respective external instruction cache and data cache circuitry is described.
Abstract: A microprocessor architecture is disclosed having separate very high speed instruction and data interface circuitry for coupling via respective separate very high speed instruction and data interface buses to respective external instruction cache and data cache circuitry. The microprocessor is comprised of an instruction interface, a data interface, and an execution unit. The instruction interface controls communications with the external instruction cache and couples the instructions from the instruction cache to the microprocessor at very high speed. The data interface controls communications with the external data cache and communicates data bidirectionally at very high speed between the data cache and the microprocessor. The execution unit selectively processes the data received via the data interface from the data cache responsive to the execution unit decoding and executing a respective one of the instructions received via the instruction interface from the instruction cache. In one embodiment, the external instruction cache is comprised of a program counter and addressable memory for outputting stored instructions responsive to its program counter and to an instruction cache advance signal output from the instruction interface. Circuitry in the instruction interface selectively outputs an initial instruction address for storage in the instruction cache program counter responsive to a context switch or branch, such that the instruction interface repetitively couples a plurality of instructions from the instruction cache to the microprocessor responsive to the cache advance signal, independent of and without the need for any intermediate or further address output from the instruction interface to the instruction cache except upon the occurrence of another context switch or branch.

Patent
06 Aug 1986
TL;DR: Cache memory includes a dual or two-part cache with one part of the cache being primarily designated for instruction data while the other part is designated for operand data, but not exclusively.
Abstract: Cache memory includes a dual or two part cache with one part of the cache being primarily designated for instruction data while the other is primarily designated for operand data, but not exclusively. For a maximum speed of operation, the two parts of the cache are equal in capacity. The two parts of the cache, designated I-Cache and O-Cache, are semi-independent in their operation and include arrangements for effecting synchronized searches, they can accommodate up to three separate operations substantially simultaneously. Each cache unit has a directory and a data array with the directory and data array being separately addressable. Each cache unit may be subjected to a primary and to one or more secondary concurrent uses with the secondary uses prioritized. Data is stored in the cache unit on a so-called store-into basis wherein data obtained from the main memory is operated upon and stored in the cache without returning the operated upon data to the main memory unit until subsequent transactions require such return.

Patent
20 Oct 1986
TL;DR: In this article, the cache memory is not accessable other than from cache memory, and the main memory access delay is not caused by requests from other system modules such as the I/O controller.
Abstract: A computer system in which only the cache memory is permitted to communicate with main memory and the same address being used in the cache is also sent at the same time to the main memory. Thus, as soon as it is discovered that the desired main memory address is not presently in the cache, the main memory RAMs can be read to the cache without being delayed by the main memory address set up time. In addition, since the main memory is not accessable other than from the cache memory, there is also no main memory access delay caused by requests from other system modules such as the I/O controller. Likewise, since the contents of the cache memory is written into a temporary register before being sent to the main memory, a main memory read can be performed before doing a writeback of the cache to the main memory, so that data can be back to the cache in approximately the same amount of time required for a normal main memory access. The result is a significant reduction in the overhead time normally associated with cache memories.

Patent
31 Jul 1986
TL;DR: In this paper, the cache memory is divided into two valid bits associated with the user execution space and the supervisor or operating system execution space, and each collection of valid bits can be cleared in unison independently of the other.
Abstract: Each entry in a cache memory located between a processor and an MMU has two valid bits. One valid bit is associated with the user execution space and the other with the supervisor or operating system execution space. Each collection of valid bits can be cleared in unison independently of the other. This allows supervisor entries in the cache to survive context changes without being purged along with the user entries.


Patent
01 Dec 1986
TL;DR: In this article, the performance of instruction processors using an instruction cache memory in combination with a sequential transfer main memory has been investigated, where preselected instructions are placed in cache memory to minimize the delay associated with fetching the same sequence from main memory following a subsequent branch to the same instruction string.
Abstract: Methods and apparatus are set forth for optimizing the performance of instruction processors using an instruction cache memory in combination with a sequential transfer main memory According to the invention, the memory system stores preselected instructions in cache memory The instructions are those that immediately follow a branch operation The purpose of storing these instructions is to minimize, and if possible, eliminate the delay associated with fetching the same sequence from main memory following a subsequent branch to the same instruction string The number of instructions that need to be cached (placed in cache memory) is a function of the access time for the first and subsequent fetches from sequential main memory, the speed of the cache memory, and instruction execution time The invention is particularly well suited for use in computer systems having RISC architectures with fixed instruction lengths

Patent
14 Jul 1986
TL;DR: In this paper, a disable circuit is provided to prevent the cache from providing the item when a signal external to the data processor is provided, so that a user, with the external signal, can cause a data processor to make all of its requests for items of operating information to the memory where these requests can be detected.
Abstract: A data processor is adapted for operation with a memory containing a plurality of items of operating information for the data processor. In addition a cache stores a selected number of all of the items of the operating information. When the cache provides an item of operating information, the memory is not requested to provide the item so that a user of the data processor cannot detect the request for the item. A disable circuit is provided to prevent the cache from providing the item when a signal external to the data processor is provided. Consequently, a user, with the external signal, can cause the data processor to make all of its requests for items of operating information to the memory where these requests can be detected.

Patent
02 Jan 1986
TL;DR: In this article, a lock warning mechanism is proposed to warn the paged memory management unit that the translator cache is in danger of becoming full of locked translators, if the last translator is removed from the cache.
Abstract: In a data processing system, a paged memory management unit (PMMU) translates logical addresses provided by a processor to physical addresses in a memory using translators constructed from page descriptors comprising, in part, translation tables stored in the memory. The PMMU maintains a set of recently used translators in a translator cache. In response to a particular lock value contained in a lock field of the page descriptor for a particular page, the PMMU sets a lock indicator in the translator cache associated with the corresponding translator, to preclude replacement of this translator in the translator cache. A lock warning mechanism provides a lock warning signal whenever all but a predetermined number of the translators in the cache are locked. In response, the PMMU can warn the processor that the translator cache is in danger of becoming full of locked translators. Preferably, the PMMU is also inhibited from locking the last translator in the cache.

Patent
Tetsu Igarashi1
15 Dec 1986
TL;DR: In this paper, a cache memory control apparatus is presented, which includes data register blocks which are individually controlled for each byte, cache memory blocks, and a decoder for generating control signals which control the access to those blocks.
Abstract: A cache memory control apparatus according to the present invention includes data register blocks which are individually controlled for each byte, cache memory blocks, and a decoder for generating control signals which control the access to those blocks In this cache memory control apparatus, when a cache hit is made in a write mode for byte data, the control signal is supplied to the data register blocks and cache memory blocks to individually control the respective blocks, thereby allowing word data corresponding to the write byte data to be synthesized Thus, the word data can be output to an external device by one operation

Patent
24 Dec 1986
TL;DR: In this article, a memory control apparatus for a data processor using a virtual memory technique includes two cache memories one for storing a portion of the instructions located in the main memory (MMU), the other for storing operand data located in main memory.
Abstract: A memory control apparatus for a data processor using a virtual memory technique includes two cache memories one for storing a portion of the instructions located in the main memory (MMU), the other for storing a portion of the operand data located in main memory. A separate translation look aside buffer (TLB) is connected to each cache memory, with the TLB connected to the cache memory storing instructions operating to translate logical addresses to real addresses in the MMU storing instructions, while the TLB connected to the cache memory storing operand data operating to translate logical addresses to real addresses in the MMU storing operand data.

Journal ArticleDOI
01 May 1986
TL;DR: A heuristic algorithm for register spilling within basic blocks is introduced and trace optimization techniques can extend the use of the algorithm to global allocation and it is shown that theUse of registers can be more effective in reducing the bus traffic than cache memory of the same size.
Abstract: Single-chip computers are becoming increasingly limited by the access constraints to off-chip memory To achieve high performance, the structure of on-chip memory must be appropriate, and it must be allocated effectively to minimize off-chip communication We report experiments that demonstrate that on-chip memory can be effective for local variable accesses For best use of the limited on-chip area, we suggest organizing memory as registers and argue that an effective register spilling scheme is required We introduce a heuristic algorithm for register spilling within basic blocks and demonstrate that trace optimization techniques can extend the use of the algorithm to global allocation Through trace simulation, we show that the use of registers can be more effective in reducing the bus traffic than cache memory of the same size

Journal ArticleDOI
TL;DR: The intent is to credibly quantify the performance implications of parameter selection in a manner which emphasizes implementation tradeoffs using address reference traces obtained from typical multitasking UNIX workloads to research cache memory performance.
Abstract: In a previous issue, we described a study of translation buffer performance undertaken in conjunction with the design of a memory management unit for a new 32 bit microprocessor [Alex85b]. This work produced generalized results via trace-driven simulations. The address reference traces were obtained from typical multitasking UNIX workloads and have now been used to research cache memory performance. Caches are small, fast memories placed between a processor and the main storage of a computer system in order to reduce the amount of time spent waiting on memory accesses. Blocks of recently referenced information are stored in the cache because future references are likely to be nearby (termed \"property of locality\") [Denn72]. When memory references are satisfied by the cache, the overhead of accesing main storage is eliminated. This frees the system bus for DMA or multiple processor activity and provides a significant improvement in the cost/performance ratio of the memory hierarchy. Processors may therefore operate at cache speed while maintaining the economic advantages of a slower main storage. There have been many published reports concerning cache memory performance. Some of these studies have been based on measurements, others have relied on analytic modeling, but most of the work has utilized trace-driven simulations. Address traces are typically captured by interpretively executing a program and recording each of its memory references. These traces are then used to drive the simulation model of a particular cache design. Although this approach has produced valuable insights, the absolute accuracy is questionable because the results are usually based on short traces of user programs which exclude operating system code, interrupts, task switches, instruction prefetching effects and input/output related activities [Smit85a]. The traces used during this project are unique in that they represent the UNIX environment and do not suffer from the problems described above. They were obtained via a hardware monitor from a system actually executing commonly fouhd workloads. Once the domain of mainframe engineers, cache memory has recently become the object of much popular attention. Driven by the practical need for cost effective utilization of high performance microprocessors and the desire to harness the power of multiple microprocessor configurations, controversial questions are being asked. And after discussions with several system architects, we felt that a cache study based on our address traces would be worthwhile. We have re-examined the basic design parameters, obtained some new results and provided a commentary on \"state-of-the-art\" issues. Our intent is to credibly quantify the performance implications of parameter selection in a manner which emphasizes implementation tradeoffs. Topics addressed by this paper include: (I) the effects of varying the block size and degree of associativity for a wide range of cache sizes; (2) coherency techniques and the cold start impact of invalidation; (3) memory update mechanisms and the efficiency achieved by several bus oriented protocols; (4) replacement algorithms; (5) sub-block and sector mapping schemes; (6) instruction caches; and (7) split caches.

Patent
Masatoshi Kofuji1
03 Feb 1986
TL;DR: In this article, a cache memory circuit is responsive to a read request to fetch a data block in a block transfer from a main memory to cache memory, and a sequence of data units into which the data block is divided is successively assigned to a plurality of cache write registers.
Abstract: A cache memory circuit is responsive to a read request to fetch a data block in a block transfer from a main memory to a cache memory. A sequence of data units into which the data block is divided is successively assigned to a plurality of cache write registers. The assigned data units are simultaneously moved to one of sub-blocks of the cache memory during each of write-in durations with an idle interval left between two adjacent ones of the write-in durations. Each state of the sub-blocks is monitored in a controller. During the idle interval, a following read request can be processed with reference to the states of the sub-blocks even when it requests the data block being transferred. In addition, a read address for the following read request may be preserved in a saving address register to process another read request.

01 Jun 1986
TL;DR: This study provides new results about the relationship between processor architecture and memory traffic for instruction fetches for a general range of cache sizes and observation that relative instruction traffic differences between architectures are about the same with very large caches as with no cache and that intermediate sized caches tend to accentuate such relative differences.
Abstract: Previously, analysis of processor architecture has involved the measurement of hardware or interpreters. The use of benchmarks written in high-level languages has added the requirement for the compiler targeted to each architecture studied. Herein, a methodology based on the use of compiler tools has been developed which allows simulation of different processors without the necessity of creating interpreters and compilers for each architecture simulated. The resource commitment per architecture studied is greatly reduced and the study of a spectrum of processor architectures is facilitated. Tools for the use of this methodology were developed from existing compiler and simulation tools. The new tools were validated and the methodology was then applied to study the effects of processor architecture on instruction cache performance. Over 50 architectures from three architectural families (Stack, Register Set and Direct Correspondence) were simulated. Earlier, studies have compared and contrasted the effects of various features of processor architecture. Instruction cache performance has also been studied in some depth. This study provides new results about the relationship between processor architecture and memory traffic for instruction fetches for a general range of cache sizes. Among the results is the general observation that relative instruction traffic differences between architectures are about the same with very large caches as with no cache and that intermediate sized caches tend to accentuate such relative differences.

Patent
26 Nov 1986
TL;DR: In this article, a processor that is optimized for digital signal processing is provided with separate, external program and data memories (102 and 112 respectively) so that the next instruction to be executed can be fetched from the program memory (102) during the same machine cycle in which a data word is retrieved from the data memory (112).
Abstract: A processor (100) that is optimized for digital signal processing is provided with separate, external program and data memories (102 and 112 respectively) so that the next instruction to be executed can be fetched from the program memory (102) during the same machine cycle in which a data word is retrieved from the data memory (112). A portion of the program memory (102) is also used to store data so that two data words can be retrieved during the same machine cycle. A cache memory (230) located on the processor chip stores a small number of previously-executed instructions so that the next instruction can be fetched from the cache memory (230) during machine cycles in which data words are fetched from both the program and data memories (102, 112). If all instructions for a program loop can be stored in the cache memory (230), each machine cycle in the loop can be used to fetch two data words from the external memories (102, 112), thus effectively increasing processor speed to the equivalent of a three-external-memory system.