Patent
Integrated processor/memory device with victim data cache
TLDR
In this paper, an integrated processor/memory device consisting of a main memory, a CPU, a victim cache, and a primary cache is presented, where each of the primary cache banks stores one or more cache lines of words and each cache line has a corresponding memory location in the corresponding main memory bank.Abstract:
An integrated processor/memory device comprising a main memory, a CPU, a victim cache, and a primary cache. The main memory comprises main memory banks. The victim cache stores victim cache sub-lines of words. Each of the victim cache sub-lines has a corresponding memory location in the main memory. When the CPU issues an address in the address space of the main memory, the victim cache determines whether a victim cache hit or miss has occurred in the victim cache. And, when a victim cache miss occurs, the victim cache replaces a selected victim cache sub-line of the victim cache sub-lines in the victim cache with a new victim cache sub-line. The primary cache comprises primary cache banks. Each of the primary cache banks stores one or more cache lines of words. Each cache line has a corresponding memory location in the corresponding main memory bank. When the CPU issues an address in the portion of the address space of the corresponding main memory bank, the corresponding primary cache bank determines whether a cache hit or a cache miss has occurred. When a cache miss occurs, the primary cache bank replaces a victim cache line of the cache lines in the primary cache bank with a new cache line from the corresponding memory location in the corresponding main memory bank specified by the issued address and routs a sub-line of the victim cache line as the new victim cache sub-line.read more
Citations
More filters
Patent
Processing architecture having a compare capability
TL;DR: In this paper, a register file, comparison logic, decode logic, and a store path are disclosed for a compare instruction. But decoding the register file is not a straightforward task, as it is computationally computationally expensive.
Patent
System and method for maintaining memory coherency in a computer system having multiple system buses
TL;DR: In this article, a cache-coherent, multiple-bus, multiprocessing system and method interconnects multiple system buses (1, 2) and an I/O bus (3) to a shared main memory (132) while minimizing the impact to latency and total bandwidth within the system.
Patent
Horizontally-shared cache victims in multiple core processors
TL;DR: Cache priority rules can be based on cache coherency data, load balancing schemes, and architectural characteristics of the processor as discussed by the authors, and the processor evaluates cache priority rules to determine whether victim lines are discarded, written back to system memory, or stored in other processor core units' caches.
Patent
Vliw computer processing architecture with on-chip dynamic ram
TL;DR: In this paper, a novel processor chip (10) having a processing core (12), at least one bank of memory (14), an I/O link (26), and a memory controller (20) is configured to receive memory requests from processing core(12) and distributed shared memory controller(22), determine whether the memory requests are directed to memory(14) on chip(10) or the external memory through external memory interface (24).
Patent
TLB tag parity checking without CAM read
Mark A. Luttrell,Paul J. Jordan +1 more
TL;DR: In this article, an apparatus and method for expediting parity checked TLB access operations is described in connection with a multithreaded multiprocessor chip, which eliminates the need to read a CAM entry from a TLB during access by storing the tag parity value in a RAM portion of the TLB, using the CAM key input to generate a tag parity check value for a matched entry, and comparing the generated tag parity checks to the stored tag parity values to determine if there is a parity match or error.
References
More filters
Journal ArticleDOI
Hitting the memory wall: implications of the obvious
William A. Wulf,Sally A. McKee +1 more
TL;DR: This work proposes an exact analysis, removing all remaining uncertainty, based on model checking, using abstract-interpretation results to prune down the model for scalability, and notably improves precision upon classical abstract interpretation at reasonable cost.
Proceedings ArticleDOI
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.
Proceedings ArticleDOI
Missing the Memory Wall: The Case for Processor/Memory Integration
TL;DR: It is shown that processor memory integration can be used to build competitive, scalable and cost-effective MP systems and results from execution driven uni- and multi-processor simulations show that the benefits of lower latency and higher bandwidth can compensate for the restrictions on the size and complexity of the integrated processor.
Proceedings ArticleDOI
EXECUBE-A New Architecture for Scaleable MPPs
TL;DR: The overall architecture of the EXECUBE chip, the new computational model it represents, some comparisons against the current state of the art, how it might be used for real applications, and some extrapolations into future developments are discussed.
Patent
Data processing system and method with small fully-associative cache and prefetch buffers
Norman P. Jouppi,Alan Eustace +1 more
TL;DR: In this article, the authors propose an extension to the basic stream buffer, called multi-way stream buffers (62), which is useful for prefetching along multiple intertwined data reference streams.