scispace - formally typeset
Patent

Integrated processor/memory device with victim data cache

TLDR
In this paper, an integrated processor/memory device consisting of a main memory, a CPU, a victim cache, and a primary cache is presented, where each of the primary cache banks stores one or more cache lines of words and each cache line has a corresponding memory location in the corresponding main memory bank.
Abstract
An integrated processor/memory device comprising a main memory, a CPU, a victim cache, and a primary cache. The main memory comprises main memory banks. The victim cache stores victim cache sub-lines of words. Each of the victim cache sub-lines has a corresponding memory location in the main memory. When the CPU issues an address in the address space of the main memory, the victim cache determines whether a victim cache hit or miss has occurred in the victim cache. And, when a victim cache miss occurs, the victim cache replaces a selected victim cache sub-line of the victim cache sub-lines in the victim cache with a new victim cache sub-line. The primary cache comprises primary cache banks. Each of the primary cache banks stores one or more cache lines of words. Each cache line has a corresponding memory location in the corresponding main memory bank. When the CPU issues an address in the portion of the address space of the corresponding main memory bank, the corresponding primary cache bank determines whether a cache hit or a cache miss has occurred. When a cache miss occurs, the primary cache bank replaces a victim cache line of the cache lines in the primary cache bank with a new cache line from the corresponding memory location in the corresponding main memory bank specified by the issued address and routs a sub-line of the victim cache line as the new victim cache sub-line.

read more

Citations
More filters
Patent

Processing architecture having a compare capability

TL;DR: In this paper, a register file, comparison logic, decode logic, and a store path are disclosed for a compare instruction. But decoding the register file is not a straightforward task, as it is computationally computationally expensive.
Patent

System and method for maintaining memory coherency in a computer system having multiple system buses

TL;DR: In this article, a cache-coherent, multiple-bus, multiprocessing system and method interconnects multiple system buses (1, 2) and an I/O bus (3) to a shared main memory (132) while minimizing the impact to latency and total bandwidth within the system.
Patent

Horizontally-shared cache victims in multiple core processors

TL;DR: Cache priority rules can be based on cache coherency data, load balancing schemes, and architectural characteristics of the processor as discussed by the authors, and the processor evaluates cache priority rules to determine whether victim lines are discarded, written back to system memory, or stored in other processor core units' caches.
Patent

Vliw computer processing architecture with on-chip dynamic ram

TL;DR: In this paper, a novel processor chip (10) having a processing core (12), at least one bank of memory (14), an I/O link (26), and a memory controller (20) is configured to receive memory requests from processing core(12) and distributed shared memory controller(22), determine whether the memory requests are directed to memory(14) on chip(10) or the external memory through external memory interface (24).
Patent

TLB tag parity checking without CAM read

TL;DR: In this article, an apparatus and method for expediting parity checked TLB access operations is described in connection with a multithreaded multiprocessor chip, which eliminates the need to read a CAM entry from a TLB during access by storing the tag parity value in a RAM portion of the TLB, using the CAM key input to generate a tag parity check value for a matched entry, and comparing the generated tag parity checks to the stored tag parity values to determine if there is a parity match or error.
References
More filters
Journal ArticleDOI

Hitting the memory wall: implications of the obvious

TL;DR: This work proposes an exact analysis, removing all remaining uncertainty, based on model checking, using abstract-interpretation results to prune down the model for scalability, and notably improves precision upon classical abstract interpretation at reasonable cost.
Proceedings ArticleDOI

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.
Proceedings ArticleDOI

Missing the Memory Wall: The Case for Processor/Memory Integration

TL;DR: It is shown that processor memory integration can be used to build competitive, scalable and cost-effective MP systems and results from execution driven uni- and multi-processor simulations show that the benefits of lower latency and higher bandwidth can compensate for the restrictions on the size and complexity of the integrated processor.
Proceedings ArticleDOI

EXECUBE-A New Architecture for Scaleable MPPs

TL;DR: The overall architecture of the EXECUBE chip, the new computational model it represents, some comparisons against the current state of the art, how it might be used for real applications, and some extrapolations into future developments are discussed.
Patent

Data processing system and method with small fully-associative cache and prefetch buffers

TL;DR: In this article, the authors propose an extension to the basic stream buffer, called multi-way stream buffers (62), which is useful for prefetching along multiple intertwined data reference streams.