scispace - formally typeset
Search or ask a question
Topic

Cache pollution

About: Cache pollution is a research topic. Over the lifetime, 11353 publications have been published within this topic receiving 262139 citations.


Papers
More filters
Proceedings ArticleDOI
28 Oct 1996
TL;DR: It is concluded that processor bandwidth can be a first-order bottleneck to achieving good performance when studying commercial benchmarks, and operating system code and data structures contribute disproportionately to the memory access load.
Abstract: We studied two aspects of the performance of Windows NT: processor bandwidth requirements for memory accesses in a uniprocessor system running benchmark and commercial applications, and locking behavior of a commercial database on a small-scale multiprocessor. Our studies are based on full dynamic execution traces of the systems, which include all instructions executed by the operating system and applications over periods of a few seconds (enough time to allow for significant computation). The traces were obtained on Alpha PCs, using a new software tool called PatchWrx that takes advantage of the Alpha architecture’s PAL-code layer to implement efficient, comprehensive system tracing. Because the Alpha version of Windows NT uses substantially the same code base as other versions, and therefore executes nearly the same sequence of calls, basic blocks, and data structure accesses, we believe our conclusions are relevant for non-Alpha systems as well. This paper describes our performance studies and interesting aspects of PatchWrx. We conclude from our studies that processor bandwidth can be a first-order bottleneck to achieving good performance. This is particularly apparent when studying commercial benchmarks. Operating system code and data structures contribute disproportionately to the memory access load. We also found that operating system software lock contention was a factor preventing the database benchmark from scaling up on the small multiprocessor, and that the cache coherence protocol employed by the machine introduced more cache interference than necessary.

81 citations

Patent
29 Dec 2000
TL;DR: In this paper, the authors describe a system for servicing a full cache line in response to a partial cache line request, which includes a storage to store at least one cache line, a hit/miss detector, and a data mover.
Abstract: A system is described for servicing a full cache line in response to a partial cache line request. The system includes a storage to store at least one cache line, a hit/miss detector, and a data mover. The hit/miss detector receives a partial cache line read request from a requesting agent and dispatches a fetch request to a memory device to fetch a full cache line data that contains data requested in the partial cache line read request from the requesting agent. The data mover loads the storage with the full cache line data returned from the memory device and forwards a portion of the full cache line data requested by the requesting agent. If data specified in a subsequent partial cache line request from the requesting agent is contained within the full cache line data specified in the previously dispatched fetch request, the hit/miss detector will send a command to the data mover to forward another portion of the full cache line data stored in the storage to the requesting agent. In one embodiment, the system also includes a write combining logic to combine two or more consecutive write requests that meet defined conditions into a single write request.

81 citations

Patent
Kevin M. Conley1, Yoram Cedar1
14 Dec 2005
TL;DR: In this article, a method of programming a non-volatile memory array using an on-chip write cache is described. But this method does not address the problem of nonvolatile arrays.
Abstract: A method of programming a non-volatile memory array using an on-chip write cache is disclosed. Individual data packets received by the memory system are stored in cache memory. More than one data packet may be stored in this way and then programmed to a single page of the non-volatile array. This results in more efficient use of storage space in the non-volatile array.

81 citations

Patent
31 Jan 2002
TL;DR: In this paper, a system and method for dynamically inserting a data cache prefetch instruction into a program executable to optimize the program being executed is presented, and the method, and system thereof, monitors the execution of the program, samples on the cache miss events, identifies the time-consuming execution paths, and optimizes the program during runtime by inserting a prefetch instructions into a new optimized code to hide cache miss latency.
Abstract: A system and method for dynamically inserting a data cache prefetch instruction into a program executable to optimize the program being executed. The method, and system thereof, monitors the execution of the program, samples on the cache miss events, identifies the time-consuming execution paths, and optimizes the program during runtime by inserting a prefetch instruction into a new optimized code to hide cache miss latency.

80 citations

Patent
Yuanlong Wang1, Zong Yu1, Xiaofan Wei1, Earl T. Cohen1, Brian R. Baird1, Daniel Fu1 
10 Aug 2001
TL;DR: In this paper, the authors present the Transaction Bus of a symmetric multiprocessor system, which is implemented using segmented buses, distributed muxes, point-to-point wiring, and supports transaction processing at a rate of one transaction per clock cycle.
Abstract: A preferred embodiment of a symmetric multiprocessor system includes a switched fabric (switch matrix) for data transfers that provides multiple concurrent buses that enable greatly increased bandwidth between processors and shared memory. A Transaction Controller, Transaction Bus, and Transaction Status Bus are used for serialization, centralized cache control, and highly pipelined address transfers. The shared Transaction Controller serializes transaction requests from Initiator devices that can include CPU/Cache modules and Peripheral Bus modules. The Transaction Bus of an illustrative embodiment is implemented using segmented buses, distributed muxes, point-to-point wiring, and supports transaction processing at a rate of one transaction per clock cycle. The Transaction Controller monitors the Transaction Bus, maintains a set of duplicate cache-tags for all CPU/Cache modules, maps addresses to Target devices, performs centralized cache control for all CPU/Cache modules, filters unnecessary Cache transactions, and routes necessary transactions to Target devices over the Transaction Status Bus. The Transaction Status Bus includes both bus-based based and point-to-point control of the target devices. A modified rotating priority scheme is used to provide Starvation-free support for Locked buses and memory resources via backoff operations. Speculative memory operations are supported to further enhance performance.

80 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
93% related
Compiler
26.3K papers, 578.5K citations
89% related
Scalability
50.9K papers, 931.6K citations
87% related
Server
79.5K papers, 1.4M citations
86% related
Static routing
25.7K papers, 576.7K citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202342
2022110
202112
202020
201915
201830