Selective GPU caches to eliminate CPU-GPU HW cache coherence
Citations
234 citations
Cites methods from "Selective GPU caches to eliminate C..."
...Since GPU cache miss rates are usually high, we choose an estimate of 50% for MissLD , close to the GPU cache miss rates reported by prior works on a wide range of workloads [1, 45]....
[...]
146 citations
Cites background from "Selective GPU caches to eliminate C..."
...Approaches without extensive hardware changes are likely to receive more widespread adoption in the industry [10]....
[...]
59 citations
Cites background from "Selective GPU caches to eliminate C..."
...protocols [1, 63], PCIe attached discrete GPUs (where integrated coherence is not possible) are likely to continue dominating the market, thanks to broad compatibility between CPU and GPU vendors....
[...]
44 citations
Cites background from "Selective GPU caches to eliminate C..."
...However, this requires additional support for handling CPU-GPU coherence [38]....
[...]
...The performance of software based paging systems between system memory and GPU memory is an active area of research with performance improving rapidly [38], [50]....
[...]
37 citations
Cites background from "Selective GPU caches to eliminate C..."
...Software-managed page migration strategies that intelligently perform explicit page copies between CPU and GPU memories are effective for many dense, hierarchical CPU-GPU sharing patterns [35], [3], [36]....
[...]
References
7,390 citations
Additional excerpts
...Bloom Filters: Bloom Filters [6] and Cuckoo Filters [18, 49] have been used by several architects [63, 69, 70] in the past....
[...]
2,697 citations
"Selective GPU caches to eliminate C..." refers background in this paper
...Table 1 shows the L1 and L2 cache hit rates across a variety of workloads from the Rodinia and United States Department of Energy application suites [9,67]....
[...]
1,558 citations
[...]
963 citations
835 citations
"Selective GPU caches to eliminate C..." refers background in this paper
...The GPU client cache also need not be specific to just GPU clients, other accelerators such as FPGAs or spatial architectures [50, 56] that will be integrated along-side a traditional CPU architecture will also likely benefit from such a client cache....
[...]