Topic

CAS latency

About: CAS latency is a research topic. Over the lifetime, 1927 publications have been published within this topic receiving 51939 citations. The topic is also known as: Column Access Strobe & Column Access Strobe latency.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Scalable high performance main memory system using phase-change memory technology

[...]

Moinuddin K. Qureshi¹, Vijayalakshmi Srinivasan¹, Jude A. Rivers¹•Institutions (1)

IBM¹

20 Jun 2009

TL;DR: This paper analyzes a PCM-based hybrid main memory system using an architecture level model of PCM and proposes simple organizational and management solutions of the hybrid memory that reduces the write traffic to PCM, boosting its lifetime from 3 years to 9.7 years.

...read moreread less

Abstract: The memory subsystem accounts for a significant cost and power budget of a computer system. Current DRAM-based main memory systems are starting to hit the power and cost limit. An alternative memory technology that uses resistance contrast in phase-change materials is being actively investigated in the circuits community. Phase Change Memory (PCM) devices offer more density relative to DRAM, and can help increase main memory capacity of future systems while remaining within the cost and power constraints.In this paper, we analyze a PCM-based hybrid main memory system using an architecture level model of PCM.We explore the trade-offs for a main memory system consisting of PCMstorage coupled with a small DRAM buffer. Such an architecture has the latency benefits of DRAM and the capacity benefits of PCM. Our evaluations for a baseline system of 16-cores with 8GB DRAM show that, on average, PCM can reduce page faults by 5X and provide a speedup of 3X. As PCM is projected to have limited write endurance, we also propose simple organizational and management solutions of the hybrid memory that reduces the write traffic to PCM, boosting its lifetime from 3 years to 9.7 years.

...read moreread less

1,451 citations

Journal Article•DOI•

Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors

[...]

Yoongu Kim¹, Ross Daly¹, Jeremie S. Kim¹, Chris Fallin¹, Ji-Hye Lee¹, Donghyuk Lee¹, Christopher B. Wilkerson², Konrad K. Lai², Onur Mutlu¹ - Show less +5 more•Institutions (2)

Carnegie Mellon University¹, Intel²

14 Jun 2014

TL;DR: This paper exposes the vulnerability of commodity DRAM chips to disturbance errors, and shows that it is possible to corrupt data in nearby addresses by reading from the same address in DRAM by activating the same row inDRAM.

...read moreread less

Abstract: Memory isolation is a key property of a reliable and secure computing system--an access to one memory address should not have unintended side effects on data stored in other addresses. However, as DRAM process technology scales down to smaller dimensions, it becomes more difficult to prevent DRAM cells from electrically interacting with each other. In this paper, we expose the vulnerability of commodity DRAM chips to disturbance errors. By reading from the same address in DRAM, we show that it is possible to corrupt data in nearby addresses. More specifically, activating the same row in DRAM corrupts data in nearby rows. We demonstrate this phenomenon on Intel and AMD systems using a malicious program that generates many DRAM accesses. We induce errors in most DRAM modules (110 out of 129) from three major DRAM manufacturers. From this we conclude that many deployed systems are likely to be at risk. We identify the root cause of disturbance errors as the repeated toggling of a DRAM row's wordline, which stresses inter-cell coupling effects that accelerate charge leakage from nearby rows. We provide an extensive characterization study of disturbance errors and their behavior using an FPGA-based testing platform. Among our key findings, we show that (i) it takes as few as 139K accesses to induce an error and (ii) up to one in every 1.7K cells is susceptible to errors. After examining various potential ways of addressing the problem, we propose a low-overhead solution to prevent the errors

...read moreread less

999 citations

Proceedings Article•DOI•

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

[...]

Shane Ryoo¹, Christopher I. Rodrigues¹, Sara S. Baghsorkhi¹, Sam S. Stone¹, David B. Kirk², Wen-mei W. Hwu¹ - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Nvidia²

20 Feb 2008

TL;DR: This work discusses the GeForce 8800 GTX processor's organization, features, and generalized optimization strategies, and achieves increased performance by reordering accesses to off-chip memory to combine requests to the same or contiguous memory locations and apply classical optimizations to reduce the number of executed operations.

...read moreread less

Abstract: GPUs have recently attracted the attention of many application developers as commodity data-parallel coprocessors. The newest generations of GPU architecture provide easier programmability and increased generality while maintaining the tremendous memory bandwidth and computational power of traditional GPUs. This opportunity should redirect efforts in GPGPU research from ad hoc porting of applications to establishing principles and strategies that allow efficient mapping of computation to graphics hardware. In this work we discuss the GeForce 8800 GTX processor's organization, features, and generalized optimization strategies. Key to performance on this platform is using massive multithreading to utilize the large number of cores and hide global memory latency. To achieve this, developers face the challenge of striking the right balance between each thread's resource usage and the number of simultaneously active threads. The resources to manage include the number of registers and the amount of on-chip memory used per thread, number of threads per multiprocessor, and global memory bandwidth. We also obtain increased performance by reordering accesses to off-chip memory to combine requests to the same or contiguous memory locations and apply classical optimizations to reduce the number of executed operations. We apply these strategies across a variety of applications and domains and achieve between a 10.5X to 457X speedup in kernel codes and between 1.16X to 431X total application speedup.

...read moreread less

993 citations

Journal Article•DOI•

The Mips R10000 superscalar microprocessor

[...]

K.C. Yeager

01 Apr 1996-IEEE Micro

TL;DR: The Mips R10000 is a dynamic, superscalar microprocessor that implements the 64-bit Mips 4 instruction set architecture that fetches and decodes four instructions per cycle and dynamically issues them to five fully-pipelined, low-latency execution units.

...read moreread less

Abstract: The Mips R10000 is a dynamic, superscalar microprocessor that implements the 64-bit Mips 4 instruction set architecture. It fetches and decodes four instructions per cycle and dynamically issues them to five fully-pipelined, low-latency execution units. Instructions can be fetched and executed speculatively beyond branches. Instructions graduate in order upon completion. Although execution is out of order, the processor still provides sequential memory consistency and precise exception handling. The R10000 is designed for high performance, even in large, real-world applications with poor memory locality. With speculative execution, it calculates memory addresses and initiates cache refills early. Its hierarchical, nonblocking memory system helps hide memory latency with two levels of set-associative, write-back caches.

...read moreread less

809 citations

Journal Article•DOI•

A case for intelligent RAM

[...]

David A. Patterson¹, Thomas Anderson¹, Neal Cardwell¹, Richard Fromm¹, Kimberly Keeton¹, Christos Kozyrakis¹, R. Thomas¹, Katherine Yelick¹ - Show less +4 more•Institutions (1)

University of California, Berkeley¹

01 Mar 1997-IEEE Micro

TL;DR: The state of microprocessors and DRAMs today is reviewed, some of the opportunities and challenges for IRAMs are explored, and performance and energy efficiency of three IRAM designs are estimated.

...read moreread less

Abstract: Two trends call into question the current practice of fabricating microprocessors and DRAMs as different chips on different fabrication lines. The gap between processor and DRAM speed is growing at 50% per year; and the size and organization of memory on a single DRAM chip is becoming awkward to use, yet size is growing at 60% per year. Intelligent RAM, or IRAM, merges processing and memory into a single chip to lower memory latency, increase memory bandwidth, and improve energy efficiency. It also allows more flexible selection of memory size and organization, and promises savings in board area. This article reviews the state of microprocessors and DRAMs today, explores some of the opportunities and challenges for IRAMs, and finally estimates performance and energy efficiency of three IRAM designs.

...read moreread less

671 citations

Collapse

Network Information

Performance

Metrics

1,991

Papers

54,820

Citations

No. of papers in the topic in previous years
Year	Papers
2023	20
2022	44
2021	16
2020	22
2019	29
2018	31

CAS latency

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics