scispace - formally typeset
Search or ask a question
DOI

A survey of synchronous RAM architectures

01 Jan 1999-Vol. 71
TL;DR: By providing a better understanding of the limits of current RAM designs, this report supports the decision for a particular RAM in an individual application.
Abstract: The functionality of volatile random access memories (RAMs) in personal computers, embedded systems, networking devices, and many other products is based on an access scheme which was designed over thirty years ago. Since then a variety of different realizations has evolved. Due to the fact that VLSI designs for memory chips have always been optimized for area and not for access speed, RAM chips have become more and more the performance bottleneck of complex computing systems. This survey gives an overview of current memory chip architectures. The basic functionality of memories is explained and the advantages and drawbacks of each RAM type are discussed. By providing a better understanding of the limits of current RAM designs, this report supports the decision for a particular RAM in an individual application.
Citations
More filters
Journal ArticleDOI
TL;DR: As I review performance trends, I am struck by a consistent theme across many technologies: bandwidth improves much more quickly than latency.
Abstract: As I review performance trends, I am struck by a consistent theme across many technologies: bandwidth improves much more quickly than latency. Here, I list a half-dozen performance milestones to document this observation, many reasons why it happens, a few ways to cope with it, a rule of thumb to quantify it, plus an example of how to design systems differently based on this observation.

267 citations

Proceedings ArticleDOI
09 Dec 2006
TL;DR: V virtually-pipelined memory is introduced, an architectural technique that efficiently supports high-bandwidth, uniform latency memory accesses, and high-confidence throughput even under adversarial conditions and outperforms the state of the art in specialized packet buffering architectures.
Abstract: We introduce virtually-pipelined memory, an architectural technique that efficiently supports high-bandwidth, uniform latency memory accesses, and high-confidence throughput even under adversarial conditions. We apply this technique to the network processing domain where memory hierarchy design is an increasingly challenging problem as network bandwidth increases. Virtual pipelining provides a simple to analyze programing model of a deep pipeline (deterministic latencies) with a completely different physical implementation (a memory system with banks and probabilistic mapping). This allows designers to effectively decouple the analysis of their algorithms and data structures from the analysis of the memory buses and banks. Unlike specialized hardware customized for a specific data-plane algorithm, our system makes no assumption about the memory access patterns. In the domain of network processors this will be of growing importance as the size of the routing tables, the complexity of the packet classification rules, and the amount of packet buffering required, all continue to grow at a staggering rate. We present a mathematical argument for our system?s ability to provably provide bandwidth with high confidence and demonstrate its functionality and area overhead through a synthesizable design. We further show that, even though our scheme is general purpose to support new applications such as packet reassembly, it outperforms the state of the art in specialized packet buffering architectures.

21 citations

Book
16 Jul 2001
TL;DR: A service scheme is defined which takes care of the requirements at the interface between networks of a service provider and a customer and various combinations of network processing tasks are explored for this service scheme by exhaustive simulation, which focuses on the preservation of service quality parameters.
Abstract: The increasing use of computer networks for all kinds of information exchange between autonomous computing resources is associated with a number of sideeffects. In the Internet, where computers all over the globe are interconnected, the traffic volume grows faster than the infrastructure improves, leading to congestion of networking routes. In the application domain of embedded systems, networks can be used to couple complex sensor systems with a computing core. The provision of raw bandwidth may not be sufficient in such systems to allow control with real-time constraints. The underlying requirement in both cases is a network service with a defined quality, for instance, in terms of traffic loss ratio and worst-case communication delay. The provision of suitable communication services however requires a noticeable overhead in terms of computing load. Therefore, application-specific hardware accelerators – so-called network processors – have been introduced to speed up or even enable the maintenance of certain network services. The following issues have not yet been dealt with: • Although there are network processors for high-speed networks, no processor is available that considers the requirements of the interface between networks of a service provider and a customer. • While each individual task of a network processor is well understood, it is unclear how different tasks, that potentially show interfering properties, should cooperate to preserve the service quality. The above issues are addressed in this thesis and the major contributions in the research area of algorithms and architectures for network processors are: • A service scheme is defined which takes care of the requirements at the interface between networks of a service provider and a customer. • Various combinations of network processing tasks are explored for this service scheme by exhaustive simulation. The exploration focuses on the preservation of service quality parameters.

17 citations


Cites background from "A survey of synchronous RAM archite..."

  • ...read or write This section is concluded by an overview of available RAM types A more thorough discussion can be found in [66] Only the most recent RAMs with a synchronous interface are considered in this section...

    [...]

Journal ArticleDOI
TL;DR: V virtually pipelined memory is introduced, an architectural technique that efficiently supports high bandwidth, uniform latency memory accesses, and high-confidence throughput even under adversarial conditions and outperforms the state-of-the-art in specialized packet buffering architectures.
Abstract: As network bandwidth increases, designing an effective memory system for network processors becomes a significant challenge. The size of the routing tables, the complexity of the packet classification rules, and the amount of packet buffering required all continue to grow at a staggering rate. Simply relying on large, fast SRAMs alone is not likely to be scalable or cost-effective. Instead, trends point to the use of low-cost commodity DRAM devices as a means to deliver the worst-case memory performance that network data-plane algorithms demand. While DRAMs can deliver a great deal of throughput, the problem is that memory banking significantly complicates the worst-case analysis, and specialized algorithms are needed to ensure that specific types of access patterns are conflict-free. We introduce virtually pipelined memory, an architectural technique that efficiently supports high bandwidth, uniform latency memory accesses, and high-confidence throughput even under adversarial conditions. Virtual pipelining provides a simple-to-analyze programming model of a deep pipeline (deterministic latencies) with a completely different physical implementation (a memory system with banks and probabilistic mapping). This allows designers to effectively decouple the analysis of their algorithms and data structures from the analysis of the memory buses and banks. Unlike specialized hardware customized for a specific data-plane algorithm, our system makes no assumption about the memory access patterns. We present a mathematical argument for our system's ability to provably provide bandwidth with high confidence and demonstrate its functionality and area overhead through a synthesizable design. We further show that, even though our scheme is general purpose to support new applications such as packet reassembly, it outperforms the state-of-the-art in specialized packet buffering architectures.

12 citations

Journal ArticleDOI
TL;DR: It is shown that evaluating systems often involves complex choices among a variety of factors that influence the value of a supercomputer to an organization, and that the high‐end computing community should view cost/performance comparisons of different architectures with skepticism.
Abstract: Comparisons of high-performance computers based on their peak floating point performance are common but seldom useful when comparing performance on real workloads. Factors that influence sustained performance extend beyond a system’s floating-point units, and real applications exercise machines in complex and diverse ways. Even when it is possible to compare systems based on their performance, other considerations affect which machine is best for a given organization. These include the cost, the facilities requirements (power, floorspace, etc.), the programming model, the existing code base, and so on. This paper describes some of the important measures for evaluating high-performance computers. We present data for many of these metrics based on our experience at Lawrence Livermore National Laboratory (LLNL), and we compare them with published information on the Earth Simulator. We argue that evaluating systems involves far more than comparing benchmarks and acquisition costs. We show that evaluating systems often involves complex choices among a variety of factors that influence the value of a supercomputer to an organization, and that the high-end computing community should view cost/performance comparisons of different architectures with skepticism. Published in 2005 by John Wiley & Sons, Ltd.

4 citations


Cites methods from "A survey of synchronous RAM archite..."

  • ...Fast-Page Mode (FPM) DRAM, Extended Data Out (EDO) DRAM, Synchronous DRAM (SDRAM), Double Data Rate (DDR) SDRAM and Enhanced SDRAM (ESDRAM) [52] represent a straightforward evolutionary technology path of advances to how the DRAM is accessed and of caching within the DRAM device from the basic design of an array of memory cells [53,54]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This work proposes an exact analysis, removing all remaining uncertainty, based on model checking, using abstract-interpretation results to prune down the model for scalability, and notably improves precision upon classical abstract interpretation at reasonable cost.
Abstract: We all know that the rate of improvement in microprocessor speed exceeds the rate of improvement in D R A M memory speed, each is improving exponentially, but the exponent for microprocessors is substantially larger than that for DRAMs. The difference between diverging exponentials also grows exponential ly; so, al though the disparity between processor and memory speed is already an issue, downst ream someplace it will be a much bigger one. How big and how soon? The answers to these questions are what the authors had failed to appreciate.

1,837 citations

Proceedings ArticleDOI
01 May 1996
TL;DR: It is predicted that off-chip accesses will be so expensive that all system memory will reside on one or more processor chips, and pin bandwidth limitations will make more complex on-chip caches cost-effective.
Abstract: This paper makes the case that pin bandwidth will be a critical consideration for future microprocessors. We show that many of the techniques used to tolerate growing memory latencies do so at the expense of increased bandwidth requirements. Using a decomposition of execution time, we show that for modern processors that employ aggressive memory latency tolerance techniques, wasted cycles due to insufficient bandwidth generally exceed those due to raw memory latencies. Given the importance of maximizing memory bandwidth, we calculate effective pin bandwidth, then estimate optimal effective pin bandwidth. We measure these quantities by determining the amount by which both caches and minimal-traffic caches filter accesses to the lower levels of the memory hierarchy. We see that there is a gap that can exceed two orders of magnitude between the total memory traffic generated by caches and the minimal-traffic caches---implying that the potential exists to increase effective pin bandwidth substantially. We decompose this traffic gap into four factors, and show they contribute quite differently to traffic reduction for different benchmarks. We conclude that, in the short term, pin bandwidth limitations will make more complex on-chip caches cost-effective. For example, flexible caches may allow individual applications to choose from a range of caching policies. In the long term, we predict that off-chip accesses will be so expensive that all system memory will reside on one or more processor chips.

376 citations

Proceedings ArticleDOI
16 Apr 1998
TL;DR: This work describes an implementation of Active Pages on RADram (Reconfigurable Architecture DRAM), a memory system based upon the integration of DRAM and reconfigurable logic and explores the sensitivity of the results to implementations in other memory technologies.
Abstract: Microprocessors and memory systems suffer from a growing gap in performance. We introduce Active Pages, a computation model which addresses this gap by shifting data-intensive computations to the memory system. An Active Page consists of a page of data and a set of associated functions which can operate upon that data. We describe an implementation of Active Pages on RADram (Reconfigurable Architecture DRAM), a memory system based upon the integration of DRAM and reconfigurable logic. Results from the SimpleScalar simulator [BA97] demonstrate up to 1000X speedups on several applications using the RADram system versus conventional memory systems. We also explore the sensitivity of our results to implementations in other memory technologies.

311 citations


"A survey of synchronous RAM archite..." refers methods in this paper

  • ...In addition, the coupling of reconfigurable logic and DRAMs is investigated in reconfigurable architecture DRAMs (RADRAM [57]) and was considered in the transit project [6]....

    [...]

Journal ArticleDOI
R. Crisp1
TL;DR: Providing three times the memory bandwidth of the 66-MHz SDRAM subsystem, Direct RDRAM modules fit seamlessly into the existing mechanical space and airflow environment of the industry-standard PC chassis.
Abstract: Providing three times the memory bandwidth of the 66-MHz SDRAM subsystem, Direct RDRAM modules fit seamlessly into the existing mechanical space and airflow environment of the industry-standard PC chassis.

163 citations


"A survey of synchronous RAM archite..." refers methods in this paper

  • ...This parameter can be determined from the tCAC and tCWD times given in a RDRAM data sheet and from tRWD in the SLDRAM case. tRAW: (not drawn in Fig....

    [...]

  • ...This parameter is called tCWD in a RDRAM data sheet and tPW in a SLDRAM data sheet. tREF: The maximal refresh interval of the whole memory chip is specified by this value....

    [...]

  • ...The length is fixed for RDRAMs (eight items, 16 bit each) and may be set to four or eight items (16 bit) individually for each data packet in the SLDRAM case. tWAR: This is the minimal write-after-read operation delay....

    [...]

  • ...RDRAM [5], [60, 18, 19, 20, 73] is a memory specification developed by Rambus Inc....

    [...]

  • ...This parameter is called tRC in the RDRAM data sheet and tRC1 in the SLDRAM specification. tAAI: (not drawn in Fig....

    [...]

Proceedings ArticleDOI
06 Feb 1997
TL;DR: IRAM is attractive because the gigabit DRAM chip has enough transistors for both a powerful processor and a memory big enough to contain whole programs and data sets, and it needs more metal layers to accelerate the long lines of 600mm/sup 2/ chips.
Abstract: It is time to reconsider unifying logic and memory. Since most of the transistors on this merged chip will be devoted to memory, it is called 'intelligent RAM'. IRAM is attractive because the gigabit DRAM chip has enough transistors for both a powerful processor and a memory big enough to contain whole programs and data sets. It contains 1024 memory blocks each 1kb wide. It needs more metal layers to accelerate the long lines of 600mm/sup 2/ chips. It may require faster transistors for the high-speed interface of synchronous DRAM. Potential advantages of IRAM include lower memory latency, higher memory bandwidth, lower system power, adjustable memory width and size, and less board space. Challenges for IRAM include high chip yield given processors have not been repairable via redundancy, high memory retention rates given processors usually need higher power than DRAMs, and a fast processor given logic is slower in a DRAM process.

146 citations


"A survey of synchronous RAM archite..." refers methods in this paper

  • ..., intelligent RAM (IRAM [58]), parallel processing RAM (PPRAM [54]), and computing RAM (CRAM [11]) are concepts to integrate RAM with logic circuits....

    [...]

Trending Questions (1)
How much RAM do I need for a Fivem server?

By providing a better understanding of the limits of current RAM designs, this report supports the decision for a particular RAM in an individual application.