How do the authors measure the safety-margin of a DRAM module?

The authors first measure the safety-margin of a DRAM module by sweeping the refresh interval at the worst operating temperature (85◦C), using the standard timing parameters.

How do the authors determine the safe refresh interval for a DRAM module?

Based on this experiment, the authors define the safe refresh interval for a DRAM module as the maximum refresh interval that leads to no errors, minus an additional margin of 8 ms, which is the increment at which the authors sweep the refresh interval.

Why do the read and write operations need to be profiled separately?

This is why the read and write operations need to be profiled separately, since they are likely to sensitize errors in different sets of cells.

How much margin should the authors strip away from the DRAM timing parameters?

The remaining margin should be enough for DRAM to achieve correctness by overcoming process variation and temperature dependence (as the authors discussed in Section 4.3).

(Open Access) Adaptive-latency DRAM: Optimizing DRAM timing for the common-case (2015) | Donghyuk Lee

Q: What are the contributions in "Adaptive-latency dram: optimizing dram timing for the common-case" ?

In this paper, the authors show that typical DRAM chips operating at typical temperatures ( e. g., 55◦C ) are capable of providing a much smaller access latency, but are nevertheless forced to operate at the largest latency of the worst-case. Their goal in this paper is to exploit the extra margin that is built into the DRAM timing parameters to improve performance. Based on this characterization, the authors propose Adaptive-Latency DRAM ( AL-DRAM ), a mechanism that adaptively reduces the timing parameters for DRAM modules based on the current operating condition. The authors show that ALDRAM improves the performance of memory-intensive workloads by an average of 14 % without introducing any errors. The authors discuss and show why AL-DRAM does not compromise reliability. The authors conclude that dynamically optimizing the DRAM timing parameters can reliably improve system performance.

Q: Why does the bitline delay the cell's ability to be fully charged?

Owing to the large resistance and the large capacitance of the bitline, the cell experiences a large RC-delay, which increases the time it takes for the cell to become fullycharged.

Q: Why do some outlier cells suffer from a larger RC-delay than others?

due to process variation, some outlier cells suffer from a larger RC-delay than other cells, and require more time to be charged.

Q: How did the authors measure the temperature of the DRAM in a server cluster?

The authors measured the DRAM ambient temperature in a server cluster running a memory-intensive benchmark, and found that the temperature never exceeds 34◦C — as well as never changing by more than 0.1◦C per second.

Q: Why do the manufacturers choose to discard the slowest cells?

The manufacturers, in turn, are driven by the extremely cost-sensitive nature of the DRAM market, which encourages them to adopt pessimistic timing parameters rather than to (i) discard chips with the slowest cells or (ii) test chips at lower temperatures.

Q: What are the two physical phenomena that impact a DRAM cell’s ability to receive and?

the authors examine two physical phenomena that critically impact a DRAM cell’s ability to receive and retain charge: (i) process variation and (ii) temperature dependence.

Q: How can the authors reduce DRAM latency without sacrificing any observed degree of reliability?

Using an FPGA-based testing platform [29, 31, 41], the authors then demonstrate that DRAM timing parameters can be shortened to reduce DRAM latency without sacrificing any observed degree of DRAM reliability.

Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case

Donghyuk Lee Yoongu Kim Gennady Pekhimenko

Samira Khan Vivek Seshadri Kevin Chang Onur Mutlu

Carnegie Mellon University

{donghyu1, yoongukim, gpekhime, samirakhan, visesh, kevincha, onur}@cmu.edu

Abstract

In current systems, memory accesses to a DRAM chip must

obey a set of minimum latency restrictions speciﬁed in the

DRAM standard. Such timing parameters exist to guarantee re-

liable operation. When deciding the timing parameters, DRAM

manufacturers incorporate a very large margin as a provision

against two worst-case scenarios. First, due to process varia-

tion, some outlier chips are much slower than others and can-

not be operated as fast. Second, chips become slower at higher

temperatures, and all chips need to operate reliably at the high-

est supported (i.e., worst-case) DRAM temperature (85

◦

C). In

this paper, we show that typical DRAM chips operating at typ-

ical temperatures (e.g., 55

◦

C) are capable of providing a much

smaller access latency, but are nevertheless forced to operate

at the largest latency of the worst-case.

Our goal in this paper is to exploit the extra margin that

is built into the DRAM timing parameters to improve perfor-

mance. Using an FPGA-based testing platform, we ﬁrst char-

acterize the extra margin for 115 DRAM modules from three

major manufacturers. Our results demonstrate that it is possi-

ble to reduce four of the most critical timing parameters by

a minimum/maximum of 17.3%/54.8% at 55

◦

C without sac-

riﬁcing correctness. Based on this characterization, we pro-

pose Adaptive-Latency DRAM (AL-DRAM), a mechanism that

adaptively reduces the timing parameters for DRAM modules

based on the current operating condition. AL-DRAM does not

require any changes to the DRAM chip or its interface.

We evaluate AL-DRAM on a real system that allows us to re-

conﬁgure the timing parameters at runtime. We show that AL-

DRAM improves the performance of memory-intensive work-

loads by an average of 14% without introducing any errors.

We discuss and show why AL-DRAM does not compromise re-

liability. We conclude that dynamically optimizing the DRAM

timing parameters can reliably improve system performance.

1. Introduction

A DRAM chip is made of capacitor-based cells that represent

data in the form of electrical charge. To store data in a cell,

charge is injected, whereas to retrieve data from a cell, charge

is extracted. Such movement of charge is not only the center-

piece of DRAM operation, but also the bottleneck of DRAM

latency [36, 66]. This is due to two fundamental reasons. First,

when injecting charge into a cell, a wire called the bitline —

through which the charge is delivered — impedes the ﬂow of

charge [36, 66]. Owing to the large resistance and the large ca-

pacitance of the bitline, the cell experiences a large RC-delay,

which increases the time it takes for the cell to become fully

charged. Second, when extracting charge from a cell, the cell

is incapable of mobilizing a strong ﬂow of charge out of itself

and into the bitline [36, 66]. Limited by the ﬁnite amount of

charge stored in its small capacitor, the cell has an inherently

weak charge-drive, which is further weakened as the cell loses

more of its charge to the bitline. As a result, the cell cannot

charge the bitline quickly (or even fully).

When a DRAM chip is accessed, it requires a certain

amount of time before enough charge can move into the cell (or

the bitline) for the data to be reliably stored (or retrieved). To

guarantee this behavior, DRAM manufacturers impose a set of

minimum latency restrictions on DRAM accesses, called tim-

ing parameters [25]. Ideally, timing parameters should pro-

vide just enough time for a DRAM chip to operate correctly.

In practice, however, DRAM manufacturers pessimistically in-

corporate a very large margin into their timing parameters to

ensure correct operation under worst-case conditions. This is

because of two major concerns. First, due to process varia-

tion, some outlier cells suffer from a larger RC-delay than other

cells, and require more time to be charged. For example, an

outlier cell could have a very narrow connection (i.e., contact)

to the bitline, which constricts the ﬂow of charge and increases

the RC-delay [37]. Second, due to temperature dependence,

all cells suffer from a weaker charge-drive at high tempera-

tures, and require more time to charge the bitline. DRAM cells

are intrinsically leaky, and lose some of their charge even when

they are not being accessed. At high temperatures, this leakage

is accelerated exponentially [29, 41, 48, 57, 74], leaving a cell

with less charge to drive the bitline when the cell is accessed

— increasing the time it takes for the bitline to be charged.

Consequently, timing parameters prescribed by the DRAM

manufacturers are dictated by the worst-case cells (the slow-

est cells) operating under the worst-case conditions (the high-

est temperature of 85

◦

C [25]). Such pessimism on the part of

the DRAM manufacturers is motivated by their desire to (i) in-

crease chip yield and (ii) reduce chip testing time. The man-

ufacturers, in turn, are driven by the extremely cost-sensitive

nature of the DRAM market, which encourages them to adopt

pessimistic timing parameters rather than to (i) discard chips

with the slowest cells or (ii) test chips at lower temperatures.

Ultimately, the burden of pessimism is passed on to the end-

users, who are forced to endure much greater latencies than

what is actually needed for reliable operation under common-

case conditions.

In this paper, we ﬁrst characterize 115 DRAM modules

from three manufacturers to expose the excessive margin that

is built into their timing parameters. Using an FPGA-based

testing platform [29, 31, 41], we then demonstrate that DRAM

timing parameters can be shortened to reduce DRAM latency

without sacriﬁcing any observed degree of DRAM reliability.

We are able to reduce latency by taking advantage of the two

large gaps between the worst-case and the “common-case.”

First, most DRAM chips are not exposed to the worst-case tem-

perature of 85

◦

C: according to previous studies [11, 12, 43] and

our own measurements (Section 4.2), the ambient temperature

around a DRAM chip is typically less than 55

◦

C. Second, most

DRAM chips do not contain the worst-case cell with the largest

latency: the slowest cell for a typical chip is still faster than that

of the worst-case chip (Section 7).

Based on our characterization, we propose Adaptive-

Latency DRAM (AL-DRAM), a mechanism that dynamically

optimizes the timing parameters for different modules at dif-

ferent temperatures. AL-DRAM exploits the additional charge

slack present in the common-case compared to the worst-case,

thereby preserving the level of reliability (at least as high as the

worst-case) provided by DRAM manufacturers. We evaluate

AL-DRAM on a real system [5, 6] that allows us to dynami-

cally reconﬁgure the timing parameters at runtime. We show

that AL-DRAM improves the performance of a wide variety of

memory-intensive workloads by 14.0% (on average) without

introducing errors. Therefore, we conclude that AL-DRAM

improves system performance while maintaining memory cor-

rectness and without requiring changes to DRAM chips or the

DRAM interface.

This paper makes the following contributions:

• We provide a detailed analysis of why we can reduce

DRAM timing parameters without sacriﬁcing reliability.

We show that the latency of a DRAM access depends on

how quickly charge moves into or out of a cell. Compared

to the worst-case cell operating at the worst-case tempera-

ture (85

◦

C), a typical cell at a typical temperature allows

much faster movement of charge, leading to shorter access

latency. This enables the opportunity to reduce timing pa-

rameters without introducing errors.

• Using an FPGA-based testing platform, we proﬁle 115

DRAM modules (from three manufacturers) and expose the

large margin built into their timing parameters. In particu-

lar, we identify four timing parameters that are the most

critical during a DRAM access: tRCD, tRAS, tWR, and

tRP. At 55

◦

C, we demonstrate that the parameters can be

reduced by an average of 17.3%, 37.7%, 54.8%, and 35.2%

while still maintaining correctness. For some chips, the re-

ductions are as high as 27.3%, 42.8%, 66.7%, and 45.4%.

• We propose a practical mechanism, Adaptive-Latency

DRAM (AL-DRAM), to take advantage of the above obser-

vation. The key idea is to dynamically adjust the DRAM

timing parameters for each module based on its latency

characteristics and temperature so that the timing param-

eters are dynamically optimized for the current operating

condition. We show that the hardware cost of AL-DRAM

is very modest, with no changes to DRAM.

• We evaluate AL-DRAM on a real system [5, 6] running real

workloads by dynamically reconﬁguring the timing param-

eters. For a wide variety of memory-intensive workloads,

AL-DRAM improves system performance by an average of

14.0% and a maximum of 20.5% without incurring errors.

2. DRAM Background

To understand the dominant sources of DRAM latency, we ﬁrst

provide the necessary background on DRAM organization and

operation.

2.1. DRAM Organization

Figure 1a shows the internal organization of a DRAM subar-

ray [8, 34, 36, 62], which consists of a 2-D array of DRAM

cells connected to a single row of sense ampliﬁers (a row of

sense ampliﬁers is also referred to as a row buffer). The sense

ampliﬁer is a component that essentially acts as a latch — it

detects the data stored in the DRAM cell and latches on to the

corresponding data.

Figure 1b zooms in on the connection between a single

DRAM cell and its corresponding sense ampliﬁer. Each cell

consists of (i) a capacitor that stores a bit of data in the form of

electrical charge, and (ii) an access transistor that determines

whether the cell is connected to the sense ampliﬁer. The sense

ampliﬁer consists of two cross-coupled inverters. The wire that

connects the cell to the sense ampliﬁer is called the bitline,

whereas the wire that controls the access transistor is called the

wordline. Figure 1c depicts a simpliﬁed view of a cell as well

as its bitline and sense ampliﬁer, in which electrical charge is

represented in gray. Switch À represents the access transistor

controlled by the wordline, and switch Á represents the on/off

state of the sense ampliﬁer.

column

cell

row

wordline

sense-

amplifier

(a) Subarray

capacitor

ref. bitline

cell

wordline

access

transistor

bitline

sense-

amplifier

(b) Cell

cell

capacitor

bitline

sense-

amplifier

parasitic

capacitor

wordline

bitline

ref. bitline

✎



Figure 1: DRAM Organization

2.2. DRAM Operation: Commands & Timing Constraints

As shown in Figure 2, a cell transitions through ﬁve different

states during each access. In the ﬁrst state Ê, which is called

the precharged state, the cell is “fully” charged, while the bit-

line is only halfway charged (i.e., the bitline voltage is main-

tained at

). In practice, the cell is usually not completely

charged because of a phenomenon called leakage, wherein the

cell capacitor loses charge over time.

In order to access data from a cell, the DRAM controller is-

sues a command called ACTIVATE. Upon receiving this com-

mand, DRAM increases the wordline voltage, thereby connect-

ing the cell to the bitline. Since the cell is at a higher voltage

than the bitline, the cell then drives its charge into the bitline

until their voltages are equalized at

+ δ. This is depicted

in state Ë, which is called charge-sharing.

Subsequently, the sense ampliﬁer is enabled, which then

✌

cell

cap.

bitline

sense-

amp.

bitline

cap.



Command

tRAS (35ns) tRP (13.75ns)

Activation (ACT) Read/Write (RD/WR) Precharge (PRE)

tRCD (13.75ns)

Parameter

reduced

charge

charge leakage

✁✂

✄

Precharged Charge-Sharing

☎

RestoredSense-

Amplification

✆

Precharged

Figure 2: DRAM Operations, Commands and Parameters

senses and ampliﬁes the difference in the bitline voltage and

. During this process, referred to as sensing and ampli-

ﬁcation, the sense ampliﬁer drives the bitline voltage to V

Since the cell is still connected to the bitline, this process also

injects charge into the cell. Midway through the sense ampliﬁ-

cation process (state Ì), when the bitline reaches an intermedi-

ate voltage level (e.g.,

), data can be read out or written

into the bitline. This is done by issuing a READ or WRITE

command to the corresponding cell. The time taken to reach

this state (Ì) after issuing the ACTIVATE is expressed as a

timing parameter called tRCD.

After completing the sense ampliﬁcation, the bitline volt-

age reaches V

and the cell charge is fully restored (state Í).

The time taken to reach this state after issuing the ACTIVATE

is expressed as a timing parameter called tRAS. If there is a

write operation, some additional time is required for the bitline

and the cell to reach this state, which is expressed as a timing

parameter called tWR.

Before we can access data from a different cell connected to

the same bitline, the sense ampliﬁer must be taken back to the

precharged state. This is done by issuing a PRECHARGE com-

mand. Upon receiving this command, DRAM ﬁrst decreases

the wordline voltage, thereby disconnecting the cell from the

bitline. Next, DRAM disables the sense ampliﬁer and drives

the bitline back to a voltage of

(state Î). The time taken

for the precharge operation is expressed as a timing parameter

called tRP.

At state Î, note that the cell is completely ﬁlled with charge.

Subsequently, however, the cell slowly loses some of its charge

until the next access (cycling back to state Ê). The length of

time for which the cell can reliably hold its charge is called the

cell’s retention time. If the cell is not accessed for a long time,

it may lose enough charge to invert its stored data, resulting in

an error. To avoid data corruption, DRAM refreshes the charge

in all of its cells at a regular interval, called the refresh interval.

3. Charge & Latency Interdependence

As we explained, the operation of a DRAM cell is governed by

two important concepts: (i) the quantity of charge and (ii) the

latency it takes to move charge. These two concepts are closely

related to each other — one cannot be adjusted without af-

fecting the other. To establish a more quantitative relation-

ship between charge and latency, Figure 3 presents the voltage

of a cell and its bitline as they cycle through the precharged

state, charge-sharing state, sense-ampliﬁcation state, restored

state, and back to the precharged state (Section 2).

This

curve is typical in DRAM operation, as also shown in prior

works [14, 27, 36, 66]. The timeline starts with an ACTIVATE

at 0 ns and ends with the completion of PRECHARGE at

48.75 ns. From the ﬁgure, we identify three speciﬁc periods

in time when the voltage changes slowly: (i) start of sense-

ampliﬁcation (part À), (ii) end of sense-ampliﬁcation (part Á),

and (iii) end of precharging (part Â). Since charge is corre-

lated with voltage, these three periods are when the charge also

moves slowly. In the following, we provide three observations

explaining why these three periods can be shortened for typical

cells at typical temperatures — offering the best opportunity

for shortening the timing parameters.

0 10 20 30 40 50

Time (ns)

0.00

0.25

0.50

0.75

1.00

1.25

1.50

Voltage (V)

ACT PRE

Charge-

Sharing

Sense-Amplification Precharge

tRCD

tRP

tRAS

95% charged

95% precharged

Cell

BLref

Figure 3: Phases of DRAM Voltage Levels

Observation 1. At the start of the sense-ampliﬁcation phase,

the higher the bitline voltage, the quicker the sense-ampliﬁer is

jump-started. Just as the ampliﬁcation phase starts, the sense-

ampliﬁer detects the bitline voltage that was increased in the

previous charge-sharing phase (by the cell donating its charge

to the bitline). The sense-ampliﬁer then begins to inject more

charge into the bitline to increase the voltage even further —

triggering a positive-feedback loop where the bitline voltage

increases more quickly as the bitline voltage becomes higher.

This is shown in Figure 3 where the bitline voltage ramps up

faster and faster during the initial part of the ampliﬁcation

phase. Importantly, if the bitline has a higher voltage to be-

gin with (at the start of sense-ampliﬁcation), then the positive-

Using 55nm DRAM parameters [55, 69], we simulate the voltage and current

of the DRAM cells, sense-ampliﬁers, and bitline equalizers (for precharging

the bitline). To be technology-independent, we model the DRAM circuitry us-

ing NMOS and PMOS transistors that obey the well-known MOSFET equation

for current-voltage (SPICE) [56]. We do not model secondary effects.

feedback is able to set in more quickly. Such a high initial

voltage is comfortably achieved by typical cells at typical tem-

peratures because they donate a large amount of charge to the

bitline during the charge-sharing phase (as they have a large

amount of charge). As a result, they are able to reach states

Ì and Í (Figure 2) more quickly, creating the opportunity to

shorten tRCD and tRAS.

Observation 2. At the end of the sense-ampliﬁcation phase,

nearly half the time (42%) is spent on injecting the last 5% of

the charge into the cell. Thanks to the positive-feedback, the

middle part of the ampliﬁcation phase (part between À and Á

in Figure 3) is able to increase the bitline voltage quickly. How-

ever, during the later part of ampliﬁcation (part Á in Figure 3),

the RC-delay becomes much more dominant, which prevents

the bitline voltage from increasing as quickly. In fact, it takes

a signiﬁcant amount of extra delay for the bitline voltage to

reach V

(Figure 3) that is required to fully charge the cell.

However, for typical cells at typical temperatures, such an extra

delay may not be needed — the cells could already be injected

with enough charge for them to comfortably share with the bit-

line when they are next accessed. This allows us to shorten the

later part of the ampliﬁcation phase, creating the opportunity

to shorten tRAS and tWR.

Observation 3. At the end of the precharging phase, nearly

half the time (45%) is spent on extracting the last 5% of the

charge from the bitline. Similar to the ampliﬁcation phase, the

later part of the precharging phase is also dominated by the

RC-delay, which causes the bitline voltage to decrease slowly

(part Â in Figure 3). If we decide to incur less than

the full delay required for the bitline voltage to reach exactly

, it could lead to two different outcomes depending on

which cell we access next. First, if we access the same cell

again, then the higher voltage left on the bitline works in our

favor. This is because the cell — which is ﬁlled with charge

— would have increased the bitline voltage anyway during the

charge-sharing phase. Second, if we access a different cell

connected to the same bitline, then the higher voltage left on

the bitline may work as a handicap. Speciﬁcally, this happens

only when the cell is devoid of any charge (e.g., storing a data

of ‘0’). For such a cell, its charge-sharing phase operates in

the opposite direction, where the cell steals some charge away

from the bitline to decrease the bitline voltage. Subsequently,

the voltage is “ampliﬁed” to 0 instead of V

. Nevertheless,

typical cells at typical temperatures are capable of comfortably

overcoming the handicap — thanks to their large capacitance,

the cells are able to steal a large amount of charge from the

bitline. As a result, this creates the opportunity to shorten tRP.

4. Charge Gap: Common-Case vs. Worst-Case

Based on the three observations, we understand that timing pa-

rameters can be shortened if the cells have enough charge. Im-

portantly, we showed that such a criterion is easily satisﬁed for

typical cells at typical temperatures. In this section, we ex-

plain what it means for a cell to be “typical” and why it has

more charge at “typical” temperatures. Speciﬁcally, we exam-

ine two physical phenomena that critically impact a DRAM

cell’s ability to receive and retain charge: (i) process variation

and (ii) temperature dependence.

4.1. Process Variation: Cells Are Not Created Equal

Process variation is a well-known phenomenon that introduces

deviations between a chip’s intended design and its actual im-

plementation [13, 37, 60]. DRAM cells are affected by pro-

cess variation in two major aspects: (i) cell capacitance and

(ii) cell resistance. Although every cell is designed to have a

large capacitance (to hold more charge) and a small resistance

(to facilitate the ﬂow of charge), some deviant cells may not be

manufactured in such a manner [15, 26, 29, 30, 38, 41, 42]. In

Figure 4a, we illustrate the impact of process variation using

two different cells: one is a typical cell conforming to design

(left column) and the other is the worst-case cell deviating the

most from design (right column).

As we see from Figure 4a, the worst-case cell contains less

charge than the typical cell in state Í (Restored state, as was

shown in Figure 2). This is because of two reasons. First, due

to its large resistance, the worst-case cell cannot allow charge

to ﬂow inside quickly. Second, due to its small capacitance,

the worst-case cell cannot store much charge even when it is

full. To accommodate such a worst-case cell, existing timing

parameters are set to a large value. However, worst-case cells

are relatively rare. When we analyzed 115 modules, the over-

whelming majority of them had signiﬁcantly more charge than

what is necessary for correct operation (Section 7 will provide

more details).

4.2. Temperature Dependence: Hot Cells Are Leakier

Temperature dependence is a well-known phenomenon in

which cells leak charge at almost double the rate for every 10

◦

increase in temperature [29, 41, 48, 57, 74]. In Figure 4a, we il-

lustrate the impact of temperature dependence using two cells

at two different temperatures: (i) typical temperature (55

◦

bottom row), and (ii) the worst-case temperature (85

◦

C, top

row) supported by DRAM standards.

As we see from the ﬁgure, both typical and worst-case

cells leak charge at a faster rate at the worst-case tempera-

ture. Therefore, not only does the worst-case cell have less

charge to begin with, but it is left with even less charge be-

cause it leaks charge at a faster rate (top-right in Figure 4a).

To accommodate the combined effect of process variation and

temperature dependence, existing timing parameters are set to

a very large value. However, most systems do not operate at

◦

C [11, 12, 43].

We measured the DRAM ambient tempera-

ture in a server cluster running a memory-intensive benchmark,

and found that the temperature never exceeds 34

◦

C — as well

as never changing by more than 0.1

◦

C per second. We show

Figure 22 in [12] and Figure 3 in [43] show that the maximum temperature

of DRAM chips at the highest CPU utilization is 60–65

◦

C. While some prior

works claim a maximum DRAM temperature over 80

◦

C [76], each DIMM

in their system dissipates 15W of power. This is very aggressive nowadays

— modern DIMMs typically dissipate around 2–6W (see Figure 8 of [17],

2-rank conﬁguration same as the DIMM conﬁguration of [76]). We believe

that continued voltage scaling and increased energy efﬁciency of DRAM have

helped reduce the DIMM power consumption. While old DDR1/DDR2 use

1.8–3.0V power supplies, newer DDR3/DDR4 use only 1.2–1.5V. In addi-

tion, newer DRAMs adopt more power saving techniques (i.e., temperature

compensated self refresh, power down modes [21, 46]) that were previously

used only by Low-Power DRAM (LPDDR). Furthermore, many previous

works [9, 39, 40, 43, 76] propose hardware/software mechanisms to maintain

a low DRAM temperature and energy.

Rlow

large

leakage

Typical Cell

small

leakage

small

leakage

large

unfilled

leakage

(Rhigh)

Worst Cell

✌



✌



✌



✌



Rhigh

RhighRlow

unfilled

(Rhigh)

Temperature Temperature

Typical Worst

(a) Existing DRAM

large

leakage

Typical Cell

✌

small

leakage

✌



small

leakage

✌



large

leakage

✌



Temperature

Worst Cell

Temperature

Rlow Rhigh

RhighRlow

unfilled

(Rhigh)

unfilled

(Rhigh)

unfilled

(by design)

unfilled

(by design)

unfilled

(by design)

Typical Worst



(b) Our Proposal (Adaptive-Latency DRAM)

Figure 4: Effect of Reduced Latency: Typical vs. Worst (Darker Background means Less Reliable)

this in Figure 5, which plots the temperature for a 24-hour pe-

riod (left) and also zooms in on a 2-hour period (right). In ad-

dition, we repeated the measurement on a desktop system that

is not as well cooled as the server cluster. As Figure 6 shows,

even when the CPU was utilized at 100% and the DRAM band-

width was utilized at 40%, the DRAM ambient temperature

never exceeded 50

◦

C. Other works [11, 12, 43] report similar

results, as explained in detail in Footnote 2. From this, we con-

clude that the majority of DRAM modules are likely to operate

at temperatures that are much lower than 85

◦

C, which slows

down the charge leakage by an order of magnitude or more

than at the worst-case temperature.

0 4 8 12 16 20 24

Time (hour)

Temperature ( )

4 5 6

Time (hour)

Figure 5: DRAM Temperature in a Server Cluster

0 1 2 3 4 5 6

Time (hour)

Temperature ( )

100

Utilization (%)

DIMM Temperature ( )

CPU Utilization (%)

Memory Channel Utilization (%)

Figure 6: DRAM Temperature in a Desktop System

4.3. Reliable Operation with Shortened Timing

As explained in Section 3, the amount of charge in state Ê

(i.e., the precharged state in Figure 2) plays a critical role in

whether the correct data is retrieved from a cell. That is why

the worst-case condition for correctness is the top-right of Fig-

ure 4a, which shows the least amount of charge stored in the

worst-case cell at the worst-case temperature in state Ê. How-

ever, DRAM manufacturers provide reliability guarantees even

for such worst-case conditions. In other words, the amount of

charge at the worst-case condition is still greater than what is

required for correctness.

If we were to shorten the timing parameters, we would also

be reducing the charge stored in the cells. It is important to

note, however, that we are proposing to exploit only the addi-

tional slack (in terms of charge) compared to the worst-case.

This allows us to provide as strong of a reliability guarantee as

the worst-case.

In Figure 4b, we illustrate the impact of shortening the tim-

ing parameters in three of the four different cases: two differ-

ent cells at two different temperatures. The lightened portions

inside the cells represent the amount of charge that we are giv-

ing up by using the shortened timing parameters. Note that

we are not giving up any charge for the worst-case cell at the

worst-case temperature. Although the other three cells are not

fully charged in state Í, when they eventually reach state Ê,

they are left with a similar amount of charge as the worst-case

(top-right). This is because these cells are capable of either

holding more charge (typical cell, left column) or holding their

charge longer (typical temperature, bottom row). Therefore,

optimizing the timing parameters (based on the amount of ex-

isting slack) provides the opportunity to reduce overall DRAM

latency while still maintaining the reliability guarantees pro-

vided by the DRAM manufacturers.

In Section 7, we present the results from our characteriza-

tion study where we quantify the slack in 115 DRAM modules.

Before we do so, we ﬁrst propose our mechanism for identify-

ing and enforcing the shortened timing parameters.

5. Adaptive-Latency DRAM

Our mechanism, Adaptive-Latency DRAM (AL-DRAM), al-

lows the memory controller to exploit the latency variation

across DRAM modules (DIMMs) at different operating tem-

peratures by using customized (aggressive) timing parame-

ters for each DIMM/temperature combination. Our mecha-

nism consists of two steps: (i) identiﬁcation of the best tim-

ing parameters for each DIMM/temperature, and (ii) enforce-

ment, wherein the memory controller dynamically extracts

each DIMM’s operating temperature and uses the best timing

parameters for each DIMM/temperature combination.

5.1. Identifying the Best Timing Parameters

Identifying the best timing parameters for each DIMM at dif-

ferent temperatures is the more challenging of the two steps.

We propose that DRAM manufacturers identify the best timing

parameters at different temperatures for each DRAM chip dur-

ing the testing phase and provide that information along with

the DIMM in the form of a simple table. Since our proposal

only involves changing four timing parameters (tRCD, tRAS,

Adaptive-latency DRAM: Optimizing DRAM timing for the common-case

Figures

Citations

Design Of Analog Cmos Integrated Circuits

Ramulator: A Fast and Extensible DRAM Simulator

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology

Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives

AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems

References

Design of Analog CMOS Integrated Circuits

Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors

Design Of Analog Cmos Integrated Circuits

RAIDR: Retention-Aware Intelligent DRAM Refresh

Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

Related Papers (5)

RAIDR: Retention-Aware Intelligent DRAM Refresh

An experimental study of data retention behavior in modern DRAM devices: implications for retention time profiling mechanisms

Tiered-latency DRAM: A low latency and low cost DRAM architecture

Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors

AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems

Frequently Asked Questions (16)

Q1. What are the contributions in "Adaptive-latency dram: optimizing dram timing for the common-case" ?

Q2. Why does the bitline delay the cell's ability to be fully charged?

Q3. Why do some outlier cells suffer from a larger RC-delay than others?

Q4. What is the common reason why a cell is not fully charged?

Q5. How did the authors measure the temperature of the DRAM in a server cluster?

Q6. Why do the manufacturers choose to discard the slowest cells?

Q7. What are the two physical phenomena that impact a DRAM cell’s ability to receive and?

Q8. How do the authors measure the safety-margin of a DRAM module?

Q9. How do the authors determine the safe refresh interval for a DRAM module?

Q10. How can the authors reduce DRAM latency without sacrificing any observed degree of reliability?

Q11. How much time is spent on removing the last 5% of the charge from the bitline?

Q12. Why is the DRAM market driven by pessimism?

Q13. Why do the read and write operations need to be profiled separately?

Q14. What is the mechanism for identifying and enforcing the timing parameters for each?

Q15. What is the time it takes for the cell to reach this state?

Q16. How much margin should the authors strip away from the DRAM timing parameters?