scispace - formally typeset
Open AccessProceedings ArticleDOI

Adaptive-latency DRAM: Optimizing DRAM timing for the common-case

TLDR
Adaptive-Latency DRAM (AL-DRAM), a mechanism that adoptively reduces the timing parameters for DRAM modules based on the current operating condition, is proposed and shown that dynamically optimizing the DRAM timing parameters can reliably improve system performance.
Abstract
In current systems, memory accesses to a DRAM chip must obey a set of minimum latency restrictions specified in the DRAM standard. Such timing parameters exist to guarantee reliable operation. When deciding the timing parameters, DRAM manufacturers incorporate a very large margin as a provision against two worst-case scenarios. First, due to process variation, some outlier chips are much slower than others and cannot be operated as fast. Second, chips become slower at higher temperatures, and all chips need to operate reliably at the highest supported (i.e., worst-case) DRAM temperature (85° C). In this paper, we show that typical DRAM chips operating at typical temperatures (e.g., 55° C) are capable of providing a much smaller access latency, but are nevertheless forced to operate at the largest latency of the worst-case. Our goal in this paper is to exploit the extra margin that is built into the DRAM timing parameters to improve performance. Using an FPGA-based testing platform, we first characterize the extra margin for 115 DRAM modules from three major manufacturers. Our results demonstrate that it is possible to reduce four of the most critical timing parameters by a minimum/maximum of 17.3%/54.8% at 55°C without sacrificing correctness. Based on this characterization, we propose Adaptive-Latency DRAM (AL-DRAM), a mechanism that adoptively reduces the timing parameters for DRAM modules based on the current operating condition. AL-DRAM does not require any changes to the DRAM chip or its interface. We evaluate AL-DRAM on a real system that allows us to reconfigure the timing parameters at runtime. We show that AL-DRAM improves the performance of memory-intensive workloads by an average of 14% without introducing any errors. We discuss and show why AL-DRAM does not compromise reliability. We conclude that dynamically optimizing the DRAM timing parameters can reliably improve system performance.

read more

Content maybe subject to copyright    Report

Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case
Donghyuk Lee Yoongu Kim Gennady Pekhimenko
Samira Khan Vivek Seshadri Kevin Chang Onur Mutlu
Carnegie Mellon University
{donghyu1, yoongukim, gpekhime, samirakhan, visesh, kevincha, onur}@cmu.edu
Abstract
In current systems, memory accesses to a DRAM chip must
obey a set of minimum latency restrictions specified in the
DRAM standard. Such timing parameters exist to guarantee re-
liable operation. When deciding the timing parameters, DRAM
manufacturers incorporate a very large margin as a provision
against two worst-case scenarios. First, due to process varia-
tion, some outlier chips are much slower than others and can-
not be operated as fast. Second, chips become slower at higher
temperatures, and all chips need to operate reliably at the high-
est supported (i.e., worst-case) DRAM temperature (85
C). In
this paper, we show that typical DRAM chips operating at typ-
ical temperatures (e.g., 55
C) are capable of providing a much
smaller access latency, but are nevertheless forced to operate
at the largest latency of the worst-case.
Our goal in this paper is to exploit the extra margin that
is built into the DRAM timing parameters to improve perfor-
mance. Using an FPGA-based testing platform, we first char-
acterize the extra margin for 115 DRAM modules from three
major manufacturers. Our results demonstrate that it is possi-
ble to reduce four of the most critical timing parameters by
a minimum/maximum of 17.3%/54.8% at 55
C without sac-
rificing correctness. Based on this characterization, we pro-
pose Adaptive-Latency DRAM (AL-DRAM), a mechanism that
adaptively reduces the timing parameters for DRAM modules
based on the current operating condition. AL-DRAM does not
require any changes to the DRAM chip or its interface.
We evaluate AL-DRAM on a real system that allows us to re-
configure the timing parameters at runtime. We show that AL-
DRAM improves the performance of memory-intensive work-
loads by an average of 14% without introducing any errors.
We discuss and show why AL-DRAM does not compromise re-
liability. We conclude that dynamically optimizing the DRAM
timing parameters can reliably improve system performance.
1. Introduction
A DRAM chip is made of capacitor-based cells that represent
data in the form of electrical charge. To store data in a cell,
charge is injected, whereas to retrieve data from a cell, charge
is extracted. Such movement of charge is not only the center-
piece of DRAM operation, but also the bottleneck of DRAM
latency [36, 66]. This is due to two fundamental reasons. First,
when injecting charge into a cell, a wire called the bitline
through which the charge is delivered impedes the flow of
charge [36, 66]. Owing to the large resistance and the large ca-
pacitance of the bitline, the cell experiences a large RC-delay,
which increases the time it takes for the cell to become fully
charged. Second, when extracting charge from a cell, the cell
is incapable of mobilizing a strong flow of charge out of itself
and into the bitline [36, 66]. Limited by the finite amount of
charge stored in its small capacitor, the cell has an inherently
weak charge-drive, which is further weakened as the cell loses
more of its charge to the bitline. As a result, the cell cannot
charge the bitline quickly (or even fully).
When a DRAM chip is accessed, it requires a certain
amount of time before enough charge can move into the cell (or
the bitline) for the data to be reliably stored (or retrieved). To
guarantee this behavior, DRAM manufacturers impose a set of
minimum latency restrictions on DRAM accesses, called tim-
ing parameters [25]. Ideally, timing parameters should pro-
vide just enough time for a DRAM chip to operate correctly.
In practice, however, DRAM manufacturers pessimistically in-
corporate a very large margin into their timing parameters to
ensure correct operation under worst-case conditions. This is
because of two major concerns. First, due to process varia-
tion, some outlier cells suffer from a larger RC-delay than other
cells, and require more time to be charged. For example, an
outlier cell could have a very narrow connection (i.e., contact)
to the bitline, which constricts the flow of charge and increases
the RC-delay [37]. Second, due to temperature dependence,
all cells suffer from a weaker charge-drive at high tempera-
tures, and require more time to charge the bitline. DRAM cells
are intrinsically leaky, and lose some of their charge even when
they are not being accessed. At high temperatures, this leakage
is accelerated exponentially [29, 41, 48, 57, 74], leaving a cell
with less charge to drive the bitline when the cell is accessed
increasing the time it takes for the bitline to be charged.
Consequently, timing parameters prescribed by the DRAM
manufacturers are dictated by the worst-case cells (the slow-
est cells) operating under the worst-case conditions (the high-
est temperature of 85
C [25]). Such pessimism on the part of
the DRAM manufacturers is motivated by their desire to (i) in-
crease chip yield and (ii) reduce chip testing time. The man-
ufacturers, in turn, are driven by the extremely cost-sensitive
nature of the DRAM market, which encourages them to adopt
pessimistic timing parameters rather than to (i) discard chips
with the slowest cells or (ii) test chips at lower temperatures.
Ultimately, the burden of pessimism is passed on to the end-
users, who are forced to endure much greater latencies than
what is actually needed for reliable operation under common-
case conditions.
In this paper, we first characterize 115 DRAM modules
from three manufacturers to expose the excessive margin that

is built into their timing parameters. Using an FPGA-based
testing platform [29, 31, 41], we then demonstrate that DRAM
timing parameters can be shortened to reduce DRAM latency
without sacrificing any observed degree of DRAM reliability.
We are able to reduce latency by taking advantage of the two
large gaps between the worst-case and the “common-case.
First, most DRAM chips are not exposed to the worst-case tem-
perature of 85
C: according to previous studies [11, 12, 43] and
our own measurements (Section 4.2), the ambient temperature
around a DRAM chip is typically less than 55
C. Second, most
DRAM chips do not contain the worst-case cell with the largest
latency: the slowest cell for a typical chip is still faster than that
of the worst-case chip (Section 7).
Based on our characterization, we propose Adaptive-
Latency DRAM (AL-DRAM), a mechanism that dynamically
optimizes the timing parameters for different modules at dif-
ferent temperatures. AL-DRAM exploits the additional charge
slack present in the common-case compared to the worst-case,
thereby preserving the level of reliability (at least as high as the
worst-case) provided by DRAM manufacturers. We evaluate
AL-DRAM on a real system [5, 6] that allows us to dynami-
cally reconfigure the timing parameters at runtime. We show
that AL-DRAM improves the performance of a wide variety of
memory-intensive workloads by 14.0% (on average) without
introducing errors. Therefore, we conclude that AL-DRAM
improves system performance while maintaining memory cor-
rectness and without requiring changes to DRAM chips or the
DRAM interface.
This paper makes the following contributions:
We provide a detailed analysis of why we can reduce
DRAM timing parameters without sacrificing reliability.
We show that the latency of a DRAM access depends on
how quickly charge moves into or out of a cell. Compared
to the worst-case cell operating at the worst-case tempera-
ture (85
C), a typical cell at a typical temperature allows
much faster movement of charge, leading to shorter access
latency. This enables the opportunity to reduce timing pa-
rameters without introducing errors.
Using an FPGA-based testing platform, we profile 115
DRAM modules (from three manufacturers) and expose the
large margin built into their timing parameters. In particu-
lar, we identify four timing parameters that are the most
critical during a DRAM access: tRCD, tRAS, tWR, and
tRP. At 55
C, we demonstrate that the parameters can be
reduced by an average of 17.3%, 37.7%, 54.8%, and 35.2%
while still maintaining correctness. For some chips, the re-
ductions are as high as 27.3%, 42.8%, 66.7%, and 45.4%.
We propose a practical mechanism, Adaptive-Latency
DRAM (AL-DRAM), to take advantage of the above obser-
vation. The key idea is to dynamically adjust the DRAM
timing parameters for each module based on its latency
characteristics and temperature so that the timing param-
eters are dynamically optimized for the current operating
condition. We show that the hardware cost of AL-DRAM
is very modest, with no changes to DRAM.
We evaluate AL-DRAM on a real system [5, 6] running real
workloads by dynamically reconfiguring the timing param-
eters. For a wide variety of memory-intensive workloads,
AL-DRAM improves system performance by an average of
14.0% and a maximum of 20.5% without incurring errors.
2. DRAM Background
To understand the dominant sources of DRAM latency, we first
provide the necessary background on DRAM organization and
operation.
2.1. DRAM Organization
Figure 1a shows the internal organization of a DRAM subar-
ray [8, 34, 36, 62], which consists of a 2-D array of DRAM
cells connected to a single row of sense amplifiers (a row of
sense amplifiers is also referred to as a row buffer). The sense
amplifier is a component that essentially acts as a latch it
detects the data stored in the DRAM cell and latches on to the
corresponding data.
Figure 1b zooms in on the connection between a single
DRAM cell and its corresponding sense amplifier. Each cell
consists of (i) a capacitor that stores a bit of data in the form of
electrical charge, and (ii) an access transistor that determines
whether the cell is connected to the sense amplifier. The sense
amplifier consists of two cross-coupled inverters. The wire that
connects the cell to the sense amplifier is called the bitline,
whereas the wire that controls the access transistor is called the
wordline. Figure 1c depicts a simplified view of a cell as well
as its bitline and sense amplifier, in which electrical charge is
represented in gray. Switch À represents the access transistor
controlled by the wordline, and switch Á represents the on/off
state of the sense amplifier.
column
cell
row
wordline
sense-
amplifier
(a) Subarray
capacitor
ref. bitline
cell
wordline
access
transistor
bitline
sense-
amplifier
(b) Cell
cell
capacitor
bitline
sense-
amplifier
parasitic
capacitor
wordline
bitline
ref. bitline
(c) Simplified View
Figure 1: DRAM Organization
2.2. DRAM Operation: Commands & Timing Constraints
As shown in Figure 2, a cell transitions through five different
states during each access. In the first state Ê, which is called
the precharged state, the cell is “fully” charged, while the bit-
line is only halfway charged (i.e., the bitline voltage is main-
tained at
1
2
V
DD
). In practice, the cell is usually not completely
charged because of a phenomenon called leakage, wherein the
cell capacitor loses charge over time.
In order to access data from a cell, the DRAM controller is-
sues a command called ACTIVATE. Upon receiving this com-
mand, DRAM increases the wordline voltage, thereby connect-
ing the cell to the bitline. Since the cell is at a higher voltage
than the bitline, the cell then drives its charge into the bitline
until their voltages are equalized at
1
2
V
DD
+ δ. This is depicted
in state Ë, which is called charge-sharing.
Subsequently, the sense amplifier is enabled, which then
2

cell
cap.
bitline
sense-
amp.
bitline
cap.
Command
tRAS (35ns) tRP (13.75ns)
Activation (ACT) Read/Write (RD/WR) Precharge (PRE)
tRCD (13.75ns)
Parameter
reduced
charge
charge leakage
DD
V
2
1
DD
V
0
DD
V
4
3
DD
V
DD
V
2
1
Precharged Charge-Sharing
RestoredSense-
Amplification
Precharged
Figure 2: DRAM Operations, Commands and Parameters
senses and amplifies the difference in the bitline voltage and
1
2
V
DD
. During this process, referred to as sensing and ampli-
fication, the sense amplifier drives the bitline voltage to V
DD
.
Since the cell is still connected to the bitline, this process also
injects charge into the cell. Midway through the sense amplifi-
cation process (state Ì), when the bitline reaches an intermedi-
ate voltage level (e.g.,
3
4
V
DD
), data can be read out or written
into the bitline. This is done by issuing a READ or WRITE
command to the corresponding cell. The time taken to reach
this state (Ì) after issuing the ACTIVATE is expressed as a
timing parameter called tRCD.
After completing the sense amplification, the bitline volt-
age reaches V
DD
and the cell charge is fully restored (state Í).
The time taken to reach this state after issuing the ACTIVATE
is expressed as a timing parameter called tRAS. If there is a
write operation, some additional time is required for the bitline
and the cell to reach this state, which is expressed as a timing
parameter called tWR.
Before we can access data from a different cell connected to
the same bitline, the sense amplifier must be taken back to the
precharged state. This is done by issuing a PRECHARGE com-
mand. Upon receiving this command, DRAM first decreases
the wordline voltage, thereby disconnecting the cell from the
bitline. Next, DRAM disables the sense amplifier and drives
the bitline back to a voltage of
1
2
V
DD
(state Î). The time taken
for the precharge operation is expressed as a timing parameter
called tRP.
At state Î, note that the cell is completely filled with charge.
Subsequently, however, the cell slowly loses some of its charge
until the next access (cycling back to state Ê). The length of
time for which the cell can reliably hold its charge is called the
cell’s retention time. If the cell is not accessed for a long time,
it may lose enough charge to invert its stored data, resulting in
an error. To avoid data corruption, DRAM refreshes the charge
in all of its cells at a regular interval, called the refresh interval.
3. Charge & Latency Interdependence
As we explained, the operation of a DRAM cell is governed by
two important concepts: (i) the quantity of charge and (ii) the
latency it takes to move charge. These two concepts are closely
related to each other one cannot be adjusted without af-
fecting the other. To establish a more quantitative relation-
ship between charge and latency, Figure 3 presents the voltage
of a cell and its bitline as they cycle through the precharged
state, charge-sharing state, sense-amplification state, restored
state, and back to the precharged state (Section 2).
1
This
curve is typical in DRAM operation, as also shown in prior
works [14, 27, 36, 66]. The timeline starts with an ACTIVATE
at 0 ns and ends with the completion of PRECHARGE at
48.75 ns. From the figure, we identify three specific periods
in time when the voltage changes slowly: (i) start of sense-
amplification (part À), (ii) end of sense-amplification (part Á),
and (iii) end of precharging (part Â). Since charge is corre-
lated with voltage, these three periods are when the charge also
moves slowly. In the following, we provide three observations
explaining why these three periods can be shortened for typical
cells at typical temperatures offering the best opportunity
for shortening the timing parameters.
0 10 20 30 40 50
Time (ns)
0.00
0.25
0.50
0.75
1.00
1.25
1.50
Voltage (V)
V
DD
V
SS
ACT PRE
Charge-
Sharing
Sense-Amplification Precharge
tRCD
tRP
tRAS
dV
BL
95% charged
dV
BL
95% precharged
V
Cell
V
BL
V
BLref
Figure 3: Phases of DRAM Voltage Levels
Observation 1. At the start of the sense-amplification phase,
the higher the bitline voltage, the quicker the sense-amplifier is
jump-started. Just as the amplification phase starts, the sense-
amplifier detects the bitline voltage that was increased in the
previous charge-sharing phase (by the cell donating its charge
to the bitline). The sense-amplifier then begins to inject more
charge into the bitline to increase the voltage even further
triggering a positive-feedback loop where the bitline voltage
increases more quickly as the bitline voltage becomes higher.
This is shown in Figure 3 where the bitline voltage ramps up
faster and faster during the initial part of the amplification
phase. Importantly, if the bitline has a higher voltage to be-
gin with (at the start of sense-amplification), then the positive-
1
Using 55nm DRAM parameters [55, 69], we simulate the voltage and current
of the DRAM cells, sense-amplifiers, and bitline equalizers (for precharging
the bitline). To be technology-independent, we model the DRAM circuitry us-
ing NMOS and PMOS transistors that obey the well-known MOSFET equation
for current-voltage (SPICE) [56]. We do not model secondary effects.
3

feedback is able to set in more quickly. Such a high initial
voltage is comfortably achieved by typical cells at typical tem-
peratures because they donate a large amount of charge to the
bitline during the charge-sharing phase (as they have a large
amount of charge). As a result, they are able to reach states
Ì and Í (Figure 2) more quickly, creating the opportunity to
shorten tRCD and tRAS.
Observation 2. At the end of the sense-amplification phase,
nearly half the time (42%) is spent on injecting the last 5% of
the charge into the cell. Thanks to the positive-feedback, the
middle part of the amplification phase (part between À and Á
in Figure 3) is able to increase the bitline voltage quickly. How-
ever, during the later part of amplification (part Á in Figure 3),
the RC-delay becomes much more dominant, which prevents
the bitline voltage from increasing as quickly. In fact, it takes
a significant amount of extra delay for the bitline voltage to
reach V
DD
(Figure 3) that is required to fully charge the cell.
However, for typical cells at typical temperatures, such an extra
delay may not be needed the cells could already be injected
with enough charge for them to comfortably share with the bit-
line when they are next accessed. This allows us to shorten the
later part of the amplification phase, creating the opportunity
to shorten tRAS and tWR.
Observation 3. At the end of the precharging phase, nearly
half the time (45%) is spent on extracting the last 5% of the
charge from the bitline. Similar to the amplification phase, the
later part of the precharging phase is also dominated by the
RC-delay, which causes the bitline voltage to decrease slowly
to
1
2
V
DD
(part  in Figure 3). If we decide to incur less than
the full delay required for the bitline voltage to reach exactly
1
2
V
DD
, it could lead to two different outcomes depending on
which cell we access next. First, if we access the same cell
again, then the higher voltage left on the bitline works in our
favor. This is because the cell which is filled with charge
would have increased the bitline voltage anyway during the
charge-sharing phase. Second, if we access a different cell
connected to the same bitline, then the higher voltage left on
the bitline may work as a handicap. Specifically, this happens
only when the cell is devoid of any charge (e.g., storing a data
of ‘0’). For such a cell, its charge-sharing phase operates in
the opposite direction, where the cell steals some charge away
from the bitline to decrease the bitline voltage. Subsequently,
the voltage is “amplified” to 0 instead of V
DD
. Nevertheless,
typical cells at typical temperatures are capable of comfortably
overcoming the handicap thanks to their large capacitance,
the cells are able to steal a large amount of charge from the
bitline. As a result, this creates the opportunity to shorten tRP.
4. Charge Gap: Common-Case vs. Worst-Case
Based on the three observations, we understand that timing pa-
rameters can be shortened if the cells have enough charge. Im-
portantly, we showed that such a criterion is easily satisfied for
typical cells at typical temperatures. In this section, we ex-
plain what it means for a cell to be “typical” and why it has
more charge at “typical” temperatures. Specifically, we exam-
ine two physical phenomena that critically impact a DRAM
cell’s ability to receive and retain charge: (i) process variation
and (ii) temperature dependence.
4.1. Process Variation: Cells Are Not Created Equal
Process variation is a well-known phenomenon that introduces
deviations between a chip’s intended design and its actual im-
plementation [13, 37, 60]. DRAM cells are affected by pro-
cess variation in two major aspects: (i) cell capacitance and
(ii) cell resistance. Although every cell is designed to have a
large capacitance (to hold more charge) and a small resistance
(to facilitate the flow of charge), some deviant cells may not be
manufactured in such a manner [15, 26, 29, 30, 38, 41, 42]. In
Figure 4a, we illustrate the impact of process variation using
two different cells: one is a typical cell conforming to design
(left column) and the other is the worst-case cell deviating the
most from design (right column).
As we see from Figure 4a, the worst-case cell contains less
charge than the typical cell in state Í (Restored state, as was
shown in Figure 2). This is because of two reasons. First, due
to its large resistance, the worst-case cell cannot allow charge
to flow inside quickly. Second, due to its small capacitance,
the worst-case cell cannot store much charge even when it is
full. To accommodate such a worst-case cell, existing timing
parameters are set to a large value. However, worst-case cells
are relatively rare. When we analyzed 115 modules, the over-
whelming majority of them had significantly more charge than
what is necessary for correct operation (Section 7 will provide
more details).
4.2. Temperature Dependence: Hot Cells Are Leakier
Temperature dependence is a well-known phenomenon in
which cells leak charge at almost double the rate for every 10
C
increase in temperature [29, 41, 48, 57, 74]. In Figure 4a, we il-
lustrate the impact of temperature dependence using two cells
at two different temperatures: (i) typical temperature (55
C,
bottom row), and (ii) the worst-case temperature (85
C, top
row) supported by DRAM standards.
As we see from the figure, both typical and worst-case
cells leak charge at a faster rate at the worst-case tempera-
ture. Therefore, not only does the worst-case cell have less
charge to begin with, but it is left with even less charge be-
cause it leaks charge at a faster rate (top-right in Figure 4a).
To accommodate the combined effect of process variation and
temperature dependence, existing timing parameters are set to
a very large value. However, most systems do not operate at
85
C [11, 12, 43].
2
We measured the DRAM ambient tempera-
ture in a server cluster running a memory-intensive benchmark,
and found that the temperature never exceeds 34
C as well
as never changing by more than 0.1
C per second. We show
2
Figure 22 in [12] and Figure 3 in [43] show that the maximum temperature
of DRAM chips at the highest CPU utilization is 60–65
C. While some prior
works claim a maximum DRAM temperature over 80
C [76], each DIMM
in their system dissipates 15W of power. This is very aggressive nowadays
modern DIMMs typically dissipate around 2–6W (see Figure 8 of [17],
2-rank configuration same as the DIMM configuration of [76]). We believe
that continued voltage scaling and increased energy efficiency of DRAM have
helped reduce the DIMM power consumption. While old DDR1/DDR2 use
1.8–3.0V power supplies, newer DDR3/DDR4 use only 1.2–1.5V. In addi-
tion, newer DRAMs adopt more power saving techniques (i.e., temperature
compensated self refresh, power down modes [21, 46]) that were previously
used only by Low-Power DRAM (LPDDR). Furthermore, many previous
works [9, 39, 40, 43, 76] propose hardware/software mechanisms to maintain
a low DRAM temperature and energy.
4

Rlow
large
leakage
Typical Cell
small
leakage
small
leakage
large
unfilled
leakage
(Rhigh)
Worst Cell
Rhigh
RhighRlow
unfilled
(Rhigh)
Temperature Temperature
Typical Worst
(a) Existing DRAM
large
leakage
Typical Cell
small
leakage
small
leakage
large
leakage
Temperature
Worst Cell
Temperature
Rlow Rhigh
RhighRlow
unfilled
(Rhigh)
unfilled
(Rhigh)
unfilled
(by design)
unfilled
(by design)
unfilled
(by design)
Typical Worst
(b) Our Proposal (Adaptive-Latency DRAM)
Figure 4: Effect of Reduced Latency: Typical vs. Worst (Darker Background means Less Reliable)
this in Figure 5, which plots the temperature for a 24-hour pe-
riod (left) and also zooms in on a 2-hour period (right). In ad-
dition, we repeated the measurement on a desktop system that
is not as well cooled as the server cluster. As Figure 6 shows,
even when the CPU was utilized at 100% and the DRAM band-
width was utilized at 40%, the DRAM ambient temperature
never exceeded 50
C. Other works [11, 12, 43] report similar
results, as explained in detail in Footnote 2. From this, we con-
clude that the majority of DRAM modules are likely to operate
at temperatures that are much lower than 85
C, which slows
down the charge leakage by an order of magnitude or more
than at the worst-case temperature.
0 4 8 12 16 20 24
Time (hour)
28
29
30
31
32
33
34
Temperature ( )
4 5 6
Time (hour)
28
29
30
31
32
33
34
Figure 5: DRAM Temperature in a Server Cluster
0 1 2 3 4 5 6
Time (hour)
35
40
45
50
55
Temperature ( )
0
20
40
60
80
100
Utilization (%)
DIMM Temperature ( )
CPU Utilization (%)
Memory Channel Utilization (%)
Figure 6: DRAM Temperature in a Desktop System
4.3. Reliable Operation with Shortened Timing
As explained in Section 3, the amount of charge in state Ê
(i.e., the precharged state in Figure 2) plays a critical role in
whether the correct data is retrieved from a cell. That is why
the worst-case condition for correctness is the top-right of Fig-
ure 4a, which shows the least amount of charge stored in the
worst-case cell at the worst-case temperature in state Ê. How-
ever, DRAM manufacturers provide reliability guarantees even
for such worst-case conditions. In other words, the amount of
charge at the worst-case condition is still greater than what is
required for correctness.
If we were to shorten the timing parameters, we would also
be reducing the charge stored in the cells. It is important to
note, however, that we are proposing to exploit only the addi-
tional slack (in terms of charge) compared to the worst-case.
This allows us to provide as strong of a reliability guarantee as
the worst-case.
In Figure 4b, we illustrate the impact of shortening the tim-
ing parameters in three of the four different cases: two differ-
ent cells at two different temperatures. The lightened portions
inside the cells represent the amount of charge that we are giv-
ing up by using the shortened timing parameters. Note that
we are not giving up any charge for the worst-case cell at the
worst-case temperature. Although the other three cells are not
fully charged in state Í, when they eventually reach state Ê,
they are left with a similar amount of charge as the worst-case
(top-right). This is because these cells are capable of either
holding more charge (typical cell, left column) or holding their
charge longer (typical temperature, bottom row). Therefore,
optimizing the timing parameters (based on the amount of ex-
isting slack) provides the opportunity to reduce overall DRAM
latency while still maintaining the reliability guarantees pro-
vided by the DRAM manufacturers.
In Section 7, we present the results from our characteriza-
tion study where we quantify the slack in 115 DRAM modules.
Before we do so, we first propose our mechanism for identify-
ing and enforcing the shortened timing parameters.
5. Adaptive-Latency DRAM
Our mechanism, Adaptive-Latency DRAM (AL-DRAM), al-
lows the memory controller to exploit the latency variation
across DRAM modules (DIMMs) at different operating tem-
peratures by using customized (aggressive) timing parame-
ters for each DIMM/temperature combination. Our mecha-
nism consists of two steps: (i) identification of the best tim-
ing parameters for each DIMM/temperature, and (ii) enforce-
ment, wherein the memory controller dynamically extracts
each DIMM’s operating temperature and uses the best timing
parameters for each DIMM/temperature combination.
5.1. Identifying the Best Timing Parameters
Identifying the best timing parameters for each DIMM at dif-
ferent temperatures is the more challenging of the two steps.
We propose that DRAM manufacturers identify the best timing
parameters at different temperatures for each DRAM chip dur-
ing the testing phase and provide that information along with
the DIMM in the form of a simple table. Since our proposal
only involves changing four timing parameters (tRCD, tRAS,
5

Citations
More filters

Design Of Analog Cmos Integrated Circuits

TL;DR: The design of analog cmos integrated circuits is universally compatible with any devices to read and is available in the book collection an online access to it is set as public so you can download it instantly.
Journal ArticleDOI

Ramulator: A Fast and Extensible DRAM Simulator

TL;DR: This paper presents Ramulator, a fast and cycle-accurate DRAM simulator that is built from the ground up for extensibility, and is able to provide out-of-the-box support for a wide array of DRAM standards.
Proceedings ArticleDOI

Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology

TL;DR: Ambit is proposed, an Accelerator-in-Memory for bulk bitwise operations that largely exploits existing DRAM structure, and hence incurs low cost on top of commodity DRAM designs (1% of DRAM chip area).
Journal ArticleDOI

Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives

TL;DR: In this article, the authors provide rigorous experimental data from state-of-the-art MLC and TLC NAND flash devices on various types of flash memory errors, to motivate the need for such techniques.
Proceedings ArticleDOI

AVATAR: A Variable-Retention-Time (VRT) Aware Refresh for DRAM Systems

TL;DR: AVATAR is proposed, a VRT-aware multirate refresh scheme that adaptively changes the refresh rate for different rows at runtime based on current VRT failures, and provides a time to failure in the regime of several tens of years while reducing refresh operations by 62%-72%.
References
More filters
Book

Design of Analog CMOS Integrated Circuits

Behzad Razavi
TL;DR: The analysis and design techniques of CMOS integrated circuits that practicing engineers need to master to succeed can be found in this article, where the authors describe the thought process behind each circuit topology, but also consider the rationale behind each modification.
Journal ArticleDOI

Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors

TL;DR: This paper exposes the vulnerability of commodity DRAM chips to disturbance errors, and shows that it is possible to corrupt data in nearby addresses by reading from the same address in DRAM by activating the same row inDRAM.

Design Of Analog Cmos Integrated Circuits

TL;DR: The design of analog cmos integrated circuits is universally compatible with any devices to read and is available in the book collection an online access to it is set as public so you can download it instantly.
Journal ArticleDOI

RAIDR: Retention-Aware Intelligent DRAM Refresh

TL;DR: This paper proposes RAIDR (Retention-Aware Intelligent DRAM Refresh), a low-cost mechanism that can identify and skip unnecessary refreshes using knowledge of cell retention times and group DRAM rows into retention time bins and apply a different refresh rate to each bin.
Journal ArticleDOI

Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

TL;DR: This work proposes a new, self-optimizing memory controller design that operates using the principles of reinforcement learning (RL), and shows that an RL-based memory controller improves the performance of a set of parallel applications run on a 4-core CMP by 19% on average and it improves DRAM bandwidth utilization by 22% compared to a state-of-the-art controller.
Related Papers (5)
Frequently Asked Questions (16)
Q1. What are the contributions in "Adaptive-latency dram: optimizing dram timing for the common-case" ?

In this paper, the authors show that typical DRAM chips operating at typical temperatures ( e. g., 55◦C ) are capable of providing a much smaller access latency, but are nevertheless forced to operate at the largest latency of the worst-case. Their goal in this paper is to exploit the extra margin that is built into the DRAM timing parameters to improve performance. Based on this characterization, the authors propose Adaptive-Latency DRAM ( AL-DRAM ), a mechanism that adaptively reduces the timing parameters for DRAM modules based on the current operating condition. The authors show that ALDRAM improves the performance of memory-intensive workloads by an average of 14 % without introducing any errors. The authors discuss and show why AL-DRAM does not compromise reliability. The authors conclude that dynamically optimizing the DRAM timing parameters can reliably improve system performance. 

Owing to the large resistance and the large capacitance of the bitline, the cell experiences a large RC-delay, which increases the time it takes for the cell to become fullycharged. 

due to process variation, some outlier cells suffer from a larger RC-delay than other cells, and require more time to be charged. 

In practice, the cell is usually not completely charged because of a phenomenon called leakage, wherein the cell capacitor loses charge over time. 

The authors measured the DRAM ambient temperature in a server cluster running a memory-intensive benchmark, and found that the temperature never exceeds 34◦C — as well as never changing by more than 0.1◦C per second. 

The manufacturers, in turn, are driven by the extremely cost-sensitive nature of the DRAM market, which encourages them to adopt pessimistic timing parameters rather than to (i) discard chips with the slowest cells or (ii) test chips at lower temperatures. 

the authors examine two physical phenomena that critically impact a DRAM cell’s ability to receive and retain charge: (i) process variation and (ii) temperature dependence. 

The authors first measure the safety-margin of a DRAM module by sweeping the refresh interval at the worst operating temperature (85◦C), using the standard timing parameters. 

Based on this experiment, the authors define the safe refresh interval for a DRAM module as the maximum refresh interval that leads to no errors, minus an additional margin of 8 ms, which is the increment at which the authors sweep the refresh interval. 

Using an FPGA-based testing platform [29, 31, 41], the authors then demonstrate that DRAM timing parameters can be shortened to reduce DRAM latency without sacrificing any observed degree of DRAM reliability. 

At the end of the precharging phase, nearly half the time (45%) is spent on extracting the last 5% of the charge from the bitline. 

Such pessimism on the part of the DRAM manufacturers is motivated by their desire to (i) increase chip yield and (ii) reduce chip testing time. 

This is why the read and write operations need to be profiled separately, since they are likely to sensitize errors in different sets of cells. 

Their mechanism consists of two steps: (i) identification of the best timing parameters for each DIMM/temperature, and (ii) enforcement, wherein the memory controller dynamically extracts each DIMM’s operating temperature and uses the best timing parameters for each DIMM/temperature combination. 

If there is a write operation, some additional time is required for the bitline and the cell to reach this state, which is expressed as a timing parameter called tWR. 

The remaining margin should be enough for DRAM to achieve correctness by overcoming process variation and temperature dependence (as the authors discussed in Section 4.3).