scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Integration of STT-MRAM model into CACTI simulator

TL;DR: A system-level tool based on CACTI simulator is presented to assist memory system designers to generate high-performance and low-power cache memories comparing performance, energy consumption, and area with traditional SRAM.
Abstract: In the last decade, academies and private companies have actively explored emerging memory technologies STT-MRAM in particular is experiencing a rapid development but it is facing several challenges in terms of performance and reliability Several techniques at cell level have been proposed to mitigate such issues but currently few tools and methodologies exist to support designers in evaluating the impact that specific micro-level design choices can determine on the STT-MRAM macro design In this paper we present a system-level tool based on CACTI simulator to assist memory system designers We use our tool to generate high-performance and low-power cache memories comparing performance, energy consumption, and area with traditional SRAM

Summary (3 min read)

Introduction

  • Smullen et al. present a methodology and tool-chain for evaluating and comparing MTJs design [15].
  • CACTI is a widely used high-level cache and memory modeling tool [9] [10].
  • In order to prove the correctness of their tool, the authors generate STT-MRAM based cache memories with different sizes comparing the resulting performances with SRAM technology.
  • An overview about STT-MRAM technology in terms of operation principles and electrical model is given.

A. Basic Principles

  • STT-MRAM technology is built up upon the magnetic tunneling junction (MTJ) device which aims at persistently store logic data.
  • Commonly, an MTJ device is composed of two ferromagnetic layers (FLs) interleaved with one oxide barrier layer.
  • FLs are characterized by their magnetic orientation: one has a fixed magnetic orientation (fixed layer) and the other has a freely rotating magnetic orientation (free layer).
  • By applying a sufficiently dense current pulse through the MTJ device, the free layer magnetic direction can be dynamically switched.

B. Electrical Model

  • When the FLs exhibit the same magnetic orientation, the MTJ has a low electrical resistance, whereas MTJ experiences high electrical resistance in presence of antiparallel configuration.
  • According to the relative magnetic orientations of the two layers, the electrical resistance of the MTJ is different.
  • The most popular is the 1T-1MTJ whose structure is composed of one NMOS transistor and one MTJ device connected in series.

C. Writing Operation

  • Many device-related parameters (e.g., MTJ area, material property) determine the write current amplitude that is required to change the free later magnetic direction.
  • Moreover, it behaves differently according to the current pulse width.
  • Based on the trade-off between write current amplitude and write pulse width, three distinct switching modes were identified [12]: thermal activation (TH), processional switching (PR), and dynamic reversal (DY) (Fig. 3).
  • Looking at Figure 3, it is evident that when operating in processional switching zone small differences in write pulse width determine wide variation in current density.
  • On the other hand, in the thermal activation area the required switching current increases very slowly even though the current pulse width is dramatically increased.

D. Reading Operation

  • This current is, then, compared against a reference value (IREF) to discriminate the stored logic state.
  • It is worth noticing here that both reading currents used to discriminate the logic state have the same order of magnitude.
  • For this reason, a Sense Amplifier is commonly used to compare IR and IREF to determine the actual logic state of the cell.
  • Different circuital schemas can be implemented to generate the reference current.
  • One of the reference cells is in the parallel (low resistance) state while the other is in the antiparallel (high resistance) state.

E. Data Retention

  • One of the most important parameter characterizing storage class memory devices is the amount of time the information is reliably stored into a cell.
  • The data retention time of an STTMRAM bit-cell depends on thermal stability of the MTJ.
  • It is usually evaluated by Equation (5): 𝑅G = 𝜏0𝑒H (5) The dependence of the retention time from Δ is exponential: the higher thermal stability, the longer retention time.
  • Nevertheless, designing MTJ to increase the thermal stability corresponds to higher write energy.

F. CACTI

  • CACTI is a widely used open-source high-level cache and memory modeling tool [13] [14] supported by HP Labs.
  • CACTI models both traditional and non-uniform banked caches and memories using SRAM, and DRAM of which it can compute delay, power, and area.
  • For a user-specified set of input parameters (e.g., energy/delay, memory size), the tool performs an exhaustive design space exploration across different array sizes and on-chip interconnections to identify, if existing, an optimal configuration that meets the input constraints.
  • The authors research work aims at extending CACTI to support inplane STT-MRAM technology.
  • By modeling bit-line, read circuitry, delay, area and energy consumption, additional parameters are combined with existing analytical models and seamless integrated with CACTI.

A. Array Modeling

  • By integrating analytical models along with parameters extracted from ITRS roadmaps [17], CACTI supports modeling of array of targeted cache or memory devices.
  • Each bank is composed of one or more subbanks which are comprised of identical mats.
  • A Mat has 4 subarray which share pre-decoding logic and each subarray contain a set of wordlines and bitlines to access the basic memory cells.
  • To support STT-MRAM technology, the authors mainly focus on mat and subarray.

C. Read Latency Model

  • In order to estimate read latency the authors model both the bitline and the sense amplifier (SA).
  • Nevertheless, CACTI currently has only models for voltage-base SA.
  • The circuital schema involves two reference cells and three PMOS transistor to implement the current-to-voltage converter.
  • Interested readers can refer to [16], for further details.

D. Write Latency Model

  • The difference between read and write latency is quite relevant in STT-MRAM memories.
  • Moreover, the required write voltage is between 1 and 2 volts whereas a smaller bias voltage (0.1V ~ 0.3V) is needed for reading.
  • There exist a strong dependence between the write voltage and the expected write latency.
  • Moreover, since CACTI does not provide a mechanism to input a distribution of desired logic values to be written, the authors only consider the switching case from parallel to anti-parallel magnetization of the free layer that is the worst case in terms of latency.
  • But this contribution is not sufficient to estimate the overall latency as each STT-MRAM is connected to an access transistor to mitigate write disturbs and to reduce the energy consumption.

E. Area Estimation Model

  • The area of STT-MRAM cell strongly depends on the design of the access transistor.
  • Determining the proper size of the access transistor is one of the most critical aspects of the cell design.
  • The analytical model integrated in CACTI for cell area estimation is given in the Equation (6).
  • There is an inverse proportionality between them: a high resistance corresponds to a small cell area and high storage density, instead a low resistance increases considerably memory area.
  • Interconnections considerably impact on resulting memory size, as well.

F. Energy Estimation Model

  • For sake of completeness, the authors consider write and read energy model individually.
  • A lower read voltage reduces the probability of read disturbs while a high value privileges read latency.
  • The computation of write energy can be divided in two main contributions (see Equation (7)).
  • (7) where Vwrite is the write voltage, RMTJ is the equivalent MTJ resistance, Racc is the equivalent NMOS resistance and τwrite is the MTJ switching time.
  • In the previous section, the authors described modeling and integration of in-plane STT-MRAM technology into CACTI tool.

A. High-Performance Cache Memories

  • For this study the authors generate high-performance, eight-way setassociative cache memories with no error correction mechanism which range in size from 32 kB to 512 kB.
  • Transistors are modeled by resorting to high performance cells (itrs-hp) for both the data and tag array and peripheral circuit.
  • Figure 4 (h) compares the read latency of the three different MTJ configurations with respect to SRAM.
  • This is due to its small cell area given by the high resistance of the access transistor.

B. Low-Power Cache Memories

  • Figure 4 (c) shows the read latency for low-power cache memories.
  • The observed trend is quite similar to the one previously described in Figure (h).
  • The motivation is that CACTI performs several optimizations, according to user constraints, that can change the internal partition of the array.
  • The density improvements that STT-MRAM arrays can attain over SRAM arrays allow in-plane STT-MRAM to be a valid technology solution to design low-power cache memories compared to SRAM when read intensive applications are targeted , and Figure 4 (b)).

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

10 August 2022
POLITECNICO DI TORINO
Repository ISTITUZIONALE
Integration of STT-MRAM model into CACTI simulator / Indaco, M.; DI CARLO, Stefano; Vatajelu, E. I.; Prinetto, Paolo
Ernesto; Arcaro, S.; Pala, D.. - ELETTRONICO. - (2014), pp. 67-72. ((Intervento presentato al convegno 9th IEEE
International Design and Test Symposium (IDT) tenutosi a Algiers, DZ nel 16-18 Dec. 2014
[10.1109/IDT.2014.7038589].
Original
Integration of STT-MRAM model into CACTI simulator
Publisher:
Published
DOI:10.1109/IDT.2014.7038589
Terms of use:
openAccess
Publisher copyright
(Article begins on next page)
This article is made available under terms and conditions as specified in the corresponding bibliographic description in
the repository
Availability:
This version is available at: 11583/2587977 since: 2016-10-07T16:48:52Z
IEEE

Integration of STT-MRAM model into CACTI
simulator
S. Arcaro, S. Di Carlo, M. Indaco, D. Pala, P. Prinetto, Elena I. Vatajelu
Politecnico di Torino
Dip. di Automatica e Informatica
Turin, Italy
{firstname.lastname}@polito.it
AbstractIn the last decade, academies and private companies
have actively explored emerging memory technologies. STT-
MRAM in particular is experiencing a rapid development but it is
facing several challenges in terms of performance and reliability.
Several techniques at cell level have been proposed to mitigate
such issues but currently few tools and methodologies exist to
support designers in evaluating the impact that specific micro-
level design choices can determine on the STT-MRAM macro
design. In this paper we present a system-level tool based on
CACTI simulator to assist memory system designers. We use our
tool to generate high-performance and low-power cache memories
comparing performance, energy consumption, and area with
traditional SRAM.
KeywordsSTT-MRAM, CACTI, Emerging Memories
I. INTRODUCTION
The focus of emerging memories is placed on non-volatile
technologies which should meet the high demands of tomorrow
applications. That includes non-volatility, high performance and
high density similar to SRAMs and DRAMs respectively, good
endurance features, small devices sizes, good integration, low
power profile, resistance to radiation effects, and ability to scale
below 20nm.
One of the most promising candidate as embedded memory
is the spin-transfer torque magnetic random access memory
(STT-RAM) [1] offering faster read and write access time
(nanoseconds) and better CMOS integration compared to other
proposed technologies such as Phase-Change RAM (PCRAM)
[2], Resistive RAM (RRAM) [3] and Ferromagnetic RAM
(FeRAM) [4]. The key building block of STT-MRAM cell is the
magnetic tunneling junction (MTJ) that is integrated with
CMOS circuitry using 3-D technology [5]. The smallest STT-
MRAM cell design is a 1T1MTJ (one transistor, one magnetic
tunneling junction) device. Logical data is stored by applying the
spin polarized current through the MTJ element to switch the
memory states.
Anyway, with scaling, STT-MRAM cell is facing a set of
challenges that strongly influence performances and reliability,
severely affecting the yield of the memory array. Such issues are
mainly related to a) process variations of MOS and MTJ devices
involving the variation of geometry size, threshold voltage, and
magnetic materials [5], [6] b) the high write cost due to high
switching current required to flip the MTJ state [7], , and c) the
thermal fluctuations in the MTJ switching [8].
To tackle such issues, efficient design paradigm at cell level
from circuit and/or architecture perspective to improve the cell
robustness and integration density have been proposed.
However, achieved results for STT-MRAM cell design may be
not directly adapted to meet high-level design requirements.
It is of utmost importance to quantify and to assess the
performance degradation in terms of write/read latency, power
consumption, and area that can potentially affect the behavior of
the whole memory array when specific requirements-driven
designs at cell level are targeted.
For this reason, more comprehensive tools and
methodologies are necessary to provide flexibility for design
experiments. In this context, Smullen et al. present a
methodology and tool-chain for evaluating and comparing MTJs
design [15]. In [11] authors propose a fixed analytical STT-
MRAM model in CACTI, to analyze the power reduction in
modern microprocessors when SRAM is replaced with STT-
MRAM. CACTI is a widely used high-level cache and memory
modeling tool [9] [10].
In this paper we present a system-level tool based on CACTI
simulator to estimate area, energy consumption and write/read
latency of STT-MRAM based cache memories. The tool
supports a parameterizable interface where a wide set of physical
parameters of STT-MRAM technology can be specified. The
implemented extensions enable our tool to be integrated with
system-level emulation tools such as QEMU, as well. In order
to prove the correctness of our tool, we generate STT-MRAM
based cache memories with different sizes comparing the
resulting performances with SRAM technology. The proposed
tool, thus, can support the design of cache or main memories by
evaluating the impact that specific micro-level design choices
can determine on the STT-MRAM macro design. The tool is
made available and it can be freely downloadable from the
website of our reaserch group: http://www.testgroup.polito.it/.
The paper is organized as follows: Section II describes
operation principles of STT-MRAM technology and shortly
CACTI tool. In Section III modeling and parameterization of
STT-MRAM technology that we implemented in CACTI is
discussed while in Section IV a comparison of three MTJ
configurations for each use-case is given. Section V concludes
the paper.

II. BACKGROUND
In this section, an overview about STT-MRAM technology
in terms of operation principles and electrical model is given.
Finally, the main features of CACTI tool are described.
A. Basic Principles
STT-MRAM technology is built up upon the magnetic
tunneling junction (MTJ) device which aims at persistently store
logic data. Commonly, an MTJ device is composed of two
ferromagnetic layers (FLs) interleaved with one oxide barrier
layer. FLs are characterized by their magnetic orientation: one
has a fixed magnetic orientation (fixed layer) and the other has a
freely rotating magnetic orientation (free layer). By applying a
sufficiently dense current pulse through the MTJ device, the free
layer magnetic direction can be dynamically switched.
B. Electrical Model
When the FLs exhibit the same magnetic orientation, the
MTJ has a low electrical resistance, whereas MTJ experiences
high electrical resistance in presence of antiparallel
configuration. Typically, the low electrical resistance (R
MTJ
=
R
L
) is associated with logic state ‘0’ and the high electrical
resistance (R
MTJ
= R
H
) is associated with the logic state ‘1’, as
depicted in Fig. 1.
Figure 1: MTJ configurations
According to the relative magnetic orientations of the two
layers, the electrical resistance of the MTJ is different. The
tunneling magnetoresistance (TMR) is defined as the relative
resistance change between the two magnetized states. TMR is a
figure of merit of MTJ design and it is often analyzed by
resorting to Equation (1):
!"# $ %
&
'
(&
)
&
)
(1)
An higher TRM value is commonly preferred since it means
that a more robust read operation can be performed. Values
above 100% are typically preferred.
Despite of the wide set of STT-MRAM cell designs, the most
popular is the 1T-1MTJ whose structure is composed of one
NMOS transistor and one MTJ device connected in series. Due
to wide set of technological information that are available in
literature, we target in-plane 1T-1MTJ cell in this paper whose
equivalent electric circuit is provided in Fig. 2. Bit Line (BL),
Source Line (SL), and Word Line (WL) aim at operate cell
access.
The MTJ is modeled as a variable electrical resistance whose
value depends on voltage applied across the device. Typically,
the free layer is connected to BL. In this topology, when forcing
MTJ in R
L
state, positive voltage difference is applied between
BL and SL and the anti-parallel to parallel write current is
required. On the contrary, when MTJ is established in R
H
state,
negative voltage difference is applied between BL and SL and
the anti-parallel to parallel write current is required.
Figure 2: STT-MRAM electrical model
C. Writing Operation
Many device-related parameters (e.g., MTJ area, material
property) determine the write current amplitude that is required
to change the free later magnetic direction. Moreover, it behaves
differently according to the current pulse width. Generally, if a
longer current pulse is applied, a lower current density is
required to switch the MTJ state. Based on the trade-off between
write current amplitude and write pulse width, three distinct
switching modes were identified [12]: thermal activation (TH),
processional switching (PR), and dynamic reversal (DY) (Fig.
3). The equations are prompted as follows:
*
+,-.
/ $ *
+0
12 3
2
4
56
/
/
0
7
(τ > 20ns)
(2)
*
+,8&
/ $ *
+0
9
:
/
;
(τ < 3ns)
(3)
*
+,<=
/ $
*
+,-.
/ 9 *
+,8&
>/?@
(A>B(B
C
?
2 9 @
(A>B(B
C
?
(3ns < τ < 20ns)
(4)
where *
D0
is the critical switching current density (i.e., the
current density in presence of zero temperature), /
0
is inverse of
attempt frequency (typically equals to 1ns). :, E, F, and /
+
are
fitting constants. The thermal stability Δ is a key factor of the
MTJ. It depends on thickness or area of free layer and on
magnetic properties of MTJ materials.
Figure 3: Dependence of switching current density on write pulse
width

Looking at Figure 3, it is evident that when operating in
processional switching zone small differences in write pulse
width determine wide variation in current density. On the other
hand, in the thermal activation area the required switching
current increases very slowly even though the current pulse
width is dramatically increased.
D. Reading Operation
When a read operation is performed a small bias voltage is
applied on the control lines, resulting in a current (IR). This
current is, then, compared against a reference value (IREF) to
discriminate the stored logic state. When IR is higher than the
IREF it means that the cell stores a logic value ‘0’, whereas if IR
is lower than IREF the cell stores a logic value ‘1’.
It is worth noticing here that both reading currents used to
discriminate the logic state have the same order of magnitude.
For this reason, a Sense Amplifier is commonly used to compare
IR and IREF to determine the actual logic state of the cell.
Different circuital schemas can be implemented to generate
the reference current. In [13] a pinned MTJ device is designed
to have an electrical resistance equals to the average value of
R
L
and R
H
. Another approach to generate the reference current
requires to adopt two MTJ cells. One of the reference cells is in
the parallel (low resistance) state while the other is in the anti-
parallel (high resistance) state. In this case, the resulting
reference resistance is computed as the average between the
low and high resistance values [14].
E. Data Retention
One of the most important parameter characterizing storage
class memory devices is the amount of time the information is
reliably stored into a cell. The data retention time of an STT-
MRAM bit-cell depends on thermal stability of the MTJ. It is
usually evaluated by Equation (5):
#
G
$ % /
0
@
H
(5)
The dependence of the retention time from Δ is exponential:
the higher thermal stability, the longer retention time.
Nevertheless, designing MTJ to increase the thermal stability
corresponds to higher write energy.
F. CACTI
CACTI is a widely used open-source high-level cache and
memory modeling tool [13] [14] supported by HP Labs. CACTI
has analytical models for all the basic building blocks of a
memory: decoder, sense-amplifier, crossbar, on-chip wires,
DRAM/SRAM cell and latch. CACTI models both traditional
and non-uniform banked caches and memories using SRAM,
and DRAM of which it can compute delay, power, and area. For
a user-specified set of input parameters (e.g., energy/delay,
memory size), the tool performs an exhaustive design space
exploration across different array sizes and on-chip
interconnections to identify, if existing, an optimal configuration
that meets the input constraints.
III. MODELING
Our research work aims at extending CACTI to support in-
plane STT-MRAM technology. By modeling bit-line, read
circuitry, delay, area and energy consumption, additional
parameters are combined with existing analytical models and
seamless integrated with CACTI. The first release supports the
simulation of set-associative cache memories.
A. Array Modeling
By integrating analytical models along with parameters
extracted from ITRS roadmaps [17], CACTI supports modeling
of array of targeted cache or memory devices. Memory is
divided into an array of banks. Each bank is composed of one or
more subbanks which are comprised of identical mats. A Mat
has 4 subarray which share pre-decoding logic and each
subarray contain a set of wordlines and bitlines to access the
basic memory cells. To support STT-MRAM technology, we
mainly focus on mat and subarray.
B. MTJ Model
The 1T-1MTJ cell is modeled by considering a NMOS
access transistor connected in series with a MTJ device. MTJ is
then modeled as a resistance whose values depends on the
relative magnetization of the free layer. We provide a fully
parameterized MTJ model to give the capability to explore a
wide set of designs. Table I shows the model input parameters.
Table 1: MTJ parameters integrated into CACTI
MTJ Parameter
Description
SttType
Type of MTJ. This version supports only in-plane
Jc0
Critical current at zero temperature
Δ
Thermal Stability
MTJArea
Area of MTJ
Rp
MTJ resistance in parallel magnetization
Rap
MTJ resistance in anti-parallel magnetization
Vbitline
Write voltage
Raccess
Equivalent resistance of the access transistor
The Delta parameter is used to compute the resulting
retention time by resorting to Eq. (5). The aforementioned MTJ
parameters are integrated in CACTI to model STT-MRAM cell
and to figure out read and write latency as described further on.
C. Read Latency Model
A read operation involves several phases. A specified
voltage is applied to a bitline and the resulting current passing
through MTJ is compared to a reference value. In order to
estimate read latency we model both the bitline and the sense
amplifier (SA). In STT-MRAM memories, the sensing operation
is performed by means of current-based SA. Nevertheless,
CACTI currently has only models for voltage-base SA.
Therefore, we adapt the current-based sensing operation of the
MTJ to the existing voltage-based SA. The circuital schema
involves two reference cells and three PMOS transistor to
implement the current-to-voltage converter. Interested readers
can refer to [16], for further details. This circuit is modeled using
SPICE at 45nm and it requires about 50ps for stabilization. It is
included into CACTI as additional delay to the existing SA. The
additional area and energy due to MTJ reference cells are also
accounted.

D. Write Latency Model
The difference between read and write latency is quite
relevant in STT-MRAM memories. Performing a write
operation is typically slower. Moreover, the required write
voltage is between 1 and 2 volts whereas a smaller bias voltage
(0.1V ~ 0.3V) is needed for reading.
There exist a strong dependence between the write voltage
and the expected write latency. Such a relationship is modeled
by Eq. (2), Eq. (3), and Eq. (4) that provide an accurate MTJ
write time estimation. The voltage used to estimate latency in
the analytical model is supposed to be constant during the write
operation and identical for both free layer orientations.
Moreover, since CACTI does not provide a mechanism to input
a distribution of desired logic values to be written, we only
consider the switching case from parallel to anti-parallel
magnetization of the free layer that is the worst case in terms of
latency.
But this contribution is not sufficient to estimate the overall
latency as each STT-MRAM is connected to an access transistor
(see Figure 2) to mitigate write disturbs and to reduce the energy
consumption. Therefore, without losing accuracy, the
computation of the overall write latency for a STT-MRAM data
array is equal to the read latency added to the MTJ write time.
E. Area Estimation Model
The area of STT-MRAM cell strongly depends on the design
of the access transistor. Let us consider that a cell is composed
of an access transistor and a MTJ stacked in a 3D structure. The
resulting area is mainly dominated by the element that requires
the larger planar surface that is generally the access transistor.
Determining the proper size of the access transistor is one of the
most critical aspects of the cell design. Due to technological
constraints, a small size improves reading latency whereas a
large size enhance write performances. The analytical model
integrated in CACTI for cell area estimation is given in the
Equation (6).
I
+JKK
$ L>
M
N
9 2?O
P
(6)
where F is the minimum feature size and W and L are the width
and length, respectively. The equivalent resistance of the access
transistor influences the length. There is an inverse
proportionality between them: a high resistance corresponds to
a small cell area and high storage density, instead a low
resistance increases considerably memory area.
The computation of the total area of the memory is not
dependent only from the size of cells. Interconnections
considerably impact on resulting memory size, as well. For this
reason, according to user requirements, CACTI attempts to
optimize on-chip memory interconnections to meet latency or
energy constraints.
F. Energy Estimation Model
For sake of completeness, we consider write and read energy
model individually. Read energy per operation is evaluated by
computing the Equation (7):
Q
RJST
$ :
GUG
V
RJST
P
(7)
where C
tot
depends on the total capacitance of the bitline, on the
all wire contributions and on the access transistor. V
read
is the
read voltage. A lower read voltage reduces the probability of
read disturbs while a high value privileges read latency.
The computation of write energy can be divided in two main
contributions (see Equation (7)). The former is related to the
energy consumption due to the current flowing through MTJ
device while the latter is similarly computed by exploiting the
model in Eq. (6):
Q
WRXGJ
$
Y
Z[\]^
_
&
`ab
&
cdd
/
WRXGJe
:
GUG
V
WRXGJ
P
(7)
where V
write
is the write voltage, R
MTJ
is the equivalent MTJ
resistance, R
acc
is the equivalent NMOS resistance and τ
write
is
the MTJ switching time. It is worth noticing here, that the
computation of write energy is performed accounting for the
worst case: the MTJ switches from parallel to anti-parallel state.
IV. EXPERIMNETAL RESULTS
In the previous section, we described modeling and
integration of in-plane STT-MRAM technology into CACTI
tool. In order to prove the correctness of our tool we generate
high-performance and low-power cache memories for three
different MTJ configurations compared with SRAM technology.
Considered MTJ input parameters are listed in Table 2. MTJ
configurations differ in terms of parallel and anti-parallel
resistance, the write voltage, and the equivalent resistance of the
access transistor.
Table 2: MTJ configurations
A
B
C
In-Plane
In-Plane
In-Plane
2
2
2
40.29
40.29
40.29
2·10
-10
2·10
-10
2·10
-10
1.5
1.5
1.2
3
3
1.8
1.8
1.3
1.8
1.5
0.3
0.3
A. High-Performance Cache Memories
For this study we generate high-performance, eight-way set-
associative cache memories with no error correction mechanism
which range in size from 32 kB to 512 kB. Each cache has 64 b
IN/OUT data interface with a single read-write port. Transistors
are modeled by resorting to high performance cells (itrs-hp) for
both the data and tag array and peripheral circuit. The usage of
itrs-hp maximizes performances at expense of power
consumption.
Figure 4 (h) compares the read latency of the three different
MTJ configurations with respect to SRAM. The fastest read
latency is achieved by SRAM. Among all the MTJ
configurations, the configuration A show the best timing.

Citations
More filters
Journal ArticleDOI
TL;DR: Proposed policies to reduce the leakage power consumption of NoC buffers by the use of non-volatile spin transfer torque random access memory (STT-RAM)-based buffers and improve lifetime by 3.2 times and 1093 times, respectively are presented.
Abstract: With the advancement in CMOS technology and multiple processors on the chip, communication across these cores is managed by a network-on-chip (NoC). Power and performance of these NoC interconnects have become a significant factor.The authors aim to reduce the leakage power consumption of NoC buffers by the use of non-volatile spin transfer torque random access memory (STT-RAM)-based buffers. STT-RAM technology has the advantages of high density and low leakage but suffers from low endurance. This low endurance has an impact on the lifetime of the router on the whole due to unwanted write-variations governed by virtual channel (VC) allocation policies. Here various VC allocation policies that help the uniform distribution of the writes across the buffers are proposed. Iso-capacity and iso-area-based alternatives to replace SRAM buffers with STT-RAM buffers are also presented. Pure STT-RAM buffers, however, impact the network latency. To mitigate this, a hybrid variant of the proposed policies which uses alternative VCs made of SRAM technology in the case of heavy network traffic is proposed. Experimental evaluation of full system simulation shows that proposed policies reduce the write variation by 99% and improve lifetime by 3.2 times and 1093 times, respectively. Also a 55.5% gain in the energy delay product is obtained.

2 citations

Proceedings ArticleDOI
10 Aug 2020
TL;DR: A write reduction technique, which is based on dirty flits present in write-back data packets, which results in a significant decrease in total and dynamic network power consumption and shows remarkable improvement in the lifetime.
Abstract: In a multi-core system, communication across cores is managed by an on-chip interconnect called Network-on-Chip (NoC). The utilization of NoC results in limitations such as high communication delay and high network power consumption. The buffers of the NoC router consume a considerable amount of leakage power. This paper attempts to reduce leakage power consumption by using Non-Volatile Memory technology-based buffers. NVM technology has the advantage of higher density and low leakage but suffers from costly write operation, and weaker write endurance. These characteristics impact on the total network power consumption, network latency, and lifetime of the router as a whole.In this paper, we propose a write reduction technique, which is based on dirty flits present in write-back data packets. The method also suggests a dirty flit based Virtual Channel (VC) allocation technique that distributes writes in NVM technology-based VCs to improve the lifetime of NVM buffers.The experimental evaluation on the full system simulator shows that the proposed policy obtains a 53% reduction in write-back flits, which results in 27% lesser total network flit on average. All these results in a significant decrease in total and dynamic network power consumption. The policy also shows remarkable improvement in the lifetime.

2 citations


Cites methods from "Integration of STT-MRAM model into ..."

  • ...We use CACTI-STT [4] to get SRAM and STT-RAM latency, read-write energy, and leakage power....

    [...]

Book ChapterDOI
21 Nov 2015
TL;DR: An integrated solution that is deployed in the cloud for monitoring all the network components, allowing administrator to verify connectivity of the equipment, their performances and network security.
Abstract: In the near future the number of equipment connected to the Internet will greatly increase, so that further development of applications meant to verify their operations will be required. Monitoring represents an important factor in improving the quality of the services provided in cloud computing, given the fact that it allows scaling resource utilization in an adaptive manner. This paper aims to provide a solution for the monitoring of network devices and services, allowing administrator to verify connectivity of the equipment, their performances and network security. The main contribution of the paper consists in proposing an integrated solution that is deployed in the cloud for monitoring all the network components. Finally, the paper discusses the main findings and advantages for a reference implementation of the monitoring system using a simulated network.

1 citations


Cites background from "Integration of STT-MRAM model into ..."

  • ...Cacti [3] is an open source monitoring application (free for users) based on a web server, being in fact a frontend for the standard monitoring technology RRDtool (Round Robin Database tool) [4]....

    [...]

Proceedings ArticleDOI
07 Apr 2021
TL;DR: In this article, the authors propose using a cache with partitions of different retention times, and further put forth a block placement and reallocation policy to use these different partitions effectively, guaranteeing a reduction in block expiry/writebacks.
Abstract: Spin-Transfer Torque RAM (STT-RAM) exhibits advantages like high density, non-volatility, and low leakage power consumption, making them a plausible successor to SRAM in caches. However, STT-RAM’s large write energy and latency constrain its potential for commercial usage in caches. Relaxing STTRAM’s retention time is one of the emerging and viable solutions to alleviate this roadblock, as this reduces the write time and energy. Reduction of retention time, however, leads to premature expiry of blocks requiring frequent refreshes or writebacks. These approaches cause unnecessary stalls and increase miss-rate.This paper proposes using a cache with partitions of different retention times. It further puts forth a block placement and reallocation policy to use these different partitions effectively. A block is said to be placed in an optimal partition if the block is either accessed or evicted before it expires. In particular, infrequently written blocks are allocated to higher retention time partitions, guaranteeing a reduction in block expiry/writebacks. During the execution, at regular intervals, blocks are migrated to appropriate retention time partitions depending on the application characteristics. Experimental evaluation shows significant improvement in performance and miss-rate compared to baseline allocation policies.

1 citations

Journal ArticleDOI
TL;DR: Keep the routers always powered ON to maintain constant connectivity and investigate various approaches to use a combination of SRAM and nonvolatile spin-transfer torque random access memory-based VCs in the routers, which yield significant energy savings while maintaining connectivity.
Abstract: In the era of dark silicon, several components on the chip [i.e., cores, memory, and network on chip (NoC)] need to be powered-off or run in low-power mode. This is mainly due to the increased leakage power consumption at smaller technology nodes. Other than the power consumed by cores and caches, power and performance of the interconnects is a significant factor as the communication network consumes a considerable share of the power budget. In particular, the buffers used at every port of the NoC router consume considerable dynamic as well as static power. To support dark silicon and save energy, a popular approach is to power off the routers and wake them up when needed. However, this affects the packet latency, and we need to observe the traffic through the nodes to decide turning the routers ON–OFF. In this article, we propose to keep the routers always powered ON to maintain constant connectivity and investigate various approaches. One proposal is to frequency scale the routers connected to powered OFF nodes, and the other proposals are to use a combination of SRAM and nonvolatile spin-transfer torque random access memory-based VCs in the routers. By managing which VCs to be active at a given time, we achieve energy savings. The proposals are evaluated by varying the percentage of dark nodes on the chip. The experimental results show that all proposals yield significant energy savings while maintaining connectivity.

1 citations


Cites methods from "Integration of STT-MRAM model into ..."

  • ...We use Cacti-STT [28] and NVSim [29] to get SRAM and STT-RAM latency, read–write energy, and leakage power....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: NVSim is developed, a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash and is expected to help boost architecture-level NVM-related studies.
Abstract: Various new nonvolatile memory (NVM) technologies have emerged recently. Among all the investigated new NVM candidate technologies, spin-torque-transfer memory (STT-RAM, or MRAM), phase-change random-access memory (PCRAM), and resistive random-access memory (ReRAM) are regarded as the most promising candidates. As the ultimate goal of this NVM research is to deploy them into multiple levels in the memory hierarchy, it is necessary to explore the wide NVM design space and find the proper implementation at different memory hierarchy levels from highly latency-optimized caches to highly density- optimized secondary storage. While abundant tools are available as SRAM/DRAM design assistants, similar tools for NVM designs are currently missing. Thus, in this paper, we develop NVSim, a circuit-level model for NVM performance, energy, and area estimation, which supports various NVM technologies, including STT-RAM, PCRAM, ReRAM, and legacy NAND Flash. NVSim is successfully validated against industrial NVM prototypes, and it is expected to help boost architecture-level NVM-related studies.

1,100 citations


"Integration of STT-MRAM model into ..." refers methods in this paper

  • ...The analytical model integrated in CACTI for cell area estimation is given in the Equation (6) [20]....

    [...]

Journal ArticleDOI
TL;DR: This work discusses the critical aspects that may affect the scaling of PCRAM, including materials properties, power consumption during programming and read operations, thermal cross-talk between memory cells, and failure mechanisms, and discusses experiments that directly address the scaling properties of the phase-change materials themselves.
Abstract: Nonvolatile RAM using resistance contrast in phase-change materials [or phase-change RAM (PCRAM)] is a promising technology for future storage-class memory. However, such a technology can succeed only if it can scale smaller in size, given the increasingly tiny memory cells that are projected for future technology nodes (i.e., generations). We first discuss the critical aspects that may affect the scaling of PCRAM, including materials properties, power consumption during programming and read operations, thermal cross-talk between memory cells, and failure mechanisms. We then discuss experiments that directly address the scaling properties of the phase-change materials themselves, including studies of phase transitions in both nanoparticles and ultrathin films as a function of particle size and film thickness. This work in materials directly motivated the successful creation of a series of prototype PCRAM devices, which have been fabricated and tested at phase-change material cross-sections with extremely small dimensions as low as 3 nm × 20 nm. These device measurements provide a clear demonstration of the excellent scaling potential offered by this technology, and they are also consistent with the scaling behavior predicted by extensive device simulations. Finally, we discuss issues of device integration and cell design, manufacturability, and reliability.

1,018 citations


"Integration of STT-MRAM model into ..." refers methods in this paper

  • ...One of the most promising candidate as embedded memory is the spin-transfer torque magnetic random access memory (STT-RAM) [1] offering faster read and write access time (nanoseconds) and better CMOS integration compared to other proposed technologies such as Phase-Change RAM (PCRAM) [2], Resistive RAM (RRAM) [3] and Ferromagnetic RAM (FeRAM) [4]....

    [...]

Proceedings ArticleDOI
05 Dec 2005
TL;DR: In this article, a spin torque transfer magnetization switching (STS) based nonvolatile memory called spin-RAM was presented for the first time, which is based on magnetization reversal through an interaction of a spin momentum-torque-transferred current and a magnetic moment of memory layers in magnetic tunnel junctions (MTJ).
Abstract: A novel nonvolatile memory utilizing spin torque transfer magnetization switching (STS), abbreviated spin-RAM hereafter, is presented for the first time The spin-RAM is programmed by magnetization reversal through an interaction of a spin momentum-torque-transferred current and a magnetic moment of memory layers in magnetic tunnel junctions (MTJs), and therefore an external magnetic field is unnecessary as that for a conventional MRAM This new programming mode has been accomplished owing to our tailored MTJ, which has an oval shape of 100 times 150 nm The memory cell is based on a 1-transistor and a 1-MTJ (ITU) structure The 4kbit spin-RAM was fabricated on a 4 level metal, 018 mum CMOS process In this work, writing speed as high as 2 ns, and a write current as low as 200 muA were successfully demonstrated It has been proved that spin-RAM possesses outstanding characteristics such as high speed, low power and high scalability for the next generation universal memory

961 citations


"Integration of STT-MRAM model into ..." refers methods in this paper

  • ...One of the most promising candidate as embedded memory is the spin-transfer torque magnetic random access memory (STT-RAM) [1] offering faster read and write access time (nanoseconds) and better CMOS integration compared to other proposed technologies such as Phase-Change RAM (PCRAM) [2], Resistive RAM (RRAM) [3] and Ferromagnetic RAM (FeRAM) [4]....

    [...]

01 Jan 2009
TL;DR: This report details the analytical model assumed for the newly added modules along with their validation analysis of CACTI 6.0, a significantly enhanced version of the tool that primarily focuses on interconnect design for large caches.
Abstract: © CACTI 6.0: A Tool to Model Large Caches Naveen Muralimanohar, Rajeev Balasubramonian, Norman P. Jouppi HP Laboratories HPL-2009-85 No keywords available. Future processors will likely have large on-chip caches with a possibility of dedicating an entire die for on-chip storage in a 3D stacked design. With the ever growing disparity between transistor and wire delay, the properties of such large caches will primarily depend on the characteristics of the interconnection networks that connect various sub-modules of a cache. CACTI 6.0 is a significantly enhanced version of the tool that primarily focuses on interconnect design for large caches. In addition to strengthening the existing analytical model of the tool for dominant cache components, CACTI 6.0 includes two major extensions over earlier versions: first, the ability to model Non-Uniform Cache Access (NUCA), and second, the ability to model different types of wires, such as RC based wires with different power, delay, and area characteristics and differential low-swing buses. This report details the analytical model assumed for the newly added modules along with their validation analysis. External Posting Date: April 21, 2009 [Fulltext] Approved for External Publication Internal Posting Date: April 21, 2009 [Fulltext] Published in International Symposium on Microarchitecture, Chicago, Dec 2007. Copyright International Symposium on Microarchitecture, 2007. CACTI 6.0: A Tool to Model Large Caches Naveen Muralimanohar, Rajeev Balasubramonian, Norman P. Jouppi † School of Computing, University of Utah ‡ Hewlett-Packard Laboratories Abstract Future processors will likely have large on-chip caches with a possibility of dedicating an entire die for on-chip storage in a 3D stacked design. With the ever growing disparity between transistor and wire delay, the properties of such large caches will primarily depend on the characteristics of the interconnection networks that connect various sub-modules of a cache. CACTI 6.0 is a significantly enhanced version of the tool that primarily focuses on interconnect design for large caches. In addition to strengthening the existing analytical model of the tool for dominant cache components, CACTI 6.0 includes two major extensions over earlier versions: first, the ability to model Non-Uniform Cache Access (NUCA), and second, the ability to model different types of wires, such as RC based wires with different power, delay, and area characteristics and differential low-swing buses. This report details the analytical model assumed for the newly added modules along with their validation analysis.Future processors will likely have large on-chip caches with a possibility of dedicating an entire die for on-chip storage in a 3D stacked design. With the ever growing disparity between transistor and wire delay, the properties of such large caches will primarily depend on the characteristics of the interconnection networks that connect various sub-modules of a cache. CACTI 6.0 is a significantly enhanced version of the tool that primarily focuses on interconnect design for large caches. In addition to strengthening the existing analytical model of the tool for dominant cache components, CACTI 6.0 includes two major extensions over earlier versions: first, the ability to model Non-Uniform Cache Access (NUCA), and second, the ability to model different types of wires, such as RC based wires with different power, delay, and area characteristics and differential low-swing buses. This report details the analytical model assumed for the newly added modules along with their validation analysis.

845 citations


"Integration of STT-MRAM model into ..." refers methods in this paper

  • ...CACTI is a widely used high-level cache and memory modeling tool [9] [10]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, experimental and numerical results of current-driven magnetization switching in magnetic tunnel junctions were presented, and three distinct switching modes, thermal activation, dynamic reversal, and precessional process, were identified within the experimental parameter space.
Abstract: We present experimental and numerical results of current-driven magnetization switching in magnetic tunnel junctions. The experiments show that, for MgO-based magnetic tunnelling junctions, the tunnelling magnetoresistance ratio is as large as 155% and the intrinsic switching current density is as low as 1.1 ? 106?A?cm?2. The thermal effect and current pulse width on spin-transfer magnetization switching are explored based on the analytical and numerical calculations. Three distinct switching modes, thermal activation, dynamic reversal, and precessional process, are identified within the experimental parameter space. The switching current distribution, write error, and read disturb are discussed based on device design considerations. The challenges and requirements for the successful application of spin-transfer torque as the write scheme in random access memory are addressed.

458 citations


"Integration of STT-MRAM model into ..." refers background in this paper

  • ...Based on the tradeplitude and write pulse width, were identified [12]: thermal switching (PR), and dynamic ations are prompted as follows:...

    [...]