scispace - formally typeset
Open AccessProceedings ArticleDOI

Integration of STT-MRAM model into CACTI simulator

Reads0
Chats0
TLDR
A system-level tool based on CACTI simulator is presented to assist memory system designers to generate high-performance and low-power cache memories comparing performance, energy consumption, and area with traditional SRAM.
Abstract
In the last decade, academies and private companies have actively explored emerging memory technologies STT-MRAM in particular is experiencing a rapid development but it is facing several challenges in terms of performance and reliability Several techniques at cell level have been proposed to mitigate such issues but currently few tools and methodologies exist to support designers in evaluating the impact that specific micro-level design choices can determine on the STT-MRAM macro design In this paper we present a system-level tool based on CACTI simulator to assist memory system designers We use our tool to generate high-performance and low-power cache memories comparing performance, energy consumption, and area with traditional SRAM

read more

Content maybe subject to copyright    Report

10 August 2022
POLITECNICO DI TORINO
Repository ISTITUZIONALE
Integration of STT-MRAM model into CACTI simulator / Indaco, M.; DI CARLO, Stefano; Vatajelu, E. I.; Prinetto, Paolo
Ernesto; Arcaro, S.; Pala, D.. - ELETTRONICO. - (2014), pp. 67-72. ((Intervento presentato al convegno 9th IEEE
International Design and Test Symposium (IDT) tenutosi a Algiers, DZ nel 16-18 Dec. 2014
[10.1109/IDT.2014.7038589].
Original
Integration of STT-MRAM model into CACTI simulator
Publisher:
Published
DOI:10.1109/IDT.2014.7038589
Terms of use:
openAccess
Publisher copyright
(Article begins on next page)
This article is made available under terms and conditions as specified in the corresponding bibliographic description in
the repository
Availability:
This version is available at: 11583/2587977 since: 2016-10-07T16:48:52Z
IEEE

Integration of STT-MRAM model into CACTI
simulator
S. Arcaro, S. Di Carlo, M. Indaco, D. Pala, P. Prinetto, Elena I. Vatajelu
Politecnico di Torino
Dip. di Automatica e Informatica
Turin, Italy
{firstname.lastname}@polito.it
AbstractIn the last decade, academies and private companies
have actively explored emerging memory technologies. STT-
MRAM in particular is experiencing a rapid development but it is
facing several challenges in terms of performance and reliability.
Several techniques at cell level have been proposed to mitigate
such issues but currently few tools and methodologies exist to
support designers in evaluating the impact that specific micro-
level design choices can determine on the STT-MRAM macro
design. In this paper we present a system-level tool based on
CACTI simulator to assist memory system designers. We use our
tool to generate high-performance and low-power cache memories
comparing performance, energy consumption, and area with
traditional SRAM.
KeywordsSTT-MRAM, CACTI, Emerging Memories
I. INTRODUCTION
The focus of emerging memories is placed on non-volatile
technologies which should meet the high demands of tomorrow
applications. That includes non-volatility, high performance and
high density similar to SRAMs and DRAMs respectively, good
endurance features, small devices sizes, good integration, low
power profile, resistance to radiation effects, and ability to scale
below 20nm.
One of the most promising candidate as embedded memory
is the spin-transfer torque magnetic random access memory
(STT-RAM) [1] offering faster read and write access time
(nanoseconds) and better CMOS integration compared to other
proposed technologies such as Phase-Change RAM (PCRAM)
[2], Resistive RAM (RRAM) [3] and Ferromagnetic RAM
(FeRAM) [4]. The key building block of STT-MRAM cell is the
magnetic tunneling junction (MTJ) that is integrated with
CMOS circuitry using 3-D technology [5]. The smallest STT-
MRAM cell design is a 1T1MTJ (one transistor, one magnetic
tunneling junction) device. Logical data is stored by applying the
spin polarized current through the MTJ element to switch the
memory states.
Anyway, with scaling, STT-MRAM cell is facing a set of
challenges that strongly influence performances and reliability,
severely affecting the yield of the memory array. Such issues are
mainly related to a) process variations of MOS and MTJ devices
involving the variation of geometry size, threshold voltage, and
magnetic materials [5], [6] b) the high write cost due to high
switching current required to flip the MTJ state [7], , and c) the
thermal fluctuations in the MTJ switching [8].
To tackle such issues, efficient design paradigm at cell level
from circuit and/or architecture perspective to improve the cell
robustness and integration density have been proposed.
However, achieved results for STT-MRAM cell design may be
not directly adapted to meet high-level design requirements.
It is of utmost importance to quantify and to assess the
performance degradation in terms of write/read latency, power
consumption, and area that can potentially affect the behavior of
the whole memory array when specific requirements-driven
designs at cell level are targeted.
For this reason, more comprehensive tools and
methodologies are necessary to provide flexibility for design
experiments. In this context, Smullen et al. present a
methodology and tool-chain for evaluating and comparing MTJs
design [15]. In [11] authors propose a fixed analytical STT-
MRAM model in CACTI, to analyze the power reduction in
modern microprocessors when SRAM is replaced with STT-
MRAM. CACTI is a widely used high-level cache and memory
modeling tool [9] [10].
In this paper we present a system-level tool based on CACTI
simulator to estimate area, energy consumption and write/read
latency of STT-MRAM based cache memories. The tool
supports a parameterizable interface where a wide set of physical
parameters of STT-MRAM technology can be specified. The
implemented extensions enable our tool to be integrated with
system-level emulation tools such as QEMU, as well. In order
to prove the correctness of our tool, we generate STT-MRAM
based cache memories with different sizes comparing the
resulting performances with SRAM technology. The proposed
tool, thus, can support the design of cache or main memories by
evaluating the impact that specific micro-level design choices
can determine on the STT-MRAM macro design. The tool is
made available and it can be freely downloadable from the
website of our reaserch group: http://www.testgroup.polito.it/.
The paper is organized as follows: Section II describes
operation principles of STT-MRAM technology and shortly
CACTI tool. In Section III modeling and parameterization of
STT-MRAM technology that we implemented in CACTI is
discussed while in Section IV a comparison of three MTJ
configurations for each use-case is given. Section V concludes
the paper.

II. BACKGROUND
In this section, an overview about STT-MRAM technology
in terms of operation principles and electrical model is given.
Finally, the main features of CACTI tool are described.
A. Basic Principles
STT-MRAM technology is built up upon the magnetic
tunneling junction (MTJ) device which aims at persistently store
logic data. Commonly, an MTJ device is composed of two
ferromagnetic layers (FLs) interleaved with one oxide barrier
layer. FLs are characterized by their magnetic orientation: one
has a fixed magnetic orientation (fixed layer) and the other has a
freely rotating magnetic orientation (free layer). By applying a
sufficiently dense current pulse through the MTJ device, the free
layer magnetic direction can be dynamically switched.
B. Electrical Model
When the FLs exhibit the same magnetic orientation, the
MTJ has a low electrical resistance, whereas MTJ experiences
high electrical resistance in presence of antiparallel
configuration. Typically, the low electrical resistance (R
MTJ
=
R
L
) is associated with logic state ‘0’ and the high electrical
resistance (R
MTJ
= R
H
) is associated with the logic state ‘1’, as
depicted in Fig. 1.
Figure 1: MTJ configurations
According to the relative magnetic orientations of the two
layers, the electrical resistance of the MTJ is different. The
tunneling magnetoresistance (TMR) is defined as the relative
resistance change between the two magnetized states. TMR is a
figure of merit of MTJ design and it is often analyzed by
resorting to Equation (1):
!"# $ %
&
'
(&
)
&
)
(1)
An higher TRM value is commonly preferred since it means
that a more robust read operation can be performed. Values
above 100% are typically preferred.
Despite of the wide set of STT-MRAM cell designs, the most
popular is the 1T-1MTJ whose structure is composed of one
NMOS transistor and one MTJ device connected in series. Due
to wide set of technological information that are available in
literature, we target in-plane 1T-1MTJ cell in this paper whose
equivalent electric circuit is provided in Fig. 2. Bit Line (BL),
Source Line (SL), and Word Line (WL) aim at operate cell
access.
The MTJ is modeled as a variable electrical resistance whose
value depends on voltage applied across the device. Typically,
the free layer is connected to BL. In this topology, when forcing
MTJ in R
L
state, positive voltage difference is applied between
BL and SL and the anti-parallel to parallel write current is
required. On the contrary, when MTJ is established in R
H
state,
negative voltage difference is applied between BL and SL and
the anti-parallel to parallel write current is required.
Figure 2: STT-MRAM electrical model
C. Writing Operation
Many device-related parameters (e.g., MTJ area, material
property) determine the write current amplitude that is required
to change the free later magnetic direction. Moreover, it behaves
differently according to the current pulse width. Generally, if a
longer current pulse is applied, a lower current density is
required to switch the MTJ state. Based on the trade-off between
write current amplitude and write pulse width, three distinct
switching modes were identified [12]: thermal activation (TH),
processional switching (PR), and dynamic reversal (DY) (Fig.
3). The equations are prompted as follows:
*
+,-.
/ $ *
+0
12 3
2
4
56
/
/
0
7
(τ > 20ns)
(2)
*
+,8&
/ $ *
+0
9
:
/
;
(τ < 3ns)
(3)
*
+,<=
/ $
*
+,-.
/ 9 *
+,8&
>/?@
(A>B(B
C
?
2 9 @
(A>B(B
C
?
(3ns < τ < 20ns)
(4)
where *
D0
is the critical switching current density (i.e., the
current density in presence of zero temperature), /
0
is inverse of
attempt frequency (typically equals to 1ns). :, E, F, and /
+
are
fitting constants. The thermal stability Δ is a key factor of the
MTJ. It depends on thickness or area of free layer and on
magnetic properties of MTJ materials.
Figure 3: Dependence of switching current density on write pulse
width

Looking at Figure 3, it is evident that when operating in
processional switching zone small differences in write pulse
width determine wide variation in current density. On the other
hand, in the thermal activation area the required switching
current increases very slowly even though the current pulse
width is dramatically increased.
D. Reading Operation
When a read operation is performed a small bias voltage is
applied on the control lines, resulting in a current (IR). This
current is, then, compared against a reference value (IREF) to
discriminate the stored logic state. When IR is higher than the
IREF it means that the cell stores a logic value ‘0’, whereas if IR
is lower than IREF the cell stores a logic value ‘1’.
It is worth noticing here that both reading currents used to
discriminate the logic state have the same order of magnitude.
For this reason, a Sense Amplifier is commonly used to compare
IR and IREF to determine the actual logic state of the cell.
Different circuital schemas can be implemented to generate
the reference current. In [13] a pinned MTJ device is designed
to have an electrical resistance equals to the average value of
R
L
and R
H
. Another approach to generate the reference current
requires to adopt two MTJ cells. One of the reference cells is in
the parallel (low resistance) state while the other is in the anti-
parallel (high resistance) state. In this case, the resulting
reference resistance is computed as the average between the
low and high resistance values [14].
E. Data Retention
One of the most important parameter characterizing storage
class memory devices is the amount of time the information is
reliably stored into a cell. The data retention time of an STT-
MRAM bit-cell depends on thermal stability of the MTJ. It is
usually evaluated by Equation (5):
#
G
$ % /
0
@
H
(5)
The dependence of the retention time from Δ is exponential:
the higher thermal stability, the longer retention time.
Nevertheless, designing MTJ to increase the thermal stability
corresponds to higher write energy.
F. CACTI
CACTI is a widely used open-source high-level cache and
memory modeling tool [13] [14] supported by HP Labs. CACTI
has analytical models for all the basic building blocks of a
memory: decoder, sense-amplifier, crossbar, on-chip wires,
DRAM/SRAM cell and latch. CACTI models both traditional
and non-uniform banked caches and memories using SRAM,
and DRAM of which it can compute delay, power, and area. For
a user-specified set of input parameters (e.g., energy/delay,
memory size), the tool performs an exhaustive design space
exploration across different array sizes and on-chip
interconnections to identify, if existing, an optimal configuration
that meets the input constraints.
III. MODELING
Our research work aims at extending CACTI to support in-
plane STT-MRAM technology. By modeling bit-line, read
circuitry, delay, area and energy consumption, additional
parameters are combined with existing analytical models and
seamless integrated with CACTI. The first release supports the
simulation of set-associative cache memories.
A. Array Modeling
By integrating analytical models along with parameters
extracted from ITRS roadmaps [17], CACTI supports modeling
of array of targeted cache or memory devices. Memory is
divided into an array of banks. Each bank is composed of one or
more subbanks which are comprised of identical mats. A Mat
has 4 subarray which share pre-decoding logic and each
subarray contain a set of wordlines and bitlines to access the
basic memory cells. To support STT-MRAM technology, we
mainly focus on mat and subarray.
B. MTJ Model
The 1T-1MTJ cell is modeled by considering a NMOS
access transistor connected in series with a MTJ device. MTJ is
then modeled as a resistance whose values depends on the
relative magnetization of the free layer. We provide a fully
parameterized MTJ model to give the capability to explore a
wide set of designs. Table I shows the model input parameters.
Table 1: MTJ parameters integrated into CACTI
MTJ Parameter
Description
SttType
Type of MTJ. This version supports only in-plane
Jc0
Critical current at zero temperature
Δ
Thermal Stability
MTJArea
Area of MTJ
Rp
MTJ resistance in parallel magnetization
Rap
MTJ resistance in anti-parallel magnetization
Vbitline
Write voltage
Raccess
Equivalent resistance of the access transistor
The Delta parameter is used to compute the resulting
retention time by resorting to Eq. (5). The aforementioned MTJ
parameters are integrated in CACTI to model STT-MRAM cell
and to figure out read and write latency as described further on.
C. Read Latency Model
A read operation involves several phases. A specified
voltage is applied to a bitline and the resulting current passing
through MTJ is compared to a reference value. In order to
estimate read latency we model both the bitline and the sense
amplifier (SA). In STT-MRAM memories, the sensing operation
is performed by means of current-based SA. Nevertheless,
CACTI currently has only models for voltage-base SA.
Therefore, we adapt the current-based sensing operation of the
MTJ to the existing voltage-based SA. The circuital schema
involves two reference cells and three PMOS transistor to
implement the current-to-voltage converter. Interested readers
can refer to [16], for further details. This circuit is modeled using
SPICE at 45nm and it requires about 50ps for stabilization. It is
included into CACTI as additional delay to the existing SA. The
additional area and energy due to MTJ reference cells are also
accounted.

D. Write Latency Model
The difference between read and write latency is quite
relevant in STT-MRAM memories. Performing a write
operation is typically slower. Moreover, the required write
voltage is between 1 and 2 volts whereas a smaller bias voltage
(0.1V ~ 0.3V) is needed for reading.
There exist a strong dependence between the write voltage
and the expected write latency. Such a relationship is modeled
by Eq. (2), Eq. (3), and Eq. (4) that provide an accurate MTJ
write time estimation. The voltage used to estimate latency in
the analytical model is supposed to be constant during the write
operation and identical for both free layer orientations.
Moreover, since CACTI does not provide a mechanism to input
a distribution of desired logic values to be written, we only
consider the switching case from parallel to anti-parallel
magnetization of the free layer that is the worst case in terms of
latency.
But this contribution is not sufficient to estimate the overall
latency as each STT-MRAM is connected to an access transistor
(see Figure 2) to mitigate write disturbs and to reduce the energy
consumption. Therefore, without losing accuracy, the
computation of the overall write latency for a STT-MRAM data
array is equal to the read latency added to the MTJ write time.
E. Area Estimation Model
The area of STT-MRAM cell strongly depends on the design
of the access transistor. Let us consider that a cell is composed
of an access transistor and a MTJ stacked in a 3D structure. The
resulting area is mainly dominated by the element that requires
the larger planar surface that is generally the access transistor.
Determining the proper size of the access transistor is one of the
most critical aspects of the cell design. Due to technological
constraints, a small size improves reading latency whereas a
large size enhance write performances. The analytical model
integrated in CACTI for cell area estimation is given in the
Equation (6).
I
+JKK
$ L>
M
N
9 2?O
P
(6)
where F is the minimum feature size and W and L are the width
and length, respectively. The equivalent resistance of the access
transistor influences the length. There is an inverse
proportionality between them: a high resistance corresponds to
a small cell area and high storage density, instead a low
resistance increases considerably memory area.
The computation of the total area of the memory is not
dependent only from the size of cells. Interconnections
considerably impact on resulting memory size, as well. For this
reason, according to user requirements, CACTI attempts to
optimize on-chip memory interconnections to meet latency or
energy constraints.
F. Energy Estimation Model
For sake of completeness, we consider write and read energy
model individually. Read energy per operation is evaluated by
computing the Equation (7):
Q
RJST
$ :
GUG
V
RJST
P
(7)
where C
tot
depends on the total capacitance of the bitline, on the
all wire contributions and on the access transistor. V
read
is the
read voltage. A lower read voltage reduces the probability of
read disturbs while a high value privileges read latency.
The computation of write energy can be divided in two main
contributions (see Equation (7)). The former is related to the
energy consumption due to the current flowing through MTJ
device while the latter is similarly computed by exploiting the
model in Eq. (6):
Q
WRXGJ
$
Y
Z[\]^
_
&
`ab
&
cdd
/
WRXGJe
:
GUG
V
WRXGJ
P
(7)
where V
write
is the write voltage, R
MTJ
is the equivalent MTJ
resistance, R
acc
is the equivalent NMOS resistance and τ
write
is
the MTJ switching time. It is worth noticing here, that the
computation of write energy is performed accounting for the
worst case: the MTJ switches from parallel to anti-parallel state.
IV. EXPERIMNETAL RESULTS
In the previous section, we described modeling and
integration of in-plane STT-MRAM technology into CACTI
tool. In order to prove the correctness of our tool we generate
high-performance and low-power cache memories for three
different MTJ configurations compared with SRAM technology.
Considered MTJ input parameters are listed in Table 2. MTJ
configurations differ in terms of parallel and anti-parallel
resistance, the write voltage, and the equivalent resistance of the
access transistor.
Table 2: MTJ configurations
A
B
C
In-Plane
In-Plane
In-Plane
2
2
2
40.29
40.29
40.29
2·10
-10
2·10
-10
2·10
-10
1.5
1.5
1.2
3
3
1.8
1.8
1.3
1.8
1.5
0.3
0.3
A. High-Performance Cache Memories
For this study we generate high-performance, eight-way set-
associative cache memories with no error correction mechanism
which range in size from 32 kB to 512 kB. Each cache has 64 b
IN/OUT data interface with a single read-write port. Transistors
are modeled by resorting to high performance cells (itrs-hp) for
both the data and tag array and peripheral circuit. The usage of
itrs-hp maximizes performances at expense of power
consumption.
Figure 4 (h) compares the read latency of the three different
MTJ configurations with respect to SRAM. The fastest read
latency is achieved by SRAM. Among all the MTJ
configurations, the configuration A show the best timing.

Citations
More filters

The 3D Stacking Bipolar RRAM for High Density

TL;DR: In this article, two 3D stacking structures built upon bipolar RRAM crossbars are proposed to enable multilayer accesses while avoiding the overwriting induced by the cross-layer disturbance.
Proceedings ArticleDOI

NVSim-VX s : An improved NVSim for variation aware STT-RAM simulation

TL;DR: A new member of NVSim family is introduced - NVSim-VXs, which enables statistical simulation of STT-RAM for write performance, errors, and energy consumption, and strongly supports the fast-growing needs of STt-RAM research on reliability analysis and enhancement.
Journal ArticleDOI

Write Variation Aware Buffer Assignment for Improved Lifetime of Non-Volatile Buffers in On-Chip Interconnects

TL;DR: This paper attempts to reduce static power consumption by using non-volatile memory technology-based spin-transfer torque random access memory (STT-RAM) buffers to reduce write variation to almost 0% and improve lifetime by 3.3 and 19.9 times for intra-VNet and inter-V net, respectively.

Developing Variation Aware Simulation Tools, Models, and Designs for STT-RAM

Enes Eken
TL;DR: This work introduces a new member of NVSim family – NVSim-VXs, which enables statistical simulation of STT-RAM for write performance, errors, and energy consumption and proposes two possible SHE-RAM designs from the aspects of two different write access operations.
Journal ArticleDOI

FN-CACTI: Advanced CACTI for FinFET and NC-FinFET Technologies

TL;DR: This work proposes an entire flow for obtaining/calibrating the transistor characteristics from a commercial technology and uses these characteristics within CACTI for the first time, and extends it to support negative capacitance fin field effect transistor (NC-FinFET), an emerging technology depictingnegative capacitance whose current and capacitive characteristics are very different compared to those of the FinFET.
References
More filters
Proceedings ArticleDOI

The STeTSiMS STT-RAM simulation and modeling system

TL;DR: The STeTSiMS STT-RAM Simulation and Modeling System is presented to assist memory systems researchers and it is demonstrated how to fit three different published MTJ models to the model and normalize their characteristics with respect to common metrics.

The 3D Stacking Bipolar RRAM for High Density

TL;DR: In this article, two 3D stacking structures built upon bipolar RRAM crossbars are proposed to enable multilayer accesses while avoiding the overwriting induced by the cross-layer disturbance.
Proceedings ArticleDOI

Accelerating enterprise solid-state disks with non-volatile merge caching

TL;DR: An auxiliary, byte-addressable, non-volatile memory is utilized to design a general purpose merge cache that significantly improves write performance and simple read policies are utilized that further improve the performance of the SSD without adding significant overhead.

Designing giga-scale memory systems with stt-ram

TL;DR: This dissertation presents tools and techniques for modeling and optimizing STT-RAM for use in high-speed memory system design and makes it possible to compare published magnetic tunnel junction designs and perform first-order evaluations of cache and memory designs.
Related Papers (5)