scispace - formally typeset
Open AccessProceedings ArticleDOI

A Parallel-friendly Majority Gate to Accelerate In-memory Computation

Reads0
Chats0
TLDR
A method to compute majority while reading from a transistor-accessed RRAM array, which could achieve a latency reduction of 70% and 50% when compared to IMPLY and NAND/NOR logic-based adders, respectively.
Abstract
Efforts to combat the ‘von Neumann bottleneck’ have been strengthened by Resistive RAMs (RRAMs), which enable computation in the memory array. Majority logic can accelerate computation when compared to NAND/NOR/IMPLY logic due to it’s expressive power. In this work, we propose a method to compute majority while reading from a transistor-accessed RRAM array. The proposed gate was verified by simulations using a physics-based model (for RRAM) and industry standard model (for CMOS sense amplifier) and, found to tolerate reasonable variations in the RRAMs’ resistive states. Together with NOT gate, which is also implemented in-memory, the proposed gate forms a functionally complete Boolean logic, capable of implementing any digital logic. Computing is simplified to a sequence of READ and WRITE operations and does not require any major modifications to the peripheral circuitry of the array. The parallel-friendly nature of the proposed gate is exploited to implement an eight-bit parallel-prefix adder in memory array. The proposed in-memory adder could achieve a latency reduction of 70% and 50% when compared to IMPLY and NAND/NOR logic-based adders, respectively.

read more

Content maybe subject to copyright    Report

A Parallel-friendly Majority Gate to Accelerate
In-memory Computation
John Reuben
Chair of Computer Science 3 - Hardware Architecture
Friedrich-Alexander-Universit
¨
at Erlangen-N
¨
urnberg (FAU)
91058 Erlangen, Germany
johnreuben.prabahar@fau.de
Stefan Pechmann
Chair of Communications Electronics
Universit
¨
at Bayreuth
95447 Bayreuth, Germany
stefan.pechmann@uni-bayreuth.de
Abstract—Efforts to combat the ‘von Neumann bottleneck’
have been strengthened by Resistive RAMs (RRAMs), which
enable computation in the memory array. Majority logic can
accelerate computation when compared to NAND/NOR/IMPLY
logic due to it’s expressive power. In this work, we propose a
method to compute majority while reading from a transistor-
accessed RRAM array. The proposed gate was verified by sim-
ulations using a physics-based model (for RRAM) and industry
standard model (for CMOS sense amplifier) and, found to tolerate
reasonable variations in the RRAMs’ resistive states. Together
with NOT gate, which is also implemented in-memory, the pro-
posed gate forms a functionally complete Boolean logic, capable
of implementing any digital logic. Computing is simplified to a
sequence of READ and WRITE operations and does not require
any major modifications to the peripheral circuitry of the array.
The parallel-friendly nature of the proposed gate is exploited to
implement an eight-bit parallel-prefix adder in memory array.
The proposed in-memory adder could achieve a latency reduction
of 70% and 50% when compared to IMPLY and NAND/NOR
logic-based adders, respectively.
Index Terms—Resistive RAM (RRAM), majority logic, major-
ity gate, memristor, 1 Transistor-1 Resistor(1T–1R), von Neu-
mann bottleneck, in-memory computing, compute-in-memory,
processing-in-memory, parallel-prefix adder
I. INTRODUCTION
T
HE movement of data between processing and memory
units in present day computing systems is their main
performance and energy-efficiency bottleneck, often referred
to as the ‘von Neumann bottleneck’ or ‘memory wall’. The
emergence of non-volatile memory technologies like Resistive
RAM (RRAM) has created opportunities to overcome the
memory wall by enabling computing at the residence of data.
RRAMs are two terminal devices (usually a Metal-Insulator-
Metal structure) capable of storing data as resistance. The
change of resistance is due to the formation or rupture of a
conductive filament, depending on the direction of the current
flow through the structure. The word ‘memristor’ is also used
by researchers to denote such a device, because it is essentially
a resistor with memory. Connecting such RRAM devices in
a certain manner, or by applying certain voltage patterns,
or by modifying the sensing circuitry, basic Boolean gates
(NOR, NAND, XOR, IMPLY logic) have been demonstrated
in RRAM arrays [1]–[6]. The motivation for such efforts is
to perform Boolean operations on data stored in the memory
array, without moving them out to a separate processing
circuit, thus mitigating the von Neumann bottleneck. Reviews
of such in-memory computing approaches are presented in
[7], [8]. To construct a memory array using such devices, two
configurations are common: 1Transistor–1Resistor (1T–1R)
and 1Selector–1Resistor (1S–1R). The 1T–1R configuration
uses a transistor as an access device for each cell, isolating
the accessed cell from its neighbours in the array. The 1S–1R
configuration uses a two-terminal device called a ‘selector’
which is fabricated in series with the memristive device.
The 1S–1R is area-efficient, but suffers from current leakage
(sneak–path problem) due to the inability to access a particular
cell without interfering with its neighbours [9].
Majority logic, a type of Boolean logic, is defined to be
true if more than half of the n inputs are true, where n is
odd. Hence, a majority gate is a democratic gate and can be
expressed in terms of Boolean AND/OR as M AJ(a, b, c) =
a.b + b.c + a.c, where a, b, c are Boolean variables. Although
majority logic was known since 1960, there has been a
revival in using it for computation in many emerging nan-
otechnologies (spin waves, magnetic Quantum-Dot cellular
automata, nano magnetic logic, Single Electron Tunneling).
Recent research [10]–[12] has confirmed that majority logic is
to be preferred not only because a particular nanotechnology
can realize it, but also because of its ability to implement
arithmetic-intensive circuits with less gates. It must be em-
phasized that majority logic did not become the dominant
logic to compute because it was more efficient to implement
NAND/NOR gate than a majority gate, in CMOS technology.
However, with many emerging nanotechnologies, this is not
the case anymore, therefore, majority logic needs to be re-
evaluated for its computing efficiency. In [13]–[15], majority
logic is implemented in RRAM by applying the two inputs of
the majority gate as voltages across its terminals, and the initial
state of the RRAM (which is also the third input) switches to
evaluate majority. Such an approach complicates the peripheral
circuitry and is also not parallel-friendly, because two of the
three inputs of a majority gate need to be applied as voltages
at wordline/bitline (see Fig.1(a)).
In this paper, we propose a majority gate whose structure
is conducive for parallel-processing in the memory array.
By activating three rows of the array simultaneously, the
This is author’s version of the accepted paper. For the published paper, see the 31st IEEE International Conference on
Application-specific Systems, Architectures and Processors (ASAP) proceedings in https://ieeexplore.ieee.org/
See Conference presentation (20 min video) at https://asap2020.cs.manchester.ac.uk/paper.php?id=72
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any
current or future media, including reprinting/republishing this material for advertising or promotional purposes,
creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component
of this work in other works.

WL
1
BL
1
SL
1
WL
2
WL
3
BL
2
BL
3
SL
2
SL
3
A
B
C
M
3
(A,B,C)
EN
Q
SA
I
READ
WL
1
BL
1
WL
3
M
3
(A,B,C)
A B C
B
BL
3
M
3
(A,B,C)
A B C
A
B
C
C
A
(a) (b)
A B C
D E F
G H K
(c)
Peripheral ckt.
Mapping for (a)
Mapping for (b)
Fig. 1: (a) In-memory majority gate of previous works [13]–[15]
(b) Proposed parallel-friendly gate (c) When multiple gates have
to be executed in parallel, the majority gates of previous works
[13]–[15] have to be mapped diagonally because two gates cannot
be executed in the same row/column. This manner of computation
complicates both the peripheral circuitry and memory controller
(inputs of the gates influence row/column decoding). In the proposed
method, multiple gates can be mapped to the same set of rows,
thereby simplifying the peripheral and the memory controller (inputs
of the gates are resistance of memory cells and row/column decoders
retain their functionality as in a conventional memory).
resistance of the RRAM cells in a column are in parallel
during the READ operation. A Sense Amplifier (SA) which
can accurately sense the effective resistance implements a ‘in-
memory’ majority gate. This manner of computing majority
enables parallelism and is energy-efficient (both reading and
writing is energy-efficient in 1T–1R when compared to 1S–
1R arrays due to the absence of sneak paths). To demonstrate
the potential of this method to accelerate computation, we
consider a parallel-prefix adder and formulate the steps to
perform eight-bit addition in a 1T–1R array. The remainder
of the paper is organized as follows. Section II-A presents the
principle of reading majority from a 1T–1R array. Since the
read operation is the crucial aspect of the proposed majority
gate, we present the detailed sensing methodology in Section
II-B. Further, we study tolerance to variations in resistive
states by performing Monte Carlo simulations. In Section
III we present the framework to compute in the memory
array, using the proposed majority gate. Section IV-A briefly
presents parallel-prefix technique and the structure of an eight-
bit parallel-prefix adder in terms of majority gates. The adder
is then mapped to a 1T–1R array using the proposed in-
memory computing technique, in Section IV-B. We compare
the proposed eight-bit adder with the state-of-the-art, followed
by conclusions in Section V.
II. MAJORITY GATE IN 1T–1R ARRAY
A. Majority gate: Operating principle
Consider an array of RRAM cells arranged in a 1T-1R
configuration, as depicted in Fig. 2. Each cell can be in-
dividually read/written into by activating the corresponding
wordline (W L) and applying appropriate voltage across the
cell (BL and SL). To read from a cell, the corresponding
W L is activated, a small current is injected into the cell and
the voltage across the cell is sensed in a voltage-mode SA i.e.
WL
1
BL
1
SL
1
BL
64
SL
64
WL
64
WL
2
WL
3
BL
2
BL
3
SL
2
SL
3
R
A
WL
D
S
BL
SL
RRAM
R
eff
= R
A
|| R
B
|| R
C
R
B
R
C
WL
4
WL
5
Fig. 2: When three rows are activated (W L
13
) simultaneously
in a 1T-1R array, the resistances of the three RRAM devices are
in parallel. An ‘in-memory’ majority gate can be implemented by
accurately sensing the effective resistance R
ef f
.
the BL voltage is sensed while the SL is grounded. Now, if
three rows are activated simultaneously during read operation
(Rows 1 to 3 in Fig. 2), the resistances in column 1 are in
parallel (neglecting the parasitic resistance of BL and SL).
During read, the access transistor will be in linear region, and
hence the transistor’s resistance will be
r
DS
=
1
µ
n
C
ox
(
W
L
)(V
GS
V
t
)
= 544 [16]. The effective
resistance between BL and SL will therefore be R
eff
=
(R
A
+ r
DS
)||(R
B
+ r
DS
)||(R
C
+ r
DS
) (R
A
||R
B
||R
C
),
if the drain-to-source resistance of transistor (r
DS
) is small
compared to LRS. Table I lists the truth table of 3-input major-
ity gate (M
3
(A, B, C)) and the effective resistance for all the
eight possibilities. To verify the proposed gate on a real RRAM
device, we choose the 1T-1R cell from IHP
1
. The 1T–1R
structure consists of a NMOS transistor manufactured in IHP’s
130 nm CMOS technology, whose drain is connected in series
to the RRAM. The RRAM is a T iN/Hf
1x
Al
x
O
y
/T i/T iN
stack integrated between Metal2 and Metal3 in the BEOL of
the CMOS process. IHP’s 1T–1R cells were modeled using
the Stanford-PKU RRAM model following the methodology
presented in [16]. The cells have a mean LRS and HRS
of 10 K and 133.3 K, respectively. Therefore, the R
eff
is 8.7 K when two or more cells are in HRS (shaded
grey in Table I) and 4.8 K when two or more cells are
in LRS. Consequently, a majority gate can be implemented
during a READ operation by precisely sensing R
eff
. As can
be deciphered from Table I, the crucial aspect of the proposed
gate is to be able to differentiate between R
001
eff
(two LRS and
one HRS) and R
110
eff
(two HRS and one LRS). Let’s denote
the resistance to be differentiated as sensing window,
Sensing window for majority = 8.7 K 4.8 K = 3.9 K
1
Innovations for High Performance Microelectronics– Leibniz-Institut f
¨
ur
innovative Mikroelektronik, Germany

for IHP’s cell (resistance window = 13.3).
TABLE I: Precisely sensing R
eff
results in majority: Logic
‘0’ is LRS (10 K) and logic ‘1’ is HRS (133.3 K)
A B C M
3
(A, B, C) R
ef f
R
ef f
0 0 0 0
LRS
3
3.3 K
0 0 1 0
HRS·LRS
LRS+2·HRS
4.8 K
0 1 0 0
HRS·LRS
LRS+2·HRS
4.8 K
0 1 1 1
HRS·LRS
HRS+2·LRS
8.7 K
1 0 0 0
HRS·LRS
LRS+2·HRS
4.8 K
1 0 1 1
HRS·LRS
HRS+2·LRS
8.7 K
1 1 0 1
HRS·LRS
HRS+2·LRS
8.7 K
1 1 1 1
HRS
3
44.4 K
B. Sensing methodology
As stated, the methodology to reliably translate R
eff
into
a CMOS-compatible voltage is the crucial aspect of the
proposed majority gate. R
001
eff
is 4.8 K and R
110
eff
is 8.7 K,
and differentiating such a resistance window ( 3.9K) needs
a robust SA. It must be noted that this will be exacerbated by
the variability exhibited by the RRAM devices. To meet this
requirement, a time-based SA recently proposed in [17] was
chosen. Different from conventional sensing schemes (voltage-
mode and current-mode), the time-based sensing scheme con-
verts the BL voltage (to be sensed) into a time delay and dis-
criminates in time-domain. This sensing scheme was originally
proposed to read data from STT-MRAM [17], which have a
resistance window of a few K. Therefore, it is ideal for the
proposed majority gate. Furthermore, this time-based sensing
achieves two to three orders of magnitude improvement in
sensing (BER) compared to conventional schemes, in addition
to being reference-less [17].
The time-based sensing circuit is essentially a voltage-to-
time converter followed by a time-domain comparator (D-flip
flop). Voltage-to-time conversion is achieved by the current-
starved inverter (transistors M
15
) followed by transistor M
6
and an inverter (Fig. 3). During READ, a current I
READ
is
injected into the 1T-1R cell (corresponding three W Ls are
activated and SL is grounded). Depending on the effective
resistance R
eff
, the BL reaches an appropriate voltage. In
the conceptual waveforms of Fig.3, it is assumed that BL
gets charged to 300 mV if R
eff
is a high resistance (8.7 K)
and 200 mV if R
eff
is a low resistance (4.8 K), for the
purpose of illustration. Such a V
BL
(few hundred mV) limits
the current flow through the inverter (transistor M
13
), hence
the name current-starved inverter. When EN goes high, the
current-starved inverter introduces a delay proportional to V
BL
i.e. a higher V
BL
incurs less delay. A V
BL
of 300 mV incurs
less delay and low-to-high transition of EN reaches the input
of the Flip-flop (I
F F
) faster i.e. at T
HRS
. For a lower V
BL
of 200 mV, the delay is greater and the low-to-high transition
I
READ
1T1R array
EN
BL
WL
SL
D
out
Time-Based
Sense Amp.
EN
M
1-3
M
4
M
5
M
6
D
Q
Q
V
BL
V
BL
t
delay
D
out
EN
delay
I
FF
EN
delay
I
FF
I
FF
D
out
HRS (V
BL
=300 mV)
LRS (V
BL
=200 mV)
= 1 if HRS
= 0 if LRS
EN
V
BL
200 or 300 mV
I
READ
= 35 uA
D
out
current-starved
T
DM
T
HRS
T
LRS
Fig. 3: A small current I
READ
injected into the cell converts the
resistance to a voltage which is fed to the time-based SA. A current-
starved inverter transforms this voltage into a proportional delay
which is sensed as a CMOS-compatible voltage by the D-FF [17].
occurs at T
LRS
. t
delay
is a chain of inverters programmed
to introduce a delay between T
HRS
and T
LRS
. EN
delay
, the
EN signal delayed by t
delay
acts as the edge trigger for the
D-FF. When EN
delay
goes high at T
DM
(Decision Moment),
it latches the signal at I
F F
and hence the D
out
is high for
high resistance (R
110
eff
= 8.7 K) and low for low resistance
(R
001
eff
= 4.8 K). It must be noted that for R
111
eff
= 44.4 K,
V
BL
will be much larger than 300 mV and will result in a
transition much before T
HRS
. Similarly, for R
000
eff
= 3.3 K,
V
BL
will be less than 200 mV and will result in a transition
much later than T
LRS
. Once designed to differentiate between
R
110
eff
and R
001
eff
, the time-based SA will output M
3
(A, B, C)
correctly for all the eight cases. Furthermore, the same SA can
be used to read a single bit by using a smaller I
READ
(and
activating a single W L during normal read operation). Hence
the proposed gate does not necessitate any modification to the
read-out circuit of the regular memory array.
The time-based sensing circuit of Fig. 3 was designed in
IHP’s 130 nm CMOS process, and simulated to verify the
functioning of the majority gate. I
READ
of 35 µA was injected
into the 1T-1R cell to sense the BL voltage. For R
001
eff
and
R
110
eff
, V
BL
was 282 mV and 410 mV, respectively. Since
the current-starved transistors M
13
are the crucial factor in
deciding the delay, they were made large (
W
L
=
1.5µm
0.39µm
) to
make the circuit less sensitive to CMOS process variations.
t
delay
was set to 3 ns using a chain of inverters with MOS
capacitive loads between them. RRAM cells exhibit variability
in their programmed resistive states cycle-to-cycle and device-
to-device [18]. Therefore the majority gate was evaluated by
taking RRAM variations into account. Since majority is com-
puted while reading (memory cell is not switched), the RRAM
was replaced with a resistor and variability was incorporated as
a Gaussian distribution in that resistor. The impact of process
variations was analysed using the statistical model files for
the CMOS transistors provided by the foundry. 2000 Monte

Fig. 4: Sample output of the time-based SA. At 13.5 ns, the EN
delay
goes high deciding the output. Only 100 MC simulations are plotted
(shaded light) with single typical case highlighted dark.
Carlo simulations were performed where the resistance of the
RRAM was Gaussian distributed with a standard deviation, σ
= 10% of mean RRAM resistance i.e σ
LRS
= 1 K and σ
HRS
= 13.33 K. With combined effects of RRAM variability and
process variability (in transistors of SA), the Bit Error Rate
(BER) was found to be 5.4%. Sample wave-forms are plotted
in Fig. 4. Further failure analysis of the majority gate (incorrect
sensing of R
001
eff
and R
110
eff
) revealed that it occurred only when
RRAM variability was more than 2σ from mean LRS/HRS (It
must be noted that 95% of resistances fall within 2σ from the
mean, in a Gaussian distribution).
III. FRAMEWORK TO COMPUTE IN 1T1R ARRAY
A. Functional completeness and memory controller
As shown in Fig. 5-(a), NOT operation can be implemented
in a 1T–1R array by simply latching
Q from the output of the
time-based SA during READ (D-Flip flop of Fig.3 outputs
Q and Q). This is accomplished by using a control signal
INV which is low during READ and majority operation (Q
is latched) and goes high only during NOT operation (Q is
latched). Majority together with NOT is functionally complete
i.e any Boolean logic can be expressed in terms of majority
and NOT gates [19]. In [19], the authors present Majority-
Inverter Graph (MIG), a new logic manipulation structure
consisting of three-input majority nodes and regular/inverted
edges. Fig.5-(b) is the MIG of a 1-bit full adder obtained by
MIGhty (MIG synthesis tool) and, any Boolean logic can be
synthesised in terms of majority and NOT gates in a similar
manner. Since both majority and NOT gates are implemented
EN
Q
Q
SL
SA
WL
BL
READ
NOT gate
Majority gate
A
A
Maj(A,B,C)
0
1
A
INV
I
READ
EN
Q
Q
SL
SA
WL
BL
A
0
1
INV
I
READ
EN
Q
Q
SL
SA
WL
BL
B
0
1
INV
I
READ
A
C
M
3
A
B
C
in
M
3
A
B
C
in
M
3
S (sum)
C
out
C
in
Memory READ
& memory WRITE
RRAM memory
array
Peripheral ckt.
Peripheral ckt.
P
e
r
i
p
h
e
r
a
l
c
k
t
.
M
e
m
o
r
y
c
o
n
t
r
o
l
l
e
r
Control signals & data
READ
WRITE
MIGhty
S =ABC
in
C
out
= AB+BC
in
+AC
in
(a)
(b)
(c)
Fig. 5: (a) NOT operation implemented with a 2:1 Mux at the
output of the time-based SA; all logic operations are essentially
READ operations (b) 1-bit full adder expressed as Majority-Inverter-
Graph using MIGhty synthesis tool [19], where M
3
represents 3-
input majority operation (c) With majority/NOT gate computed as
READ, multiple levels of logic can be executed by writing the data
back to the memory, simplifying computing to READ and WRITE
operations.
as READ, multiple levels of gates can be cascaded by writing
the read data back to the array. In essence, ‘computing’ is
simplified to a sequence of READ and WRITE operations,
orchestrated by the memory controller, as depicted in Fig.5-
(c).
The memory controller of a regular memory (be it DRAM-
based or NVM-based) is responsible for orchestrating the
READ and WRITE operation by issuing the control signals to
the peripheral circuitry of the array. In addition, the memory
controller must be augmented with additional capability to
execute majority and NOT operation. Since both majority and
NOT operations are READ operations in this logic family, the
controller does not require any major alterations. To execute a
majority operation, an additional control signal called M AJ
is needed, which is set to logic ‘1’ during majority operation
2
and, the address of the first row (out of three rows in which
majority is to be performed) is placed on the row decoder.
It must be noted that majority operation is executed on three
contiguous bits of data in a column and the triple row decoder
of section III-B will not only select the row corresponding
to the address placed on the row decoder, but also the next
two rows if MAJ is ‘1’. The column address is placed on
the column decoder to select the particular column in which
majority is executed and the SA is activated to get the output.
The NOT operation is the same as the READ operation with
the only exception being the controller issues the control signal
INV , which goes high to invert the read data at the output of
2
this signal acts as an additional input to the row decoder, Fig. 6

the SA (Fig. 5-(a)). The control signals activated during logic
operations are summarized in Table II.
TABLE II: Control signals for memory and logic operations
Operation WL BL SL EN(SA) IN V MAJ
READ single row
activated
to read
ckt.
grounded 1 0 0
NOT single row
activated
to read
ckt.
grounded 1 1 0
Majority three rows
activated
to read
ckt.
grounded 1 0 1
WRITE
‘0’
single row
activated
V
SET
grounded 0 0 0
WRITE
‘1’
single row
activated
grounded V
RESET
0 0 0
B. Triple-row decoder design
WL
0
WL
1
WL
2
WL
3
2:4 Dynamic Decoder
EN
D
0
D
0
D
1
D
1
Ф
EN
0
D
1
D
0
EN
1
D
3
D
2
EN
2
D
4
D
5
EN
3
D
6
D
7
WL
0
WL
1
WL
2
WL
3
WL
4
WL
5
WL
6
WL
7
WL
8
WL
9
WL
10
WL
11
WL
12
WL
13
WL
14
WL
15
A
D
D
R
E
S
S
T
R
A
N
S
L
A
T
O
R
L
O
G
I
C
A
3
A
2
A
1
A
0
MAJ
Ф
A
1
A
0
MAJ
EN
3
EN
2
EN
1
EN
0
A
2
A
3
D
2
D
3
D
1
D
0
D
7
,
D
5
D
6
,
D
4
Fig. 6: Triple-row decoding is achieved by interleaving mul-
tiple single-row decoders. When control signal MAJ is logic
‘0’ (READ/WRITE/NOT), W L
i
corresponding to row address
A
3
A
2
A
1
A
0
is selected. When M AJ is logic ‘1’ (majority),
W L
i
, W L
i+1
, W L
i+2
are selected.
A conventional decoder for a 1T–1R array can select one
row at a time, while the proposed majority gate needs three
rows to be selected simultaneously. Moreover, the row-decoder
must be versatile to switch between single-row activation and
triple-row activation seamlessly. This is because, as stated
in the previous section, one must be able to read/write a
single bit of the array (READ/WRITE/NOT) as well as read
three bits in a column (majority). To this end, we propose a
robust row decoder which is designed by interleaving multiple
single-row decoders. As depicted in Fig.6, a 4:16 triple-row
decoder can be designed by interleaving four 2:4 dynamic
NAND decoders
3
. Since single-row decoding must co-exist
with triple-row decoding, an address translator circuit is used
to switch between the two modes using MAJ as a control
3
a dynamic decoder uses a precharge signal φ, which when low, all W L
are driven to ‘0’. When φ goes high, W L
i
corresponding to D
1
D
0
goes
high, provided EN is ‘1’
signal. For example, to select a single row W L
5
, the address
is A
3
A
2
A
1
A
0
= ‘0101’ and MAJ = ‘0’. For these inputs,
the address translator outputs EN
3
EN
2
EN
1
EN
0
= ‘0010’
and D
7
D
6
D
5
D
4
D
3
D
2
D
1
D
0
= ‘XXXX01XX’ (green decoder
in Fig. 6 is enabled and it’s second row is selected, thereby
activating W L
5
). But, for the same row address A
3
A
2
A
1
A
0
= ‘0101’ and MAJ = ‘1’, the address translator outputs
EN
3
EN
2
EN
1
EN
0
= ‘1110’ and D
7
D
6
D
5
D
4
D
3
D
2
D
1
D
0
=
‘010101XX’ (blue, red and green decoders are enabled and
second row of each of them is selected, thereby activating
W L
5
, W L
6
and W L
7
). The address translator inputs MAJ
and A
3
A
2
A
1
A
0
and generates D
7
D
6
D
5
D
4
D
3
D
2
D
1
D
0
and
EN
3
EN
2
EN
1
EN
0
to achieve this desired functionality for
all the 16 cases. With the address translator logic (88 tran-
sistors), the triple-row decoder requires 200 transistors, while
a regular 4:16 dynamic decoder (only single row activation)
requires 136 transistors, a 47% increase in the row-decoder
area. The address translator does not add any significant
latency to the decoding process. The decoder was designed
in 130 nm IHP process and its functionality was verified and
decoding latency was found to be 496 ps.
C. Area of time-based Sense Amplifier
Fig. 7: Layout of time-based SA.
In this work, the primary motivation for pioneering a
parallel-friendly gate was to exploit it to accelerate addition, by
executing gates in parallel. It must be emphasized that the main
drawback of RRAM based in-memory adders is their latency
numerous cycles of Boolean operations (NAND, NOR,
IMPLY) are needed to perform addition, when compared to
CMOS. To evaluate the number of gates that can be executed
in parallel, we evaluated the area of the time-based SA. The
time-based SA of [17] could sense the BL voltage without an
op-amp, and, this was an important reason for adopting it for
our majority gate (conventional SAs use operational amplifier,
which consume huge silicon area). The layout of the time-
based SA of Fig.3 is drawn in Fig.7 and occupies an area of
20 × 3 = 60 µm
2
. It must be noted that this area estimate does
not include the area of the delay element since it is shared by
all the SA in the array. (t
delay
in Fig.3 is implemented as series
of inverters with MOS capacitive load between them). From
[20], the layout of a single 1T–1R cell occupies 450 nm ×
450 nm = 0.2 µm
2
in 130 nm (12.4 F
2
). If the SA is stacked
along its height of 3 µm, eight columns can share a SA. This
means that the number of majority gates that can be executed
in parallel in an array is the number of columns divided by a
factor of 8 i.e. 32 gates can be executed simultaneously in a
256×256 array, 8 gates in a 64×64 array etc.

Figures
Citations
More filters
Journal ArticleDOI

Rediscovering Majority Logic in the Post-CMOS Era: A Perspective from In-Memory Computing

TL;DR: In this review, memristive logic families which can implement MAJORITY gate and NOT are to be favored for in-memory computing, and one-bit full adders implemented in memory array using different logic primitives are compared and the efficiency of majority-based implementation is underscores.
Journal ArticleDOI

Accelerated Addition in Resistive RAM Array Using Parallel-Friendly Majority Gates

TL;DR: A method to implement a majority gate in a transistor-accessed ReRAM array during the READ operation, which forms a functionally complete Boolean logic, capable of implementing any digital logic.
Journal ArticleDOI

A Versatile, Voltage-Pulse Based Read and Programming Circuit for Multi-Level RRAM Cells

TL;DR: The measurement results prove the functionality of the read circuit and the programming system and demonstrate that the read system can distinguish up to eight different states with an overall resistance ratio of 7.9.
Proceedings ArticleDOI

Direct state transfer in MLC based memristive ReRAM devices for ternary computing

TL;DR: This paper presents the new concept and simulation results characterising the functionality for the new memristive ternary MLC for a ReRAM technology from Innovations for High Performance Microelectronics (IHP).
Journal ArticleDOI

A Novel In-Memory Wallace Tree Multiplier Architecture Using Majority Logic

TL;DR: In this article , the Wallace tree multiplier is used to implement the addition operation in each phase of the Wallace Tree and a high degree of gate-level parallelism is employed at the array level by executing multiple majority gates in the columns of the array.
References
More filters
Proceedings ArticleDOI

Majority-based synthesis for nanotechnologies

TL;DR: It is argued that synthesis tools, natively supporting the majority logic abstraction, are the technology enablers because they allow designers to validate majority-based nanotechnologies on large-scale benchmarks.
Book ChapterDOI

A Taxonomy and Evaluation Framework for Memristive Logic

TL;DR: This chapter presents a framework for comparing logic families by classifying them on the basis of fundamental properties, statefulness, proximity (to the memory array), and flexibility of computation, and proposes metrics to compare memristive logic families using analytic expressions for latency, energy efficiency, and area.
Proceedings ArticleDOI

A Time-based Sensing Scheme for Multi-level Cell (MLC) Resistive RAM

TL;DR: A circuit which senses the state of a MLC by converting the current drawn from the cell to voltage pulses, where the number of pulses is proportional to the current's magnitude, which is area efficient when compared to conventional parallel sensing approach.
Proceedings ArticleDOI

RRAM Crossbar Arrays for Storage Class Memory Applications: Throughput and Density Considerations

TL;DR: It is shown that for the optimal RRAM crosspoint architecture (2-layers with common bitline), massively multiple bank write is the solution to optimize density and write throughput to around 20-100Gbit/cm2 and 200-500MB/s respectively for 32 to 64 parallel access.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "A parallel-friendly majority gate to accelerate in-memory computation" ?

In this work, the authors propose a method to compute majority while reading from a transistoraccessed RRAM array. 

Trending Questions (1)
What are the reasons for the majority of almetric analysis using Research Gate?

The provided paper does not mention anything about almetric analysis or Research Gate.