scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A Parallel-friendly Majority Gate to Accelerate In-memory Computation

TL;DR: A method to compute majority while reading from a transistor-accessed RRAM array, which could achieve a latency reduction of 70% and 50% when compared to IMPLY and NAND/NOR logic-based adders, respectively.
Abstract: Efforts to combat the ‘von Neumann bottleneck’ have been strengthened by Resistive RAMs (RRAMs), which enable computation in the memory array. Majority logic can accelerate computation when compared to NAND/NOR/IMPLY logic due to it’s expressive power. In this work, we propose a method to compute majority while reading from a transistor-accessed RRAM array. The proposed gate was verified by simulations using a physics-based model (for RRAM) and industry standard model (for CMOS sense amplifier) and, found to tolerate reasonable variations in the RRAMs’ resistive states. Together with NOT gate, which is also implemented in-memory, the proposed gate forms a functionally complete Boolean logic, capable of implementing any digital logic. Computing is simplified to a sequence of READ and WRITE operations and does not require any major modifications to the peripheral circuitry of the array. The parallel-friendly nature of the proposed gate is exploited to implement an eight-bit parallel-prefix adder in memory array. The proposed in-memory adder could achieve a latency reduction of 70% and 50% when compared to IMPLY and NAND/NOR logic-based adders, respectively.

Summary (3 min read)

Introduction

  • RRAMs are two terminal devices (usually a Metal-InsulatorMetal structure) capable of storing data as resistance.
  • Recent research [10]–[12] has confirmed that majority logic is to be preferred not only because a particular nanotechnology can realize it, but also because of its ability to implement arithmetic-intensive circuits with less gates.
  • Personal use of this material is permitted.
  • In Section III the authors present the framework to compute in the memory array, using the proposed majority gate.

A. Majority gate: Operating principle

  • Consider an array of RRAM cells arranged in a 1T-1R configuration, as depicted in Fig. 2. Each cell can be individually read/written into by activating the corresponding wordline (WL) and applying appropriate voltage across the cell (BL and SL).
  • Table I lists the truth table of 3-input majority gate (M3(A,B,C)) and the effective resistance for all the eight possibilities.
  • The 1T–1R structure consists of a NMOS transistor manufactured in IHP’s 130 nm CMOS technology, whose drain is connected in series to the RRAM.
  • IHP’s 1T–1R cells were modeled using the Stanford-PKU RRAM model following the methodology presented in [16].

B. Sensing methodology

  • As stated, the methodology to reliably translate Reff into a CMOS-compatible voltage is the crucial aspect of the proposed majority gate.
  • The time-based sensing circuit is essentially a voltage-totime converter followed by a time-domain comparator (D-flip flop).
  • ENdelay, the EN signal delayed by tdelay acts as the edge trigger for the D-FF.
  • Therefore the majority gate was evaluated by taking RRAM variations into account.

A. Functional completeness and memory controller

  • This is accomplished by using a control signal INV which is low during READ and majority operation (Q is latched) and goes high only during NOT operation (Q is latched).
  • Majority together with NOT is functionally complete i.e any Boolean logic can be expressed in terms of majority and NOT gates [19].
  • The memory controller of a regular memory (be it DRAMbased or NVM-based) is responsible for orchestrating the READ and WRITE operation by issuing the control signals to the peripheral circuitry of the array.
  • It must be noted that majority operation is executed on three contiguous bits of data in a column and the triple row decoder of section III-B will not only select the row corresponding to the address placed on the row decoder, but also the next two rows if MAJ is ‘1’.
  • The NOT operation is the same as the READ operation with the only exception being the controller issues the control signal INV , which goes high to invert the read data at the output of 2this signal acts as an additional input to the row decoder, Fig. 6 the SA (Fig. 5-(a)).

B. Triple-row decoder design

  • A conventional decoder for a 1T–1R array can select one row at a time, while the proposed majority gate needs three rows to be selected simultaneously.
  • To this end, the authors propose a robust row decoder which is designed by interleaving multiple single-row decoders.
  • When φ goes high, WLi corresponding to D1D0 goes high, provided EN is ‘1’ signal.
  • The address translator does not add any significant latency to the decoding process.

C. Area of time-based Sense Amplifier

  • It must be emphasized that the main drawback of RRAM based in-memory adders is their latency – numerous cycles of Boolean operations (NAND, NOR, IMPLY) are needed to perform addition, when compared to CMOS.
  • The time-based SA of [17] could sense the BL voltage without an op-amp, and, this was an important reason for adopting it for their majority gate (conventional SAs use operational amplifier, which consume huge silicon area).
  • It must be noted that this area estimate does not include the area of the delay element since it is shared by all the SA in the array.
  • (tdelay in Fig.3 is implemented as series of inverters with MOS capacitive load between them).

D. Energy for in-memory operations

  • To assess the energy required for computation, the authors first calculate the energy required for each logic operation.
  • The authors calculate the energy for a single majority operation, as EMAJ = VDD ∫ tREAD 0 IREAD · dt+ VDD ∫ tREAD 0 ISA · dt (1) where IREAD is the current injected into the 1T–1R cell (see Fig. 3), ISA is the current consumed by the time-based SA and tREAD is the READ cycle duration.
  • The energy for a single majority operation, EMAJ = 1.98 pJ.
  • The energy for the NOT operation is the same as the energy to read a single bit, and it was calculated to be ENOT = 1.24 pJ.
  • ENOT is smaller than EMAJ because, IREAD was smaller (22 µA) for NOT and READ, where a single bit is read.

A. Parallel-prefix adder using majority logic

  • Parallel-prefix (PP) adders are a family of adders originally proposed to overcome the latency incurred by the rippling of carry in CMOS-based adders.
  • The regular structure of the memory array and the proposed parallel-friendly majority gate can be combined to implement PP adders in the memory array.
  • The ‘carry-generate block’ can generate the carry ‘ahead’ and is known to reduce the latency to O(log n), for n-bit adders.
  • Kogge-Stone, LadnerFischer, Brent-Kung and the like, are examples of PP adders.
  • For an eight-bit adder, the logical depth is six levels of majority gates and one level of NOT gates, and at most eight gates are needed simultaneously in each level.

B. Mapping of the eight-bit LF adder to 1T–1R array

  • The authors map the eight-bit Ladner-Fischer adder structure of Fig. 8 to a 1T–1R array, using the proposed logic family, and elaborate the sequence of operations.
  • To minimize latency, the authors map the adder in a way such that all the majority gates in a logic level (see Fig. 8) are executed simultaneously in a READ operation (see Fig. 9).
  • In a 1T–1R array, HRS→ LRS transition (SET process when the conductive filament is created) is accomplished by applying two pulses simultaneously to the WL and BL, while SL is grounded.
  • In the Table III, the authors have not compared the energy for computation since they are either not reported [2] or reported for another RRAM technology [22].
  • Each step has one or more OR/AND operation [23].

V. CONCLUSION

  • A memristive logic family formulates a functionally complete Boolean logic with a memristive device (RRAM/PCM/STT-MRAM) as the primary switching device.
  • The proposed method of implementing a majority and NOT gate in a 1T–1R array forms a new memristive logic family.
  • The majority gate can be implemented in a 1T–1R array without necessitating any major change in the peripheral circuit (except the row decoder which needs to be modified to activate three rows simultaneously).
  • Majority logic can be combined with parallel-prefix techniques to design fast adders, and the proposed gate can be used to implement them in memory arrays, with minimum latency.

Did you find this useful? Give us your feedback

Figures (12)

Content maybe subject to copyright    Report

A Parallel-friendly Majority Gate to Accelerate
In-memory Computation
John Reuben
Chair of Computer Science 3 - Hardware Architecture
Friedrich-Alexander-Universit
¨
at Erlangen-N
¨
urnberg (FAU)
91058 Erlangen, Germany
johnreuben.prabahar@fau.de
Stefan Pechmann
Chair of Communications Electronics
Universit
¨
at Bayreuth
95447 Bayreuth, Germany
stefan.pechmann@uni-bayreuth.de
Abstract—Efforts to combat the ‘von Neumann bottleneck’
have been strengthened by Resistive RAMs (RRAMs), which
enable computation in the memory array. Majority logic can
accelerate computation when compared to NAND/NOR/IMPLY
logic due to it’s expressive power. In this work, we propose a
method to compute majority while reading from a transistor-
accessed RRAM array. The proposed gate was verified by sim-
ulations using a physics-based model (for RRAM) and industry
standard model (for CMOS sense amplifier) and, found to tolerate
reasonable variations in the RRAMs’ resistive states. Together
with NOT gate, which is also implemented in-memory, the pro-
posed gate forms a functionally complete Boolean logic, capable
of implementing any digital logic. Computing is simplified to a
sequence of READ and WRITE operations and does not require
any major modifications to the peripheral circuitry of the array.
The parallel-friendly nature of the proposed gate is exploited to
implement an eight-bit parallel-prefix adder in memory array.
The proposed in-memory adder could achieve a latency reduction
of 70% and 50% when compared to IMPLY and NAND/NOR
logic-based adders, respectively.
Index Terms—Resistive RAM (RRAM), majority logic, major-
ity gate, memristor, 1 Transistor-1 Resistor(1T–1R), von Neu-
mann bottleneck, in-memory computing, compute-in-memory,
processing-in-memory, parallel-prefix adder
I. INTRODUCTION
T
HE movement of data between processing and memory
units in present day computing systems is their main
performance and energy-efficiency bottleneck, often referred
to as the ‘von Neumann bottleneck’ or ‘memory wall’. The
emergence of non-volatile memory technologies like Resistive
RAM (RRAM) has created opportunities to overcome the
memory wall by enabling computing at the residence of data.
RRAMs are two terminal devices (usually a Metal-Insulator-
Metal structure) capable of storing data as resistance. The
change of resistance is due to the formation or rupture of a
conductive filament, depending on the direction of the current
flow through the structure. The word ‘memristor’ is also used
by researchers to denote such a device, because it is essentially
a resistor with memory. Connecting such RRAM devices in
a certain manner, or by applying certain voltage patterns,
or by modifying the sensing circuitry, basic Boolean gates
(NOR, NAND, XOR, IMPLY logic) have been demonstrated
in RRAM arrays [1]–[6]. The motivation for such efforts is
to perform Boolean operations on data stored in the memory
array, without moving them out to a separate processing
circuit, thus mitigating the von Neumann bottleneck. Reviews
of such in-memory computing approaches are presented in
[7], [8]. To construct a memory array using such devices, two
configurations are common: 1Transistor–1Resistor (1T–1R)
and 1Selector–1Resistor (1S–1R). The 1T–1R configuration
uses a transistor as an access device for each cell, isolating
the accessed cell from its neighbours in the array. The 1S–1R
configuration uses a two-terminal device called a ‘selector’
which is fabricated in series with the memristive device.
The 1S–1R is area-efficient, but suffers from current leakage
(sneak–path problem) due to the inability to access a particular
cell without interfering with its neighbours [9].
Majority logic, a type of Boolean logic, is defined to be
true if more than half of the n inputs are true, where n is
odd. Hence, a majority gate is a democratic gate and can be
expressed in terms of Boolean AND/OR as M AJ(a, b, c) =
a.b + b.c + a.c, where a, b, c are Boolean variables. Although
majority logic was known since 1960, there has been a
revival in using it for computation in many emerging nan-
otechnologies (spin waves, magnetic Quantum-Dot cellular
automata, nano magnetic logic, Single Electron Tunneling).
Recent research [10]–[12] has confirmed that majority logic is
to be preferred not only because a particular nanotechnology
can realize it, but also because of its ability to implement
arithmetic-intensive circuits with less gates. It must be em-
phasized that majority logic did not become the dominant
logic to compute because it was more efficient to implement
NAND/NOR gate than a majority gate, in CMOS technology.
However, with many emerging nanotechnologies, this is not
the case anymore, therefore, majority logic needs to be re-
evaluated for its computing efficiency. In [13]–[15], majority
logic is implemented in RRAM by applying the two inputs of
the majority gate as voltages across its terminals, and the initial
state of the RRAM (which is also the third input) switches to
evaluate majority. Such an approach complicates the peripheral
circuitry and is also not parallel-friendly, because two of the
three inputs of a majority gate need to be applied as voltages
at wordline/bitline (see Fig.1(a)).
In this paper, we propose a majority gate whose structure
is conducive for parallel-processing in the memory array.
By activating three rows of the array simultaneously, the
This is author’s version of the accepted paper. For the published paper, see the 31st IEEE International Conference on
Application-specific Systems, Architectures and Processors (ASAP) proceedings in https://ieeexplore.ieee.org/
See Conference presentation (20 min video) at https://asap2020.cs.manchester.ac.uk/paper.php?id=72
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any
current or future media, including reprinting/republishing this material for advertising or promotional purposes,
creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component
of this work in other works.

WL
1
BL
1
SL
1
WL
2
WL
3
BL
2
BL
3
SL
2
SL
3
A
B
C
M
3
(A,B,C)
EN
Q
SA
I
READ
WL
1
BL
1
WL
3
M
3
(A,B,C)
A B C
B
BL
3
M
3
(A,B,C)
A B C
A
B
C
C
A
(a) (b)
A B C
D E F
G H K
(c)
Peripheral ckt.
Mapping for (a)
Mapping for (b)
Fig. 1: (a) In-memory majority gate of previous works [13]–[15]
(b) Proposed parallel-friendly gate (c) When multiple gates have
to be executed in parallel, the majority gates of previous works
[13]–[15] have to be mapped diagonally because two gates cannot
be executed in the same row/column. This manner of computation
complicates both the peripheral circuitry and memory controller
(inputs of the gates influence row/column decoding). In the proposed
method, multiple gates can be mapped to the same set of rows,
thereby simplifying the peripheral and the memory controller (inputs
of the gates are resistance of memory cells and row/column decoders
retain their functionality as in a conventional memory).
resistance of the RRAM cells in a column are in parallel
during the READ operation. A Sense Amplifier (SA) which
can accurately sense the effective resistance implements a ‘in-
memory’ majority gate. This manner of computing majority
enables parallelism and is energy-efficient (both reading and
writing is energy-efficient in 1T–1R when compared to 1S–
1R arrays due to the absence of sneak paths). To demonstrate
the potential of this method to accelerate computation, we
consider a parallel-prefix adder and formulate the steps to
perform eight-bit addition in a 1T–1R array. The remainder
of the paper is organized as follows. Section II-A presents the
principle of reading majority from a 1T–1R array. Since the
read operation is the crucial aspect of the proposed majority
gate, we present the detailed sensing methodology in Section
II-B. Further, we study tolerance to variations in resistive
states by performing Monte Carlo simulations. In Section
III we present the framework to compute in the memory
array, using the proposed majority gate. Section IV-A briefly
presents parallel-prefix technique and the structure of an eight-
bit parallel-prefix adder in terms of majority gates. The adder
is then mapped to a 1T–1R array using the proposed in-
memory computing technique, in Section IV-B. We compare
the proposed eight-bit adder with the state-of-the-art, followed
by conclusions in Section V.
II. MAJORITY GATE IN 1T–1R ARRAY
A. Majority gate: Operating principle
Consider an array of RRAM cells arranged in a 1T-1R
configuration, as depicted in Fig. 2. Each cell can be in-
dividually read/written into by activating the corresponding
wordline (W L) and applying appropriate voltage across the
cell (BL and SL). To read from a cell, the corresponding
W L is activated, a small current is injected into the cell and
the voltage across the cell is sensed in a voltage-mode SA i.e.
WL
1
BL
1
SL
1
BL
64
SL
64
WL
64
WL
2
WL
3
BL
2
BL
3
SL
2
SL
3
R
A
WL
D
S
BL
SL
RRAM
R
eff
= R
A
|| R
B
|| R
C
R
B
R
C
WL
4
WL
5
Fig. 2: When three rows are activated (W L
13
) simultaneously
in a 1T-1R array, the resistances of the three RRAM devices are
in parallel. An ‘in-memory’ majority gate can be implemented by
accurately sensing the effective resistance R
ef f
.
the BL voltage is sensed while the SL is grounded. Now, if
three rows are activated simultaneously during read operation
(Rows 1 to 3 in Fig. 2), the resistances in column 1 are in
parallel (neglecting the parasitic resistance of BL and SL).
During read, the access transistor will be in linear region, and
hence the transistor’s resistance will be
r
DS
=
1
µ
n
C
ox
(
W
L
)(V
GS
V
t
)
= 544 [16]. The effective
resistance between BL and SL will therefore be R
eff
=
(R
A
+ r
DS
)||(R
B
+ r
DS
)||(R
C
+ r
DS
) (R
A
||R
B
||R
C
),
if the drain-to-source resistance of transistor (r
DS
) is small
compared to LRS. Table I lists the truth table of 3-input major-
ity gate (M
3
(A, B, C)) and the effective resistance for all the
eight possibilities. To verify the proposed gate on a real RRAM
device, we choose the 1T-1R cell from IHP
1
. The 1T–1R
structure consists of a NMOS transistor manufactured in IHP’s
130 nm CMOS technology, whose drain is connected in series
to the RRAM. The RRAM is a T iN/Hf
1x
Al
x
O
y
/T i/T iN
stack integrated between Metal2 and Metal3 in the BEOL of
the CMOS process. IHP’s 1T–1R cells were modeled using
the Stanford-PKU RRAM model following the methodology
presented in [16]. The cells have a mean LRS and HRS
of 10 K and 133.3 K, respectively. Therefore, the R
eff
is 8.7 K when two or more cells are in HRS (shaded
grey in Table I) and 4.8 K when two or more cells are
in LRS. Consequently, a majority gate can be implemented
during a READ operation by precisely sensing R
eff
. As can
be deciphered from Table I, the crucial aspect of the proposed
gate is to be able to differentiate between R
001
eff
(two LRS and
one HRS) and R
110
eff
(two HRS and one LRS). Let’s denote
the resistance to be differentiated as sensing window,
Sensing window for majority = 8.7 K 4.8 K = 3.9 K
1
Innovations for High Performance Microelectronics– Leibniz-Institut f
¨
ur
innovative Mikroelektronik, Germany

for IHP’s cell (resistance window = 13.3).
TABLE I: Precisely sensing R
eff
results in majority: Logic
‘0’ is LRS (10 K) and logic ‘1’ is HRS (133.3 K)
A B C M
3
(A, B, C) R
ef f
R
ef f
0 0 0 0
LRS
3
3.3 K
0 0 1 0
HRS·LRS
LRS+2·HRS
4.8 K
0 1 0 0
HRS·LRS
LRS+2·HRS
4.8 K
0 1 1 1
HRS·LRS
HRS+2·LRS
8.7 K
1 0 0 0
HRS·LRS
LRS+2·HRS
4.8 K
1 0 1 1
HRS·LRS
HRS+2·LRS
8.7 K
1 1 0 1
HRS·LRS
HRS+2·LRS
8.7 K
1 1 1 1
HRS
3
44.4 K
B. Sensing methodology
As stated, the methodology to reliably translate R
eff
into
a CMOS-compatible voltage is the crucial aspect of the
proposed majority gate. R
001
eff
is 4.8 K and R
110
eff
is 8.7 K,
and differentiating such a resistance window ( 3.9K) needs
a robust SA. It must be noted that this will be exacerbated by
the variability exhibited by the RRAM devices. To meet this
requirement, a time-based SA recently proposed in [17] was
chosen. Different from conventional sensing schemes (voltage-
mode and current-mode), the time-based sensing scheme con-
verts the BL voltage (to be sensed) into a time delay and dis-
criminates in time-domain. This sensing scheme was originally
proposed to read data from STT-MRAM [17], which have a
resistance window of a few K. Therefore, it is ideal for the
proposed majority gate. Furthermore, this time-based sensing
achieves two to three orders of magnitude improvement in
sensing (BER) compared to conventional schemes, in addition
to being reference-less [17].
The time-based sensing circuit is essentially a voltage-to-
time converter followed by a time-domain comparator (D-flip
flop). Voltage-to-time conversion is achieved by the current-
starved inverter (transistors M
15
) followed by transistor M
6
and an inverter (Fig. 3). During READ, a current I
READ
is
injected into the 1T-1R cell (corresponding three W Ls are
activated and SL is grounded). Depending on the effective
resistance R
eff
, the BL reaches an appropriate voltage. In
the conceptual waveforms of Fig.3, it is assumed that BL
gets charged to 300 mV if R
eff
is a high resistance (8.7 K)
and 200 mV if R
eff
is a low resistance (4.8 K), for the
purpose of illustration. Such a V
BL
(few hundred mV) limits
the current flow through the inverter (transistor M
13
), hence
the name current-starved inverter. When EN goes high, the
current-starved inverter introduces a delay proportional to V
BL
i.e. a higher V
BL
incurs less delay. A V
BL
of 300 mV incurs
less delay and low-to-high transition of EN reaches the input
of the Flip-flop (I
F F
) faster i.e. at T
HRS
. For a lower V
BL
of 200 mV, the delay is greater and the low-to-high transition
I
READ
1T1R array
EN
BL
WL
SL
D
out
Time-Based
Sense Amp.
EN
M
1-3
M
4
M
5
M
6
D
Q
Q
V
BL
V
BL
t
delay
D
out
EN
delay
I
FF
EN
delay
I
FF
I
FF
D
out
HRS (V
BL
=300 mV)
LRS (V
BL
=200 mV)
= 1 if HRS
= 0 if LRS
EN
V
BL
200 or 300 mV
I
READ
= 35 uA
D
out
current-starved
T
DM
T
HRS
T
LRS
Fig. 3: A small current I
READ
injected into the cell converts the
resistance to a voltage which is fed to the time-based SA. A current-
starved inverter transforms this voltage into a proportional delay
which is sensed as a CMOS-compatible voltage by the D-FF [17].
occurs at T
LRS
. t
delay
is a chain of inverters programmed
to introduce a delay between T
HRS
and T
LRS
. EN
delay
, the
EN signal delayed by t
delay
acts as the edge trigger for the
D-FF. When EN
delay
goes high at T
DM
(Decision Moment),
it latches the signal at I
F F
and hence the D
out
is high for
high resistance (R
110
eff
= 8.7 K) and low for low resistance
(R
001
eff
= 4.8 K). It must be noted that for R
111
eff
= 44.4 K,
V
BL
will be much larger than 300 mV and will result in a
transition much before T
HRS
. Similarly, for R
000
eff
= 3.3 K,
V
BL
will be less than 200 mV and will result in a transition
much later than T
LRS
. Once designed to differentiate between
R
110
eff
and R
001
eff
, the time-based SA will output M
3
(A, B, C)
correctly for all the eight cases. Furthermore, the same SA can
be used to read a single bit by using a smaller I
READ
(and
activating a single W L during normal read operation). Hence
the proposed gate does not necessitate any modification to the
read-out circuit of the regular memory array.
The time-based sensing circuit of Fig. 3 was designed in
IHP’s 130 nm CMOS process, and simulated to verify the
functioning of the majority gate. I
READ
of 35 µA was injected
into the 1T-1R cell to sense the BL voltage. For R
001
eff
and
R
110
eff
, V
BL
was 282 mV and 410 mV, respectively. Since
the current-starved transistors M
13
are the crucial factor in
deciding the delay, they were made large (
W
L
=
1.5µm
0.39µm
) to
make the circuit less sensitive to CMOS process variations.
t
delay
was set to 3 ns using a chain of inverters with MOS
capacitive loads between them. RRAM cells exhibit variability
in their programmed resistive states cycle-to-cycle and device-
to-device [18]. Therefore the majority gate was evaluated by
taking RRAM variations into account. Since majority is com-
puted while reading (memory cell is not switched), the RRAM
was replaced with a resistor and variability was incorporated as
a Gaussian distribution in that resistor. The impact of process
variations was analysed using the statistical model files for
the CMOS transistors provided by the foundry. 2000 Monte

Fig. 4: Sample output of the time-based SA. At 13.5 ns, the EN
delay
goes high deciding the output. Only 100 MC simulations are plotted
(shaded light) with single typical case highlighted dark.
Carlo simulations were performed where the resistance of the
RRAM was Gaussian distributed with a standard deviation, σ
= 10% of mean RRAM resistance i.e σ
LRS
= 1 K and σ
HRS
= 13.33 K. With combined effects of RRAM variability and
process variability (in transistors of SA), the Bit Error Rate
(BER) was found to be 5.4%. Sample wave-forms are plotted
in Fig. 4. Further failure analysis of the majority gate (incorrect
sensing of R
001
eff
and R
110
eff
) revealed that it occurred only when
RRAM variability was more than 2σ from mean LRS/HRS (It
must be noted that 95% of resistances fall within 2σ from the
mean, in a Gaussian distribution).
III. FRAMEWORK TO COMPUTE IN 1T1R ARRAY
A. Functional completeness and memory controller
As shown in Fig. 5-(a), NOT operation can be implemented
in a 1T–1R array by simply latching
Q from the output of the
time-based SA during READ (D-Flip flop of Fig.3 outputs
Q and Q). This is accomplished by using a control signal
INV which is low during READ and majority operation (Q
is latched) and goes high only during NOT operation (Q is
latched). Majority together with NOT is functionally complete
i.e any Boolean logic can be expressed in terms of majority
and NOT gates [19]. In [19], the authors present Majority-
Inverter Graph (MIG), a new logic manipulation structure
consisting of three-input majority nodes and regular/inverted
edges. Fig.5-(b) is the MIG of a 1-bit full adder obtained by
MIGhty (MIG synthesis tool) and, any Boolean logic can be
synthesised in terms of majority and NOT gates in a similar
manner. Since both majority and NOT gates are implemented
EN
Q
Q
SL
SA
WL
BL
READ
NOT gate
Majority gate
A
A
Maj(A,B,C)
0
1
A
INV
I
READ
EN
Q
Q
SL
SA
WL
BL
A
0
1
INV
I
READ
EN
Q
Q
SL
SA
WL
BL
B
0
1
INV
I
READ
A
C
M
3
A
B
C
in
M
3
A
B
C
in
M
3
S (sum)
C
out
C
in
Memory READ
& memory WRITE
RRAM memory
array
Peripheral ckt.
Peripheral ckt.
P
e
r
i
p
h
e
r
a
l
c
k
t
.
M
e
m
o
r
y
c
o
n
t
r
o
l
l
e
r
Control signals & data
READ
WRITE
MIGhty
S =ABC
in
C
out
= AB+BC
in
+AC
in
(a)
(b)
(c)
Fig. 5: (a) NOT operation implemented with a 2:1 Mux at the
output of the time-based SA; all logic operations are essentially
READ operations (b) 1-bit full adder expressed as Majority-Inverter-
Graph using MIGhty synthesis tool [19], where M
3
represents 3-
input majority operation (c) With majority/NOT gate computed as
READ, multiple levels of logic can be executed by writing the data
back to the memory, simplifying computing to READ and WRITE
operations.
as READ, multiple levels of gates can be cascaded by writing
the read data back to the array. In essence, ‘computing’ is
simplified to a sequence of READ and WRITE operations,
orchestrated by the memory controller, as depicted in Fig.5-
(c).
The memory controller of a regular memory (be it DRAM-
based or NVM-based) is responsible for orchestrating the
READ and WRITE operation by issuing the control signals to
the peripheral circuitry of the array. In addition, the memory
controller must be augmented with additional capability to
execute majority and NOT operation. Since both majority and
NOT operations are READ operations in this logic family, the
controller does not require any major alterations. To execute a
majority operation, an additional control signal called M AJ
is needed, which is set to logic ‘1’ during majority operation
2
and, the address of the first row (out of three rows in which
majority is to be performed) is placed on the row decoder.
It must be noted that majority operation is executed on three
contiguous bits of data in a column and the triple row decoder
of section III-B will not only select the row corresponding
to the address placed on the row decoder, but also the next
two rows if MAJ is ‘1’. The column address is placed on
the column decoder to select the particular column in which
majority is executed and the SA is activated to get the output.
The NOT operation is the same as the READ operation with
the only exception being the controller issues the control signal
INV , which goes high to invert the read data at the output of
2
this signal acts as an additional input to the row decoder, Fig. 6

the SA (Fig. 5-(a)). The control signals activated during logic
operations are summarized in Table II.
TABLE II: Control signals for memory and logic operations
Operation WL BL SL EN(SA) IN V MAJ
READ single row
activated
to read
ckt.
grounded 1 0 0
NOT single row
activated
to read
ckt.
grounded 1 1 0
Majority three rows
activated
to read
ckt.
grounded 1 0 1
WRITE
‘0’
single row
activated
V
SET
grounded 0 0 0
WRITE
‘1’
single row
activated
grounded V
RESET
0 0 0
B. Triple-row decoder design
WL
0
WL
1
WL
2
WL
3
2:4 Dynamic Decoder
EN
D
0
D
0
D
1
D
1
Ф
EN
0
D
1
D
0
EN
1
D
3
D
2
EN
2
D
4
D
5
EN
3
D
6
D
7
WL
0
WL
1
WL
2
WL
3
WL
4
WL
5
WL
6
WL
7
WL
8
WL
9
WL
10
WL
11
WL
12
WL
13
WL
14
WL
15
A
D
D
R
E
S
S
T
R
A
N
S
L
A
T
O
R
L
O
G
I
C
A
3
A
2
A
1
A
0
MAJ
Ф
A
1
A
0
MAJ
EN
3
EN
2
EN
1
EN
0
A
2
A
3
D
2
D
3
D
1
D
0
D
7
,
D
5
D
6
,
D
4
Fig. 6: Triple-row decoding is achieved by interleaving mul-
tiple single-row decoders. When control signal MAJ is logic
‘0’ (READ/WRITE/NOT), W L
i
corresponding to row address
A
3
A
2
A
1
A
0
is selected. When M AJ is logic ‘1’ (majority),
W L
i
, W L
i+1
, W L
i+2
are selected.
A conventional decoder for a 1T–1R array can select one
row at a time, while the proposed majority gate needs three
rows to be selected simultaneously. Moreover, the row-decoder
must be versatile to switch between single-row activation and
triple-row activation seamlessly. This is because, as stated
in the previous section, one must be able to read/write a
single bit of the array (READ/WRITE/NOT) as well as read
three bits in a column (majority). To this end, we propose a
robust row decoder which is designed by interleaving multiple
single-row decoders. As depicted in Fig.6, a 4:16 triple-row
decoder can be designed by interleaving four 2:4 dynamic
NAND decoders
3
. Since single-row decoding must co-exist
with triple-row decoding, an address translator circuit is used
to switch between the two modes using MAJ as a control
3
a dynamic decoder uses a precharge signal φ, which when low, all W L
are driven to ‘0’. When φ goes high, W L
i
corresponding to D
1
D
0
goes
high, provided EN is ‘1’
signal. For example, to select a single row W L
5
, the address
is A
3
A
2
A
1
A
0
= ‘0101’ and MAJ = ‘0’. For these inputs,
the address translator outputs EN
3
EN
2
EN
1
EN
0
= ‘0010’
and D
7
D
6
D
5
D
4
D
3
D
2
D
1
D
0
= ‘XXXX01XX’ (green decoder
in Fig. 6 is enabled and it’s second row is selected, thereby
activating W L
5
). But, for the same row address A
3
A
2
A
1
A
0
= ‘0101’ and MAJ = ‘1’, the address translator outputs
EN
3
EN
2
EN
1
EN
0
= ‘1110’ and D
7
D
6
D
5
D
4
D
3
D
2
D
1
D
0
=
‘010101XX’ (blue, red and green decoders are enabled and
second row of each of them is selected, thereby activating
W L
5
, W L
6
and W L
7
). The address translator inputs MAJ
and A
3
A
2
A
1
A
0
and generates D
7
D
6
D
5
D
4
D
3
D
2
D
1
D
0
and
EN
3
EN
2
EN
1
EN
0
to achieve this desired functionality for
all the 16 cases. With the address translator logic (88 tran-
sistors), the triple-row decoder requires 200 transistors, while
a regular 4:16 dynamic decoder (only single row activation)
requires 136 transistors, a 47% increase in the row-decoder
area. The address translator does not add any significant
latency to the decoding process. The decoder was designed
in 130 nm IHP process and its functionality was verified and
decoding latency was found to be 496 ps.
C. Area of time-based Sense Amplifier
Fig. 7: Layout of time-based SA.
In this work, the primary motivation for pioneering a
parallel-friendly gate was to exploit it to accelerate addition, by
executing gates in parallel. It must be emphasized that the main
drawback of RRAM based in-memory adders is their latency
numerous cycles of Boolean operations (NAND, NOR,
IMPLY) are needed to perform addition, when compared to
CMOS. To evaluate the number of gates that can be executed
in parallel, we evaluated the area of the time-based SA. The
time-based SA of [17] could sense the BL voltage without an
op-amp, and, this was an important reason for adopting it for
our majority gate (conventional SAs use operational amplifier,
which consume huge silicon area). The layout of the time-
based SA of Fig.3 is drawn in Fig.7 and occupies an area of
20 × 3 = 60 µm
2
. It must be noted that this area estimate does
not include the area of the delay element since it is shared by
all the SA in the array. (t
delay
in Fig.3 is implemented as series
of inverters with MOS capacitive load between them). From
[20], the layout of a single 1T–1R cell occupies 450 nm ×
450 nm = 0.2 µm
2
in 130 nm (12.4 F
2
). If the SA is stacked
along its height of 3 µm, eight columns can share a SA. This
means that the number of majority gates that can be executed
in parallel in an array is the number of columns divided by a
factor of 8 i.e. 32 gates can be executed simultaneously in a
256×256 array, 8 gates in a 64×64 array etc.

Citations
More filters
Journal ArticleDOI
TL;DR: In this review, memristive logic families which can implement MAJORITY gate and NOT are to be favored for in-memory computing, and one-bit full adders implemented in memory array using different logic primitives are compared and the efficiency of majority-based implementation is underscores.
Abstract: As we approach the end of Moore’s law, many alternative devices are being explored to satisfy the performance requirements of modern integrated circuits. At the same time, the movement of data between processing and memory units in contemporary computing systems (‘von Neumann bottleneck’ or ‘memory wall’) necessitates a paradigm shift in the way data is processed. Emerging resistance switching memories (memristors) show promising signs to overcome the ‘memory wall’ by enabling computation in the memory array. Majority logic is a type of Boolean logic which has been found to be an efficient logic primitive due to its expressive power. In this review, the efficiency of majority logic is analyzed from the perspective of in-memory computing. Recently reported methods to implement majority gate in Resistive RAM array are reviewed and compared. Conventional CMOS implementation accommodated heterogeneity of logic gates (NAND, NOR, XOR) while in-memory implementation usually accommodates homogeneity of gates (only IMPLY or only NAND or only MAJORITY). In view of this, memristive logic families which can implement MAJORITY gate and NOT (to make it functionally complete) are to be favored for in-memory computing. One-bit full adders implemented in memory array using different logic primitives are compared and the efficiency of majority-based implementation is underscored. To investigate if the efficiency of majority-based implementation extends to n-bit adders, eight-bit adders implemented in memory array using different logic primitives are compared. Parallel-prefix adders implemented in majority logic can reduce latency of in-memory adders by 50–70% when compared to IMPLY, NAND, NOR and other similar logic primitives.

22 citations


Cites background or methods from "A Parallel-friendly Majority Gate t..."

  • ...An eight-bit parallel-prefix adder in majority logic could achieve a latency of 19 steps [46]....

    [...]

  • ...In contrast, in the R–V implementation [45,46], the row/column decoders retain their functionality as in a conventional memory, with a minor modification (the row decoder must be enhanced to select three rows during majority operation, which can be achieved by interleaving decoders [46])....

    [...]

  • ...(a) In-memory majority gate proposed in [45,46]: When three rows are activated (WL1−3) simultaneously in a 1T-1R array, the three resistances RA, RB, RC will be in parallel (Inputs of the majority gate A, B, C are represented as resistances RA, RB, RC)....

    [...]

  • ...Using majority logic, an 8-bit PP adder is implemented in memory in [46]....

    [...]

  • ...Furthermore, the R–V implementation [45,46] is conducive for parallel-processing since multiple gates can be mapped to the same set of rows, as illustrated in in Figure 4....

    [...]

Journal ArticleDOI
TL;DR: A method to implement a majority gate in a transistor-accessed ReRAM array during the READ operation, which forms a functionally complete Boolean logic, capable of implementing any digital logic.
Abstract: To overcome the “von Neumann bottleneck,” methods to compute in memory are being researched in many emerging memory technologies, including resistive RAMs (ReRAMs). Majority logic is efficient for synthesizing arithmetic circuits when compared to NAND/NOR/IMPLY logic. In this work, we propose a method to implement a majority gate in a transistor-accessed ReRAM array during the READ operation. Together with NOT gate, which is also implemented in memory, the proposed gate forms a functionally complete Boolean logic, capable of implementing any digital logic. Computing is simplified to a sequence of READ and WRITE operations and does not require any major modifications to the peripheral circuitry of the array. While many methods have been proposed recently to implement the Boolean logic in memory, the latency of in-memory adders implemented as a sequence of such Boolean operations is exorbitant ( ${O}$ ( ${n}$ )). Parallel-prefix (PP) adders use prefix computation to accelerate addition in conventional CMOS-based adders. By exploiting the parallel-friendly nature of the proposed majority gate and the regular structure of the memory array, it is demonstrated how PP adders can be implemented in memory in ${O}$ (log( ${n}$ )) latency. The proposed in-memory addition technique incurs a latency of $4\cdot $ log( ${n}$ )+6 for $n$ -bit addition and is energy-efficient due to the absence of sneak currents in 1Transistor–1Resistor configuration.

22 citations

Journal ArticleDOI
TL;DR: The measurement results prove the functionality of the read circuit and the programming system and demonstrate that the read system can distinguish up to eight different states with an overall resistance ratio of 7.9.
Abstract: In this work, we present an integrated read and programming circuit for Resistive Random Access Memory (RRAM) cells. Since there are a lot of different RRAM technologies in research and the process variations of this new memory technology often spread over a wide range of electrical properties, the proposed circuit focuses on versatility in order to be adaptable to different cell properties. The circuit is suitable for both read and programming operations based on voltage pulses of flexible length and height. The implemented read method is based on evaluating the voltage drop over a measurement resistor and can distinguish up to eight different states, which are coded in binary, thereby realizing a digitization of the analog memory value. The circuit was fabricated in the 130 nm CMOS process line of IHP. The simulations were done using a physics-based, multi-level RRAM model. The measurement results prove the functionality of the read circuit and the programming system and demonstrate that the read system can distinguish up to eight different states with an overall resistance ratio of 7.9.

9 citations


Cites background from "A Parallel-friendly Majority Gate t..."

  • ...Those last two applications can also be combined to realize in-memory computing, one prominent way to overcome the von Neumann bottleneck, one of the major challenges for further improvements of modern computing systems [5]....

    [...]

  • ...in-memorycomputing Non-Volatile Logic [5]...

    [...]

Proceedings ArticleDOI
01 Sep 2020
TL;DR: This paper presents the new concept and simulation results characterising the functionality for the new memristive ternary MLC for a ReRAM technology from Innovations for High Performance Microelectronics (IHP).
Abstract: In this paper we present a new procedure for a direct state transfer in ReRAM based multi-level cell (MLC) memristors for future ternary data processing, i.e. the direct transitioning of one ternary MLC state to another state. According to the rules of a ternary stored-transfer-adder cell the content of two memristors storing three different resistance values are read out and processed by a sense amplifier to produce a new ternary state for two output memristors. In contrast to own older work the analogue-digital-converting of ternary MLC based memristors with subsequent digital processing is avoided what requires a comparatively high energy budget. The solution is based on an adapted version of an existing sense amplifier circuit realising an in-memory processing for a majority logic developed by our own. We present the new concept and simulation results characterising the functionality for the new memristive ternary MLC for a ReRAM technology from Innovations for High Performance Microelectronics (IHP).

4 citations


Cites background or methods from "A Parallel-friendly Majority Gate t..."

  • ...A robust decoder designed by interleaving multiple decoders can achieve this functionality, as presented in [7], [8]....

    [...]

  • ...Our solution we found is based on a procedure that was used in implementing majority gate logic [6] by modifying the processing in sense amplifiers of crossbar memristor based structures [7], [8]....

    [...]

  • ...This in-memory majority gate, published in our earlier works [7], [8] exploits the 1T–1R structure of the memristive array to compute majority....

    [...]

Journal ArticleDOI
TL;DR: In this article , the Wallace tree multiplier is used to implement the addition operation in each phase of the Wallace Tree and a high degree of gate-level parallelism is employed at the array level by executing multiple majority gates in the columns of the array.
Abstract: In-memory computing using emerging technologies such as resistive random-access memory (ReRAM) addresses the ‘von Neumann bottleneck’ and strengthens the present research impetus to overcome the memory wall. While many methods have been recently proposed to implement Boolean logic in memory, the latency of arithmetic circuits (adders and consequently multipliers) implemented as a sequence of such Boolean operations increases greatly with bit-width. Existing in-memory multipliers require $O(n^{2})$ cycles which is inefficient both in terms of latency and energy. In this work, we tackle this exorbitant latency by adopting Wallace Tree multiplier architecture and optimizing the addition operation in each phase of the Wallace Tree. Majority logic primitive was used for addition since it is better than NAND/NOR/IMPLY primitives. Furthermore, high degree of gate-level parallelism is employed at the array level by executing multiple majority gates in the columns of the array. In this manner, an in-memory multiplier of $O(n.log(n))$ latency is achieved which outperforms all reported in-memory multipliers. Furthermore, the proposed multiplier can be implemented in a regular transistor-accessed memory array without any major modifications to its peripheral circuitry and is also energy-efficient.

3 citations

References
More filters
Journal ArticleDOI
TL;DR: This letter systematically analyzed the four-variable logic method and map it into the operation of two anti-serial complementary memristors in the crossbar array architecture to develop a parallel 1-bit full adder.
Abstract: In-memory computing based on memristive logic is considered as a prospective non von Neumann computing paradigm. In this letter, we systematically analyze the four-variable logic method and map it into the operation of two anti-serial complementary memristors in the crossbar array architecture. Arbitrary Boolean logic can be implemented within three cycles with the experimental evidence of reconfigurable NAND, NOR, and XOR logic using Pt/HfO2/TiN devices. Taking advantage of the functional flexibility, a parallel 1-bit full adder that can be realized in 8 cycles within a $\textsf {4}\times \textsf {3}$ array has been designed and verified in simulation.

42 citations

Journal ArticleDOI
TL;DR: Multi-input memristive switch logic is proposed, which enables the function X OR (Y NOR Z) to be performed in a single-step with three Memristive switches, improving the overall system efficiency of a memristives switch-based computing architecture.
Abstract: Memristive switches are able to act as both storage and computing elements, which make them an excellent candidate for beyond-CMOS computing. In this paper, multi-input memristive switch logic is proposed, which enables the function X OR (Y NOR Z) to be performed in a single-step with three memristive switches. This ORNOR logic gate increases the capabilities of memristive switches, improving the overall system efficiency of a memristive switch-based computing architecture. Additionally, a computing system architecture and clocking scheme are proposed to further utilize memristive switching for computation. The system architecture is based on a design where multiple computational function blocks are interconnected and controlled by a master clock that synchronizes system data processing and transfer. The clocking steps to perform a full adder with the ORNOR gate are presented along with simulation results using a physics-based model. The full adder function block is integrated into the system architecture to realize a 64-bit full adder, which is also demonstrated through simulation.

38 citations


"A Parallel-friendly Majority Gate t..." refers background in this paper

  • ...1 (a)) or READ [22] OR/AND 1S-1R 37 steps 64 cells Each step has one or more OR/AND operation [23] ORNOR 1S-1R 31 steps 54 cells Each step has one or more ORNOR/IMPLY [24] Majority+NOT 1T-1R 19 steps 5×65 Each step is majority/NOT or WRITE (this work) XOR∗∗ 1T-1R 16 steps three 1×8 Each step is XOR/READ [2]...

    [...]

Journal ArticleDOI
TL;DR: In this article, a physics-based compact model was chosen due to its flexibility, and the proposed algorithm was used to exactly fit the model to different RRAMs, which differed greatly in their material composition and switching behavior.
Abstract: Modeling of resistive RAMs (RRAMs) is a herculean task due to its non-linearity. While the exigent need for a model has motivated research groups to formulate realistic models, the diversity in RRAMs’ characteristics has created a gap between model developers and model users. This paper bridges the gap by proposing an algorithm by which the parameters of a model are tuned to specific RRAMs. To this end, a physics-based compact model was chosen due to its flexibility, and the proposed algorithm was used to exactly fit the model to different RRAMs, which differed greatly in their material composition and switching behavior. Furthermore, the model was extended to simulate multiple low resistance states (LRS), which is a vital focus of research to increase memory density in RRAMs. The ability of the model to simulate the switching from a high resistance state to multiple LRS was verified by measurements on 1T-1R cells.

37 citations


"A Parallel-friendly Majority Gate t..." refers methods in this paper

  • ...IHP’s 1T–1R cells were modeled using the Stanford-PKU RRAM model following the methodology presented in [16]....

    [...]

  • ...rDS = 1 μnCox( W L )(VGS−Vt) = 544 Ω [16]....

    [...]

Journal ArticleDOI
TL;DR: Two new majority gate-based recursive techniques are proposed that result in the calculation of the output carry of an multi-bit adder with only a majority gate delay of $\boldsymbol{n}$, which leads to a reduction of 40percent in delay and 30percent in circuit complexity for multi- bit addition in comparison to the best existing designs found in the technical literature.
Abstract: The design of high-performance adders has experienced a renewed interest in the last few years; among high performance schemes, parallel prefix adders constitute an important class. They require a logarithmic number of stages and are typically realized using AND-OR logic; moreover with the emergence of new device technologies based on majority logic, new and improved adder designs are possible. However, the best existing majority gate-based prefix adder incurs a delay of $2{\mathbf{lo}}{{\mathbf{g}}_2}(\boldsymbol{n}) - 1$ (due to the $\boldsymbol{n}$ th carry); this is only marginally better than a design using only AND-OR gates (the latter design has a $2{\mathbf{lo}}{{\mathbf{g}}_2}(\boldsymbol{n}) + 1$ gate delay). This paper initially shows that this delay is caused by the output carry equation in majority gate-based adders that is still largely defined in terms of AND-OR gates. In this paper, two new majority gate-based recursive techniques are proposed. The first technique relies on a novel formulation of the majority gate-based equations in the used group generate and group propagate hardware; this results in a new definition for the output carry, thus reducing the delay. The second contribution of this manuscript utilizes recursive properties of majority gates (through a novel operator) to reduce the circuit complexity of prefix adder designs. Overall, the proposed techniques result in the calculation of the output carry of an $\boldsymbol{n}$ -bit adder with only a majority gate delay of ${\mathbf{lo}}{{\mathbf{g}}_2}(\boldsymbol{n}) + 1$ . This leads to a reduction of 40percent in delay and 30percent in circuit complexity (in terms of the number of majority gates) for multi-bit addition in comparison to the best existing designs found in the technical literature.

25 citations


"A Parallel-friendly Majority Gate t..." refers background in this paper

  • ...The carry-generate and sum-generate blocks for an eight-bit adder in majority logic are derived from [11], [12] (Fig....

    [...]

  • ...8: Eight-bit PP adder (Ladner-Fischer)expressed as 7 levels of majority and NOT gates [11], [12]....

    [...]

  • ...Recent research [10]–[12] has confirmed that majority logic is to be preferred not only because a particular nanotechnology can realize it, but also because of its ability to implement arithmetic-intensive circuits with less gates....

    [...]

  • ...Since majority gate is the basic building block for many emerging nanotechnologies, prior works [11], [12] have formulated such PP adders in terms of majority gates....

    [...]

Book ChapterDOI
01 Jan 2020
TL;DR: This chapter presents mMPU memristive memory processing unit, which relies on a Memristor-Aided loGIC (MAGIC), a technique to compute logical functions using memristors within the memory array, and therefore directly tackles the von Neumann bottleneck.
Abstract: Data transfer between processing and memory units in modern computing systems is their main performance and energy-efficiency bottleneck, commonly known as the von Neumann bottleneck. Prior research attempts to alleviate the problem by moving the computing units closer to the memory that has had limited success since data transfer is still required. In this chapter, we present mMPU memristive memory processing unit, which relies on a memristive memory to perform computation using the memory cells, and therefore directly tackles the von Neumann bottleneck. In mMPU, the operation is controlled by a modified controller and peripheral circuit without changing the structure of the memory cells and arrays. As the basic logic element, we present Memristor-Aided loGIC (MAGIC), a technique to compute logical functions using memristors within the memory array. We further show how to extend basic MAGIC primitives to execute any arbitrary Boolean function and demonstrate the microarchitecture of the memory. This process is required to enable data computing using MAGIC. Finally, we show how to build the computing system using mMPU, which performs computation using MAGIC to enable a real processing-in-memory machine.

23 citations


"A Parallel-friendly Majority Gate t..." refers background in this paper

  • ...The 1S–1R is area-efficient, but suffers from current leakage (sneak–path problem) due to the inability to access a particular cell without interfering with its neighbours [9]....

    [...]

Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "A parallel-friendly majority gate to accelerate in-memory computation" ?

In this work, the authors propose a method to compute majority while reading from a transistoraccessed RRAM array. 

Trending Questions (1)
What are the reasons for the majority of almetric analysis using Research Gate?

The provided paper does not mention anything about almetric analysis or Research Gate.