What are the contributions mentioned in the paper "A parallel-friendly majority gate to accelerate in-memory computation" ?

Q: What are the contributions mentioned in the paper "A parallel-friendly majority gate to accelerate in-memory computation" ?

In this work, the authors propose a method to compute majority while reading from a transistoraccessed RRAM array.

(Open Access) A Parallel-friendly Majority Gate to Accelerate In-memory Computation (2020) | John Reuben

A Parallel-friendly Majority Gate to Accelerate

In-memory Computation

John Reuben

Chair of Computer Science 3 - Hardware Architecture

Friedrich-Alexander-Universit

at Erlangen-N

urnberg (FAU)

91058 Erlangen, Germany

johnreuben.prabahar@fau.de

Stefan Pechmann

Chair of Communications Electronics

Universit

at Bayreuth

95447 Bayreuth, Germany

stefan.pechmann@uni-bayreuth.de

Abstract—Efforts to combat the ‘von Neumann bottleneck’

have been strengthened by Resistive RAMs (RRAMs), which

enable computation in the memory array. Majority logic can

accelerate computation when compared to NAND/NOR/IMPLY

logic due to it’s expressive power. In this work, we propose a

method to compute majority while reading from a transistor-

accessed RRAM array. The proposed gate was veriﬁed by sim-

ulations using a physics-based model (for RRAM) and industry

standard model (for CMOS sense ampliﬁer) and, found to tolerate

reasonable variations in the RRAMs’ resistive states. Together

with NOT gate, which is also implemented in-memory, the pro-

posed gate forms a functionally complete Boolean logic, capable

of implementing any digital logic. Computing is simpliﬁed to a

sequence of READ and WRITE operations and does not require

any major modiﬁcations to the peripheral circuitry of the array.

The parallel-friendly nature of the proposed gate is exploited to

implement an eight-bit parallel-preﬁx adder in memory array.

The proposed in-memory adder could achieve a latency reduction

of 70% and 50% when compared to IMPLY and NAND/NOR

logic-based adders, respectively.

Index Terms—Resistive RAM (RRAM), majority logic, major-

ity gate, memristor, 1 Transistor-1 Resistor(1T–1R), von Neu-

mann bottleneck, in-memory computing, compute-in-memory,

processing-in-memory, parallel-preﬁx adder

I. INTRODUCTION

HE movement of data between processing and memory

units in present day computing systems is their main

performance and energy-efﬁciency bottleneck, often referred

to as the ‘von Neumann bottleneck’ or ‘memory wall’. The

emergence of non-volatile memory technologies like Resistive

RAM (RRAM) has created opportunities to overcome the

memory wall by enabling computing at the residence of data.

RRAMs are two terminal devices (usually a Metal-Insulator-

Metal structure) capable of storing data as resistance. The

change of resistance is due to the formation or rupture of a

conductive ﬁlament, depending on the direction of the current

ﬂow through the structure. The word ‘memristor’ is also used

by researchers to denote such a device, because it is essentially

a resistor with memory. Connecting such RRAM devices in

a certain manner, or by applying certain voltage patterns,

or by modifying the sensing circuitry, basic Boolean gates

(NOR, NAND, XOR, IMPLY logic) have been demonstrated

in RRAM arrays [1]–[6]. The motivation for such efforts is

to perform Boolean operations on data stored in the memory

array, without moving them out to a separate processing

circuit, thus mitigating the von Neumann bottleneck. Reviews

of such in-memory computing approaches are presented in

[7], [8]. To construct a memory array using such devices, two

conﬁgurations are common: 1Transistor–1Resistor (1T–1R)

and 1Selector–1Resistor (1S–1R). The 1T–1R conﬁguration

uses a transistor as an access device for each cell, isolating

the accessed cell from its neighbours in the array. The 1S–1R

conﬁguration uses a two-terminal device called a ‘selector’

which is fabricated in series with the memristive device.

The 1S–1R is area-efﬁcient, but suffers from current leakage

(sneak–path problem) due to the inability to access a particular

cell without interfering with its neighbours [9].

Majority logic, a type of Boolean logic, is deﬁned to be

true if more than half of the n inputs are true, where n is

odd. Hence, a majority gate is a democratic gate and can be

expressed in terms of Boolean AND/OR as M AJ(a, b, c) =

a.b + b.c + a.c, where a, b, c are Boolean variables. Although

majority logic was known since 1960, there has been a

revival in using it for computation in many emerging nan-

otechnologies (spin waves, magnetic Quantum-Dot cellular

automata, nano magnetic logic, Single Electron Tunneling).

Recent research [10]–[12] has conﬁrmed that majority logic is

to be preferred not only because a particular nanotechnology

can realize it, but also because of its ability to implement

arithmetic-intensive circuits with less gates. It must be em-

phasized that majority logic did not become the dominant

logic to compute because it was more efﬁcient to implement

NAND/NOR gate than a majority gate, in CMOS technology.

However, with many emerging nanotechnologies, this is not

the case anymore, therefore, majority logic needs to be re-

evaluated for its computing efﬁciency. In [13]–[15], majority

logic is implemented in RRAM by applying the two inputs of

the majority gate as voltages across its terminals, and the initial

state of the RRAM (which is also the third input) switches to

evaluate majority. Such an approach complicates the peripheral

circuitry and is also not parallel-friendly, because two of the

three inputs of a majority gate need to be applied as voltages

at wordline/bitline (see Fig.1(a)).

In this paper, we propose a majority gate whose structure

is conducive for parallel-processing in the memory array.

By activating three rows of the array simultaneously, the

This is author’s version of the accepted paper. For the published paper, see the 31st IEEE International Conference on

Application-specific Systems, Architectures and Processors (ASAP) proceedings in https://ieeexplore.ieee.org/

See Conference presentation (20 min video) at https://asap2020.cs.manchester.ac.uk/paper.php?id=72

current or future media, including reprinting/republishing this material for advertising or promotional purposes,

creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component

of this work in other works.

(A,B,C)

READ

(A,B,C)

A B C

(A,B,C)

A B C

(a) (b)

A B C

D E F

G H K

(c)

Peripheral ckt.

Mapping for (a)

Mapping for (b)

Fig. 1: (a) In-memory majority gate of previous works [13]–[15]

(b) Proposed parallel-friendly gate (c) When multiple gates have

to be executed in parallel, the majority gates of previous works

[13]–[15] have to be mapped diagonally because two gates cannot

be executed in the same row/column. This manner of computation

complicates both the peripheral circuitry and memory controller

(inputs of the gates inﬂuence row/column decoding). In the proposed

method, multiple gates can be mapped to the same set of rows,

thereby simplifying the peripheral and the memory controller (inputs

of the gates are resistance of memory cells and row/column decoders

retain their functionality as in a conventional memory).

resistance of the RRAM cells in a column are in parallel

during the READ operation. A Sense Ampliﬁer (SA) which

can accurately sense the effective resistance implements a ‘in-

memory’ majority gate. This manner of computing majority

enables parallelism and is energy-efﬁcient (both reading and

writing is energy-efﬁcient in 1T–1R when compared to 1S–

1R arrays due to the absence of sneak paths). To demonstrate

the potential of this method to accelerate computation, we

consider a parallel-preﬁx adder and formulate the steps to

perform eight-bit addition in a 1T–1R array. The remainder

of the paper is organized as follows. Section II-A presents the

principle of reading majority from a 1T–1R array. Since the

read operation is the crucial aspect of the proposed majority

gate, we present the detailed sensing methodology in Section

II-B. Further, we study tolerance to variations in resistive

states by performing Monte Carlo simulations. In Section

III we present the framework to compute in the memory

array, using the proposed majority gate. Section IV-A brieﬂy

presents parallel-preﬁx technique and the structure of an eight-

bit parallel-preﬁx adder in terms of majority gates. The adder

is then mapped to a 1T–1R array using the proposed in-

memory computing technique, in Section IV-B. We compare

the proposed eight-bit adder with the state-of-the-art, followed

by conclusions in Section V.

II. MAJORITY GATE IN 1T–1R ARRAY

A. Majority gate: Operating principle

Consider an array of RRAM cells arranged in a 1T-1R

conﬁguration, as depicted in Fig. 2. Each cell can be in-

dividually read/written into by activating the corresponding

wordline (W L) and applying appropriate voltage across the

cell (BL and SL). To read from a cell, the corresponding

W L is activated, a small current is injected into the cell and

the voltage across the cell is sensed in a voltage-mode SA i.e.

RRAM

eff

= R

|| R

Fig. 2: When three rows are activated (W L

1−3

) simultaneously

in a 1T-1R array, the resistances of the three RRAM devices are

in parallel. An ‘in-memory’ majority gate can be implemented by

accurately sensing the effective resistance R

ef f

the BL voltage is sensed while the SL is grounded. Now, if

three rows are activated simultaneously during read operation

(Rows 1 to 3 in Fig. 2), the resistances in column 1 are in

parallel (neglecting the parasitic resistance of BL and SL).

During read, the access transistor will be in linear region, and

hence the transistor’s resistance will be

(

)(V

−V

)

= 544 Ω [16]. The effective

resistance between BL and SL will therefore be R

eff

+ r

)||(R

+ r

)||(R

+ r

) ≈ (R

||R

if the drain-to-source resistance of transistor (r

) is small

compared to LRS. Table I lists the truth table of 3-input major-

ity gate (M

(A, B, C)) and the effective resistance for all the

eight possibilities. To verify the proposed gate on a real RRAM

device, we choose the 1T-1R cell from IHP

. The 1T–1R

structure consists of a NMOS transistor manufactured in IHP’s

130 nm CMOS technology, whose drain is connected in series

to the RRAM. The RRAM is a T iN/Hf

1−x

/T i/T iN

stack integrated between Metal2 and Metal3 in the BEOL of

the CMOS process. IHP’s 1T–1R cells were modeled using

the Stanford-PKU RRAM model following the methodology

presented in [16]. The cells have a mean LRS and HRS

of 10 KΩ and 133.3 KΩ, respectively. Therefore, the R

eff

is ≥ 8.7 KΩ when two or more cells are in HRS (shaded

grey in Table I) and ≤ 4.8 KΩ when two or more cells are

in LRS. Consequently, a majority gate can be implemented

during a READ operation by precisely sensing R

eff

. As can

be deciphered from Table I, the crucial aspect of the proposed

gate is to be able to differentiate between R

001

eff

(two LRS and

one HRS) and R

110

eff

(two HRS and one LRS). Let’s denote

the resistance to be differentiated as sensing window,

Sensing window for majority = 8.7 KΩ – 4.8 KΩ = 3.9 KΩ

Innovations for High Performance Microelectronics– Leibniz-Institut f

innovative Mikroelektronik, Germany

for IHP’s cell (resistance window = 13.3).

TABLE I: Precisely sensing R

eff

results in majority: Logic

‘0’ is LRS (10 KΩ) and logic ‘1’ is HRS (133.3 KΩ)

A B C M

(A, B, C) R

ef f

0 0 0 0

LRS

3.3 KΩ

0 0 1 0

HRS·LRS

LRS+2·HRS

4.8 KΩ

0 1 0 0

HRS·LRS

LRS+2·HRS

4.8 KΩ

0 1 1 1

HRS·LRS

HRS+2·LRS

8.7 KΩ

1 0 0 0

HRS·LRS

LRS+2·HRS

4.8 KΩ

1 0 1 1

HRS·LRS

HRS+2·LRS

8.7 KΩ

1 1 0 1

HRS·LRS

HRS+2·LRS

8.7 KΩ

1 1 1 1

HRS

44.4 KΩ

B. Sensing methodology

As stated, the methodology to reliably translate R

eff

into

a CMOS-compatible voltage is the crucial aspect of the

proposed majority gate. R

001

eff

is 4.8 KΩ and R

110

eff

is 8.7 KΩ,

and differentiating such a resistance window (≈ 3.9KΩ) needs

a robust SA. It must be noted that this will be exacerbated by

the variability exhibited by the RRAM devices. To meet this

requirement, a time-based SA recently proposed in [17] was

chosen. Different from conventional sensing schemes (voltage-

mode and current-mode), the time-based sensing scheme con-

verts the BL voltage (to be sensed) into a time delay and dis-

criminates in time-domain. This sensing scheme was originally

proposed to read data from STT-MRAM [17], which have a

resistance window of a few KΩ. Therefore, it is ideal for the

proposed majority gate. Furthermore, this time-based sensing

achieves two to three orders of magnitude improvement in

sensing (BER) compared to conventional schemes, in addition

to being reference-less [17].

The time-based sensing circuit is essentially a voltage-to-

time converter followed by a time-domain comparator (D-ﬂip

ﬂop). Voltage-to-time conversion is achieved by the current-

starved inverter (transistors M

1−5

) followed by transistor M

and an inverter (Fig. 3). During READ, a current I

READ

injected into the 1T-1R cell (corresponding three W Ls are

activated and SL is grounded). Depending on the effective

resistance R

eff

, the BL reaches an appropriate voltage. In

the conceptual waveforms of Fig.3, it is assumed that BL

gets charged to 300 mV if R

eff

is a high resistance (8.7 KΩ)

and 200 mV if R

eff

is a low resistance (4.8 KΩ), for the

purpose of illustration. Such a V

(few hundred mV) limits

the current ﬂow through the inverter (transistor M

1−3

), hence

the name current-starved inverter. When EN goes high, the

current-starved inverter introduces a delay proportional to V

i.e. a higher V

incurs less delay. A V

of 300 mV incurs

less delay and low-to-high transition of EN reaches the input

of the Flip-ﬂop (I

F F

) faster i.e. at T

HRS

. For a lower V

of 200 mV, the delay is greater and the low-to-high transition

READ

1T1R array

out

Time-Based

Sense Amp.

1-3

delay

out

delay

out

HRS (V

=300 mV)

LRS (V

=200 mV)

= 1 if HRS

= 0 if LRS

200 or 300 mV

READ

= 35 uA

out

current-starved

HRS

LRS

Fig. 3: A small current I

READ

injected into the cell converts the

resistance to a voltage which is fed to the time-based SA. A current-

starved inverter transforms this voltage into a proportional delay

which is sensed as a CMOS-compatible voltage by the D-FF [17].

occurs at T

LRS

. t

delay

is a chain of inverters programmed

to introduce a delay between T

HRS

and T

LRS

. EN

delay

, the

EN signal delayed by t

delay

acts as the edge trigger for the

D-FF. When EN

delay

goes high at T

(Decision Moment),

it latches the signal at I

F F

and hence the D

out

is high for

high resistance (R

110

eff

= 8.7 KΩ) and low for low resistance

001

eff

= 4.8 KΩ). It must be noted that for R

111

eff

= 44.4 KΩ,

will be much larger than 300 mV and will result in a

transition much before T

HRS

. Similarly, for R

000

eff

= 3.3 KΩ,

will be less than 200 mV and will result in a transition

much later than T

LRS

. Once designed to differentiate between

110

eff

and R

001

eff

, the time-based SA will output M

(A, B, C)

correctly for all the eight cases. Furthermore, the same SA can

be used to read a single bit by using a smaller I

READ

(and

activating a single W L during normal read operation). Hence

the proposed gate does not necessitate any modiﬁcation to the

read-out circuit of the regular memory array.

The time-based sensing circuit of Fig. 3 was designed in

IHP’s 130 nm CMOS process, and simulated to verify the

functioning of the majority gate. I

READ

of 35 µA was injected

into the 1T-1R cell to sense the BL voltage. For R

001

eff

and

110

eff

, V

was 282 mV and 410 mV, respectively. Since

the current-starved transistors M

1−3

are the crucial factor in

deciding the delay, they were made large (

1.5µm

0.39µm

) to

make the circuit less sensitive to CMOS process variations.

delay

was set to 3 ns using a chain of inverters with MOS

capacitive loads between them. RRAM cells exhibit variability

in their programmed resistive states cycle-to-cycle and device-

to-device [18]. Therefore the majority gate was evaluated by

taking RRAM variations into account. Since majority is com-

puted while reading (memory cell is not switched), the RRAM

was replaced with a resistor and variability was incorporated as

a Gaussian distribution in that resistor. The impact of process

variations was analysed using the statistical model ﬁles for

the CMOS transistors provided by the foundry. 2000 Monte

Fig. 4: Sample output of the time-based SA. At 13.5 ns, the EN

delay

goes high deciding the output. Only 100 MC simulations are plotted

(shaded light) with single typical case highlighted dark.

Carlo simulations were performed where the resistance of the

RRAM was Gaussian distributed with a standard deviation, σ

= 10% of mean RRAM resistance i.e σ

LRS

= 1 KΩ and σ

HRS

= 13.33 KΩ. With combined effects of RRAM variability and

process variability (in transistors of SA), the Bit Error Rate

(BER) was found to be 5.4%. Sample wave-forms are plotted

in Fig. 4. Further failure analysis of the majority gate (incorrect

sensing of R

001

eff

and R

110

eff

) revealed that it occurred only when

RRAM variability was more than 2σ from mean LRS/HRS (It

must be noted that 95% of resistances fall within 2σ from the

mean, in a Gaussian distribution).

III. FRAMEWORK TO COMPUTE IN 1T–1R ARRAY

A. Functional completeness and memory controller

As shown in Fig. 5-(a), NOT operation can be implemented

in a 1T–1R array by simply latching

Q from the output of the

time-based SA during READ (D-Flip ﬂop of Fig.3 outputs

Q and Q). This is accomplished by using a control signal

INV which is low during READ and majority operation (Q

is latched) and goes high only during NOT operation (Q is

latched). Majority together with NOT is functionally complete

i.e any Boolean logic can be expressed in terms of majority

and NOT gates [19]. In [19], the authors present Majority-

Inverter Graph (MIG), a new logic manipulation structure

consisting of three-input majority nodes and regular/inverted

edges. Fig.5-(b) is the MIG of a 1-bit full adder obtained by

MIGhty (MIG synthesis tool) and, any Boolean logic can be

synthesised in terms of majority and NOT gates in a similar

manner. Since both majority and NOT gates are implemented

READ

NOT gate

Majority gate

Maj(A,B,C)

INV

READ

INV

READ

INV

READ

S (sum)

out

Memory READ

& memory WRITE

RRAM memory

array

Peripheral ckt.

Control signals & data

READ

WRITE

MIGhty

S =A⊕B⊕C

out

= AB+BC

+AC

(a)

(b)

(c)

Fig. 5: (a) NOT operation implemented with a 2:1 Mux at the

output of the time-based SA; all logic operations are essentially

READ operations (b) 1-bit full adder expressed as Majority-Inverter-

Graph using MIGhty synthesis tool [19], where M

represents 3-

input majority operation (c) With majority/NOT gate computed as

READ, multiple levels of logic can be executed by writing the data

back to the memory, simplifying computing to READ and WRITE

operations.

as READ, multiple levels of gates can be cascaded by writing

the read data back to the array. In essence, ‘computing’ is

simpliﬁed to a sequence of READ and WRITE operations,

orchestrated by the memory controller, as depicted in Fig.5-

(c).

The memory controller of a regular memory (be it DRAM-

based or NVM-based) is responsible for orchestrating the

READ and WRITE operation by issuing the control signals to

the peripheral circuitry of the array. In addition, the memory

controller must be augmented with additional capability to

execute majority and NOT operation. Since both majority and

NOT operations are READ operations in this logic family, the

controller does not require any major alterations. To execute a

majority operation, an additional control signal called M AJ

is needed, which is set to logic ‘1’ during majority operation

and, the address of the ﬁrst row (out of three rows in which

majority is to be performed) is placed on the row decoder.

It must be noted that majority operation is executed on three

contiguous bits of data in a column and the triple row decoder

of section III-B will not only select the row corresponding

to the address placed on the row decoder, but also the next

two rows if MAJ is ‘1’. The column address is placed on

the column decoder to select the particular column in which

majority is executed and the SA is activated to get the output.

The NOT operation is the same as the READ operation with

the only exception being the controller issues the control signal

INV , which goes high to invert the read data at the output of

this signal acts as an additional input to the row decoder, Fig. 6

the SA (Fig. 5-(a)). The control signals activated during logic

operations are summarized in Table II.

TABLE II: Control signals for memory and logic operations

Operation WL BL SL EN(SA) IN V MAJ

READ single row

activated

to read

ckt.

grounded 1 0 0

NOT single row

activated

to read

ckt.

grounded 1 1 0

Majority three rows

activated

to read

ckt.

grounded 1 0 1

WRITE

‘0’

single row

activated

SET

grounded 0 0 0

WRITE

‘1’

single row

activated

grounded V

RESET

0 0 0

B. Triple-row decoder design

2:4 Dynamic Decoder

MAJ

Fig. 6: Triple-row decoding is achieved by interleaving mul-

tiple single-row decoders. When control signal MAJ is logic

‘0’ (READ/WRITE/NOT), W L

corresponding to row address

is selected. When M AJ is logic ‘1’ (majority),

W L

, W L

i+1

, W L

i+2

are selected.

A conventional decoder for a 1T–1R array can select one

row at a time, while the proposed majority gate needs three

rows to be selected simultaneously. Moreover, the row-decoder

must be versatile to switch between single-row activation and

triple-row activation seamlessly. This is because, as stated

in the previous section, one must be able to read/write a

single bit of the array (READ/WRITE/NOT) as well as read

three bits in a column (majority). To this end, we propose a

robust row decoder which is designed by interleaving multiple

single-row decoders. As depicted in Fig.6, a 4:16 triple-row

decoder can be designed by interleaving four 2:4 dynamic

NAND decoders

. Since single-row decoding must co-exist

with triple-row decoding, an address translator circuit is used

to switch between the two modes using MAJ as a control

a dynamic decoder uses a precharge signal φ, which when low, all W L

are driven to ‘0’. When φ goes high, W L

corresponding to D

goes

high, provided EN is ‘1’

signal. For example, to select a single row W L

, the address

is A

= ‘0101’ and MAJ = ‘0’. For these inputs,

the address translator outputs EN

= ‘0010’

and D

= ‘XXXX01XX’ (green decoder

in Fig. 6 is enabled and it’s second row is selected, thereby

activating W L

). But, for the same row address A

= ‘0101’ and MAJ = ‘1’, the address translator outputs

= ‘1110’ and D

‘010101XX’ (blue, red and green decoders are enabled and

second row of each of them is selected, thereby activating

W L

, W L

and W L

). The address translator inputs MAJ

and A

and generates D

and

to achieve this desired functionality for

all the 16 cases. With the address translator logic (88 tran-

sistors), the triple-row decoder requires 200 transistors, while

a regular 4:16 dynamic decoder (only single row activation)

requires 136 transistors, a 47% increase in the row-decoder

area. The address translator does not add any signiﬁcant

latency to the decoding process. The decoder was designed

in 130 nm IHP process and its functionality was veriﬁed and

decoding latency was found to be 496 ps.

C. Area of time-based Sense Ampliﬁer

Fig. 7: Layout of time-based SA.

In this work, the primary motivation for pioneering a

parallel-friendly gate was to exploit it to accelerate addition, by

executing gates in parallel. It must be emphasized that the main

drawback of RRAM based in-memory adders is their latency

– numerous cycles of Boolean operations (NAND, NOR,

IMPLY) are needed to perform addition, when compared to

CMOS. To evaluate the number of gates that can be executed

in parallel, we evaluated the area of the time-based SA. The

time-based SA of [17] could sense the BL voltage without an

op-amp, and, this was an important reason for adopting it for

our majority gate (conventional SAs use operational ampliﬁer,

which consume huge silicon area). The layout of the time-

based SA of Fig.3 is drawn in Fig.7 and occupies an area of

20 × 3 = 60 µm

. It must be noted that this area estimate does

not include the area of the delay element since it is shared by

all the SA in the array. (t

delay

in Fig.3 is implemented as series

of inverters with MOS capacitive load between them). From

[20], the layout of a single 1T–1R cell occupies 450 nm ×

450 nm = 0.2 µm

in 130 nm (12.4 F

). If the SA is stacked

along its height of 3 µm, eight columns can share a SA. This

means that the number of majority gates that can be executed

in parallel in an array is the number of columns divided by a

factor of 8 i.e. 32 gates can be executed simultaneously in a

256×256 array, 8 gates in a 64×64 array etc.

A Parallel-friendly Majority Gate to Accelerate In-memory Computation

Figures

Citations

Rediscovering Majority Logic in the Post-CMOS Era: A Perspective from In-Memory Computing

Accelerated Addition in Resistive RAM Array Using Parallel-Friendly Majority Gates

A Versatile, Voltage-Pulse Based Read and Programming Circuit for Multi-Level RRAM Cells

Direct state transfer in MLC based memristive ReRAM devices for ternary computing

A Novel In-Memory Wallace Tree Multiplier Architecture Using Majority Logic

References

Majority-based synthesis for nanotechnologies

A Taxonomy and Evaluation Framework for Memristive Logic

A Time-based Sensing Scheme for Multi-level Cell (MLC) Resistive RAM

RRAM Crossbar Arrays for Storage Class Memory Applications: Throughput and Density Considerations

Related Papers (5)

An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar

HieIM: highly flexible in-memory computing using STT MRAM

X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories

Distributed In-Memory Computing on Binary RRAM Crossbar

PIMA-logic: a novel processing-in-memory architecture for highly flexible and energy-efficient logic computation

Frequently Asked Questions (1)

Q1. What are the contributions mentioned in the paper "A parallel-friendly majority gate to accelerate in-memory computation" ?

Trending Questions (1)