A Parallel-friendly Majority Gate to Accelerate In-memory Computation
Summary (3 min read)
Introduction
- RRAMs are two terminal devices (usually a Metal-InsulatorMetal structure) capable of storing data as resistance.
- Recent research [10]–[12] has confirmed that majority logic is to be preferred not only because a particular nanotechnology can realize it, but also because of its ability to implement arithmetic-intensive circuits with less gates.
- Personal use of this material is permitted.
- In Section III the authors present the framework to compute in the memory array, using the proposed majority gate.
A. Majority gate: Operating principle
- Consider an array of RRAM cells arranged in a 1T-1R configuration, as depicted in Fig. 2. Each cell can be individually read/written into by activating the corresponding wordline (WL) and applying appropriate voltage across the cell (BL and SL).
- Table I lists the truth table of 3-input majority gate (M3(A,B,C)) and the effective resistance for all the eight possibilities.
- The 1T–1R structure consists of a NMOS transistor manufactured in IHP’s 130 nm CMOS technology, whose drain is connected in series to the RRAM.
- IHP’s 1T–1R cells were modeled using the Stanford-PKU RRAM model following the methodology presented in [16].
B. Sensing methodology
- As stated, the methodology to reliably translate Reff into a CMOS-compatible voltage is the crucial aspect of the proposed majority gate.
- The time-based sensing circuit is essentially a voltage-totime converter followed by a time-domain comparator (D-flip flop).
- ENdelay, the EN signal delayed by tdelay acts as the edge trigger for the D-FF.
- Therefore the majority gate was evaluated by taking RRAM variations into account.
A. Functional completeness and memory controller
- This is accomplished by using a control signal INV which is low during READ and majority operation (Q is latched) and goes high only during NOT operation (Q is latched).
- Majority together with NOT is functionally complete i.e any Boolean logic can be expressed in terms of majority and NOT gates [19].
- The memory controller of a regular memory (be it DRAMbased or NVM-based) is responsible for orchestrating the READ and WRITE operation by issuing the control signals to the peripheral circuitry of the array.
- It must be noted that majority operation is executed on three contiguous bits of data in a column and the triple row decoder of section III-B will not only select the row corresponding to the address placed on the row decoder, but also the next two rows if MAJ is ‘1’.
- The NOT operation is the same as the READ operation with the only exception being the controller issues the control signal INV , which goes high to invert the read data at the output of 2this signal acts as an additional input to the row decoder, Fig. 6 the SA (Fig. 5-(a)).
B. Triple-row decoder design
- A conventional decoder for a 1T–1R array can select one row at a time, while the proposed majority gate needs three rows to be selected simultaneously.
- To this end, the authors propose a robust row decoder which is designed by interleaving multiple single-row decoders.
- When φ goes high, WLi corresponding to D1D0 goes high, provided EN is ‘1’ signal.
- The address translator does not add any significant latency to the decoding process.
C. Area of time-based Sense Amplifier
- It must be emphasized that the main drawback of RRAM based in-memory adders is their latency – numerous cycles of Boolean operations (NAND, NOR, IMPLY) are needed to perform addition, when compared to CMOS.
- The time-based SA of [17] could sense the BL voltage without an op-amp, and, this was an important reason for adopting it for their majority gate (conventional SAs use operational amplifier, which consume huge silicon area).
- It must be noted that this area estimate does not include the area of the delay element since it is shared by all the SA in the array.
- (tdelay in Fig.3 is implemented as series of inverters with MOS capacitive load between them).
D. Energy for in-memory operations
- To assess the energy required for computation, the authors first calculate the energy required for each logic operation.
- The authors calculate the energy for a single majority operation, as EMAJ = VDD ∫ tREAD 0 IREAD · dt+ VDD ∫ tREAD 0 ISA · dt (1) where IREAD is the current injected into the 1T–1R cell (see Fig. 3), ISA is the current consumed by the time-based SA and tREAD is the READ cycle duration.
- The energy for a single majority operation, EMAJ = 1.98 pJ.
- The energy for the NOT operation is the same as the energy to read a single bit, and it was calculated to be ENOT = 1.24 pJ.
- ENOT is smaller than EMAJ because, IREAD was smaller (22 µA) for NOT and READ, where a single bit is read.
A. Parallel-prefix adder using majority logic
- Parallel-prefix (PP) adders are a family of adders originally proposed to overcome the latency incurred by the rippling of carry in CMOS-based adders.
- The regular structure of the memory array and the proposed parallel-friendly majority gate can be combined to implement PP adders in the memory array.
- The ‘carry-generate block’ can generate the carry ‘ahead’ and is known to reduce the latency to O(log n), for n-bit adders.
- Kogge-Stone, LadnerFischer, Brent-Kung and the like, are examples of PP adders.
- For an eight-bit adder, the logical depth is six levels of majority gates and one level of NOT gates, and at most eight gates are needed simultaneously in each level.
B. Mapping of the eight-bit LF adder to 1T–1R array
- The authors map the eight-bit Ladner-Fischer adder structure of Fig. 8 to a 1T–1R array, using the proposed logic family, and elaborate the sequence of operations.
- To minimize latency, the authors map the adder in a way such that all the majority gates in a logic level (see Fig. 8) are executed simultaneously in a READ operation (see Fig. 9).
- In a 1T–1R array, HRS→ LRS transition (SET process when the conductive filament is created) is accomplished by applying two pulses simultaneously to the WL and BL, while SL is grounded.
- In the Table III, the authors have not compared the energy for computation since they are either not reported [2] or reported for another RRAM technology [22].
- Each step has one or more OR/AND operation [23].
V. CONCLUSION
- A memristive logic family formulates a functionally complete Boolean logic with a memristive device (RRAM/PCM/STT-MRAM) as the primary switching device.
- The proposed method of implementing a majority and NOT gate in a 1T–1R array forms a new memristive logic family.
- The majority gate can be implemented in a 1T–1R array without necessitating any major change in the peripheral circuit (except the row decoder which needs to be modified to activate three rows simultaneously).
- Majority logic can be combined with parallel-prefix techniques to design fast adders, and the proposed gate can be used to implement them in memory arrays, with minimum latency.
Did you find this useful? Give us your feedback
Citations
22 citations
Cites background or methods from "A Parallel-friendly Majority Gate t..."
...An eight-bit parallel-prefix adder in majority logic could achieve a latency of 19 steps [46]....
[...]
...In contrast, in the R–V implementation [45,46], the row/column decoders retain their functionality as in a conventional memory, with a minor modification (the row decoder must be enhanced to select three rows during majority operation, which can be achieved by interleaving decoders [46])....
[...]
...(a) In-memory majority gate proposed in [45,46]: When three rows are activated (WL1−3) simultaneously in a 1T-1R array, the three resistances RA, RB, RC will be in parallel (Inputs of the majority gate A, B, C are represented as resistances RA, RB, RC)....
[...]
...Using majority logic, an 8-bit PP adder is implemented in memory in [46]....
[...]
...Furthermore, the R–V implementation [45,46] is conducive for parallel-processing since multiple gates can be mapped to the same set of rows, as illustrated in in Figure 4....
[...]
22 citations
9 citations
Cites background from "A Parallel-friendly Majority Gate t..."
...Those last two applications can also be combined to realize in-memory computing, one prominent way to overcome the von Neumann bottleneck, one of the major challenges for further improvements of modern computing systems [5]....
[...]
...in-memorycomputing Non-Volatile Logic [5]...
[...]
4 citations
Cites background or methods from "A Parallel-friendly Majority Gate t..."
...A robust decoder designed by interleaving multiple decoders can achieve this functionality, as presented in [7], [8]....
[...]
...Our solution we found is based on a procedure that was used in implementing majority gate logic [6] by modifying the processing in sense amplifiers of crossbar memristor based structures [7], [8]....
[...]
...This in-memory majority gate, published in our earlier works [7], [8] exploits the 1T–1R structure of the memristive array to compute majority....
[...]
3 citations
References
42 citations
38 citations
"A Parallel-friendly Majority Gate t..." refers background in this paper
...1 (a)) or READ [22] OR/AND 1S-1R 37 steps 64 cells Each step has one or more OR/AND operation [23] ORNOR 1S-1R 31 steps 54 cells Each step has one or more ORNOR/IMPLY [24] Majority+NOT 1T-1R 19 steps 5×65 Each step is majority/NOT or WRITE (this work) XOR∗∗ 1T-1R 16 steps three 1×8 Each step is XOR/READ [2]...
[...]
37 citations
"A Parallel-friendly Majority Gate t..." refers methods in this paper
...IHP’s 1T–1R cells were modeled using the Stanford-PKU RRAM model following the methodology presented in [16]....
[...]
...rDS = 1 μnCox( W L )(VGS−Vt) = 544 Ω [16]....
[...]
25 citations
"A Parallel-friendly Majority Gate t..." refers background in this paper
...The carry-generate and sum-generate blocks for an eight-bit adder in majority logic are derived from [11], [12] (Fig....
[...]
...8: Eight-bit PP adder (Ladner-Fischer)expressed as 7 levels of majority and NOT gates [11], [12]....
[...]
...Recent research [10]–[12] has confirmed that majority logic is to be preferred not only because a particular nanotechnology can realize it, but also because of its ability to implement arithmetic-intensive circuits with less gates....
[...]
...Since majority gate is the basic building block for many emerging nanotechnologies, prior works [11], [12] have formulated such PP adders in terms of majority gates....
[...]
23 citations
"A Parallel-friendly Majority Gate t..." refers background in this paper
...The 1S–1R is area-efficient, but suffers from current leakage (sneak–path problem) due to the inability to access a particular cell without interfering with its neighbours [9]....
[...]