Received December 4, 2019, accepted December 22, 2019, date of publication January 3, 2020, date of current version January 14, 2020. Digital Object Identifier 10.1109/ACCESS.2019.2963727 # A Novel Low Power and Reduced Transistor **Count Magnetic Arithmetic Logic Unit Using Hybrid STT-MTJ/CMOS Circuit** PRASHANTH BARLA<sup>®</sup>, VINOD KUMAR JOSHI<sup>®</sup>, AND SOMASHEKARA BHAT<sup>®</sup> Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, India Corresponding author: Vinod Kumar Joshi (vinodkumar.joshi@manipal.edu) **ABSTRACT** One of the major concern for CMOS technology is the increase in power dissipation as the technology node lowers down to deep submicron region. Magnetic tunnel junction (MTJ) working on Spin transfer torque (STT) switching mechanism is recognized as one of the most promising spintronic device for post CMOS era due to its non-volatility, high speed, high endurance, CMOS compatibility and mainly the low power dissipation which can offer the solutions for the problems posed by existing CMOS technology. We have proposed a novel logic-in-memory (LIM) architecture of magnetic arithmetic logic unit (P-MALU) based on hybrid STT-MTJ/CMOS circuits. Simulation results reveal that there is significant reduction in the total power dissipation and transistor count of arithmetic unit by 28.44% and 29.16% compared to double pass transistor logic based clocked CMOS ALU design (DPTL-C<sup>2</sup>MOS-ALU), while 58.87% and 45.16% to modified magnetic arithmetic logic unit (M-MALU) respectively. Reduction in average power dissipation for logical unit is 37.61% and 52.55% along with 47.22% and 42.42% fewer transistors than DPTL-C<sup>2</sup>MOS-ALU and M-MALU design respectively. Monte-Carlo(MC) simulation is then performed by incorporating process and mismatch variations for CMOS and extracted parameters of MTJ, to study the behavior of DPTL-C<sup>2</sup>MOS-ALU, M-MALU and P-MALU designs in terms of power dissipation. All the simulation results reveal that the P-MALU is superior than other two ALU designs in terms of power dissipation, delay and device count. Further, the P-MALU circuit is extended for 4-bits arithmetic operations. Electrical simulations are performed to verify the functionality of the design for higher bit operations which demonstrates the feasibility of the proposed design in VLSI circuits. **INDEX TERMS** Logic-in-memory, magnetic RAM, magnetic tunnel junction, non-volatile, tunnel magnetoresistance, spintronics, spin transfer torque. ### I. INTRODUCTION In this modern era of information and technology, high performance portable electronic devices are ubiquitous. The performance of these devices would greatly depend upon its speed of operation and nature of power hungriness. Design and development of low power consuming circuit which works optimally is the focus of current research. All these devices require highly efficient circuit elements and optimally performing circuit designs. Because of increased power dissipation in circuits below 90nm CMOS technology, much of the research is focused on developing devices which are compatible with existing CMOS technology [1]. The associate editor coordinating the review of this manuscript and approving it for publication was Liang-Bi Chen . These devices must also sustain with the scaling trends predicted by Moore's law [2]. This surge in the power dissipation is mainly due to the increase in static power dissipation observed while reducing the transistor size. Magnetic tunnel junction (MTJ), a spintronic device working on spin transfer torque (STT) switching mechanism has grabbed much of the attention recently due to its low power dissipation, nonvolatility, ease of integration with CMOS technology, fast reading capability, high density and infinite endurance [1]. Advancement in fabrication technology such as 3D [3], [4] back-end process, enabled the growth of MTJ device on top of silicon layer (CMOS technology) without compromising the functionality of the circuit [5]. This helps to reduce the overall area occupied by the circuit. Resistance of MTJs can be easily tuned to several kilo-ohms range by tweaking their barrier thickness so that they are fully compatible with the present CMOS technology [6], [7]. Use of MTJ brings non-volatility and makes the overall circuit as non-volatile. MTJs were initially dedicated to non-volatile memories such as magnetic random access memory (MRAM) [5], [8], but recent investigations have unraveled the use of MTJs in different circuit designs. For example, various hybrid circuits have been developed such as non-volatile magnetic adder [9]–[12], non-volatile magnetic decoder [13], non-volatile logic gates [14], magnetic flip-flop [15]-[19], magnetic look up table [20]-[23] for reprogrammable logic circuits and recently polymorphic gate module [24]. There are various circuit and their applications that are also developed using many other spintronic devices, such as spin valves [7], [25], [26], ferroelectric tunnel junctions (FTJs) [27]–[29], domain wall (DW) based magnetic nano wires [30]-[38], all spin logic (ASL) based devices [39]-[41], SPIN-transistors [7], [25], [26], 3-D magnetic ratchet etc., [25], [26], [42]. All these spintronic devices exploit the spin degree of freedom along with the charge on the electrons. Guo et al. [43] proposed novel architecture of magnetic arithmetic logic unit (MALU) using hybrid STT-MTJ/CMOS structure. CMOS process of 130 nm and STT-MTJ electrical model [44] was used for hybrid circuit simulation. In our initial step to design magnetic ALU using logic-in-memory (LIM) structure, we developed 1-bit MALU which can also be obtained by modifying the original structure of 1-bit MALU design reported by Guo et al. [43]. We named this structure as modified magnetic arithmetic logic unit (M-MALU) throughout the manuscript. All the hybrid circuits reported in the literature were found to consume almost zero standby power dissipation compared to their CMOS counterparts (thereby reducing overall power dissipation) and have shown all the potential to keep in line with the existing scaling trends [1], [10]. Encouraged by the results obtained by [9], we have proposed a novel magnetic arithmetic logic unit (P-MALU) based on LIM architecture using hybrid STT-MTJ/CMOS circuits. The MALU [43] and M-MALU, stores only the operation codes (opcodes), in order to select a specific operation (arithmetic/logic). In these designs, MTJs are acting as multiplexer select lines, whose job is to select a specific input. Unlike MALU and M-MALU, in the P-MALU, MTJs not only stores the input but also actively take part in performing a logic operation. Simulations are carried out and results are compared among double pass transistor logic based clocked CMOS ALU design (DPTL-C<sup>2</sup>MOS-ALU) [45], [46], M-MALU and the novel P-MALU. Monte-Carlo (MC) simulation is then performed to study the behavior of all three types of ALU designs in terms of power dissipation by incorporating process and mismatch variations for CMOS and extracted parameters of MTJ. Furthermore, 1-bit arithmetic unit of P-MALU is extended to perform 4-bit arithmetic addition. Cadence tool (IC6.1.6-64b.500.4) with 45 nm CMOS generic process design kit (GPDK) and compact physics based perpendicular magnetic anisotropy MTJ electrical model [47] developed on STT switching mechanism are used to perform all the hybrid MTJ/CMOS circuit simulations. We have used the 4T writing circuit for writing the state of the MTJ adapted from [10] in all the hybrid circuits. During the simulation process, all the transistors are set to default size i.e. $L=45~\mathrm{nm}$ and $W=120~\mathrm{nm}$ . The organization of the paper is as follows: section II presents the basics of MTJ, its construction, brief summary of various type of switching mechanism, idea of LIM structure and its benefits over the traditional Von-Neumann architecture. Section III describes the design and working of M-MALU, P-MALU and the 4-bit arithmetic unit of P-MALU. Section IV covers the the selection of STT-MTJ model, comparative study and a detailed discussion on the results obtained for various ALUs reported in this manuscripts. Finally, the conclusion of the work is presented in section V. Appendix representing the structure of 1-bit DPTL-C<sup>2</sup>MOS-ALU adder circuit has been also appended for convenience. #### II. BACKGROUND This section gives a brief description on MTJ construction, its switching mechanisms and concept of hybrid circuit developed using logic in memory (LIM) architecture. Giant magnetoresistance (GMR) was first observed in the year 1988 [48], since then rise of spintronics devices alluded researchers to investigate deeper into them. Various spintronics devices are explored to alleviate the problems imposed due to the scaling trends. Spintronic devices utilize intrinsic spin of an electron in addition to charge in order to extend the capabilities of electronic devices. Based on the characteristics and potential applications, MTJ is found to be the most suitable compared to rest of the spintronic devices [49]. #### A. MAGNETIC TUNNEL JUNCTION CONSTRUCTION MTJ is a multilayer nano-stack structure, comprising of a nonmagnetic (NM) layer (or tunnel barrier) sandwiched between two ferromagnetic (FM) layers (Figure 1). Here we choose p-MTJ (perpendicular magnetic tunnel junction) over i-MTJ (in plane magnetic tunnel junction), because, i-MTJ poses several problems such as short retention time of the stored data, erroneous state switching while reading and fabrication difficulty below 22 nm. All these problems are easily overcome by the p-MTJ device making it preferable choice over the i-MTJ [50]-[52]. p-MTJ are reported to have low power dissipation, high thermal stability, low current density and easily scalable than i-MTJ [53], [54]. In MTJ the magnetic orientation of one of the FM layer is fixed and is called fixed layer (or pinned, reference layer) while magnetic orientation of the other FM layer may point in the same or opposite direction to that of the reference layer and is called as free layer. By the application of external magnetic field or suitable current in a particular direction, magnetic orientation of the free layer can be altered. When the magnetic FIGURE 1. (a) Low resistance state of p-MTJ, where magnetic orientation of free layer and fixed layer are in the same direction. (b) High resistance state of of p-MTJ, where magnetic orientation of free layer and fixed layer are in the opposite direction. orientation of the free layer and fixed layer are in the same direction, then the resistance offered by the device for the flow of current is less, and that state is denoted by $R_P$ (parallel state), whereas, if the magnetic orientation of free layer is opposite to the fixed layer, the device offers higher resistance to the flow of read current and therefore it is in the high resistance state $R_{AP}$ (antiparallel state). Relative resistance variation between two states is described by a quantity known as tunnel magnetoresistance (TMR) and is calculated using Eq.1, $$TMR = \frac{R_{AP} - R_P}{R_P}. (1)$$ TMR effect is a quantum mechanical effect, where electrons tunnel across the nonmagnetic material. TMR depends on various factors such as fabrication technique, choice and thickness of material used, quality of interfaces [25], [55] etc. Achieving a high TMR value is crucial for the faithful reproduction of stored values in the MTJ. Because, high TMR produces large voltage swing, which could be easily read by CMOS sensing circuit with higher accuracy. Both $Al_2O_3$ and MgO have been used as NM layer in MTJ, which directly affect TMR ratio [52]. At room temperature the largest TMR for $Al_2O_3$ and MgO as NM layers with CoFeB as FM layers was reported as 70.4% [56] and 604% [57] respectively in the literature. This has catapulted the expectation of hybrid MTJ/CMOS circuits for commercialization. # B. MTJ SWITCHING MECHANISM Changing MTJ resistance either from $R_{AP}$ to $R_P$ or vice versa can be achieved by switching the magnetic orientation of the free layer. MTJ switching process is also known as writing/storing the data in MTJ. There are various mechanisms adopted for MTJ writing such as Field Induced Magnetic Switching (FIMS) [58], Thermally Assisted Switching (TAS) [59], [60], Spin Hall Effect (SHE) switching [61], [62], Voltage Induced Switching (VIS) [63], [64] and Spin Transfer Torque (STT) switching mechanism [65], [66]. FIMS requires high switching current and hence dissipates more power. Scalability of the switching circuit posed huge problem below 90 nm because of which FIMS requires **FIGURE 2.** STT switching mechanism: When $I_{write}$ flows from free layer to reference layer to change the MTJ resistance from $R_{AP}$ to $R_P$ , whereas $I_{write}$ flows from reference layer to free layer to change the MTJ resistance from $R_P$ to $R_{AP}$ . large area on silicon [67]. FIMS also suffers from selectivity problem [68]. Though the selectivity problem of FIMS is addressed in TAS by using an additional heating circuit, eventually it also requires large area on the silicon. This limits the scalability of MTJ writing circuit using FIMS and TAS mechanisms. A lot of research is going on in SHE, VIS and STT based switching mechanisms and sufficient experimentally verified literature is available. These mechanisms are able to address all the above mentioned issues effectively. However, not many circuits are fabricated using SHE and VIS methods, but using STT methods companies like Everspin, successfully commercialized MRAMs [69]. Due to commercial point of view, we have used STT switching mechanism in our circuit design. was theoretically predicted by Slonczewski in 1996 [70] and Berger [71]. Later it was experimentally observed in deep submicron sized low resistance CoFeB/Al<sub>2</sub>O<sub>3</sub>/CoFeB MTJ structure in 2004 [72]. In this method a spin polarized current is employed to switch the magnetic orientation of free layer. MTJ cell can be switched from antiparallel state (AP) to parallel (P) state or vice versa by the suitable amount of write current $(I_{write})$ flowing from free layer to fixed layer or vice versa respectively (Figure 2). The perpendicular write current, which distinguishes these regions is called as critical or threshold current ( $I_{C0}$ ). Electrons flowing from fixed layer to free layer, are spin polarized i.e. electron spin is aligned in the magnetization direction of the fixed layer. When these electrons reach in free layer, they transfer the spin angular momentum to magnetization of free layer by applying a large spin transfer torque. So the magnetic orientation of free layer aligns towards fixed layer, i.e. switching from $R_{AP}$ to $R_P$ takes place. When electrons flow in the reverse direction, magnetic orientation of the free layer changes opposite to the fixed layer by the reflected electrons, so switching from $R_P$ to $R_{AP}$ takes place. $I_{C0}$ is the important factor to characterize the performance of STT-MTJ device. Decrease in $I_{C0}$ corresponds to higher speed and low power consumption during write operation. Double-barrier MTJ (DMTJ) is a variant of MTJ, which reduces the $I_{C0}$ [73]. Hence, it is faster and consumes less power as compared to single barrier MTJ [74]. **FIGURE 3.** (a) Von-Neumann architecture showing logic and memory units. Both logic and memory are kept separate having their total area as X + Y and interconnects facilitate the communication between them. (b) LIM structure with stack integration having total area as either X or Y. FIGURE 4. Block diagram of LIM structure [1], consists of sense amplifier, MOS logic structure and MTJs. Writing circuit is used to change the state of MTJs. #### C. HYBRID MTJ/CMOS LOGIC-IN-MEMORY In traditional Von-Neumann architecture, logic and memory block are located separately and communication between them is facilitated by wires and interconnects, as shown in the Figure 3(a). This strategy hampers the overall performance of chip. There is significant delay introduced between these two blocks, as wire length increases. According to ITRS [75], global interconnects because of their large length has a huge influence on delay as compared to local interconnects and also global wires require large driver circuits, which dissipate significant amount of power. To add to that, as channel length of MOSFET decreases, its standby power dissipation increases. This will increase the overall power dissipation of the circuit. Further it has been reported that standby power increases with the level of integration [76]. LIM is emerging trend [1], [77] which can offer solution to the increased leakage currents due to scaling and large interconnect delay observed in traditional architecture. In LIM structure non-volatile devices are distributed over logic circuit plane and placed close to each other (Figure 3 (b)). This tight integration reduces the overall area occupied and also shortens the interconnect distance. Because of which there is a significant reduction in the delay and power consumption compared to Von-Neumann architecture [1]. Figure 4 shows the general block diagram for the LIM structure [1]. It is consists of three parts, 1) Sense Amplifier/Read circuitry: It is a current comparator circuit arrangement. Once pull down network (PDN) completes its operation on inputs, sense FIGURE 5. (a) Modified structure of sum sub-circuit for M-MALU to produce SUM, SUM in arithmetic addition and XOR, AND, OR functions along with their complements in logical mode of operation. (b) Modified structure of carry sub-circuit for M-MALU which produce CARRY, CARRY in arithmetic addition. amplifier provides output in its true and complementary form. - Pull down network: It is combination of MOS logic structure and MTJs. Here logical operations are performed by both MOS logic and MTJs. - 3) Writing block of the LIM architecture is used to write the input data into the MTJ. In both M-MALU and P-MALU designs, a current comparator pre-charge sense amplifier [78] is used. Whereas, 4T writing circuit is adapted from [10] for writing the state of MTJs. # III. M-MALU AND P-MALU DESIGN This section describes the design and working of M-MALU and P-MALU circuits developed using the concept of hybrid MTJ/CMOS LIM architecture. Simulated waveforms are also presented for the respective designs to confirm the functionality of these circuits. Further, extension of arithmetic unit of 1-bit P-MALU to 4-bit P-MALU is presented along with the simulated waveforms. # A. MODIFIED MAGNETIC ARITHMETIC LOGIC UNIT (M-MALU) Arithmetic logic unit (ALU), the heart of any central processing unit (CPU) performs both arithmetic (addition and subtraction) and logical operations (such as XOR, AND and OR etc.). Figure 5(a), (b) shows the sum and carry sub-circuits of M-MALU respectively. A, B, $C_{in}$ are inputs for both sum and carry sub-circuits. OUT1, $\overline{OUT1}$ and OUT2, $\overline{OUT2}$ are TABLE 1. M-MALU opcodes for arithmetic addition and different logical operations adapted from [43]. | Mc | M0 | M1 | M2 | OUT1 | $\overline{ ext{OUT1}}$ | OUT2 | $\overline{ ext{OUT2}}$ | Operation | |----|----|----|----|------|-------------------------|-------|-------------------------|------------| | 0 | 0 | 0 | 0 | SUM | $\overline{\text{SUM}}$ | CARRY | CARRY | Arithmetic | | X | 0 | 0 | 0 | XOR | XNOR | X | X | | | X | 0 | 1 | 1 | AND | NAND | X | X | Logical | | X | 1 | 0 | 0 | OR | NOR | X | X | | <sup>\* 0,1</sup> Represents P and AP state of the MTJ respectively. X represents Don't care condition. TABLE 2. P-MALU opcode combinations for arithmetic addition and different logical operations. | 0 | pcodes | | Out | tputs | Operation | | |-------------------|--------|----|------|-------|------------|--| | $\mathbf{C_{in}}$ | C1 | C2 | Out1 | Out2 | Operation | | | X | X | 0 | SUM | CARRY | Arithmetic | | | 0 | 0 | 1 | XOR | AND | | | | 0 | 1 | 1 | XOR | NAND | Logical | | | 1 | 0 | 1 | XNOR | OR | | | | 1 | 1 | 1 | XNOR | NOR | | | <sup>\*</sup>X represents Don't care condition FIGURE 6. Basic structure for P-MALU. outputs of the sum and carry sub-circuits respectively. Only sum sub-circuit is used to perform logical operations, whereas during arithmetic addition operation both sum sub-circuit and carry sub-circuit are used. Selection of arithmetic or logic operation is decided by different combinations of opcodes. These opcodes are stored in complementary pairs of MTJs. The complementary pairs of MTJs for sum sub-circuit are M0/ $\overline{\text{M0}}$ , M1/ $\overline{\text{M1}}$ and M2/ $\overline{\text{M2}}$ whereas complementary pairs of MTJs in carry sub-circuit are Mc/ $\overline{\text{Mc}}$ . Different combinations of opcode to perform various arithmetic and logic operations are shown in the Table 1. #### B. DESIGN OF NOVEL P-MALU Figure 6 shows the basic block diagram of 1- bit P-MALU. It performs addition (i.e. arithmetic) operation and XOR, XNOR, AND, NAND, OR and NOR (i.e. logical) operations based on the opcodes. Table 2 shows the various operations performed by the P-MALU for different combinations of opcodes and the corresponding outputs Out1 and Out2. Here carry input $C_{in}$ is not only used as input to the P-MALU but it is also used as one of the opcode. Opcodes are FIGURE 7. Full adder circuit of P-MALU for arithmetic operation. (a) Sum sub-circuit and (b) carry sub-circuit of full adder. Input A and B are given to MOS whereas C<sub>in</sub> is stored in MTJs. used as select lines for multiplexers which select a specific output as shown in the table 2. During arithmetic addition Out1 shows SUM result whereas Out2 shows the result for the CARRY. ### 1) ARITHMETIC UNIT OF P-MALU Figure 7 shows the circuit for the arithmetic unit which performs the addition operation when the opcode " $C_{in}C1C2$ " is set as "XX0". Full adder circuit is divided into sum sub-circuit (Figure 7(a)) and carry sub-circuit (Figure 7(b)). It has three | | Input | s | | SU | M sub-circuit | | CARRY sub-circuit | | | | |---|-------|----------|-----|---------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|-----------------------|-------------------|---------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|-----------------------| | A | В | $C_{in}$ | SUM | $\begin{array}{c} \textbf{Left arm} \\ \textbf{resistance} \ (\mathbf{R_{SL}}) \end{array}$ | $\begin{array}{c} \textbf{Right arm} \\ \textbf{resistance} \ (\mathbf{R_{SR}}) \end{array}$ | Resistance comparison | CARRY | $\begin{array}{c} \textbf{Left arm} \\ \textbf{resistance} \ (\mathbf{R_{CL}}) \end{array}$ | $\begin{array}{c} \textbf{Right arm} \\ \textbf{resistance} \ (\mathbf{R_{CR}}) \end{array}$ | Resistance comparison | | 0 | 0 | 0 | 0 | $2R_{ON} + R_{AP}$ | $2R_{ON} + R_P$ | $R_{SL} > R_{SR}$ | 0 | $2R_{ON} + R_{AP}$ | $2R_{ON} + R_P$ | $R_{CL} > R_{CR}$ | | 0 | 0 | 1 | 1 | $2R_{ON} + R_P$ | $2R_{ON} + R_{AP}$ | $R_{SL} < R_{SR}$ | 0 | $2R_{ON} + R_{AP}$ | $2R_{ON} + R_P$ | $R_{CL} > R_{CR}$ | | 0 | 1 | 0 | 1 | $2R_{ON} + R_P$ | $2R_{ON} + R_{AP}$ | $R_{SL} < R_{SR}$ | 0 | $2R_{ON} + R_{AP}$ | $2R_{ON} + R_P$ | $R_{CL} > R_{CR}$ | | 0 | 1 | 1 | 0 | $2R_{ON} + R_{AP}$ | $2R_{ON} + R_P$ | $R_{SL} > R_{SR}$ | 1 | $2R_{ON} + R_P$ | $2R_{ON} + R_{AP}$ | $R_{CL} < R_{CR}$ | | 1 | 0 | 0 | 1 | $2R_{ON} + R_P$ | $2R_{ON} + R_{AP}$ | $R_{SL} < R_{SR}$ | 0 | $2R_{ON} + R_{AP}$ | $2R_{ON} + R_P$ | $R_{CL} > R_{CR}$ | | 1 | 0 | 1 | 0 | $2R_{ON} + R_{AP}$ | $2R_{ON} + R_P$ | $R_{SL} > R_{SR}$ | 1 | $2R_{ON} + R_P$ | $2R_{ON} + R_{AP}$ | $R_{CL} < R_{CR}$ | | 1 | 1 | 0 | 0 | $2R_{ON} + R_{AP}$ | $2R_{ON} + R_P$ | $R_{SL} > R_{SR}$ | 1 | $2R_{ON} + R_P$ | $2R_{ON} + R_{AP}$ | $R_{CL} < R_{CR}$ | | 1 | 1 | 1 | 1 | $2R_{ON} + R_P$ | $2R_{ON} + R_{AP}$ | $R_{SL} < R_{SR}$ | 1 1 | $2R_{ON} + R_P$ | $2R_{ON} + R_{AP}$ | $R_{CL} < R_{CR}$ | TABLE 3. Truth table for full adder along with corresponding resistance for sum and carry sub-circuits. inputs A, B, Cin and two outputs SUM and CARRY with their complements respectively. Cin is written and stored in the non-volatile MTJs (MTJ0 - MTJ1) which imparts non-volatility for the P-MALU. Here the circuit consists of three major sections, sense amplifier, pull down network (PDN) and MTJ pairs (MTJ0 - MTJ1). The sense amplifier is a clocked current comparator [78], which accesses the difference between current flow in two arms (left and right) of the circuit and produces true (SUM, CARRY) and complementary outputs (SUM, CARRY). In the PDN section the NMOS are acting as switch; they either provide or block the path for the flow of current. Pair of MTJs are always set in complementary mode (AP/P or P/AP) which provide low or high resistive path for the flow of current and plays an important role in deciding the output of the circuit. NMOS resistance (ON or OFF) is always comparable to that of MTJ resistance, i.e. the ON resistance of the NMOS $(R_{ON})$ is less than parallel state of the MTJ $(R_{P\ MTJ})$ ; $R_{ON}$ < $R_{P\ MTJ}$ . OFF resistance of the NMOS $(R_{OFF})$ is more than antiparallel state resistance of MTJ $(R_{AP\_MTJ})$ ; $R_{OFF} >$ $R_{AP\ MTJ}$ . Table 3 shows the truth table for the full adder and the corresponding resistance comparison of left and right arm for the sum and carry sub-circuits. Adder circuit operates in two phases, pre-charge phase and evaluation phase. During the pre-charge phase (when clock is low), inputs are applied to MOSFETs as well as MTJ pairs. Both output and its complement are at logic 1. During evaluation phase (when clock is high) the inputs are evaluated, i.e. sense amplifier will detect the difference in the current flow between left and right arms (which intern depends on change in the resistance between left and right arm) of the sum or carry sub-circuit and produce true and complementary outputs. Consider any of the input condition "ABCin", such as "011" for evaluation phase (Figure7(a)); in sum subcircuit, M2, M4 (shown in green) are ON connected to right arm whereas in the left arm transistors M5, M7 (shown in red) are ON. So there are two paths (path P2 and path P1) from the SUM and SUM to ground. Path P1 has two ON transistor and their combined resistance is represented as $2R_{ON}$ . Similarly, path P2 also has two ON transistor having their combined resistance as $2R_{ON}$ . Now the value of output is decided by the resistance of the MTJ pairs. When MTJ0 and MTJ1 are configured as P, AP it represents C<sub>in</sub> as logic 1 (Figure 7(a)). FIGURE 8. SUM and CARRY output waveforms of arithmetic unit of P-MALU. A particular case of input patterns "011" and corresponding outputs have been highlighted. So, when this condition is met, the total resistance in the left arm (path P1) is $2R_{ON} + R_{AP}$ and the total resistance in the right arm (path P2) is $2R_{ON} + R_P$ . Because of which, current in the Path 2 (green dotted line) finds lower resistance path to ground than Path 1 (red dotted line). This produces difference in current flow in two arms which is detected by the sense amplifier to make SUM output as logic 0 and SUM as logic 1. Similarly, during the evaluation phase of the carry sub-circuit for "011" input combination; transistor M5, M7 (shown in red) and M1, M3 (shown in green) are ON (Fig. 7(b)). MTJ0 and MTJ1 are configured as P, AP (which represents C<sub>in</sub> as logic 1, Figure 7(b)). So, the total resistance for the flow of current in right arm, i.e. path 2 (red dotted line) is $2R_{ON} + R_{AP}$ , whereas for left arm, i.e. path 1 (green dotted line) it is $2R_{ON} + R_P$ . Hence, sense amplifier produces logic 1 as CARRY and logic 0 for CARRY. Figure 8 confirms the functionality of arithmetic module. For instance, it can be observed that when input pattern is " $ABC_{in}$ " = "011", SUM and CARRY bits are 0 and 1 respectively. # 2) LOGIC UNIT OF P-MALU Figure 9 shows the circuit for logic unit. Depending upon the combination of opcodes (table 2), it performs all logical operations XOR, XNOR, AND, NAND, OR, NOR. The full adder circuit performs logical operations on two input operands, A and B. $C_{in}$ is used as one of the opcode (table 2) to perform a specific logic operation. From SUM equation of full adder (Eq.2), when $C_{in}$ is "0", we obtain $A \oplus B$ at SUM output. On the contrary when $C_{in}$ is "1", we get $\overline{A \oplus B}$ at SUM output. Similarly, from the CARRY FIGURE 9. Hybrid STT-MTJ/CMOS circuit of logic unit. (a) XOR/XNOR operation performed by SUM sub-circuit. (b) AND/NAND and OR/NOR operation performed by CARRY sub-circuit. equation (Eq.4), when $C_{in}$ is "0", we get A.B at CARRY output, while for $C_{in} =$ "1", we obtain A + B at CARRY output. Hence by specifying value of $C_{in}$ we can perform 2 input XOR, XNOR, AND and OR logical operations. Since the carry-sub circuit (Figure 9(b)) produces both true and complementary outputs at the same time, when AND output is produced at CARRY, NAND is also produced at $\overline{CARRY}$ . Similarly, when OR is produced at CARRY, its complement i.e. NOR is obtained at the $\overline{CARRY}$ . Hence all the logic operations can be performed using the logic unit of P-MALU. $$SUM = A \oplus B \oplus C_{in} = ABC_{in} + A\overline{B}\overline{C_{in}} + \overline{A}B\overline{C_{in}} + \overline{A}\overline{B}C_{in}.$$ (2) $$\overline{SUM} = AB\overline{C_{in}} + \bar{A}\bar{B}\overline{C_{in}} + \bar{A}BC_{in} + A\bar{B}C_{in}. \tag{3}$$ $$CARRY = AB + AC_{in} + BC_{in}. (4)$$ $$\overline{CARRY} = \overline{A}\overline{B} + \overline{A}\overline{C_{in}} + \overline{B}\overline{C_{in}}.$$ (5) Figure 10 shows the result of logic unit when opcode (table 2) is set to Out1 produces XOR whereas Out2 produces FIGURE 10. Simulated waveforms of logical unit of P-MALU by setting the opcode. Logical unit performs XOR, AND and NAND operations. FIGURE 11. Simulated waveforms of logical unit of P-MALU by setting the opcode. Logical unit performs XNOR, OR and NOR operations. FIGURE 12. Block diagram of 4-bit hybrid STT-MTJ/CMOS adder. AND/NAND. Figure 11 shows the result of logical unit when opcode is set to Out1 produces XNOR whereas Out2 produces OR/NOR. #### C. 4-BIT EXTENSION OF P-MALU 4-bit arithmetic unit of P-MALU is obtained by cascading four 1-bit P-MALU in ripple carry form to perform 4-bit addition and is shown in the Figure 12. We have cascaded four arithmetic units of 1-bit P-MALU (Figure 7) in ripple carry form to obtain 4-bit arithmetic unit. This 4-bit arithmetic unit performs 4-bit addition. The previous carry output from each stage is passed to the next stage via a register (Reg0, Reg1, Reg2). The circuit (Figure 12) operates in two phases, pre-charge and evaluation phase. During pre-charge phase when CLK is 0, in clock period (1) output of all the sense amplifiers become high. During the evaluation phase when CLK is 1, in clock period (1), Block0 produces both SUM0 and CARRY0 output (Figure 13). At the same time, the generated CARRY0 is stored in REG0, so that this bit is available as carry-in (Cin1) for the Block1 in the pre-charge phase of clock period (2). The SUM0 generated in period (1) forms LSB of the final result. This process continues till 4 clock cycles to produce all the SUM bits of the final result. Because, only at the end of 4th clock period FIGURE 13. Simulated waveform for 4-bit adder showing output (SUM = 9) for specific case, i.e. when input A = 2 and B = 7. TABLE 4. MTJ parameters used for the simulation [47]. | Paramenter | Description | Value | |-------------------|--------------------------------|-------------------------------------------| | Area | MTJ dimensions | $32\text{nm}\times32\text{nm}\times\pi/4$ | | TMR(0) | TMR ratio with zero bias | 200% | | $t_{sl}$ | Thickness of the free layer | 1.3nm | | RA | Resistance area product | $5\Omega\mu m^2$ | | $t_{ox}$ | Thickness of the oxide barrier | 0.85nm | | $\sigma_{TMR}$ | Standard deviation of TMR | 3% of $TMR$ | | $\sigma_{t_{sl}}$ | Standard deviation of $t_{sl}$ | $3\%$ of $t_{sl}$ | | $\sigma_{t_{ox}}$ | Standard deviation of $t_{ox}$ | $3\%$ of $t_{ox}$ | SUM3 bit is available. Hence at the end of every $4^{th}$ clock cycle SUM0 to SUM3 will be obtained. As it is observed from Figure 13, if A and B are 2 and 7 respectively then SUM[3:0] is 9, obtained in $4^{th}$ clock cycle. # **IV. RESULTS AND DISCUSSION** In this section, simulation of appropriate MTJ model [47] and and various ALU designs are presented. Further, simulation results of arithmetic units and logical units are compared separately in-terms of power, device count, delay and PM variations. All the designs are operating at 100 MHz clock frequency. Justification to the non-volatility nature of the P-MALU is explained elaborately. To design the arithmetic unit of DPTL-C<sup>2</sup>MOS-ALU, we have used the CMOS conventional 1-bit full adder obtained from standard cell library of STMicroelectronics design kit [45], [46] and is shown in Appendix. Full adder is used for arithmetic unit which performs 1-bit addition operation. Simulated results of each arithmetic unit have been listed in the table 5. #### A. MTJ MODEL SIMULATION We have selected the appropriate STT-MTJ model version "PM\_Beta\_5" recently developed using Verilog-A [47]. The unique feature of this model is, the ease of integration with CMOS logic using Cadence tool. The other important feature of the model is the flexibility i.e. it can be easily upgraded by incorporating the experimental or fabrication results which would be reported in future. The similar lower version of the model "PM\_Beta\_4.5" had been also verified and reported in literature [79]. The MTJ model [47] used for our simulation are set with parameters shown in the table 4. Other parameters retain the default values mentioned in [47]. TABLE 5. Comparison of 1-bit P-MALU with 1-bit DPTL-C<sup>2</sup>MOS-ALU and 1-bit M-MALU for arithmetic addition operation @100 MHz. | Arithmetic Units | DPTL-C <sup>2</sup> MOS-ALU | M-MALU | P-MALU | |-------------------|-----------------------------|--------------------|-------------------| | Dynamic power(nW) | 255.2 | 381.8 | 177.9 | | Static power (nW) | $7.5^{a}$ | $0^b (74.5^c)$ | $0^b (9.8^c)$ | | Total power (nW) | $262.3^{d}$ | 456.4 <sup>d</sup> | $187.7^{d}$ | | Device count | 48 MOS | 62 MOS<br>+ 8 MTJ | 34 MOS<br>+ 4 MTJ | | SUM delay (ps) | 56.6 | 100.9 | 48.34 | | CARRY delay (ps) | 56.7 | 90.6 | 52.3 | Note: Write circuit has not been considered while obtaining the values in table 5. $R_P$ and $R_{AP}$ resistance of the MTJ are influenced by different parameters such as material properties, device dimensions, TMR etc. In the design stage this effect is taken into consideration by incorporating the variations in the extracted parameters such as: TMR, thickness of the barrier $(t_{ox})$ and thickness of the free layer $(t_{sl})$ . In our design, we have set 3% variations in the extracted parameters (TMR, $t_{ox}$ , $t_{sl}$ refer table 4) which follow Gaussian distribution [80], [81]. There would be changes caused due to these parameter variations which affect the resistance values of the MTJ. Hence, this in general would affect the overall performance of the hybrid circuits. Figure 14 shows the standard distribution of MC simulation of MTJ model for 200 runs. It represents the variations caused in the $R_P$ and $R_{AP}$ by considering 3% variations in TMR, $t_{ox}$ , $t_{sl}$ . The $R_P$ , $R_{AP}$ has mean value of $6.2k\Omega$ , $16.8k\Omega$ and standard deviation of 417.6 $\Omega$ , 1.1k $\Omega$ respectively. In our M-MALU and P-MALU design, resistance variations in the MTJ will affect the power dissipation of the circuit, which is one of the key performance indicator revealing the quality of LIM circuit. Detailed discussion on these variation have been presented in section IV-E. # **B. POWER DISSIPATION** Table 5 shows the comparison of power dissipation, device count and delay for the arithmetic unit of DPTL-C<sup>2</sup>MOS-ALU, M-MALU and P-MALU design. The dynamic power consumption of P-MALU adder is 30.28% less than that of DPTL-C<sup>2</sup>MOS-ALU adder and 53.4% less than M-MALU adder design respectively. Whereas, the total power consumption of P-MALU adder is 28.44% less than that of DPTL-C<sup>2</sup>MOS-ALU adder and 58.87% less than M-MALU adder design respectively. A bar chart is plotted for dynamic and total power dissipation in Figure 15. Regarding the standby power: In the P-MALU circuit, input A and B are applied to the NMOS transistors of logic tree and input $C_{in}$ is stored in MTJ pair. Use of MTJs in the <sup>&</sup>lt;sup>a</sup> As DPTL-C<sup>2</sup>MOS-ALU is volatile circuit, the power can not be switched off at any moment. $<sup>^</sup>b$ M-MALU and P-MALU circuits are non-volatile in nature, so the power is switched off in standby mode. $<sup>^{</sup>a,c}$ Static power dissipation under the steady state condition of the input signals. $<sup>\</sup>frac{d}{d}$ Total power dissipation in the active mode is the sum of dynamic and static power. **FIGURE 14.** Resistance distribution of (a) $R_P$ and (b) $R_{AP}$ for the MTJ model [47] used in our design. MC simulation of 200 runs are performed for each resistance state. FIGURE 15. Comparison of dynamic and total power dissipation among the DPTL-C<sup>2</sup>MOS-ALU, P-MALU and M-MALU. P-MALU imparts non-volatility into the circuit. During the standby mode, due to the non-volatility of MTJs, the power supply of the P-MALU circuit can be completely cut-off and hence the circuit dissipates almost zero power in this mode. Pair of MTJs will retain the stored bit (i.e. AP-P represents bit "0" and P-AP represents bit "1"). These states were written into MTJ pairs when the circuit was in active mode. When the power supply is restored back, the data stored in the MTJ pair is readily available for the logical operation, without the necessity to write/restore back again. Figure 16 shows one particular case where non-volatile nature of the P-MALU is demonstrated. When the power supply is completely cut-off the data stored in the MTJ pair can be easily retrieved. At time T1 bit "1" is written into the MTJ pair, which represents input Cin of the full adder (FA). Input A and B are applied to the MOS transistors. In active mode, based on the input combinations of ABC<sub>in</sub>, P-MALU outputs generate SUM and CARRY (refer Figure 7). Hence, in the evaluation phase between time T2-T3 input combination is $ABC_{in} = "101"$ and the outputs SUM and CARRY were "0" and "1" respectively. At time T3 the power supply to the P-MALU is completely turned off and the circuit enters into the standby mode. The power supply is completely cut-off from time T3 to time T4. During this time the P-MALU circuit does not consume any power. As soon as the power is switched on at time T4, bit "1" retained in the MTJ pair due to the non-volatile nature of the MTJ. As the MTJ pairs not only stores the logic value but also participate in the logic operation. Hence as the power is switched on at time T4, bit "1" already stored in the MTJs (C<sub>in</sub>) is available for addition operation in the immediate evaluation phase occurring between time T5-T6. So the SUM and CARRY outputs are produced as "0" and "1" respectively which is same as that produced before entering into standby mode. Here we have not written any new values into MTJ pair but, the already written value in the MTJ pair was retained. Since in P-MALU, there is no restore/write operation required for the MTJ data, the MTJs do not contribute for the additional delay in the outputs. The delay produced in the calculation of SUM and CARRY outputs depend upon the sensing quality of the sense amplifier. Further, it can be observed that at time T7 new bit value "0" is written in MTJ pair. Hence in P-MALU circuit we can completely cut-off the power supply of the circuit without losing the stored bit in MTJ pair. This completely eliminates static power dissipation of the P-MALU (table 5). The non-volatility of the M-MALU can also be explained in the same way, i.e. it also consumes zero static power and MTJs retain the stored value when the power is switched off (table 5). While, on the other side for CMOS based circuits, since complete cut-off of the power supply is not permitted there will be a significant static power dissipation. Table 6 shows the comparative study of average power dissipation and the device count used for all the logical operations performed by DPTL-C<sup>2</sup>MOS-ALU, M-MALU and P-MALU design. Figure 17 represents the bar chart of the power dissipated among all the ALUs for various logical operations. We have found that, P-MALU dissipates 37.61% and 52.55% lesser power than DPTL-C<sup>2</sup>MOS-ALU and M-MALU design respectively. FIGURE 16. Simulated waveform for FA showing the non-volatile nature. When supply voltage is powered OFF at T3 and powered ON again at T4, bit stored in the MTJs are retained. No bit restoration procedure is needed. TABLE 6. Comparison of power dissipation and device count among DPTL-C<sup>2</sup>MOS-ALU, M-MALU and P-MALU design for various logical operations @ 100 MHz. | Logic units | Operation performed | Power dissipation(nW) | Device count | |-----------------------------|---------------------|-----------------------|----------------| | DPTL-C <sup>2</sup> MOS-ALU | XOR/XNOR | 212.2 | 36 MOS | | | OR/NOR | 181.8 | 36 MOS | | | AND/NAND | 212.2 | 36 MOS | | M-MALU | XOR/XNOR | 265.5 | 35 MOS +6MTJ | | | OR/NOR | 265.8 | 32 MOS+ 2 MTJ | | | AND/NAND | 265.8 | 32 MOS + 2 MTJ | | P-MALU | XOR/XNOR | 133.6 | 19 MOS + 2 MTJ | | | OR/NOR | 123 | 19 MOS + 2 MTJ | | | AND/NAND | 121.6 | 19 MOS + 2 MTJ | Note: Write circuit has not been considered while obtaining the values in table 6. FIGURE 17. Comparison of average power dissipation of various logical units. # C. DEVICE COUNT Device count represents the total number of MOSFETs and MTJs used in each design. Here, transistors used for the sense amplifier as well as inverting the primary inputs and clock signal are also taken into consideration for device count. Table 5 represents the device count of various arithmetic units. It can be observed that device count of P-MALU adder is least compared to both DPTL-C<sup>2</sup>MOS-ALU and M-MALU design, i.e. P-MALU requires 29.16% lesser MOSFETs than DPTL-C<sup>2</sup>MOS-ALU design, 45.16% MOSFETs and 50% MTJs less than M-MALU design. For device count comparison between various logic units, we have taken the average of MOSFETs and MTJs used in each design. To perform various logical operations, P-MALU uses 47.22 % lesser MOSFETs than DPTL-C<sup>2</sup>MOS-ALU design, whereas compared to M-MALU, P-MALU uses 42.42% lesser MOSFETs and 39.93% lesser MTJs respectively. Hence P-MALU holds supremacy over other two designs by utilizing noticeably fewer number of devices during arithmetic and logical implementation. Further, thanks to 3D integration feasibility of MTJs, which not only further reduce the area occupied on silicon but also shorten the distance between memory and logic unit. Hence, there is a significant reduction in the overall die area for the P-MALU design over CMOS technology. # D. DELAY Table 5 shows the worst case delay for SUM and CARRY outputs. It can be observed that the P-MALU has least SUM and CARRY delay with respect to DPTL-C<sup>2</sup>MOS-ALU and M-MALU respectively. In P-MALU the delay is least, since C<sub>in</sub> is stored in MTJ pairs and inputs A, B are applied through MOSFET (Figure 7). During the evaluation phase of P-MALU, one of the pre-charged outputs (SUM/SUM and CARRY/CARRY) in the SUM and CARRY sub-circuits finds its path to ground via three MOSFETs and one MTJ respectively. So, in SUM or CARRY sub-circuits, there are four devices which facilitate any one of the output (SUM/SUM and CARRY/CARRY) to be discharged to ground respectively. There is no switching of the MOSFETs taking place during this phase. While in DPTL-C<sup>2</sup>MOS-ALU and M-MALU, C<sub>in</sub> is applied through the MOSFETs which TABLE 7. Delay dependency upon width of the PD transistor. | Size of the pull-down transistor | Worst case delay | | | |-------------------------------------|------------------|---------|--| | Size of the pun-down transistor | SUM | CARRY | | | W=120 nm | 48.3 ps | 52.3 ps | | | W=480 nm | 37.8 ps | 45 ps | | | W=960 nm | 36.2 ps | 43.6 ps | | | $\mathbf{W} = 1.2 \ \mu \mathbf{m}$ | 36.6 ps | 43.3 ps | | FIGURE 18. Dependency of delay on the width of PD transistor. As the size of the PD transistor increases the SUM and CARRY delay reduces. contribute in larger delay. During the evaluation phase of DPTL-C<sup>2</sup>MOS-ALU, both SUM and CARRY outputs are obtained by the switching action of 14 MOSFETs respectively, hence the delay increases. Whereas in the case of M-MALU, total 5 devices (4 MOSFETs and 1 MTJ) are required to produce the outputs for both SUM and CARRY sub-circuits. Hence P-MALU has comparatively lesser delay than DPTL-C<sup>2</sup>MOS-ALU and M-MALU. In P-MALU design, the output delay depends upon the size of pull down transistor (PD in Figure 7). Quicker the discharge-current finds its path to ground, faster the output response received. A delay dependency analysis is done on P-MALU by varying the size of PD. The worst case delay for the SUM and CARRY outputs are obtained for the P-MALU and results are shown in the table 7. From the tabulated values which are plotted in Figure 18, it is clear that, as the size of PD is increased, the worst case delay reduces for both SUM and CARRY output. A significant reduction in delay is observed under the range W = 120 nm to W = 480 nm. This is due to the wider channel which provides quicker path for discharging the current. Meanwhile, the increase in the size of the transistor will have area overhead on the entire circuit. #### E. PROCESS AND MISMATCH VARIATIONS Process and mismatch (PM) variations during the nanoscale fabrication affect the performance of LIM structures. To study this effect in the design stage, we have performed MC simulation of 200 runs in three different circuits i.e. DPTL-C<sup>2</sup>MOS-ALU, M-MALU and P-MALU. In M-MALU and P-MALU circuits, due to the use of MTJs, 3% process variations of TABLE 8. Power dissipation comparison for DPTL-C<sup>2</sup>MOS-ALU, M-MALU and P-MALU during MC simulation of 200 runs. | Circuit type | Min<br>(nW) | Max<br>(nW) | Mean<br>(nW) | Standard<br>deviation(nW) | |-----------------------------|-------------|-------------|--------------|---------------------------| | DPTL-C <sup>2</sup> MOS-ALU | 247.3 | 278.7 | 262.1 | 4.949 | | M-MALU | 388.1 | 528.7 | 456.4 | 25.11 | | P-MALU | 166 | 208.2 | 187.8 | 7.132 | FIGURE 19. Distribution of total power dissipation for P-MALU obtained during MC simulation of 200 runs. **TABLE 9.** MC simulation for total power dissipation of P-MALU with different size of the PD transistor. | PD transistor size | Min<br>(nW) | Max<br>(nW) | Mean<br>(nW) | Standard<br>deviation(nW) | |--------------------|-------------|-------------|--------------|---------------------------| | W=120 nm | 166 | 208.2 | 187.8 | 7.132 | | W=480 nm | 201.4 | 246.4 | 222 | 8.17 | | W=960 nm | 243.8 | 297.6 | 268.7 | 9.834 | | $W=1.2 \mu m$ | 264.6 | 322.5 | 291.5 | 10.63 | TMR, $t_{ox}$ , $t_{sl}$ , that follow the Gaussian distribution has been included. These PM variations affect the behavior of the circuit in different ways. We have studied the influence of PM variation on the power dissipation on three different designs. Table 8 shows the comparison of DPTL-C<sup>2</sup>MOS-ALU, M-MALU and P-MALU designs for their total power dissipation during MC simulation for 200 runs. From the table 8 it is clear that, the total power dissipation of P-MALU in-terms of min, max and mean value is less than rest of the two designs. Whereas M-MALU design consumes maximum power amongst all the three designs. Furthermore, maximum power dissipation of P-MALU i.e. 208.2 nW is less than the minimum power dissipation of DPTL-C<sup>2</sup>MOS-ALU design, i.e. 247.3 nW. So amongst all the three designs in-terms of total power dissipation, P-MALU is considered to be the most efficient design which consumes least power. Figure 19 shows an example of total power distribution obtained for P-MALU design during MC simulation of 200 runs. Table 9 shows the total power dissipation of P-MALU for different size of the PD transistor during MC simulation for 200 runs. From its tabulated values we can infer that, as size of the PD transistor increases the total power dissipated (in-terms of min, max and mean) by the P-MALU circuit FIGURE 20. Double pass transistor logic based clocked CMOS full adder circuit [45], [46]. also increases. But from the Figure 18 we observed that, delay of the P-MALU circuit will reduce as the size of the PD transistor increases. Hence we find a trade off between speed and power performance of the P-MALU for different size of the PD transistor. A larger PD transistor dissipates more power to produce quicker output response. #### V. CONCLUSION A lot of research is going on in both academics and industry to develop innovative designs using hybrid STT-MTJ/CMOS based on LIM structure. In this paper a novel P-MALU design is proposed and it is found superior to DPTL-C<sup>2</sup>MOS-ALU and M-MALU design in-terms of power dissipation, device count and delay. As MTJs are non-volatile in nature, they not only store the logic values but also the stored values are immediately available for logic operation as soon as the power is switched on. Hybrid circuits developed using MTJs are non-volatile in nature, due to which they consume zero static power in stand by mode and do not require to perform the "backup" and "restore" operations. This is a significant advantage of hybrid circuits over volatile CMOS design. Results discussed in this paper suggest that the proposed design not only consume less power but also occupy smaller area on the silicon. Hence P-MALU design has all the potential to be used in low power VLSI circuits which also accept the scaling trends of post CMOS era. #### **APPENDIX** # DOUBLE PASS TRANSISTOR LOGIC BASED CLOCKED CMOS FULL ADDER CIRCUIT See Figure 20. #### **REFERENCES** - [1] W. Zhao and G. Prenat, *Spintronics-Based Computing*. Berlin, Germany: Springer, 2015. - [2] G. E. Moore, "Cramming more components onto integrated circuits, Reprinted from Electronics, volume 38, number 8, April 19, 1965, pp. 114 ff," *IEEE Solid-State Circuits Soc. Newslett.*, vol. 11, no. 3, pp. 33–35, Sep. 2006. - [3] S. J. Souri, K. Banerjee, A. Mehrotra, and K. C. Saraswat, "Multiple Si layer ICs: Motivation, performance analysis, and design implications," in *Proc. 37th Conf. Design Autom. (DAC)*, 2000, pp. 213–220. - [4] Y. Deng and W. Maly, "2.5D system integration: A design driven system implementation schema," in *Proc. ASP-DAC*, Asia South Pacific Design Autom. Conf., Oct. 2004, pp. 450–455. - [5] S. Tehrani, J. Slaughter, E. Chen, M. Durlam, J. Shi, and M. Deherren, "Progress and outlook for MRAM technology," *IEEE Trans. Magn.*, vol. 35, no. 5, pp. 2814–2819, 1999. - [6] S. Ikeda, K. Miura, H. Yamamoto, K. Mizunuma, H. D. Gan, M. Endo, S. Kanai, J. Hayakawa, F. Matsukura, and H. Ohno, "A perpendicularanisotropy CoFeB–MgO magnetic tunnel junction," *Nature Mater.*, vol. 9, no. 9, pp. 721–724, 2010. - [7] X. Lin, W. Yang, K. L. Wang, and W. Zhao, "Two-dimensional spintronics for low-power electronics," *Nature Electron.*, vol. 2, no. 7, pp. 274–283, Jul. 2019. - [8] R. C. Sousa and I. L. Prejbeanu, "Non-volatile magnetic random access memories (MRAM)," *Comp. Rendus Phys.*, vol. 6, no. 9, pp. 1013–1021, Nov. 2005. - [9] S. Matsunaga, J. Hayakawa, S. Ikeda, K. Miura, T. Endoh, H. Ohno, and T. Hanyu, "MTJ-based nonvolatile logic-in-memory circuit, future prospects and issues," in *Proc. Design, Autom. Test Eur. Conf. Exhib.*, Apr. 2009, pp. 433–435. - [10] E. Deng, Y. Zhang, J.-O. Klein, D. Ravelsona, C. Chappert, and W. Zhao, "Low power magnetic full-adder based on spin transfer torque MRAM," *IEEE Trans. Magn.*, vol. 49, no. 9, pp. 4982–4987, Sep. 2013. - [11] A. Roohi, R. Zand, D. Fan, and R. F. Demara, "Voltage-based concatenatable full adder using spin Hall effect switching," *IEEE Trans. Comput.-Aided Design Integr. Circuits Syst.*, vol. 36, no. 12, pp. 2134–2138, Dec. 2017. - [12] A. Zarei and F. Safaei, "Power and area-efficient design of VCMA–MRAM based full-adder using approximate computing for IoT applications," *Microelectron. J.*, vol. 82, pp. 62–70, Dec. 2018. - [13] E. Deng, G. Prenat, L. Anghel, and W. Zhao, "Non-volatile magnetic decoder based on MTJs," *Electron. Lett.*, vol. 52, no. 21, pp. 1774–1776, Oct. 2016. - [14] W. Zhao, M. Moreau, E. Deng, Y. Zhang, J.-M. Portal, J.-O. Klein, M. Bocquet, H. Aziza, D. Deleruyelle, C. Müller, D. Querlioz, N. B. Romdhane, D. Ravelosona, and C. Chappert, "Synchronous non-volatile logic gate design based on resistive switching memories," *IEEE Trans. Circuits Syst. I, Reg. Paper*, vol. 61, no. 2, pp. 443–454, Feb. 2014. - [15] W. Zhao, E. Belhaire, C. Chappert, F. Jacquet, and P. Mazoyer, "New non-volatile logic based on spin-MTJ," *Phys. Status Solidi A*, vol. 205, no. 6, pp. 1373–1377, Jun. 2008. - [16] D. Chabi, W. Zhao, E. Deng, Y. Zhang, N. B. Romdhane, J.-O. Klein, and C. Chappert, "Ultra low power magnetic flip-flop based on check-pointing/power gating and self-enable mechanisms," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 6, pp. 1755–1765, Jun. 2014. - [17] A. Roohi and R. F. Demara, "NV-clustering: Normally-off computing using non-volatile datapaths," *IEEE Trans. Comput.*, vol. 67, no. 7, pp. 949–959, Jul. 2018. - [18] A. Jaiswal, R. Andrawis, and K. Roy, "Area-efficient nonvolatile flip-flop based on spin Hall effect," *IEEE Magn. Lett.*, vol. 9, pp. 1–4, 2018. - [19] W. Kang, Y. Ran, W. Lv, Y. Zhang, and W. Zhao, "High-speed, low-power, magnetic non-volatile flip-flop with voltage-controlled, magnetic anisotropy assistance," *IEEE Magn. Lett.*, vol. 7, pp. 1–5, 2016. - [20] Y. Guillemenet, L. Torres, G. Sassatelli, and N. Bruchon, "On the use of magnetic RAMs in field-programmable gate arrays," *Int. J. Reconfigurable Comput.*, vol. 2008, pp. 1–9, 2008. - [21] W. Zhao, E. Belhaire, C. Chappert, and P. Mazoyer, "Spin transfer torque (STT)-MRAM-based runtime reconfiguration FPGA circuit," TECSACM Trans. Embed. Comput. Syst., vol. 9, no. 2, pp. 1–16, Oct. 2009. - [22] G. Prenat, B. Dieny, W. Guo, M. El Baraji, V. Javerliac, and J.-P. Nozieres, "Beyond MRAM, CMOS/MTJ integration for logic components," *IEEE Trans. Magn.*, vol. 45, no. 10, pp. 3400–3405, Oct. 2009. - [23] R. Alhalabi, G. Di Pendina, I.-L. Prejbeanu, and E. Nowak, "High speed and high-area efficiency non-volatile look-up table design based on magnetic tunnel junction," in *Proc. 17th Non-Volatile Memory Technol. Symp.* (NVMTS), Aug. 2017, pp. 1–4. - [24] A. Roohi and R. F. Demara, "PARC: A novel design methodology for power analysis resilient circuits using spintronics," *IEEE Trans. Nanotech*nol., vol. 18, pp. 885–889, 2019. - [25] V. K. Joshi, "Spintronics: A contemporary review of emerging electronics devices," *Eng. Sci. Technol., Int. J.*, vol. 19, no. 3, pp. 1503–1513, Sep. 2016. - [26] S. Umesh and S. Mittal, "A survey of spintronic architectures for processing-in-memory and neural networks," J. Syst. Archit., vol. 97, pp. 349–372, Aug. 2019. - [27] L. Chen, T.-Y. Wang, Y.-W. Dai, M.-Y. Cha, H. Zhu, Q.-Q. Sun, S.-J. Ding, P. Zhou, L. Chua, and D. W. Zhang, "Ultra-low power Hf<sub>0.5</sub>Zr<sub>0.5</sub>O<sub>2</sub> based ferroelectric tunnel junction synapses for hardware neural network applications," *Nanoscale*, vol. 10, no. 33, pp. 15826–15833, Jul. 2018. - [28] Z. Wang, W. Zhao, W. Kang, Y. Zhang, J.-O. Klein, D. Ravelosona, and C. Chappert, "Compact modelling of ferroelectric tunnel memristor and its use for neuromorphic simulation," *Appl. Phys. Lett.*, vol. 104, no. 5, Feb. 2014, Art. no. 053505. - [29] Z. Wang, W. Zhao, W. Kang, A. Bouchenak-Khelladi, Y. Zhang, Y. Zhang, J.-O. Klein, D. Ravelosona, and C. Chappert, "Corrigendum: A physics-based compact model of ferroelectric tunnel junction for memory and logic design (2014 J. Phys. D: Appl. Phys. 47 045001)," J. Phys. D, Appl. Phys., vol. 49, no. 9, 2016, Art. no. 099501. - [30] S. Deb and A. Chattopadhyay, "Spintronic device-structure for low-energy XOR logic using domain wall motion," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2019, pp. 1–5. - [31] F. Parveen, Z. He, S. Angizi, and D. Fan, "Hybrid polymorphic logic gate with 5-terminal magnetic domain wall motion device," in *Proc. IEEE Comput. Soc. Annu. Symp. VLSI (ISVLSI)*, Jul. 2017, pp. 152–157. - [32] K. Huang and R. Zhao, "Magnetic domain-wall racetrack memory-based nonvolatile logic for low-power computing and fast run-time-reconfiguration," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 9, pp. 2861–2872, Sep. 2016. - [33] A. Roohi, R. Zand, and R. F. Demara, "A tunable majority gate-based full adder using current-induced domain wall nanomagnets," *IEEE Trans. Magn.*, vol. 52, no. 8, pp. 1–7, Aug. 2016. - [34] S. Angizi, Z. He, R. F. Demara, and D. Fan, "Composite spintronic accuracy-configurable adder for low power digital signal processing," in *Proc. 18th Int. Symp. Qual. Electron. Design (ISQED)*, Mar. 2017, pp. 391–396. - [35] H.-P. Trinh, D. Ravelsona, C. Chappert, J.-O. Klein, Y. Zhang, and W. Zhao, "Domain wall motion based magnetic adder," *Electron. Lett.*, vol. 48, no. 17, pp. 1049–1051, Aug. 2012. - [36] T. Luo, W. Zhang, B. He, and D. Maskell, "A racetrack memory based in-memory booth multiplier for cryptography application," in *Proc.* 21st Asia South Pacific Design Autom. Conf. (ASP-DAC), Jan. 2016, pp. 286–291. - [37] D. Kline, H. Xu, R. Melhem, and A. K. Jones, "Racetrack queues for extremely low-energy FIFOs," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 26, no. 8, pp. 1531–1544, Aug. 2018. - [38] S. Angizi, Z. He, F. Parveen, and D. Fan, "RIMPA: A new reconfigurable dual-mode in-memory processing architecture with spin Hall effect-driven domain wall motion device," in *Proc. IEEE Comput. Soc. Annu. Symp.* VLSI (ISVLSI), Jul. 2017, pp. 45–50. - [39] Q. An, L. Su, J.-O. Klein, S. Le Beux, I. O'connor, and W. Zhao, "Full-adder circuit design based on all-spin logic device," in *Proc. IEEE/ACM Int. Symp. Nanosc. Archit.* (NANOARCH), Jul. 2015, pp. 163–168. - [40] M. Patra and S. K. Maiti, "All-spin logic operations: Memory device and reconfigurable computing," EPL, vol. 121, no. 3, p. 38004, Feb. 2018. - [41] Q. An, S. Le Beux, I. O'connor, J. O. Klein, and W. Zhao, "Arithmetic logic unit based on all-spin logic devices," in *Proc. 15th IEEE Int. New Circuits Syst. Conf. (NEWCAS)*, Jun. 2017, pp. 317–320. - [42] Y. Zhang, "Compact modeling and hybrid circuit design for spintronic devices based on current-induced switching," Ph.D. dissertation, Univ. Paris Sud-Paris, Orsay, France, 2014. - [43] W. Guo, G. Prenat, and B. Dieny, "A novel architecture of non-volatile magnetic arithmetic logic unit using magnetic tunnel junctions," J. Phys. D, Appl. Phys., vol. 47, no. 16, Apr. 2014, Art. no. 165001. - [44] W. Guo, G. Prenat, V. Javerliac, M. E. Baraji, N. De Mestier, C. Baraduc, and B. Diény, "SPICE modelling of magnetic tunnel junctions written by spin-transfer torque," *J. Phys. D, Appl. Phys.*, vol. 43, no. 21, Jun. 2010, Art. no. 215001. - [45] Manual of Design Kit for CMOS 65 nm, STMicroelectron., Geneva, Switzerland, 2009. - [46] Y. Gang, W. Zhao, J.-O. Klein, C. Chappert, and P. Mazoyer, "A high-reliability, low-power magnetic full adder," *IEEE Trans. Magn.*, vol. 47, no. 11, pp. 4611–4616, Nov. 2011. - [47] Y. Wang, Y. Zhang, W. Zhao, J.-O. Klein, T. Devolder, D. Ravelosona, and C. Chappert. (2015). Compact Model for Perpendicular Magnetic Anisotropy Magnetic Tunnel Junction. [Online]. Available: https://www.researchgate.net/publication/309355960 - [48] M. N. Baibich, J. M. Broto, A. Fert, F. N. Van Dau, F. Petroff, P. Etienne, G. Creuzet, A. Friederich, and J. Chazelas, "Giant magnetoresistance of (001)Fe/(001)Cr magnetic superlattices," *Phys. Rev. Lett.*, vol. 61, no. 21, pp. 2472–2475, Jul. 2002. - [49] Y.-J. Song, G. Jeong, I.-G. Baek, and J. Choi, "What lies ahead for resistance-based memory technologies?" *Computer*, vol. 46, no. 8, pp. 30–36, Aug. 2013. - [50] Y. Zhang, W. Zhao, Y. Lakys, J.-O. Klein, J.-V. Kim, D. Ravelosona, and C. Chappert, "Compact modeling of perpendicular-anisotropy CoFeB/MgO magnetic tunnel junctions," *IEEE Trans. Electron Devices*, vol. 59, no. 3, pp. 819–826, Mar. 2012. - [51] J. D. Harms, F. Ebrahimi, X. Yao, and J.-P. Wang, "SPICE macromodel of spin-torque-transfer-operated magnetic tunnel junctions," *IEEE Trans. Electron Devices*, vol. 57, no. 6, pp. 1425–1430, Jun. 2010. - [52] S. Bhatti, R. Sbiaa, A. Hirohata, H. Ohno, S. Fukami, and S. N. Piramanayagam, "Spintronics based random access memory: A review," *Mater. Today*, vol. 20, no. 9, pp. 530–548, 2017. - [53] H. Lim, S. Lee, and H. Shin, "A survey on the modeling of magnetic tunnel junctions for circuit simulation," Act. Passive Electron. Compon., vol. 2016, pp. 1–12, 2016. - [54] I. Ahmed, Z. Zhao, M. G. Mankalale, S. S. Sapatnekar, J.-P. Wang, and C. H. Kim, "A comparative study between spin-transfer-torque and spin-Hall-effect switching mechanisms in PMTJ using SPICE," *IEEE J. Explor. Solid-State Comput. Devices Circuits*, vol. 3, pp. 74–82, Dec. 2017. - [55] S. Verma, A. A. Kulkarni, and B. K. Kaushik, "Spintronics-based devices to circuits: Perspectives and challenges.," *IEEE Nanotechnol. Mag.*, vol. 10, no. 4, pp. 13–28, Dec. 2016. - [56] D. Wang, C. Nordman, J. Daughton, Z. Qian, J. Fink, D. Wang, C. Nordman, J. Daughton, Z. Qian, and J. Fink, "70% TMR at room temperature for SDT sandwich junctions with CoFeB as free and reference layers," *IEEE Trans. Magn.*, vol. 40, no. 4, pp. 2269–2271, Jul. 2004. - [57] S. Ikeda, J. Hayakawa, Y. Ashizawa, Y. M. Lee, K. Miura, H. Hasegawa, M. Tsunoda, F. Matsukura, and H. Ohno, "Tunnel magnetoresistance of 604% at 300 K by suppression of Ta diffusion in CoFeB/MgO/CoFeB pseudo-spin-valves annealed at high temperature," *Appl. Phys. Lett.*, vol. 93, no. 8, Aug. 2008, Art. no. 082508. - [58] S. A. Wolf, D. Awschalom, R. Buhrman, J. Daughton, S. Von Molnar, M. Roukes, A. Y. Chtchelkanova, and D. Treger, "Spintronics: A spinbased electronics vision for the future," *Science*, vol. 294, no. 5546, pp. 1488–1495, Nov. 2001. - [59] J. Wang and P. Freitas, "Low-current blocking temperature writing of double-barrier MRAM cells," *IEEE Trans. Magn.*, vol. 40, no. 4, pp. 2622– 2624, Jul. 2004. - [60] I. Prejbeanu, M. Kerekes, R. C. Sousa, H. Sibuet, O. Redon, B. Dieny, and J. Nozières, "Thermally assisted MRAM," J. Phys., Condens. Matter, vol. 19, no. 16, 2007, Art. no. 165218. - [61] L. Liu, C.-F. Pai, Y. Li, H. W. Tseng, D. C. Ralph, and R. A. Buhrman, "Spin-torque switching with the giant spin Hall effect of tantalum," *Science*, vol. 336, no. 6081, pp. 555–558, May 2012. - [62] J. Hirsch, "Spin Hall effect," Phys. Rev. Lett., vol. 83, no. 9, p. 1834, 1999. - [63] Y. Shiota, T. Nozaki, F. Bonell, S. Murakami, T. Shinjo, and Y. Suzuki, "Induction of coherent magnetization switching in a few atomic layers of FeCo using voltage pulses," *Nature Mater.*, vol. 11, no. 1, pp. 39–43, Jan. 2012. - [64] W.-G. Wang, M. Li, S. Hageman, and C. L. Chien, "Electric-field-assisted switching in magnetic tunnel junctions," *Nature Mater.*, vol. 11, no. 1, pp. 64–68, Jan. 2012. - [65] D. Ralph and M. Stiles, "Spin transfer torques," J. Magn. Magn. Mater., vol. 320, no. 7, pp. 1190–1216, Apr. 2008. - [66] Z. Diao, Z. Li, S. Wang, Y. Ding, A. Panchula, E. Chen, L.-C. Wang, and Y. Huai, "Spin-transfer torque switching in magnetic tunnel junctions and spin-transfer torque random access memory," *J. Phys., Condens. Matter*, vol. 19, no. 16, Apr. 2007, Art. no. 165209. - [67] R. Sbiaa, H. Meng, and S. N. Piramanayagam, "Materials with perpendicular magnetic anisotropy for magnetic random access memory," *Phys. Status Solidi RRL*, vol. 5, no. 12, pp. 413–419, Dec. 2011. - [68] T. M. Maffitt, J. K. DeBrosse, J. Gabric, E. T. Gow, M. C. Lamorey, J. S. Parenteau, D. R. Willmott, M. A. Wood, and W. J. Gallagher, "Design considerations for MRAM," *IBM J. Res. Develop.*, vol. 50, no. 1, pp. 25–39, Jan. 2006. - [69] Everspin. (2019). Spin-Transfer Torque MRAM Products. [Online]. Available: https://www.everspin.com/spin-transfer-torque-mram-products - [70] J. Slonczewski, "Current-driven excitation of magnetic multilayers," J. Magn. Magn. Mater., vol. 159, nos. 1–2, pp. L1–L7, Jun. 1996. - [71] L. Berger, "Emission of spin waves by a magnetic multilayer traversed by a current," *Phys. Rev. B, Condens. Matter*, vol. 54, no. 13, pp. 9353–9358, Jul. 2002. - [72] Y. Huai, F. Albert, P. Nguyen, M. Pakala, and T. Valet, "Observation of spin-transfer switching in deep submicron-sized and low-resistance magnetic tunnel junctions," *Appl. Phys. Lett.*, vol. 84, no. 16, pp. 3118–3120, Apr. 2004. - [73] D. C. Worledge, "Theory of spin torque switching current for the double magnetic tunnel junction," *IEEE Magn. Lett.*, vol. 8, pp. 1–5, 2017. - [74] G. Wang, Y. Zhang, J. Wang, Z. Zhang, K. Zhang, Z. Zheng, J.-O. Klein, D. Ravelosona, Y. Zhang, and W. Zhao, "Compact modeling of perpendicular-magnetic-anisotropy double-barrier magnetic tunnel junction with enhanced thermal stability recording structure," *IEEE Trans. Electron Devices*, vol. 66, no. 5, pp. 2431–2436, May 2019. - [75] (2019). International Roadmap for Devices and Systems. [Online]. Available: https://irds.ieee.org/editions/2017 - [76] N. Sung Kim, T. Austin, D. Blaauw, T. Mudge, K. Flautner, J. S. Hu, M. Irwin, M. Kandemir, and V. Narayanan, "Leakage current: Moore's law meets static power," *Computer*, vol. 36, no. 12, pp. 68–75, Dec. 2003. - [77] T. Endoh, H. Koike, S. Ikeda, T. Hanyu, and H. Ohno, "An overview of nonvolatile emerging memories—Spintronics for working memories," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 6, no. 2, pp. 109–119, Jun. 2016. - [78] W. Zhao, C. Chappert, V. Javerliac, and J.-P. Noziere, "High speed, high stability and low power sensing amplifier for MTJ/CMOS hybrid logic circuits," *IEEE Trans. Magn.*, vol. 45, no. 10, pp. 3784–3787, Oct. 2009. - [79] Y. Wang, H. Cai, L. A. D. B. Naviner, Y. Zhang, X. Zhao, E. Deng, J.-O. Klein, and W. Zhao, "Compact model of dielectric breakdown in spin-transfer torque magnetic tunnel junction," *IEEE Trans. Electron Devices*, vol. 63, no. 4, pp. 1762–1767, Apr. 2016. - [80] B. Engel, J. Akerman, B. Butcher, R. Dave, M. Deherrera, M. Durlam, G. Grynkewich, J. Janesky, S. Pietambaram, N. Rizzo, J. Slaughter, K. Smith, J. Sun, and S. Tehrani, "A 4-Mb toggle MRAM based on a novel bit and switching method," *IEEE Trans. Magn.*, vol. 41, no. 1, pp. 132–136, Jan. 2005. - [81] J. Slaughter, N. Rizzo, F. Mancoff, R. Whig, K. Smith, S. Aggarwal, and S. Tehrani, "Toggle and spin toggle and spin-torque MRAM: Status and outlook," *IEEE Trans. Magn.*, vol. 41, p. 132, 2010. PRASHANTH BARLA received the B.E. degree in electronics and communication engineering and the M.Tech. degree in microelectronics and control systems from Visvesvaraya Technological University, Belgaum, Karnataka, India. He is currently pursuing the Ph.D. degree with the Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India. His research interests include VLSI design and Spintronics. He is working under the guidance of Prof. Vinod Kumar Joshi and Prof. Somashekara Bhat on hybrid MTJ/CMOS based on logic-in-memory structure. VINOD KUMAR JOSHI received the Ph.D. degree from Kumaun University, Nainital, India, and the M.Tech. degree from VIT University, Vellore, India. He is currently an Associate Professor with the Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal, India. His main research interests include Spintronics-based VLSI, logic-in-memory based hybrid non-volatile logic circuits and their application for low power application. He is a Life member of the Indian Society of Systems for Science and Engineering (LMISSE-00361), VSSC-ISRO, Trivandrum, India. **SOMASHEKARA BHAT** received the Ph.D. degree in the field of MEMS from IIT Madras, India. He is currently serving as a Professor with the Department of Electronics and Communication Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, India. His areas of interests are MEMS and electronics for biomedical applications.