This is a post-peer-review, pre-copyedit version of an article published in Journal of Electronic Testing. The final authenticated version is available online at: http://dx.doi.org/10.1007/s10836-018-5737-6. Access to this work was provided by the University of Maryland, Baltimore County (UMBC) ScholarWorks@UMBC digital repository on the Maryland Shared Open Access (MD-SOAR) platform.

Please provide feedback Please support the ScholarWorks@UMBC repository by emailing <u>scholarworks-group@umbc.edu</u> and telling us what having access to this work means to you and why it's important to you. Thank you.

# Low-Power Resonant Clocking Using Soft Error Robust Energy Recovery Flip-Flops

**Riadul Islam** 

Received: date / Accepted: date

**Abstract** An energy recovery or resonant clocking scheme is very attractive for saving the clock power in nanoscale ASICs and systems-on-chips, which have increased functionality and larger die sizes. The technology scaling followed Moore's law, that lowers node capacitance and supply voltage, making nanoscale integrated circuits more vulnerable to radiation-induced single event upsets (SEUs) or soft errors. In this work, we propose soft-error robust flipflops (FFs) capable of working with a sinusoidal resonant clock to save the overall chip power. The proposed conditional-pass Quatro (CPQ) FF and true single phase clock energy recovery (TSPCER) FF are based on a unique soft error robust latch, which we refer to as a Quatro latch. The proposed  $C^2$ -DICE FF is based on a dual interlocked cell (DICE) latch. In addition to the storage cell, each FF consists of a unique input-stage and a two-transistor, two-input output buffer. In each FF with a sinusoidal clock, the transfer unit passes the data to the Quatro and DICE latches. The latches store the data values at two storage nodes and two redundant nodes, the latter enabling recovery from a particle-induced transient with or without multiple-node charge sharing. Postlayout simulations in 65nm CMOS technology show that the FF exhibits as much as 82% lower power-delay product compared to recently reported soft error robust FFs. We implemented 1024 proposed FFs distributed in an H-tree clock network driven by a resonant clock-generator that generates a 1–5 GHz sinusoidal clock signal. The simulation results show a power reduction of 93%on the clock tree and total power saving of up to 74% as compared to the same implementation using the conventional square-wave clocking scheme and FFs.

Keywords Clock generator  $\cdot$  Cosmic radiation  $\cdot$  Energy recovery  $\cdot$  Flip-flop  $\cdot$  Single event upset  $\cdot$  Sinusoidal clock

Riadul Islam

E-mail: riaduli@umich.edu

Electrical and Computer Engineering, University of Michigan Dearborn, MI, 48128 USA Tel.: +1-313-583-6590



Fig. 1: (a) Traditional clocking scheme uses buffers in the clock tree to distribute a square wave clock for the final flip-flops, and (b) resonant energy recovery clocking scheme uses an RLC resonance network to the bufferless clock network to distribute a sinusoidal signal for the energy recovery flip-flops.

### 1 introduction

We have observed an increasing trend in the use of electronic devices; at the same time, due to the wide applications of mobile devices, power consumption and timing issues have become a critical concern in the modern semiconductor industry. In a semiconductor device, the clock is an indispensable part that gives the timing references to control the data flow. However, the clock distribution network (CDN) in synchronous ASICs and systems-on-chips (SOCs) consumes a significant amount of power compared to the whole system. In addition, the skew and the jitter of the clock network increase the timing margins, which consequently degrade the performance and prevent the chip from working at lower voltage to save power [5]. The energy recovery clocking scheme is very attractive as a way to address this concern. Unlike the traditional square-wave clocking scheme (see Figure 1(a)) [8], the resonant clocking scheme uses an on-chip inductor, an input decoupling capacitor with the capacitance of the clock distribution network, to generate a sinusoidal clock (see Figure 1(b)) [27, 9, 26, 3]. The energy recovery scheme achieves low energy dissipation by restricting current from flowing across devices with low voltage drop, enabling significant power saving.

In addition to the CDN power and timing constraints, the very large-scale integrated circuits in the submicron regime are highly susceptible to particleinduced single event transients (SETs). First and foremost, these SETs are caused by alpha particles and cosmic neutrons, which originate from packaging materials and intergalactic rays, respectively. Cosmic radiation, which mainly comprises neutrons at the ground level, has a higher flux density at aircraft altitudes. However, the ground level neutron flux (~ 13 neutrons/ $cm^2/s$  in New York City) is sufficient to interact with the silicon atoms in the substrate and generate unwanted charges [17]. When these unwanted charges collect by reverse bias p-n junctions, they result in a voltage transient or SET at the associated nodes [1, 25, 23]. When an SET changes the stored value of a memory element or flip-flop (FF), it is referred to as single event upset (SEU) [6]. While an SEU can cause system malfunctions by data corruption, the error does not permanently damage the device as we can restore the original/new data by refresh or rewrite and hence referred to as a soft error. However, with technology scaling, the soft error rate (SER) in logic circuits is increasing significantly compared to the overall SER rate of the sequential circuits [18]. In fact, the per-bit logic SER has become comparable to that of the embedded memory at the 65nm node [1]. Therefore, limiting the SER and the clock network power is very crucial to ensure the reliability and power efficiency of ASICs and SOCs.

The importance of designing low-power and high-performance timing elements has led to the design of different kinds of low-power latches and FFs in the literature [8, 21, 30, 19, 15, 32, 7, 22, 29, 28, 11, 20]. A number of papers have proposed new styles of latches and FFs that improve power, speed, and energy or demonstrate special characteristics of timing and soft error robustness. Soft error robust flip-flops consume extra power due to their additional redundant circuitry. In addition, this circuitry increases the flip-flops' input clock pin capacitance. As a result, the conventional clocking and the soft error robust flip-flops increase the overall chip power. In this paper, we present three high-speed soft error tolerant FFs with an energy recovery clock in order to improve the robustness and reduce the CDN power of a SOC or a microprocessor. By working with the energy recovery clock, the FF enables recovery of energy from the clock input gate capacitances and eliminates the need for clock tree buffers, which are often used for a square-wave clock. Hence, we refer our flip-flops as energy recovery flip-flops. The FFs are based on an eighttransistor (8T) SEU-tolerant Quatro latch [16] and an 8T dual interlocked cell (DICE) [4]. The key contributions of this work are:

- The first work to apply resonant clocking on SEU hardened FFs.
- A comprehensive analysis of proposed FFs with existing soft error robust and energy recovery FFs.
- The proposed FF remove glitches from traditional true single phase clock register output.
- A detailed analysis of SET sensitivity based on dual-node charge sharing, which is considered as a major threat to the reliability of an integrated circuit.
- A multi-frequency power comparison of the proposed SEU-hardened FF system with traditional clocking schemes.

The remainder of the paper is organized as follows. Section 2 presents an overview of existing energy recovery and soft error robust FFs. Section 3 intro-

duces the proposed soft error robust energy recovery FFs. Section 4 presents a performance analysis of the proposed FFs. Section 5 presents the soft error tolerance of the proposed FFs, and Section 6 presents the energy recovery clocking scheme while Section 7 draws the conclusions.

# 2 Background

#### 2.1 Overview of Existing Energy Recovery Flip-Flops

A number of energy recovery FFs are proposed in the literature [8, 21]. The sense amplifier energy recovery (SAER) FF is based on traditional sense amplifier [21]. The SAER FF consumes extra short-circuit power due to the overlap between evaluation and precharge phases. In addition, its internal nodes are charging and discharging at every clock cycle regardless of any data switching activity. Static differential energy recovery (SDER) is a implicit pulsed FF that has a very low clock-to-Q  $(t_{C-Q})$  delay and also eliminates the internal charging-discharging issue of SAER FF but consumes extra power at high data switching activity [21]. However, the SDER input data buffer consumes more power than the SAER FF. The single-ended conditional capturing energy recovery (SCCER) FF exhibits very low  $t_{C-Q}$  delay [21]. The power-efficiency of a SCCER FF reduces significantly at high dara rate, this is primarily due to the large four-NMOS transistors stacked at the input stage. However, SCCER FFs are more power-efficient at low data switching activity. Another prior art, the dual-edge-triggered pulsed energy recovery FF [8] requires more transistors compared to a single edge-triggered FF. Similar to a SAER FF, this FF consumes more power due to its internal node charging-discharging, particularly at low data activity. Moreover, all of these [8, 21] energy recovery FFs are highly vulnerable to particle-induced single event transients.

#### 2.2 Overview of Existing SEU-Hardened Flip-Flops

The SEU mitigation in FFs involves either hardware-redundancy or circuithardening by design (HBD) methodologies. Redundancy can be spatial or temporal. Triple modular redundancy (TMR), in which a module is replicated three times and the output is extracted from a majority vote, is widely used to mitigate SETs/SEUs. The temporal redundancy technique, on the other hand, samples the data at lower-frequency than the pulse width of the SET. The sampled data then store in different FFs and uses the majority-voter circuit to restore the original data. This technique is very attractive, as it can detect and correct an SEU resulting from an SET on the data line as well as on the storage-cell inside the FF. However, both of these redundancy techniques offer improved SER rate but consume a large area, power, and performance penalties. On the other hand, HBD techniques employ SEU-immune storagecell instead of replicating the hardware [4, 15, 19, 7, 22, 32, 10, 13, 24, 11, 31]. HBD techniques are more attractive than redundancy techniques because of the significantly lower area, power, and delay penalties, but HBD techniques cannot correct SEUs caused by data-line SETs.

The traditional SEU-immune FFs are based on static DICE cell which has four storage nodes. Unlike a cross-coupled inverter, DICE copies data and complement of data in two additional nodes. All the four nodes of a DICE cell is connected to one NMOS and one PMOS, and when the logic of any node is corrupt by an SET, the MOSFET connected to the corresponding node helps to restore the correct value. Traditionally, DICE-based FFs use a single DICE [19, 16] or two DICE in a master-slave configuration [32]. An interesting DICE-based FF uses explicit pulse-generator to generate voltage pulse (less than 50% duty cycle) to write data at the storage cell [19]. we identify to this FF as pulsed-DICE (PDICE) [19]. The true single phase clock (TSPC) architecture is very attractive for designing a power efficient and less-areaconsuming clock tree. The TSPC-DICE FF [15]offers single phase clocking and also limits the negative bias temperature instability (NBTI) effects. However, traditional TSPC architecture produces glitches at the output, generally at low data activity. Unlike TSPC-DICE FFs [15], PDICE FFs do not have any transistor sizing constraints inside the DICE cell, even though both of these FFs consume higher power, particularly at low data rate. Most of the existing DICE-based FFs cannot mask an SET of an internal node propagating to the output (Q) or next level-logic. A C-element based SEU hardened dual data rate FF [7], a TSPC-DICE FF, and a delay filtered DICE FF [22] can efficiently serve this purpose; however, they suffer from significant  $t_{C-Q}$ delay and area overhead. Another previously developed pulsed-gated DICE (PGDICE) uses extra transistors to improve the double node upset; however, it consumes significant extra area and power [28]. Similar to a master-slave D FF, a Quatro-based SEU-hardened FF (MSQuatro) has low delay; however, it suffers from large area overhead [20]. Therefore, SEU-hardened FFs with optimal power, area, and high performance are of tremendous interest in order to meet the overall power budget and reliability needs of microprocessors and SOCs.

#### 3 Proposed Soft Error Robust FFs

#### 3.1 Proposed Conditional-Pass Quatro Flip-Flop

Figure 2a shows schematic of the proposed conditional-pass Quatro (CPQ) FF [12]. The CPQ has an input transfer unit, an SEU-immune Quatro latch, and an output stage. The input stage delay element opens a small transparency window during the Clk and Clkb signal overlap that allows the data and its complement to pass and be written to the storage cell. Similar to a TSPC-DICE FF [15], an equalizer transistor M7 helps the input stage to enable writing into the storage cell at the  $0 \rightarrow 1$  transition of the input Clk signal. If the FF is storing a "1", the voltages at Quatro cell internal nodes A, B, C, and



(a) The proposed energy recovery CPQ FF uses the overlap of Clk and Clkb signals to create a transparency window in the rising edge of Clk signal to write data at the Quatro cell.



(b) Simulation waveforms confirm a successful write at the Quatro cell on the rising edge of a sinusoidal Clk signal.

Fig. 2: The Proposed CPQ FF and simulation results.

D are "0", "1", "0", and "1". For a stored value of "0", the node voltages are "1", "0", "1", and "0". The output stage consists of a two-transistor two-input inverter that can mask the SET to propagate to the output [14].

Figure 2b shows the simulation waveforms of the proposed CPQ FF. The input stage delay element (three inverters) generates a narrow time window using Clk and Clkb signals at the rising edge of the Clk signal to pass logic "1" or "0" data to the output. Irrespective of value, the data signal is required to be stable before the falling edge of the Clkb signal. As a result, the FF trigger at the positive or rising edge of the sinusoidal Clk signal and exhibit a negative setup time.

In order to write "1" into the Quatro cell (B, D) nodes, the current drive capability of M5 requires being large enough to overpower M11 and M15. In order to write "0" into the Quatro cell (B, D) nodes, the current drive capability of (M6, M3, M4) requires being large enough to overpower M10 and M14. Special attention requires to properly size M1 and M5 transistors to reduce charge-sharing and ensure proper functionality of the FF. We identified a special problem in CPQ FF, the internal node "a" charges-discharges at every clock cycle, when the input data is low for several cycles. We tackle the issue by minimize the width of M1, which increases the resistance between node "a" and node " $V_{DD}$ ". An alternate way to deal with this problem can be using a ground connection in the input of M1 instead of using Clk signal. In the latter case, the FF power consumption will increase at high data switching activity due to the short-circuit current. On the other hand, using a small width (or large length) MOSFET at M1 will increase FF  $t_{C-Q}$  delay and incur high constraint in FF setup and hold time. In order to write to a Quatro latch, the input transfer unit needs to access at least two nodes of the storage cell. We cannot write data into the latch by driving only one node to logic "0" or "1". The small equalizing NMOS (M7) helps the input register transfer stage to access two nodes simultaneously. Therefore, the primary assumption behind the SEU-immunity of the Quatro latch is that an SET can affect only one node in a given strike. In order to reduce the charge sharing, two similar potential nodes ("A" and "C" or "B" and "D") are placed as far apart as possible in the layout. However, due to the aggressive scaling of the transistors in nanoscale technologies, an SET can affect multiple nodes by causing charge sharing among the adjacent nodes. Accordingly, in Section 5 of this work, we analyze the robustness of the proposed FF to dual node SETs.

The output stage of the CPQ FF is driven by the storage node A and the redundant node C, which holds the same logic value. As a result, one SET strike at one of these nodes cannot propagate to output (Q). If an inverter is used as an output stage, an SET striking a node can easily latch to the output; the C-element serves this purpose efficiently [15]. However, a four-transistor C-element at output increases power and area and degrades the performance of the FF. The proposed output stage can efficiently mask the SET without any performance and power penalty. For example, if an SET changes the logic value at node C,  $0 \rightarrow 1$ , it will only turn off MOSFET M16, which will recover soon due to the soft error robust characteristics of the Quatro latch. The only problem arises when node C changes  $1 \rightarrow 0$  (or A changes  $0 \rightarrow 1$ ); in that case, the M16 and M17 both are on for an SET period. Our simulation suggests that with general CMOS sizing, for injecting sufficient charge (by mimicking

an SET), the Q node charges only up to 175 mV (or drops to 800 mV), which is not enough to turn "ON" ("OFF") an NMOS (PMOS) driving the next stage.

#### 3.2 Proposed $C^2$ -DICE Energy Recovery Flip-Flop

The proposed CPQ FF requires 3 extra transistors compared to an MSD FF. In addition, the internal node "a" charges-discharges at every clock cycle, when the input data is low for several cycles. As a result, the CPQ flip-flop consumes high power at low data activity and will be discussed with the results in Section 4. In order to reduce the number of transistors and the latter issue in the CPQ flip-flop, we proposed the  $C^2$ -DICE flip-flop, as shown in Figure 3a [12]. The FF has a traditional CMOS ( $C^2$ MOS-logic) input transfer unit and uses DICE latch as the storage cell. When the Clk = "0", the first stage of the register stage is active. This unit acts as a C-element-type inverter by passing the inverted version of data to node X, leaving the second stage in hold mode. When the Clk = "1", the first stage is "OFF" (M2-M3 are "OFF") and the second stage of the register unit is "ON". The inverted-version of the data value stored in node X propagates to the output node through the second stage, which acts as a C-element. The overall circuit operation resembles a master-slave D FF. The simulation waveforms of the proposed SEU-immune  $C^2$ -DICE FF confirm a successful write at the DICE cell on the rising edge of a sinusoidal Clk signal as shown in Figure 3b.

Similar to a master-slave D FF, the  $C^2$ -DICE FF has a positive setup time. Unlike CPQ FF, it triggers exactly at the positive or rising edge of the sinusoidal Clk signal. The writing strategy of the proposed  $C^2$ -DICE FF is similar to our previous CPQ architecture presented in Section 3.1. An passgate NMOS transistor M9 works in conjunction with the input register stage to access the two internal nodes of the DICE latch. In order to reduce the  $t_{C-Q}$  delay of the FF, the sizing of the M5, M6 and M9 to be large enough to quickly overpower NMOS M15; or the sizing of the M7, M8 and M9 to be large enough to quickly overpower PMOS M14. In addition, we improve the  $t_{C-Q}$  delay of the  $C^2$ -DICE FF by making M13 and M17 slightly larger than M11 and M15.

## 3.3 Proposed TSPC Energy Recovery Flip-Flop

Figure 4a shows the proposed true single phase clock energy recovery (TSPCER) soft error robust FF [12]. The FF has a improved TSPC register transfer unit, a single node SEU hardened Quatro cell, and a two input output stage inverter. When the Clk = "0", the register stage is at precharge-phase. At precharge-phase, node "X" precharged/discharged to complement of data and node "Y" precharged to "1". At the rising edge of Clk, the transfer unit at evaluation-phase. At the beginning of the evaluation-phase, if data = "1", node "X" =



(a) The proposed energy recovery  $C^2$ -DICE FF uses 0-0 and 1-1 overlap of Clk and Clkb signals to sample and write data to the DICE latch, respectively.



(b) Simulation waveforms confirm a successful write at the DICE cell on the rising edge of a sinusoidal Clk signal.

Fig. 3: The Proposed  $C^2$ -DICE FF and simulation results.

"0", node "Y" = "1", at the rising edge of the Clk signal, node "B" pulled down to low. At the same time M11 helps the transfer unit to write "0" at node "D" as shown in Figure 4b. As a consequence nodes "A" and "C" stores

"1". The output inverter driven by nodes (B, D) = "0", resulting high output. On the other hand, at the beginning of the evaluation-phase, if data = "0", node "X" = "1", node "Y" = "1", at the rising edge of the Clk signal, node "Y" pulled down to low and node "B" pulled up to "1". At the same time M11 helps the transfer unit to write "1" at node "D" as shown in Figure 4b. As a consequence nodes "A" and "C" stores "0". The output inverter driven by nodes (B, D) = "1", resulting low output.

The problem associated with the conventional TSPC register is that when the data = "0" and at low data rate, glitches may appear at the register output node on every clock cycle. In order to tackle this issue, we introduce an extra NMOS (M9) that isolates the register output stage and completely eliminates glitches from the output (Q) as shown in Figure 4b. In order to reliably write "1" into the Quatro cell, the sizing of M7 and M11 must be large enough to overpower M19. In order to reliably write "0" into the Quatro cell, the sizing of M9, M10, and M11 must be large enough to overpower M18.

## 4 Power-Performance and Area Analysis of the Proposed Flip-Flops

## 4.1 Simulation Setup

We have designed and laid out the proposed CPQ,  $C^2$ -DICE, TSPCER FFs along with a traditional master-slave D (MSD) FF, a master-slave DICE (MS-DICE) FF without preset and clear [32], a PDICE FF [19], a TSPC-DICE FF [15], a PGDICE [28], a MSQuatro [20], and a SCCER FF [21] in a commercial 65nm CMOS technology. The performance of the FFs is extracted using post-layout simulation considering a wide frequency band from 1 GHz to 5 GHz and a supply voltage of 1 V.

#### 4.2 Flip-Flops Area

Table 1 shows the layout areas of these FFs. Clearly, the proposed FFs require less area than the DICE-based FFs. Moreover, the proposed  $C^2$ -DICE FF consume less area than the other proposed FFs. In particular, the  $C^2$ -DICE FF consumes 42% less area compared to the PGDICE FF. Hence, the proposed  $C^2$ -DICE FF is more suitable than the other proposed FF for a design where we have stringent area budget.

#### 4.3 Flip-Flops Performance

The  $t_{C-Q}$  delays of the FFs are measured under relaxed timing conditions, which implies that data settles sufficiently before the arrival of the clock edge. Figure 5(a), Figure 5(b), and Figure 5(c) show the distribution of the  $t_{C-Q}$ 



(a) The proposed TSPC energy recovery FF uses one extra transistor M9 with the traditional TSPC register to tackle unnecessary charging-discharging of node "B" and completely eliminate glitches from the output.



(b) Simulation waveforms confirm a successful write at the Quatro cell on the rising edge of a sinusoidal Clk signal.

Fig. 4: The Proposed TSPC energy recovery FF and simulation results.

delays of the CPQ,  $C^2$ -DICE, and TSPCER FF, respectively, in 2000 Monte-Carlo simulation considering process-variation and mismatch conditions at 27° C, with four minimum sized inverters as load at 5 GHz. We measure the setup time  $(t_{su})$  and the hold time  $(t_h)$  of the FFs using conventional methodology. We consider  $t_{su}$  as the point where  $t_{C-Q}$  is  $1.2 \times$  than the nominal  $t_{C-Q}$ . In the simulation, we move the clock rising edge closer to the data transition edge



Fig. 5: The robustness of the proposed (a) CPQ, (b)  $C^2$ -DICE, and (c) TSPCER FFs are demonstrated through Monte-Carlo simulations considering process variations and mismatch.

until the C-Q delay reaches  $1.2t_{C-Q}$ . Similarly, we extract the  $t_h$  of the FF by moving the data edge closer to the clock edge from the opposite direction.

| Types of FF    | # of transistors | Layout area $(\mu m^2)$ | D-Q delay (ps) |  |
|----------------|------------------|-------------------------|----------------|--|
| MSD            | 22               | 12.75                   | 48.7           |  |
| MSDICE [32]    | 36               | 23.09                   | 85.7           |  |
| PDICE [19]     | 32               | 18.83                   | 79.0           |  |
| TSPC-DICE [15] | 22               | 18.62                   | 81.9           |  |
| SCCER [21] 17  |                  | 19.32                   | 50.6           |  |
| PGDICE [28]    | PGDICE [28] 42   |                         | 47.0           |  |
| MSQuatro [20]  | 46               | 31.88                   | 81.1           |  |
| CPQ            | 25               | 18.51                   | 34.1           |  |
| $C^2$ -DICE    | $C^2$ -DICE 21   |                         | 88.8           |  |
| TSPCER         | 21               | 16.49                   | 38.6           |  |

Table 1: The proposed TSPCER consumes 23% less area and is 55% faster compared to the MSDICE FF.

The  $t_{su}$  for the CPQ,  $C^2$ -DICE, and TSPCER FF are -1 ps, 25 ps, and -3.3 ps, respectively. The  $t_h$  for the CPQ,  $C^2$ -DICE, and TSPCER FF are 3 ps, 34 ps, and 38 ps, respectively. We compute the data-to-output delay  $(t_{D-Q})$  by using the  $t_{C-Q}$  and  $t_{su}$ , which is simply the summation of nominal  $t_{C-Q}$  and  $t_{su}$ . The maximum  $t_{D-Q}$  delay of the FFs is measured by taking the average for both 0-to-1 and 1-to-0 data transitions (see Table 1). The proposed CPQ FF has less  $t_{D-Q}$  than the other proposed FFs. As a result, the CPQ FF is more suitable than the other proposed FF for a design where we have stringent timing budget.

The slew-rate of a sinusoidal clock signal can vary with frequency. As a result, the  $t_{C-Q}$  delays of the sinusoidal clock FFs are different at different frequencies. The variations of  $t_{C-Q}$  delay of the proposed FFs along with a high-performance SCCER FF are shown in Figure 6. Clearly, the  $t_{C-Q}$  delays of the proposed FFs decrease with the increase of input sinusoidal clock signal frequencies.

## 4.4 Flip-Flops Power Consumption

In order to measure the total power consumption of the FFs, we determine the input clock capacitive loading  $(P_{Clk})$ , input data loading  $(P_{Data})$ , and the FFs internal power  $(P_{Int})$  at different levels of data activity. Figure 7 shows the test bench for this experiments. The total power  $(P_T)$  consumption of each FF ((listed in Table 2) is then calculated by adding the individual components  $(P_T = P_{Clk} + P_{Data} + P_{Int})$ . In this experiment, we considered a clock frequency of 5 GHz. It is clear from the Table 2 that the proposed FFs consume less power than the competing DICE-based FFs. In addition, the proposed FFs consume less power than the low-power SCCER FF. Moreover, at low data rate, the proposed the  $C^2$ -DICE FF consumes comparable power to the traditional MSD FF, while the proposed TSPCER consumes less power than the MSD FF. At high data switching activity, the proposed CPQ FF and  $C^2$ -DICE FF consume similar power. However, the rate of reduction of power consumption proportional to data activity of the  $C^2$ -DICE FF is much higher than the CPQ FF as shown in Figure 8. At 12.5% data switching activity, the proposed  $C^2$ -DICE FF consumes 56% lower power compared to the recently reported MSQuatro FF. At low data switching activity (12.5%-25%), the proposed TSPCER FF consumes less power than the rest of the competing FFs. Moreover, at 100% data switching activity, the TSPCER FF consumes 31% and 24% lower power than the TSPC DICE and SCCER FF. respectively. The proposed TSPCER FF consumes 71% to 88% lower power than the recently reported PGDICE [28] FF at 100% to 12.5% data activity. At all data activity, the proposed TSPCER FF consumes less power than the other proposed FFs, makes it more suitable for low-power operation. Due to the large area and power overhead, we eliminate the PGDICE and MSQuatro FF from rest of the analysis.



Fig. 6: The  $t_{C-Q}$  delays of the proposed SEU-immune energy-recovery FFs decrease with the increase of input sinusoidal clock signal frequencies.



Fig. 7: We considered FF internal power, input Clk driver power, and input Data driver power to compute the total power consumption of an FF.

Table 2: The proposed SEU-hardened energy recovery FFs consume 71% to 88% lower power than the existing SEU-hardened and energy recovery FFs at 100% to 12.5% data activity (All the power in  $\mu W$ ).

| Data<br>activity | MSD   | MS-<br>DICE [32] | PDICE [19] | TSPC-<br>DICE [15] | SCCER [21] | PG-<br>DICE [28] | MS-<br>Quatro [20] | CPQ  | $C^{2}$ -<br>DICE | TSPCER |
|------------------|-------|------------------|------------|--------------------|------------|------------------|--------------------|------|-------------------|--------|
| 100%             | 50.15 | 153.0            | 86.1       | 94.0               | 116.8      | 222.5            | 147.7              | 84.2 | 77.9              | 65.0   |
| 50%              | 34.30 | 84.3             | 67.6       | 66.8               | 71.4       | 189.8            | 99.2               | 56.3 | 47.0              | 38.1   |
| 25%              | 26.35 | 51.3             | 57.8       | 53.1               | 50.5       | 167.3            | 71.9               | 42.0 | 31.0              | 24.5   |
| 12.5%            | 22.40 | 38.6             | 45.2       | 41.1               | 37.0       | 142.1            | 52.1               | 36.5 | 22.9              | 17.7   |

#### 4.5 Flip-Flops Power-Delay Product

The power and  $t_{D-Q}$  delay product (PDP) of the FFs at 25% data activity is shown in Figure 9. Clearly, the PDP of the proposed CPQ FF is much lower than that of existing energy recovery and DICE-based FFs. In addition, the PDP of the proposed CPQ FF is even comparable to the MSD FF at 25% data activity. At 100% data activity, the proposed CPQ FF exhibits 63% lower PDP than the TSPC-DICE and 34% lower PDP than the SCCER FF. At 25% data activity, the energy recovery  $C^2$ -DICE FF exhibits 47% lower PDP than the TSPC-DICE FF and 44% lower PDP than the PDICE FF. The proposed TSPCER FF exhibits lower PDP compared to all of the competing FFs, from 50% to lower data switching activity. At low 12.5% data rate, the TSPCER exhibits 82% lower PDP than the competing TSPC-DICE FF and 57% lower PDP than the SCCER FF.

## **5** Soft Error Tolerance of Proposed Flip-Flops

We utilized HSPICE simulation to verify the SEU-robustness of the proposed FFs. For this experiments, we inject an exponential current pulse with peak current varied from 35  $\mu A$  to 95  $\mu A$ , depending on the node capacitance, at a test node to mimic a radiation-induced SET. The current pulse has damping factor 1 and damping factor 2 as 5 ps and 50 ps, respectively. According to our analysis, all nodes (A, B, C, and D of Figure 2a and Figure 4a or X0, X1, X3, and X4 of Figure 3a) are capable of recovering from (1-to-0) or (0-to-1) SETs. Figure 10(a) illustrates such recoveries of the TSPCER FF when a particle strike at node "C" and node "D" for SETs at two different time instants. In addition, the proposed two-input inverter output stage filters the single-node SETs, keeping the output (Q) unaltered. Due to the structure of the Quatro latch, it is able to recover from  $0 \rightarrow 1$  SETs at nodes "A" and "D". However, the pair of nodes ("B" or "C") has the potential to corrupt the Quatro latch for a sufficiently large 0-to-1 SET. On the other hand a 1-to-0 SET at node 'A" or "D" has the potential to alter the data. However, the critical charge  $(Q_{cri})$ , i.e., the amount of charge required to cause a data corruption for such an instance, is very large, implying a very strong particle-strike is required to cause the upset [16]. For example, the Quatro cell-based 10T SRAM cell exhibits 98% lower SER than the conventional 6T cell in accelerated neutron radiation tests on a 32-kb SRAM [16]. Similarly, we investigate the soft error immunity of the TSPCER FF for single node SETs at two different time instants.



Fig. 8: The proposed SEU hardened energy recovery FFs save proportionally more power with the decrease of data activity.



Fig. 9: At 25% data rate, the SEU-immune  $C^2$ -DICE FF exhibits 47% lower PDP than the TSPC-DICE FF and 44% lower PDP than the PDICE FF.

After having convincing results of the proposed FFs against single-node SETs, we test their robustness for multiple node soft errors by simultaneously injecting exponential current at the two storage nodes of the FFs. Figure 10(b) is evidence that the proposed FF is robust against dual-node SETs. In general, at ground-level, the radius of charge-sharing due to a particle strike in silicon is about a few microns [2]. As a consequence, two neighboring nodes of an SEU-hardened (DICE or Quatro) latch can conceivably share the induced charge and corrupt the stored data. However, it is possible to identify the critical nodes of DICE cell as a function of its driving transistors and the circuit architecture [29]. The fundamental idea behind that is that it is possible to corrupt DICE data by charge sharing of similar potential nodes. We compute the  $Q_{cri}$  of different nodes of DICE and Quatro cell by injecting exponential current at multiple nodes that resembles a charge sharing scenario. Figure 11 shows a single node SEU-hardened DICE and Quatro cell.

We list the  $Q_{cri}$  considering a variety of node combinations in Table 3. It is apparent from the Table 3, for charge sharing between two similar potential storing nodes (e.g., "A" and "C" or "B" and "D" in a Quatro latch and  $X_0$ and  $X_2$  or  $X_1$  and  $X_3$  in a DICE), the Quatro latch and the DICE latch has similar sensitivity to an SET. In the case of two opposite logic storing nodes (e.g., "C" and "D" or "B" and "C" in a Quatro latch and  $X_2$  and  $X_3$  or  $X_1$ and  $X_2$  in a DICE), the Quatro cell has better critical charge compared to the DICE, which indicates a lower SER. However, when we consider two opposite potential nodes ("B", "C" in Quatro) or ("X1", "X2" in DICE), Quatro cell has  $Q_{cri}$ , also implies that we can differentially write into a Quatro cell while this characteristic is absent in DICE. As a result, when we consider two opposite node's charge sharing, the DICE cell is more attractive and robust compared to a Quatro cell. Due to the technology scaling, both the node capacitance and the supply voltage are reduced, resulting in lower  $Q_{cri}$ . However, in addition to lower leakage current, the Quatro cell exhibits a higher read static noise margin when used as a static random access memory (SRAM) storage cell



Fig. 10: The proposed TSPCER exhibits excellent soft error resiliency against both (a) when single-node affected in a single strike, and (b) multiple-node affected in a single strike.

compared to a DICE cell at below 0.45V supply voltage, making the Quatro cell more attractive in future technology nodes [16].

The proposed FFs use an equalizing transistor to write data at the twostorage node of the Quatro/DICE cell. When the Clk is high in Figure 2a(a), it can reduce the  $Q_{cri}$  of nodes ("B", "D"). According to our analysis, the  $Q_{cri}$  can drop from 5.71 fC to 5.42 fC.



Fig. 11: The Traditional SEU hardened flip-flops are based on (a) a DICE cell, or (b) a Quatro cell.

| Stored data | SET inje     | ected nodes | $Q_{cri}$ (fC) |               |  |
|-------------|--------------|-------------|----------------|---------------|--|
| Stored data | DICE [4]     | Quatro [16] | DICE [4]       | Quatro [16]   |  |
| (0, 0)      | $(X_0, X_2)$ | (A, C)      | (5.73, 5.73)   | (5.71, 5.71)  |  |
| (1, 1)      | $(X_0, X_2)$ | (A, C)      | (3.14, 3.14)   | (3.15, 3.15)  |  |
| (0, 0)      | $(X_1, X_3)$ | (B,D)       | (5.73, 5.73)   | (5.71, 5.71)  |  |
| (1, 1)      | $(X_1, X_3)$ | (B,D)       | (3.14, 3.14)   | (3.15, 3.15)  |  |
| (1, 0)      | $(X_2, X_3)$ | (C, D)      | No flip        | No flip       |  |
| (0, 1)      | $(X_2, X_3)$ | (C, D)      | (2.22, 2.22)   | (3.96,  3.96) |  |
| (1, 0)      | $(X_1, X_2)$ | (B,C)       | No flip        | (2.64, 2.64)  |  |
| (0, 1)      | $(X_1, X_2)$ | (B,C)       | (2.22, 2.22)   | (2.64, 2.64)  |  |

Table 3: The Quatro and DICE latches have similar critical charges for dualnode SET.

## 6 Energy Recovery Clocking Scheme

In order to validate the power-efficiency of the proposed energy recovery FFs, we implemented a six-level H-tree CDN in a  $1mm \times 1mm$  chip area. We evenly distributed the FFs grouped into registers of 16 FFs and clocking them by a single-phase sinusoidal clock from the clock root. We constructed the CDN using metal-5 layer, which has the lowest parasitic capacitance to substrate for the corresponding technology. For this experiments, we consider a common data input for all the energy recovery FFs. The H-tree CDN's number of wires at each level, wire width, and wire length is shown in Table 4.

We modeled the wire resistance and capacitance using the technology parameters. We utilized the traditional 3-segment  $\pi$ -type resistance-capacitance (RC) model for each wire of the H-tree and then joined together to build a distributed RC model of the CDN [33]. Figure 1(b) shows the resonant energy recovery clocking scheme that drives the root of clock tree (node "Clk") and each final node of the H-tree is connected to the 16-bit registers. The energy recovery clock generator is a single-tone resonant clock generator [5, 21], the frequency of oscillation of which is given by:

$$f_c = \frac{1}{2\pi} \sqrt{\frac{1}{L_{C+C_{-d}}^{C \times C_{-d}}}} \tag{1}$$

where C is the total CDN capacitance, including all FFs input gate capacitances; C\_d is the input decoupling capacitance [8]; and L is the lumped on-chip

| Wire name | # of wires | Width $(\mu m)$ | Length $(\mu m)$ |
|-----------|------------|-----------------|------------------|
| $L_1$     | 1          | 12.8            | 560              |
| $L_2$     | 2          | 6.4             | 560              |
| $L_3$     | 4          | 3.2             | 280              |
| $L_4$     | 8          | 1.6             | 280              |
| $L_5$     | 16         | 0.8             | 140              |
| $L_6$     | 32         | 0.4             | 140              |

| Table 4: We used 6- | level tapered | wire sizing to | reduce overall | CDN power | [26]. |
|---------------------|---------------|----------------|----------------|-----------|-------|
|---------------------|---------------|----------------|----------------|-----------|-------|

inductor. In order to sustain the oscillation, the clock generator needs to compensate for the resistive loss in the CDN. The resistive loss is compensated by pulling up the Clk node to  $V_{DD}$  when the Clk signal reaches to maximum using equal PMOS (M9 - M16) transistors or pulling down the clock signal to the ground by the equal-sized NMOS transistors (M1 - M8) when the clock signal reaches its minimum as shown in Figure 1(b). In each case, the eight parallel transistors enable the system to oscillate at high frequency. We drive the resonant clock driver using  $180^{\circ}$  out-of-phase gate-control signals using a simple NAND-NOR driver, generated from a single clock (Clk\_In) to eliminate the short-circuit power. If the 2 reference signals overlap due to variation, it may experience short-circuit current. However, the short-circuit power is considered as much smaller than the total dynamic power of the clock network. We can control the magnitude of the generated resonant Clk signal by changing the pulse width of the reference signals (ref1 and ref2 of Figure 1(b)) and the sizing of the driver transistors (M1-M16). The simulation waveforms of the generated sinusoidal signals (Clk) and final output clock signals for the registers (Clk1-to-Clk64) is shown in Figure 12. In order to achieve different frequencies ranging from 1 GHz - 5 GHz, we varied inductor values using Equation 1.

The resonant energy recovery clock can be employed at the global level with local traditional square-wave clock buffers [5] or at both global and local levels without clock tree buffers [21]. In order to improve the power-efficiency, we designed our FFs for the latter scheme. In order to have a fair comparison with the square wave clocking, we distributed three square-wave FFs in the same clock network. Figure 1(a) shows the buffered square-wave clocking scheme. Since our proposed FFs are SEU-immune, we choose two existing DICE-based FFs (PDICE and TSPC-DICE) with conventional MSD FF as reference. We



Fig. 12: Simulation waveform confirms the required phase and amplitude of the generated sinusoidal signals (Clk) and final output clock signals for the registers (Clk1-to-Clk64).

utilized a synthesized buffered square-wave clocking with a worst case slew rate of 10% of the clock period. We used 84 inverters for the square-wave clocking. We performed HSPICE simulation on the whole system that includes clocktree parasitics, buffers, and FFs considering clock frequencies ranging from 1 GHz to 5GHz at different data switching activities. The results of this experiment is shown in Figure 13 which plots the total system power against data switching activity and clock frequency for the systems with different FFs. According to this analysis, the proposed FF systems consume lower power compared to all the competing FFs system at all data switching activity and frequency. At 5 GHz and 2.5 GHz clock frequencies, the TSPC-DICE system has the highest power consumption at high data switching activity; however, at low data switching activity, the PDICE system has highest power consumption. At 1GHz clock frequency, the TSPC-DICE system has the highest power consumption at high data switching activity and at very low data switching activity its power consumption is comparable to PDICE system. Among all frequencies and data switching activities, the proposed TSPCER FF system exhibits the lowest power consumption compared to all of the competing FF systems.

The total power consumption breakdown of the systems with different FFs at different data rate and at 5 GHz, 2.5 GHz, and 1 GHz clock frequency is shown in Table 5. The total power is split down into two components: CDN power, including clock generator power, and FFs power. As compared to the TSPC-DICE system, the proposed  $C^2$ -DICE system consumes 47% lower power at 50% data switching activity, and the proposed TSPCER system consumes 56% lower power at the same data switching activity, both at a clock frequency of 5 GHz. In addition, at 25% and 12.5% data switching activity, the proposed  $C^2$ -DICE system exhibits, respectively, 57% and 60% lower power saving compared to the PDICE system at 5 GHz frequency (see Figure 14). At 2.5 GHz clock frequency and 50% data switching activity, the  $C^2$ -DICE system and the TSPCER system show 42% and 59% power saving, as compared to

the TSPC-DICE system. As compared to the PDICE system, the  $C^2$ -DICE system consumes 53% and 61% lower power, respectively, at 25% and 12.5% data switching activity. When compared to the TSPC-DICE system, at 1 GHz clock rate, the proposed  $C^2$ -DICE consumes as much as 42% lower power at 12.5% data switching activity, and the TSPCER system consumes 67% lower power at this particular data rate. In addition, proposed resonant CDN saves up to 93% power compared to the traditional buffered square-wave CDN.

## 7 Conclusion

We have introduced three-ultra-low power soft error robust flip-flops based on Quatro and DICE latches capable of working with an energy recovery sinusoidal clock. However, the FFs are also capable of working with a traditional square-wave clock. We have introduced a two-input inverter output stage to mask the single node SET propagations to the output. The proposed CPQ FF exhibits up to 63% lower PDP than the existing soft error robust PDICE and 34% lower PDP than the existing energy recovery SCCER FF. At 25% data rate, the proposed  $C^2$ -DICE FF exhibits 47% lower PDP than the TSPC-DICE and 44% lower PDP than the PDICE. The proposed TSPCER FF has exhibited tremendous results in terms of power, delay, and area. The TSPCER FF shows as much as 82% lower PDP compared to the TSPC-DICE FF and 57% lower PDP than the energy recovery SCCER FF. In addition, at 5 GHz



Fig. 13: Among all frequencies and data switching activities, the proposed TSPCER FF system exhibits the lowest power consumption compared to all of the competing FF systems.



Fig. 14: At high 5 GHz clock frequency, the TSPCER system shows 67% power saving compared to the PDICE system.

Table 5: The proposed TSPCER systems save up to 67%, 74%, and 68% power compared to the PDICE systems at 5 GHz, 2.5 GHz, and 1 GHz clock frequency, respectively.

| Fre.  | Types of<br>FFs | CDN power<br>(mW) | mW at 50% data activity |             | mW at 25% data activity |             | mW at $12.5\%$ data activity |             |
|-------|-----------------|-------------------|-------------------------|-------------|-------------------------|-------------|------------------------------|-------------|
| (GHz) |                 |                   | FFs power               | Total power | FFs power               | Total power | FFs power                    | Total power |
| 5     | MSD             | 42.1              | 35.10                   | 77.2        | 27.0                    | 69.1        | 22.9                         | 65.0        |
|       | PDICE           | 42.1              | 69.2                    | 111.3       | 59.2                    | 101.3       | 46.3                         | 88.4        |
|       | TSPC-DICE       | 45.4              | 68.4                    | 113.8       | 54.4                    | 99.8        | 42.1                         | 87.5        |
|       | CPQ             | 11.95             | 57.7                    | 69.6        | 43.0                    | 55.0        | 37.4                         | 49.4        |
|       | $C^2$ -DICE     | 12.2              | 48.1                    | 60.3        | 31.7                    | 43.9        | 23.4                         | 35.6        |
|       | TSPCER          | 11.0              | 39.0                    | 50.0        | 25.1                    | 36.1        | 18.1                         | 29.1        |
| 2.5   | MSD             | 22.75             | 17.7                    | 40.5        | 13.9                    | 36.7        | 11.8                         | 34.6        |
|       | PDICE           | 22.75             | 34.8                    | 57.6        | 29.7                    | 52.5        | 27.1                         | 49.9        |
|       | TSPC-DICE       | 24.6              | 34.7                    | 59.3        | 27.6                    | 52.2        | 22.6                         | 47.2        |
|       | CPQ             | 3.1               | 38.8                    | 41.9        | 28.7                    | 31.8        | 24.2                         | 27.3        |
|       | $C^2$ -DICE     | 3.0               | 31.0                    | 34.0        | 21.7                    | 24.7        | 16.7                         | 19.7        |
|       | TSPCER          | 2.80              | 21.8                    | 24.6        | 14.0                    | 16.8        | 10.1                         | 12.9        |
|       | MSD             | 9.2               | 7.1                     | 16.3        | 5.6                     | 14.8        | 4.8                          | 14.0        |
| 1     | PDICE           | 9.2               | 14.1                    | 23.3        | 12.1                    | 21.3        | 11.1                         | 20.3        |
|       | TSPC-DICE       | 9.9               | 14.3                    | 24.2        | 12.7                    | 22.6        | 10.4                         | 20.3        |
|       | CPQ             | 1.0               | 20.2                    | 21.2        | 15.3                    | 16.3        | 12.8                         | 13.8        |
|       | $C^2$ -DICE     | 0.7               | 17.7                    | 18.4        | 12.9                    | 13.6        | 11.1                         | 11.8        |
|       | TSPCER          | 0.7               | 12.1                    | 12.8        | 7.9                     | 8.6         | 5.8                          | 6.5         |

clock frequency, when integrated with an energy recovery CDN, the proposed TSPCER system consumes 67% and 55% lower power than the PDICE and

MSD FF-based systems, respectively. The results demonstrate the feasibility and effectiveness of the high-frequency resonant clocking scheme.

# References

- Baumann R (2005) Soft errors in advanced computer systems. IEEE Design & Test of Computers 22(3):258–266
- Baumann RC (2005) Radiation-induced soft errors in advanced semiconductor technologies. IEEE Transactions on Device and Materials Reliability 5(3):305–316
- Bezzam I, Mathiazhagan C, Raja T, Krishnan S (2015) An energyrecovering reconfigurable series resonant clocking scheme for wide frequency operation. IEEE Transactions on Circuits and Systems 62(7):1766– 1775
- Calin T, Nicolaidis M, Velazco R (1996) Upset hardened memory design for submicron CMOS technology. IEEE Transactions on Nuclear Science 43(6):2874–2878
- Chan SC, Restle PJ, Bucelot TJ, Liberty JS, Weitzel S, Keaty JM, Flachs B, Volant R, Kapusta P, Zimmerman JS (2009) A resonant global clock distribution for the cell broadband engine processor. IEEE Journal of Solid-State Circuits 44(1):64–72
- 6. Chen RM, Diggins ZJ, Mahatme NN, Wang L, Zhang EX, Chen YP, Zhang H, Liu YN, Narasimham B, Witulski AF, Bhuva BL, Fleetwood DM (2017) Effects of temperature and supply voltage on SEU- and SETinduced errors in bulk 40-nm sequential circuits. IEEE Transactions on Nuclear Science 64(8):2122–2128
- Devarapalli SV, Zarkesh-Ha P, Suddarth SC (2010) Seu-hardened dual data rate flip-flop using c-elements. In: Proc. International Symposium on Defect and Fault Tolerance in VLSI Systems, pp 167–171
- Esmaeili SE, j Al-Khalili A, Cowan GER (2010) Dual-edge triggered sense amplifier flip-flop for resonant clock distribution networks. IET Computers Digital Techniques 4(6):499–514
- Fuketa H, Nomura M, Takamiya M, Sakurai T (2014) Intermittent resonant clocking enabling power reduction at any clock frequency for near/sub-threshold logic circuits. IEEE Journal of Solid-State Circuits 49(2):536–544
- 10. Glorieux M, Clerc S, Gasiot G, Autran JL, Roche P (2013) New d-flipflop design in 65nm CMOS for improved SEU and low power overhead at system level. IEEE Transactions on Nuclear Science 60(6):4381–4386
- Hifumi M, Maruoka H, Umehara S, Yamada K, Furuta J, Kobayashi K (2017) Influence of layout structures to soft errors caused by higher-energy particles on 28/65 nm FDSOI flip-flops. In: Proc. International Reliability Physics Symposium, pp SE–5.1–SE–5.4
- 12. Islam R (2011) High-speed energy-efficient soft error tolerant flip-flops. MASc Thesis, Concordia University, Montreal, Canada

- Islam R (2012) A highly reliable seu hardened latch and high performance SEU hardened flip-flop. In: Proc. International Symposium on Quality Electronic Design, pp 347–352
- Islam R, Esmaeili SE, Islam T (2011) A high performance clock precharge SEU hardened flip-flop pp 574–577
- Jahinuzzaman SM, Islam R (2010) TSPC-DICE: A single phase clock high performance SEU hardened flip-flop. In: Proc. International Midwest Symposium on Circuits and Systems, pp 73–76
- Jahinuzzaman SM, Rennie DJ, Sachdev M (2009) A soft error tolerant 10T SRAM bit-cell with differential read capability. IEEE Transactions on Nuclear Science 56(6):3768–3773
- 17. JEDEC (2006) JESD89A: Measurement and reporting of alpha particle and terrestrial cosmic ray-induced. http://www.jedec.org
- Jiang H, Zhang H, Kauppila JS, Massengill LW, Bhuva BL (2018) An empirical model for predicting SE cross section for combinational logic circuits in advanced technologies. IEEE Transactions on Nuclear Science 65(1):304–310
- Krueger D, Francom E, Langsdorf J (2008) Circuit design for voltage scaling and SER immunity on a quad-core itanium processor. In: Proc. International Solid-State Circuits Conference, pp 94–95
- 20. Li YQ, Wang HB, Liu R, Chen L, Nofal I, Shi ST, He AL, Guo G, Baeg SH, Wen SJ, Wong R, Chen M, Wu Q (2017) A quatro-based 65-nm flipflop circuit for soft-error resilience. IEEE Transactions on Nuclear Science 64(6):1554–1561
- Mahmoodi H, Tirumalashetty V, Cooke M, Roy K (2009) Ultra low-power clocking scheme using energy recovery and clock gating. IEEE Transactions on Very Large Scale Integration Systems 17(1):33–44
- 22. Naseer R, Draper J (2006) DF-DICE: a scalable solution for soft error tolerant circuit design. In: Proc. International Symposium on Circuits and Systems, pp 4 pp.–
- 23. Nsengiyumva P, Ball DR, Kauppila JS, Tam N, McCurdy M, Holman WT, Alles ML, Bhuva BL, Massengill LW (2016) A comparison of the SEU response of planar and FinFET D flip-flops at advanced technology nodes. IEEE Transactions on Nuclear Science 63(1):266–272
- 24. Omana M, Rossi D, Metra C (2007) Latch susceptibility to transient faults and new hardening approach. IEEE Transactions on Computers 56(9):1255–1268
- Rennie D, Li D, Sachdev M, Bhuva BL, Jagannathan S, Wen S, Wong R (2012) Performance, metastability, and soft-error robustness trade-offs for flip-flops in 40 nm CMOS. IEEE Transactions on Circuits and Systems 59(8):1626–1634
- 26. Rosenfeld J, Friedman EG (2007) Design methodology for global resonant H-tree clock distribution networks. IEEE Transactions on Very Large Scale Integration Systems 15(2):135–148
- Sathe VS, Arekapudi S, Ishii A, Ouyang C, Papaefthymiou MC, Naffziger S (2013) Resonant-clock design for a power-efficient, high-volume x86-64

microprocessor. IEEE Journal of Solid-State Circuits 48(1):140-149

- Shah JS, Sachdev M (2016) Radiation hardened pulsed-latches in 65-nm CMOS. In: Proc. Canadian Conference on Electrical and Computer Engineering, pp 1–4
- Sheshadri VB, Bhuva BL, Reed RA, Weller RA, Mendenhall MH, Schrimpf RD, Warren KM, Sierawski BD, Wen SJ, Wong R (2010) Effects of multinode charge collection in flip-flop designs at advanced technology nodes. In: Proc. IEEE International Reliability Physics Symposium, pp 1026– 1030
- Tirumalashetty V, Mahmoodi H (2007) Clock gating and negative edge triggering for energy recovery clock. In: Proc. International Symposium on Circuits and Systems, pp 1141–1144
- 31. Wang HB, Kauppila JS, Lilja K, Bounasser M, Chen L, Newton M, Li YQ, Liu R, Bhuva BL, Wen SJ, Wong R, Fung R, Baeg S, Massengill LW (2017) Evaluation of SEU performance of 28-nm FDSOI flip-flop designs. IEEE Transactions on Nuclear Science 64(1):367–373
- Wang W, Gong H (2004) Edge triggered pulse latch design with delayed latching edge for radiation hardened application. IEEE Transactions on Nuclear Science 51(6):3626–3630
- 33. Weste NHE, Harris DM (2004) CMOS VLSI design: A circuits and systems perspective. third edition. Pearson Addision-Wesley