# Low-Power Design of Sequential Circuits Using a Quasi-Synchronous Derived Clock<sup>\*</sup>

Xunwei Wu, Jian Wei

Institute of Circuits and Systems Ningbo University Ningbo, Zhejiang 315211, CHINA Tel: +86-574-760-5785 Fax: +86-574-760-4591 email: {xunweiwu, weijian}@mail.hz.zj.cn

Abstract – This paper presents a novel circuit design technique to reduce the power dissipation in sequential circuits by generating a quasi-synchronous derived clock from the master clock and using it to isolate the flip flops in the circuit from the unwanted triggering action of the master clock. An example design of a decimal counter demonstrates the large power saving and improved performance of the resulting circuit.

### I. INTRODUCTION

In the past, the major concerns of the VLSI designer were area, performance, and cost; power consumption considerations were mostly of secondary concern. In recent years, however, this trend has begun to change and, increasingly, power consumption is being given comparable weight to area and speed in VLSI design [1]. One reason is that the continuing increase in the chip scale integration and the operating frequency has made power consumption a major design issue in VLSI circuits. The excessive power dissipation in integrated circuits not only discourages their use in a portable device, but also causes overheating, which degrades performance and reduces the circuit lifetime. All of these factors drive designers to devote significant resources to reduce the circuit power dissipation. Indeed, the Semiconductor Industry Association has identified low-power design as a critical technological direction [2].

In CMOS circuits, the dominant term of power dissipation is that which is required to charge or discharge the capacitors in the circuit. The power dissipation of a node in the circuit is expressed by the following equation:

#### $P = 0.5C_L \cdot V_{DD}^2 \cdot f_{CLK} \cdot E_{SW}$

where  $C_L$  is the physical capacitance at the node,  $V_{DD}$  is the supply voltage,  $f_{CLK}$  is the clock frequency,  $E_{SW}$  (referred to as the average switching activity) is the average number of output transitions per clock cycle  $1/f_{CLK}$ .

The sequential circuit elements in a CMOS circuit are considered major contributors to the power dissipation since one input of sequential circuit elements is the clock, which is the only signal that switches all the time. In addition, the clock signal tends to be highly loaded. To distribute the clock and control the clock skew, one must construct a clock network (often a clock tree) with clock buffers. Massoud Pedram, Qing Wu Department of Electrical Engineering-Systems University of Southern California Los Angeles, CA 90089, USA Tel: +1-213-740-4458 Fax: +1-213-740-7290 email: {massoud, qingwu}@zugros.usc.edu

All of this adds to the total node capacitance of the clock net, which also happens to have the largest activity (two transitions per cycles) in a synchronous circuit (ignoring possible hazard activity on same signal lines). Recent studies indicate that the clock signals in digital computers consume a large (15% - 45%) percentage of the system power. Thus, reducing the clock power dissipation can greatly reduce the power dissipation in digital VLSI circuits.

Most efforts for clock power reduction have focused on issues such as voltage swing reduction, buffer insertion and clock routing [3]. In many cases switching of the clock causes a lot of unnecessary gate activity. For that reason, reducing or suppressing the unwanted switching of the clock becomes an important way to reduce the power dissipation of sequential circuits. This goal can be achieved by two means.

The first method is to eliminate the wasted power dissipation caused by the clock's switching in the non-triggering direction. The flip-flops now available are all Single-Edge Triggered (SET), for example, they are only sensitive to the clock's falling edges, thus the power dissipation caused by the clock's rising edge is wasteful. For this reason, Double-edge Triggered (DET) flip-flops have been developed which switch at both the falling and the rising edges of the clock. Consequently the clock frequency can be reduced by half while keeping the same data rate resulting in 50% power savings in the flip-flops [4,5].

The second method is to block the clock to the flip-flops during their holding states so as to reduce the power dissipation. In this case the clock received by the flip-flop should not be the chip master clock. This means that other clocks must be derived from the master clock which, based on certain conditions, can be slowed down or stopped completely with respect to the master clock. Obviously, this scheme results in power savings due to the following factors:

i) Load on the master clock as well as the number of required buffers in the clock tree is decreased. Therefore, the power dissipation of the clock tree is reduced.

ii) The flip-flop receiving the derived clock is not triggered in idle cycles, hence, the corresponding dynamic power dissipation is saved.

iii) The excitation function of the flip-flop triggered by a derived clock may be simplified since it has a don't-care condition in the cycle when the flip-flop is not triggered by the

<sup>\*</sup> This work was supported in part by NNSF in China under contract #69773034 and NSF in USA under contract # MIP-9628999.

derived clock.

Based on the above discussion, this paper describes how to generate a secondary clock, which is derived from the clock tree and meets all design requirements, such as being glitch-free and having no additional skew. Next, we show how to use a quasi-synchronous derived clock for designing sequential circuits, which have lower dissipation and simpler combinational logic. Circuit simulation is used to check the quality of the derived clock and its ability to reduce power dissipation of sequential circuits.

In conventional design, we are interested in the next state of a flip-flop, hence, D flip-flops are the natural choice. In low power design, we are more concerned with whether or not the next state changes, hence, T flip-flops become the preferred choice.<sup>1</sup> The clock power dissipation occurs during both T=0 and T=1. It is however desirable to eliminate the clock power dissipation during T=0. This can be accomplished by using the excitation function for input T to control the clock. This idea is very effective in reducing the clock power and, as we will show during the design of a decimal counter, the use of T flip-flops (instead of D flip-flops) leads to simpler combinational logic, and hence further power reduction.

# II. GATING THE CLOCK BY USING A NOT-TRIGGER SIGNAL

If there are flip-flops whose inputs are unchanged when a sequential circuit goes from one state to next, we can produce a *not-trigger* signal  $\overline{T}$  from the original state to cutoff the path from the master clock to these flip-flops. As a result, these flip-flops are not subjected to the clock signal and their power dissipation is accordingly reduced.

Without loss of generality, consider that the flip-flops in the sequential circuit are sensitive to the clock's falling edge. Fig.1(a) shows that  $\overline{T}$  is directly used to control the OR gate to cutoff the master clock *clk*, and the derived clock is *clk*'=*T* + *clk*. Fig.1(d) shows the timing relationship of *T*, *clk* and *clk*'. This scheme however fails because of the following reasons:

i) Suppose that  $\overline{T}$  is produced in cycle  $S_1$  and disappears in cycle  $S_3$ , that is, the output of the flip-flop should remain unchanged during  $S_1 \rightarrow S_2$  and  $S_2 \rightarrow S_3$  transitions. *clk* waveform in Fig.1(d) however shows that the falling transition of T in cycle  $S_3$  causes an unwanted transition that should have been avoided since  $\overline{T} = 1$  when  $S_2 \rightarrow S_3$ . Besides, *clk* has a long delay of  $t_f + t_d + t_g$  with respect to the master clock, where  $t_f$  is the delay time of flip-flop,  $t_d$  is the delay time of the combinational circuit for generating the signal  $\overline{T}$ , and  $t_g$  is the delay time of the OR gate in Fig.1(a).

ii) If the combinational circuit for generating  $\overline{T}$  has race hazards, these hazards may propagate to the clock signal of the flip-flop through OR gate in Fig.1(a).

In [6], the authors propose a scheme whereby a latch is used to filter out hazards and synchronize  $\overline{T}$  with the master clock as shown in Fig1(b). Because the latch is in the storage state during clk = 0, the incidental glitch of  $\overline{T}$  can be filtered out and the derived  $\overline{T}$  is able to cutoff the master clock during the state transitions  $S_1 \rightarrow S_2$  and  $S_2 \rightarrow S_3$ . The derived clock clk'' is obtained as shown in Fig.1(d). However, this scheme has the following shortcomings:

i) The added latch increases the circuit complexity.

ii) Since *clk* is connected to the newly added latch, the extra power dissipation of *clk* offsets some of the power saving due to the non-triggering of the flip-flop.

iii) There is still the  $t_g$  delay between the derived clock  $clk^*$  and the master clock clk, which results in the clock skew. As a result, the sequential circuit is not safely synchronized.

In fact, we can cutoff the transmission of the clock by using the trigger signal T to control an AND gate, as shown in Fig.1(c). The derived clock *clk*''' is shown in Fig.1(d). This scheme has an obvious advantage due to the omission of the appended latch, but does not solve the clock skew problem. A simple idea is to appropriately delay the *clk*, and then use it as the master clock. In this way the delayed master clock will be quasi-synchronous with respect to clk'", which means that the clock skew between clk''' and clk can be made very small by appropriate sizing of transistors in the NOR gates. This leads to another idea where the presence of both *clk* and *clk* in the clock tree can be exploited. In particular, if we rewrite  $clk''' = T \cdot clk$  as  $clk''' = \overline{T} + \overline{clk}$ , where  $\overline{clk}$  is taken from the previous stage of the clock tree, then we can design a quasi-synchronous derived clock based on the NOR gate, as shown in Fig.2.

So long as we control delay of the inverter to make it nearly the same as that of the NOR gate in Fig.2, the skew between clk and clk''' will be very small, thus the derived clock clk''' will be quasi-synchronous with respect to clk.

In Fig.2, we also present the circuit to produce the quasi-synchronous derived clock based on a NAND gate which is controlled by T. This design is suitable for the flip-flops which trigger on the rising clock edge.

<sup>&</sup>lt;sup>1</sup> Notice that the switching of the output of a T flip-flop is controlled by the excitation input T.



Fig.1 Not-trigger signal controls the master clock

(a) Cutoff the master clock through an OR gate,

(b) Cutoff the master clock through a latch,

(c) Cutoff the master clock through an AND gate,

(d) The logic waveforms

# III. DESIGN OF SEQUENTIAL CIRCUITS USING QUASI-SYNCHRONOUS DERIVED CLOCKS

The *D* flip-flop is widely used in the design of CMOS sequential circuits. However, when the not-trigger signal *T* is used to gate the master clock's triggering action to the flip-flops, the *T* flip-flop function is a better choice. The next-state equation of a *T* flip-flop is  $Q_+ = T \oplus Q$ , therefore we set  $T = Q \oplus Q_+$ , which indicates that the flip-flop switches or holds state during T = 1 or T = 0, respectively. This equation matches with the method of using the not-trigger signal  $\overline{T}$  to gate the master clock as mentioned in previous section. So we just need to use the *T* function from the design of a T flip-flop to gate the master clock.

Taking a decimal counter as an example, the next state of the counter is shown in Table II. If D flip-flops are used as is typically the case, we will obtain Karnaugh maps for the excitation functions  $D_3$ ,  $D_2$ ,  $D_1$  and  $D_0$  from the next states in Tab.1, as shown in Fig.3(a). In these maps, an empty box represents the don't-care condition. The optimized excitation functions are:

$$\begin{split} D_3 &= Q_2 Q_1 Q_0 + Q_3 Q_0 ,\\ D_2 &= Q_2 \oplus (Q_1 Q_0) ,\\ D_1 &= \overline{Q_3} \overline{Q_1} Q_0 + Q_1 \overline{Q_0} ,\\ D_0 &= \overline{Q_0} . \end{split}$$



# Fig.2 Quasi-synchronous clocks derived from the master clock.

The corresponding circuit realization is shown in Fig.3(b) which is a traditional synchronous design for a decimal counter.

Now we use *T* flip-flops in the design instead. Since  $T = Q \oplus Q_{+}$ , from the next-state Karnaugh maps in Fig.3(a), the Karnaugh maps of the excitation functions  $T_3$ ,  $T_2$ ,  $T_1$  and  $T_0$  are obtained as shown in Fig.4(a). The optimized excitation functions are:

$$T_3 = Q_3 Q_0 + Q_2 Q_1 Q_0 = (Q_3 + Q_2 Q_1) Q_0 ,$$

$$T_2 = Q_1 Q_0 ,$$
  

$$T_1 = \overline{Q}_3 Q_0 ,$$
  

$$T_0 = 1 .$$

At the first glance, the above excitation functions are simpler than those of D flip-flops. However, if we construct a T flip-flop from D flip-flop an extra XOR gate has to be attached to produce  $D = T \oplus Q$ . Thus, we didn't adopt T flip-flops in usual sequential designs.

TABLE I STATE TABLE OF A DECIMAL COUNTER

| $Q_3$ | $Q_2$ | $Q_1$ | $Q_{\scriptscriptstyle 0}$ | $Q_3^+$ | $Q_2^+$ | $Q_1^+$ | $Q^{\scriptscriptstyle +}_{\scriptscriptstyle 0}$ |
|-------|-------|-------|----------------------------|---------|---------|---------|---------------------------------------------------|
| 0     | 0     | 0     | 0                          | 0       | 0       | 0       | 1                                                 |
| 0     | 0     | 0     | 1                          | 0       | 0       | 1       | 0                                                 |
| 0     | 0     | 1     | 0                          | 0       | 0       | 1       | 1                                                 |
| 0     | 0     | 1     | 1                          | 0       | 1       | 0       | 0                                                 |
| 0     | 1     | 0     | 0                          | 0       | 1       | 0       | 1                                                 |
| 0     | 1     | 0     | 1                          | 0       | 1       | 1       | 0                                                 |
| 0     | 1     | 1     | 0                          | 0       | 1       | 1       | 1                                                 |
| 0     | 1     | 1     | 1                          | 1       | 0       | 0       | 0                                                 |
| 1     | 0     | 0     | 0                          | 1       | 0       | 0       | 1                                                 |
| 1     | 0     | 0     | 1                          | 0       | 0       | 0       | 0                                                 |



Fig. 3 Synchronous design of a decimal counter.

(a) Karnaugh maps of  $D_3$ ,  $D_2$ ,  $D_1$  and  $D_0$ ,

(b) circuit realization

Assuming that the D flip-flops are sensitive to the falling edge of the clock signal, we can adopt the method of producing derived clock based on the NOR gate in Fig.2. The clock signals for these four flip-flops are:

$$\begin{split} clk_3 &= T_3 \cdot clk = \overline{\overline{T_3} + \overline{clk}} = \overline{\overline{Q_3 + Q_2Q_1} + \overline{Q_0} + \overline{clk}} \;, \\ clk_2 &= T_2 \cdot clk = \overline{\overline{T_2} + \overline{clk}} = \overline{\overline{Q_1} + \overline{Q_0} + \overline{clk}} \;, \\ clk_1 &= T_1 \cdot clk = \overline{\overline{T_1} + \overline{clk}} = \overline{Q_3 + \overline{Q_0} + \overline{clk}} \;, \\ clk_0 &= T_0 \cdot clk = \overline{\overline{T_0} + \overline{clk}} = \overline{\overline{clk}} \;. \end{split}$$

From the clock functions, we construct the circuit realizations shown in Fig.4(b). Notice that in this circuit, to

derive  $clk_3$ , the *n* part of the CMOS configuration for realizing  $\overline{Q_3 + Q_2Q_1}$  is composed of two series nMOS transistors parallel-connected with an nMOS transistor, and the *p* part is composed of three pMOS transistors with the dual configuration<sup>[7]</sup>. So the complexity of the circuit for realizing  $\overline{Q_3 + Q_2Q_1}$  is equal to that of a 3-input NOR gate. From this, the circuit construction in Fig.4(a) is clearly simpler than its counterpart in Fig3(a).

The simple construction of the combinational circuit results in low power dissipation. However the low power

dissipation property is mostly achieved as a result of gating the clock. Besides flip-flop  $Q_0$ , the three flip-flops  $Q_3$ ,  $Q_2$  and  $Q_1$  have no dynamic power dissipation when there is no triggering of the clock.

0.5µ CMOS technology and the following MOS parameters:

*nMOS*: level=3 phi=0.700000 tox=9.6000e-09 xj=0.200000u tpg=1 vto=0.6566 delta=6.9100e-01 ld=4.7290e-08 kp=1.9647e-04 uo=546.2 theta=2.6840e-01

We simulated the new design in Fig.4(b) by PSPICE with



**Fig. 4** Quasi-synchronous design of a decimal counter. (a) Karnaugh maps of  $T_3$ ,  $T_2$ ,  $T_1$  and  $T_0$ ,

(b) The circuit realization

pMOS : level=3tox=9.6000e-09 phi=0.700000 tpg=-1*xj*=0.200000*u* vto=-0.9213 delta=2.8750e-01 ld=3.5070e-08 kp=4.8740e-05 uo=135.5 theta=1.8070e-01 rsh=1.1000e-01 nsub=8.5120e+16 gamma=0.4673 nfs=6.5000e+11 vmax=2.5420e+05 eta=2.4500e-02 kappa=7.9580e+00 cgdo=2.3922e-10 cgso=2.3922e-10 *cgbo*=3.7579*e*-10 *cj*=9.35*e*-04 *mj*=0.468 cjsw=2.89e-10 mjsw=0.505 pb=0.99.

The transient analysis shown in Fig.5 proves that the new design has the expected logic operation. The Q -to- clk average delays for the four flip-flops are listed in Table II, which shows that the flip-flops in Fig.4(b) can synchronously be switched. For comparison, the delays of the synchronous design in Fig3(b) are also given in the table. As we can see, the average delays are comparable.

 TABLE II AVERAGE DELAY OF FLIP-FLOPS (ns)

|                    | $Q_0$ | $Q_1$ | $Q_2$ | $Q_{3}$ |
|--------------------|-------|-------|-------|---------|
| Design in Fig.4(b) | 0.252 | 0.259 | 0.261 | 0.276   |
| Design in Fig.3(b) | 0.463 | 0.453 | 0.403 | 0.402   |



Fig. 5 Transient analysis.





We also measure the power dissipation of the synchronous design in Fig.3(b) and the quasi-synchronous design in Fig.4(b). The energy dissipation diagrams are shown in Fig.6, and prove that the later design reduces the power dissipation by 51%. This is expected since  $clk_1 \\$ ,  $clk_2$  and  $clk_3$  waveforms in Fig.5 show that in a decimal counting cycle flip-flops  $Q_1$ ,  $Q_2$  and  $Q_3$  are triggered only 4, 4 and 2 times, respectively. Furthermore, we know that they have no dynamic power dissipation when not triggered, and that the simpler combinational circuit in Fig.4(b) has simpler structure which results in low power dissipation.

#### IV. CONCLUSION

We presented a procedure for gating the clock by means of the "not-trigger" signal. The derived clock is quasi-synchronous with respect to the master clock and can be used to isolate the triggered flip-flops from the master clock in their idle cycles. The achieved power saving is significant as shown by the example design of a decimal counter. The circuit simulation proved the quality of the new derived clock and its ability to reduce power dissipation. In this paper we only provided a few examples to illustrate the basic ideas. Our purpose is to introduce the design method and indicate that the engineering issues related to the use of gated clocks could be resolved for practical applications, opening the path for adoption of the clock-gating technique in the design of low power sequential circuits.

#### **R**EFERENCES

- J. Rabaey and M. Pedram, *Low Power Design Methodologies*, Kluwer Academic Publishers, Norwell, 1996.
- [2]. The National Technology Roadmap for Semiconductors, Technology needs, Semiconductor Industry Association, pp.17-18, the 1997 Edition.
- [3]. G. Friedman, "Clock distribution design in VLSI circuits: an overview," *Proc. IEEE ISCAS*, San Jose, pp. 1475-1478, 1994.
- [4]. R. Hossain, L. D. Wronski and A. Albicki, "Low power design using double edge triggered flip-flops," *IEEE Trans. VLSI Systems*, vol.2, no.2, pp. 261-265, June 1994.
- [5]. M. Pedram, Q. Wu and X. Wu, "A new design of double edge triggered flip-flops", *Proc. ASP-DAC*, Yokohama, pp. 417-421, Feb. 1998.
- [6]. L. Benini and G. De Micheli, "Transformation and synthesis of FSMs for low power gated clock implementation", *Proc. Int. Symp. Workshop on Low Power Design*, pp.21-26, Apr. 1995.
- [7]. N. H. E. West, K. Eshraghian, Principles of CMOS VLSI Design: A System Perspective, 2<sup>nd</sup> Edition, Addison-Wesley Publishing Co., New Work, 1993.