## Hybrid interconnect network for on-chip low-power clock distribution

## Q. Ding<sup> $\boxtimes$ </sup> and T. Mak

Clock is regarded as the heartbeat of modern synchronous digital integrated circuits. However, with the CMOS technology shrinking, it becomes critical to deliver high-quality global clock signal with low propagation delay and hence conventional metallic interconnect seems to meet its bottleneck, as a clock distribution network (CDN) might consume up to 50% of the overall power. To address these problems, this Letter proposes a novel combination of wireless and conventional metallic interconnect to improve the performance of on-chip clock distribution. By incorporating integrated wireless clock transceivers and efficient modulation technique, overall performance has been increased significantly with a total delay reduction of 66.8% compared with a new cornerstone tapered H-tree model from 400 to 130 ps. In addition, clock uncertainties are now predictable according to the displacement of transceivers, <33 ps of clock skew at 2.5 GHz input with highly unbalanced loads could be found within the proposed CDN, and hence, indicates a promising potential of future high-performance on-chip clock distribution.

Introduction: As CMOS technology keeps shrinking, transistor gate delay gets significantly better for shorter channel length. However, interconnect dimensions are scaling together with transistor size, hence lead to decreasing bandwidth and more interference. As interconnect data acquired from the prediction technology model [1] show, for global interconnect using a typical top metal layer from 180 to 65 nm CMOS, wiring resistance per unit length has almost doubled its original value. Consequently, to deliver high speed clock signals over the entire die area becomes gradually costly and difficult. To alleviate the wiring impact, other interconnect techniques are emerging such as RF-based interconnect and optical interconnect. Among all, wireless interconnect is considered to be one of the most promising solutions to solve the above existing problems. The inherent fan-out feature of the electromagnetic (EM) wave propagation significantly enhances the broadcast performance, hence could generate less delay for a higher data transmission rate. Taking advantage of wireless interconnect and under the assumption of a propagation speed near the speed of light, this work proposes a novel feature of a hybrid clock distribution network (CDN) architecture utilising wireless interconnect with efficient on-off-keying (OOK) under a 65 GHz frequency band with a delay and power reduction of 66.8 and 36.8%, respectively, which shows its superiority in terms of both low-delay and low-power, and hence, a potential of future global clock distribution.

Proposed CDN architecture: For conventional solutions, H-tree is one of the mostly adopted CDN for global clock distribution, however, it is almost impractical to allocate fully balanced loads, and thus, generating different time of arrival for several clock routes. Several literatures have proposed many interconnect delay prediction models based on the first moment assumption [2, 3]. Given a metallic interconnect, the predicted 50% point delay of this conductor could be given by

$$D(k) = \lim_{n \to +\infty} \left[ C_{\text{load}} \sum_{i=1}^{n} R_{rki} + \frac{n(n+1)R_{rki}C_{rki}}{2} \right]$$
  
= ln (2) · R<sub>rk</sub>L(k)  $\left( C_{\text{load}} + \frac{C_{rk}L(k)}{2} \right)$ , (1)

where n is the number of distributed sections of a wire; l is the total interconnect length;  $C_{load}$  is the load capacitance seen at any branching point;  $R_{rki}$  and  $C_{rki}$  are the *i*th distributed sub-circuit resistance and capacitance at the kth branch, respectively, and  $R_{rk}$ ,  $C_{rk}$  are the total resistance and capacitance for the kth branch. Also, the total interconnect delay could be given by

$$D_{\rm sum} = \sum_{i=1}^{2^{k+1}-1} D(i) + d_{\rm buf},$$
 (2)

where  $d_{\text{buf}}$  is the circuit delay for clock buffers. It is clear that, with the increasing RC product and expanding interconnect length, the total clock propagation delay would eventually increase pseudoexponentially, and hence raise the potential of possible time violations. On the contrary, the proposed wireless global CDN shown in Fig. 1b could jump over congested wires, thus reduce overall signal propagation delay.



Fig. 1 Comparison between global CDN architectures a Tapered H-tree

b Proposed hybrid wireless approach

Under the assumption of a propagation speed near the speed of light, the total delay of a proposed wireless global CDN is derived by

$$d_{\text{wireless}}(e) = t_{pd}(1, e) + t_{\text{Tx}} + t_{\text{Rx}},$$
(3)

which is primarily determined upon the communication distance between Tx and Rx from the root node to any of the leaf nodes e, hence, as the circuit delay could be regarded as a small fraction of the wiring delay, this design would essentially exhibit a significant reduction of time of flight (ToF). As for wireless CDN, clock skew could be defined as the difference between maximum and minimum delay thereby clock uncertainties such as skew could be minimised by allocating nodes with identical distance.

System design: The proposed CDN adopts an efficient OOK modulation for its power efficiency, as the entire system could ideally work only at an active state. Our previous work [4] adopts a CMOS switchbased clock transmitter (SWmod) with a compact structure of transmission gates. However, a carrier wave would leak through the transistor channel in the sub-threshold region in 65 GHz thus producing noise. To address this, we proposed a new clock transmitter (LCmod) which could mitigate the leakage interference at the cost of slight decay of output power. As shown in Fig. 2a, the proposed clock Tx adopts a crosscoupling structure with a connected signal and leakage route, therefore cancel each other out for their opposite phase property. Assume the body effect is neglected, the step response of SWmod as per time t could then be derived by

$$V_{\text{previous}} = V_{\text{dd}} - V_{th} - \left(\frac{\mu_n C_{\text{ox}} W t}{2C_{\text{hold}} L} + \frac{1}{V_{\text{dd}} - V_{th}}\right)^{-1}$$
(4)

where  $C_{ox}$ , W and L are effective transistor oxide capacitance, width and length, respectively. As t gets larger, output voltage eventually approaches to  $V_{dd} - V_{th}$ . While the proposed LCmod benefits from a common gate pair gain. Assume M1, M2, M3, and M4 are in saturation, the output signal could then be defined by:

$$V_{\rm now} = V_{\rm dd} - \frac{1}{2} \mu_n C_{\rm ox} \frac{W}{L} (V_b - V_{\rm in} - V_{\rm th})^2 R_D, \tag{5}$$

where  $R_D$ ,  $V_b$  and  $V_{in}$  are load resistance, input clock signal, and generated carrier wave, respectively. Also, the proposed LCmod could, therefore, produce a gain over the original modulator given by

$$Gain = \lim_{t \to \infty} \left\{ \frac{\delta \left[ V_{dd} - \frac{1}{2} \mu_n C_{ox} \frac{W}{L} (V_b - V_{in} - V_{th})^2 R_D \right]}{\delta (V_{dd} - V_{th})} \right\}$$
(6)
$$= \mu_n C_{ox} \frac{W}{L} V_{th} R_D,$$

therefore enhance the Tx output power level, which enables higher clock frequency to be transmitted. The proposed clock Rx adopts a rectifierbuffer combination, which could effectively rectify input RF signal with a proper biasing voltage shown in Fig. 2b. The RC load could perform as a low pass filter to reduce the interference of high-frequency carrier, hence only track the envelope to the modulated clock with a cut-off frequency around  $1/RC_{load}$ . Furthermore, a two-level cascode amplifier with Class-AB biasing enlarges rectified signal swing, and finally, the recovered envelope would go through a CMOS inverter as voltage buffer to provide an adequate slew rate.



**Fig. 2** Schematic of proposed clock transceiver a Proposed leakage compensation modulator with boosted on/off isolation b Rectifier-based demodulator with pseudo-differential input

The typical on-chip antenna exhibits a large area overhead. For a conventional half-wave dipole, it needs to occupy 2.3 cm physical length for both antenna arms, and the total area consumption is to the order of  $10^6 \mu m^2$  at the frequency of interest. Previously, we adopted a meandering monopole antenna (MMA) for wireless on-chip clock distribution [4] with a compact layout. In this work, we assume that the MMA could be mirrored with an extra arm, forming a meandering dipole antenna (MDA) with a balanced structure for higher radiation efficiency. Hence the area overhead would still be maintained within  $10^4 \mu m^2$  while the total power efficiency would get significantly better. Furthermore, this MDA could be directly connected to the I/O terminal/matching network without a RF balun, thus further enhance power and area conservation.

*Results and analysis:* Experiments based on Cadence Spectre and CST Microwave Studio have been performed to verify the performance of the proposed design with both four-receiver and 16-receiver structures in terms of clock delay and uncertainties. For a test model around 10 mm using a 50  $\Omega$  lumped port excitation [5], based on our assumption, the proposed MDA would exhibit a much higher radiation efficiency up to 65% at resonance, compared to that of MMA with only 10%, while still maintaining a relatively low footprint (0.02 mm<sup>2</sup>).

Tx power output at both active and idle states is shown in Fig. 3. The on–off isolation of the proposed leakage compensation Tx has been boosted up to 45.3 dB, which is 18.2 dB higher than the switch-based Tx, hence enhance output power level significantly. The proposed Rx front-end contains a single stage LNA with a pseudo-differential structure with cascoding topology to provide enough gain for decayed RF signal. A well matched 50  $\Omega$  port is connected directly to receiver antenna feed line with 3.5 dB forward gain.



**Fig. 3** Output power of clock Tx using two different modulator structures: proposed LCmod and SWmod during both 'on' and 'off' state a On-off isolation w/o cancelling gate CG (M5, M6) b On-off isolation with cancelling gate CG (M5, M6)

Measured delay, skew, jitter could be found in Figs. 4a and *b*, respectively, for the time domain periodic ramp response and eye patterns.

Different loads are represented by load capacitance ranging from light load around 10 fF to heavy load around 500 fF, respectively. It is clear that for the proposed wireless CDN with unbalanced loads, clock skew could be minimised by allocating heavy loads near the clock Tx and light loads slightly far away from clock Tx. The proposed CDN exhibits an average interconnect delay including wire delays and EM signal ToF together around 113.5 ps and a maximum clock skew at 50% rising point around 33 ps, which shows a remarkable reduction of 96.5, 94.9 and 66.8% compared with Elmore delay model, uniform H-tree and tapered H-tree in 65 nm process, respectively, as shown in Fig. 5*a*.



**Fig. 4** *Measured results at output of different clock receivers* a Recovered 16 clock signals with average delay of 113 ps and skew of 33 ps with

2.5 GHz clock input b Measured eye-pattern with deterministic jitter of 6.6 ps with 2.5 GHz clock input



Fig. 5 Performance and power comparison between different global CDN models under 65 nm process

a 50% point delay with 2.5 GHz clock input

b Simulated power consumption with 2.5 GHz clock input

Last but not least, Fig. 5b depicts the simulated power consumption, of which a cross point appears around the system model near 5.6 mm and a power reduction of 36.8% around 10 mm side length, which indicates that for a small application/die area, a conventional interconnect still benefits from its simplicity and efficiency. However, for modern systems with larger dimensions, the proposed wireless CDN exhibits a robust and competitive performance for high-speed clock distribution, which is of paramount importance when considering future many-core system design.

*Conclusion:* Wireless interconnect provides an effective way of jumping over congested interconnect with heavy loads, which helps to mitigate total signal propagation delay and wiring power loss. Taking advantage of these benefits, this Letter proposes a novel architecture of hybrid CDN. A reference input clock ranging from 2.5 to 5 GHz is transmitted and recovered via a global short-range wireless channel at a very low cost of 41 mW. A significant cut down of delay and clock uncertainty essentially shed light on the promising potential of distributing a high-quality clock signal over the future many-core system.

© The Institution of Engineering and Technology 2019 Submitted: *29 October 2018* doi: 10.1049/el.2018.6570 One or more of the Figures in this Letter are available in colour online. Q. Ding and T. Mak (*School of Electronics and Computer Science*,

University of Southampton, Southampton, UK)

□ E-mail: qd1g15@ecs.soton.ac.uk

## References

- Zhao, W., and Cao, Y.: 'New generation of predictive technology model for sub-45 nm early design exploration', *Trans. Electron Devices*, 2006, 53, (11), pp. 2816–2823
- 2 Chen, G., and Friedman, E.G.: 'An RLC interconnect model based on Fourier analysis', *Trans. Comput.-Aided Des. Integrated Circuits Syst.*, 2005, 24, (2), pp. 170–183
- 3 Eudes, T., Ravelo, B., and Louis, A.: 'Experimental validations of a simple PCB interconnect model for high-rate signal integrity', *Trans. Electromagn. Compat.*, 2012, **54**, (2), pp. 397–404
- 4 Ding, Q., Fletcher, B.J., and Mak, T.: 'Globally wireless locally wired (GloWiLoW): a clock distribution network for many-core systems'. IEEE Int. Symp. on Circuits and Systems (ISCAS) Conf., Florence, Italy, May 2018
- 5 Hirano, T., Okada, K., Hirokawa, J., *et al.*: 'Electromagnetic simulation modeling of silicon substrate for 60 GHz on-chip differential-feed dipole antenna', *Appl. Phys. Lett.*, 2013, **103**, (12), p. 122101