# A Single-chip Ultra-Wideband Receiver using Silicon Integrated Antennas for Interchip Wireless Interconnection Nobuo Sasaki, Masashi Fukuda, Masakazu Nitta, Kentaro Kimoto and Takamaro Kikkawa Research Center for Nanodevices and Systems, Hiroshima University 1-4-2 Kagamiyama, Higashi Hiroshima, 739-8527, JAPAN Email: kikkawa@sxsys.hiroshima-u.ac.jp #### Introduction Wireless interconnections between ULSI chips can be applied to three dimensional stacked-chip packaging on a print-circuit board (PCB). Clock and data are transmitted between LSI chips without using bonding wires and solder bumps. Inductive coupling [1] and electromagnetic wave transmission [2-4] technologies have been developed for inter-chip wireless interconnection. The transmission range of the inductive coupling is a few hundred µm, so that it is suitable for clock and data communications between adjacent The electromagnetic wave transmission can reach further distance, so that it can be used for inter-chip reconfigurable interconnections. The proposed three dimension custom stacked system (3DCSS) [5] has both local wireless interconnections (LWI) using inductive coupling, and global wireless interconnections (GWI) using electromagnetic wave. The target distance of GWI is from mm $\sim$ cm. The wireless communication technology using integrated antennas has been applied to global clock distribution [2]. More advanced technology will be required for multi-channel data communication among stacked ULSI modules. Pulse-based ultra-wideband (UWB) is one of candidates [6-9], and it is adopted as a communication method of GWI. Here Gaussian monocycle pulse (GMP) is used as a transmitted wave. Multi-channel access will be realized by time division multiplexing (TDM). The generation of GMP is performed by simple logic operations [10,11], thus simple circuitry is possible for UWB transmitter. In the receiver side, synchronous pulse detection by taking a correlation between the received GMP and template is important, but the holding of the synchronization is difficult due to the short pulse width and periodic jitter of the signal [12,13]. In this paper, a single-chip CMOS UWB receiver for mm ~ cm distance communication is developed, and HSPICE simulation results are shown. ## **UWB** signals The first derivative of Gauss function is called as Gaussian monocycle pulse (GMP). The expression of one-shot GMP is given as follows. $$y^{(1)}(t) = -\sqrt{e} \cdot \frac{2\pi t}{\tau} \exp\left[-\frac{1}{2} \left(\frac{2\pi t}{\tau}\right)^2\right]$$ (1) Here, the parameter $\tau$ is called as a pulse width, which has a dimension of time. Frequency spectrum of one-shot GMP is also the first derivative of Gaussian in frequency domain. $$\left| Y^{(1)}(f) \right| = \sqrt{e} \frac{f}{f_c} \exp \left\{ -\frac{1}{2} \left( \frac{f}{f_c} \right)^2 \right\}$$ (2) The center frequency means that the function takes an extreme value at $f = f_c$ . The definition of pulse width is given as an inverse of center frequency, i.e., $\tau = 1/f_c$ . Normalization constants of Eqs. (1) and (2) are fixed to become unity at the positions of extreme values. ## **UWB** receiver and simulations Figure 1 shows a photograph of UWB receiver. The receiver circuit design has been done by TSMC 0.18 $\mu m$ mixed signal design rule. As shown in Fig. 2, developed UWB receiver is composed of (a) delayed clock generator, (b) template generator, (c) demodulator and (d) RZ / NRZ converter. (a) *Delayed clock generator*: It is composed of three blocks, i.e., (1) phase-locked loop between reference clock and differential 8 stages voltage controlled oscillator (VCO), (2) delay selector and (3) 4 bit counter whose outputs controls delay selector. Figure 3 (a) shows a block diagram of VCO, which is a differential 8 stages ring oscillator. Here the differential inverter is used as a delay cell of the ring oscillator, as shown in Fig. 3 (b). Outputs of VCO are shown in Fig. 4. Here each of figures corresponds to the output of a delay cell, whose amount of delay is 1/16 of the cycle of VCO. Obtained cycle is 976 ps and the unit delay time is 61 ps, at $V_{dd} = V_{cn} = 1.8 \ V$ , $V_{cp} = 0 \ V$ . The differential 8 outputs of VCO are given to the inputs of delay selector. Figure 5 (a) is a block diagram of six stages frequency dividers. Block diagram of 1/2 divider and the schematic diagram of latch are shown in Figs. 5 (b) and (c), respectively. In Fig. 5 (a), 1/4 divided clock is sent to phase/frequency detector (PFD), and compared to the reference clock. The rest of dividers construct a 4bit counter, which controls delay selector. Figure 5 (a) also shows that latches lock the outputs of 4bit counter, when synchronization is established and the value of 'Lock' becomes 'H'. Outputs of 4bit counters are shown in Figs. 6 (a) – (d). Fig 6 (a) is a least significant bit (LSB), which has a cycle of the 1/8 of that of VCO. Analogously, Fig. 6 (d) shows a most significant bit (MSB), which has a cycle of the 1/64 of that of VCO. As shown in these figures, outputs of 4bit counter are fixed after the establishment of synchronization. Figures 7 (a) - (c) show schematic diagrams of PFD, charge pump (CP) and low pass filter (LPF). These are components of PLL, which performs the synchronization between the reference clock and internal VCO. PFD outputs the difference of phase and frequency between the reference clock and 1/4 divided internal clock, and CP generates currents in proportion to the outputs of PFD. Generated charges are given as an input of LPF, which outputs the control voltage $V_{\rm cn}$ of VCO. Figures 8 show the comparisons between the reference clock and internal clock at (a) 0 - 20 ns, (b) 300-320 ns, (c) 700-720 ns, respectively. These figures show that the frequency / phase synchronization between reference clock and internal clock establishes. As shown in Fig. 9 (a), delay selector is composed of differential 8 to 1 MUX, differential 2 to 1 MUX and 1/4 frequency divider. Here one of 8 differential inputs from VCO is sequentially selected and delayed clock is generated. Figure 9 (b) is the schematic diagram of differential 2 to 1 MUX. Figures 10 show comparisons between non-delayed and delayed clock. Before synchronization (Fig. 10(a)), delay selector is unlocked and it generates delay. After synchronization (Fig. 10(b)), delay selector is locked and the phase difference between non-delayed and delayed clock is fixed. - (b) Template generator [10,11]: Differential, 50% duty and delayed clocks are coming from delayed clock generator. Template generator generates differential GMP template. As shown in Fig. 11, it is composed of GMP generator, i.e., triangular pulse generator (TPG) + CR differentiator, and single input / differential output (SIDO) amplifier. Here, there are double GMP generators. The template from the 1st GMP generator is used for the detection of the timing pulse, and that from the 2<sup>nd</sup> GMP generator is used for the detection of data pulse. The received signal has alternately coming timing pulse (always '1') and on-off-keying (OOK) modulated data pulse. The time deference between timing pulse and data pulse is a 50% of the period of the clock. Figures 12 show the output of SIDO amplifier. The pulse width of the template is 600 ps, and the peak-to-peak voltage $V_{pp}$ is 100 mV. Figure 12 (a) shows that only the 1<sup>st</sup> GMP generator is operating when the synchronization is not established. After the establishment of synchronization, the 2<sup>nd</sup> GMP generator becomes on (Fig. 12 (b)), and data acquisition starts. - (c) *Demodulator* [12,13]: Figure 13 shows schematic diagrams of (a) differential amplifier, (b) mixer, (c) differential input / single output (DISO) amplifier and (d) inverter buffer. Received signal is amplified by double differential amplifiers and given as an input of Mixer. Mixer multiplies amplified received signal by template. After additional amplification by double differential amplifiers and differential / single-ended conversion using DISO, inverter buffer performs analog / digital conversion. The threshold of the inverter buffer is about 0.9 *V*, and dc offset of the input of the buffer can be externally adjusted. - (d) RZ / NRZ converter: Output of buffer is return zero (RZ) signal with low duty cycle, and it is composed of alternately coming timing bits that are always '1' and RZ data bits after the establishment of synchronization. RZ / NRZ (non-return zero) converter separates data from timing bits and performs RZ / NRZ conversion. Obtained NRZ timing signal is used as a flag of the establishment of synchronization, i.e., the signal locks the outputs of 4bit counter. The timing signal is also used as a trigger of data acquisition. Figure 14 shows a block diagram of RZ / NRZ converter, and an algorithm of RZ / NRZ conversion is shown in Fig. 15. Fig. 15 (a) shows RZ signal with low duty cycle, which is given as a set signal of S-R flip-flop (SRFF). Delayed clock (Fig. 15 (b)) is given as a reset signal of SRFF. If the rising edge of the clock comes after the falling edge of RZ signal, obtained output of SRFF has a rising edge of RZ and a falling edge of clock, as shown in Fig. 15 (c). After some delays by inverters, a rising edge trigger D flip-flop (DFF) samples it. Here the same clock as the reset signal of SRFF is used as a trigger of DFF, and the falling edge of RZ data is sure to come later than the a trigger. Thus the correct sampling and holding are performed as shown in Fig. 15 (d). In above- mentioned method, the condition that 'the rising edge of the clock comes after the falling edge of RZ signal', is important. This condition is automatically satisfied when double RZ / NRZ converters are prepared, and both clock and inverted clock are used, as shown in Fig. 14. Double RZ / NRZ converters also perform the separation between timing bit and data bit Synchronization and OOK data recovery: Figures 16 (a)-(e) show that (a) input signal, (b) output of first amplifiers, (c) output of MIXER, (d) output of DISO amplifier, and (e) output of buffer, respectively. From these figures, it is found that the synchronization establishes around 750 ns. After the establishment of synchronization, data acquisition starts. Figures 17 (a) - (d) show that (a) input data, (b) output of buffer, (c) NRZ-converted timing pulse and (d) NRZ converted data, respectively. Here, synchronization between the template and transmitted GMP is already established. Input data is composed of alternately coming timing pulse and data pulse. Buffer outputs RZ signal, where the timing pulse and data pulse is not separated. After double RZ/ NRZ generators, timing pulse and data pulse are separated and converted to NRZ, as shown in Fig. 17 (c) and (d). Figure 17 (d) shows that the receiver is succeeding in the recovery of data from alternately coming timing pulse and data pulse. In this simulation, obtained data rate is 250 Mbps. Performance of the developed UWB receiver is shown in table 1. #### Conclusion Single-chip UWB receiver was developed, and results of HSPICE simulation were presented. The receiver was succeeding in the recovery of data from alternately coming timing pulse and data pulse. Obtained data rate was 250 Mbps. # Acknowledgements This work is supported by the Ministry of Education, Culture, Sports, Science and Technology under the 21st Century COE program and the Grant-in-Aid for Scientific Research. #### References - [1] N. Miura, D. Mizoguchi, M. Inoue, H. Tsuji, T. Sakurai, and T. Kuroda, ISSCC Dig. Tech. Papers (2005) 264. - [2] B. A. Floyd, C. M. Hung, and Kenneth K. O, IEEE J. Solid-State Circuits, 37, (2002) 543. - [3] K. Kimoto and T. Kikkawa, Jpn. J. Appl. Phys. 45 (2006) 4968. - [4] K. Kimoto, N. Sasaki, P. K. Saha, M. Nitta, T. Kikkawa and M. Sasaki, Jpn. J. Appl. Phys, 45 (2006) 3272. - [5] A. Iwata, M. Sasaki, T. Kikkawa, S. Kameda, H. Ando, K. Kimoto, D. Arizono, and H. Sunami, ISSCC Dig. Tech. Papers (2005) 262. - [6] M. Z. Win, and R. A. Scholtz, IEEE Trans. Commun., 48(2000) 679. - [7] Y. Zheng, Y. Zhang and Y. Tong, IEEE Trans. Microwave Theory and Techniques 54 (2006) 1912. - [8] J. Lee, Y.-J. Park, M. Kim, C. Yoon, J. Kim and K.-H. Kim, IEEE Trans. Microwave Theory and Techniques 54 (2006) 1667. - [9] I. D. O'Donnell and R. W. Brodersen, 2006 Symp. VLSI circuits Dig. Tech. Papers (2006) 248. - [10] P. K. Saha, N. Sasaki and T. Kikkawa, Jpn. J. Appl. Phys. 45, (2006) 3279 - [11] P. K. Saha, N. Sasaki and T. Kikkawa, 2006 Symp. VLSI circuits Dig. Tech. Papers (2006) 252. - [12] N. Sasaki, P. K. Saha and T. Kikkawa, Proc. Int. Workshop on UWB Technologies (2005) 46. - [13] N. Sasaki, M. Fukuda, M. Nitta, K. Kimoto and T. Kikkawa, Extended Abstracts of the 2006 Int. Conf. on SSDM (2006) 70. Fig. 7. (a) Schematic diagram of phase frequency detector (PFD) (b) Schematic diagram of charge pump (CP). (c) Schematic diagram of low pass filter (LPF). Fig. 8 Comparisons between the reference clock and the internal clock at (a) 0 - 20 ns, (b) 300 - 320 ns, (c) 700 - 720 ns. 1/32 (d) 1/64 of the cycle of VCO, respectively. Fig. 13. (a) Differential amplifier. (b) Mixer. (c) Differential input / single output (DISO) amplifier. (d) Inverter buffer. Fig. 15. Algorithm of RZ / NRZ conversion. **Time [ns]**Fig.16 (a) Input signal. (b) Output of 1st amplifiers. (c) Output of MIXER. (d) Output of DISO amplifiers. (e) Output of buffer. Time [ns] Fig. 17. (a) Input data. (b) Output of buffer. (c) NRZ - converted timing pulse. (d) NRZ - converted data. Table 1. Calculated performance of UWB receiver. | Technology | 0.18µm CMOS | |------------|---------------------| | Modulation | On-off keying | | Area | 0.54mm <sup>2</sup> | | Power | 40.0mW<br>(250MHz) | | Data rate | 250Mbps |