scispace - formally typeset
Search or ask a question

Showing papers by "Chen-Yi Lee published in 2006"


Proceedings ArticleDOI
18 Sep 2006
TL;DR: A scalable pipeline and prediction circuit is employed to improve integration efficiency and transmission bandwidth, and performs real-time MPEG-2 and H.264/AVC QCIF video decoding.
Abstract: An MPEG-2 and H.264/AVC decoder occupies 3.9 times 3.9mm2 in 0.18mum 1P6M CMOS. To improve integration efficiency and transmission bandwidth, a scalable pipeline and prediction circuit is employed. The decoder performs real-time MPEG-2 and H.264/AVC QCIF at 15frames/s video decoding, dissipating 108muW and 125muW, respectively, at 1V with a clock frequency of 1.15MHz

80 citations


Journal ArticleDOI
TL;DR: This work presents a clock generator with cascaded dynamic frequency counting (DFC) loops for wide multiplication range applications that achieves a multiplication range from 4 to 13 888 with output peak-to-peak jitter less than 2.8% of clock period.
Abstract: This work presents a clock generator with cascaded dynamic frequency counting (DFC) loops for wide multiplication range applications. The DFC loop, which uses variable time period to estimate and tune the frequency of the digitally controlled oscillator (DCO), enhances the resolution of frequency detection. The conventional phase-frequency detector (PFD) and programmable divider are replaced with a digital arithmetic comparator and a DCO timing counter. The value in the DCO timing counter is separated into quotient and remainder vectors. A threshold region is set in the remainder vector to reduce the influence of jitter variation in frequency detection. The loop stability can be retained by cascading two DFC loops when the multiplication factor (N) is large. The proposed clock generator achieves a multiplication range from 4 to 13 888 with output peak-to-peak jitter less than 2.8% of clock period. A test chip for the proposed clock generator is fabricated in 0.18-/spl mu/m CMOS process with core area of 0.16 mm/sup 2/. Power consumption is 15 mW @ 378 MHz with 1.8-V supply voltage.

57 citations


Proceedings ArticleDOI
26 Apr 2006
TL;DR: A very high-resolution all-digital phase-locked loop (ADPLL), which is designed with the cell library and described by hardware description language (HDL), making it very suitable for system-on-chip (SoC) and system-level applications.
Abstract: In this paper, we propose a very high-resolution all-digital phase-locked loop (ADPLL), which is designed with the cell library and described by Hardware Description Language (HDL). The proposed ADPLL uses a novel digitally controlled oscillator (DCO) to achieve 1.06ps resolution and the proposed DCO can extend the controllable range easily. The dead zone of the proposed phase/frequency detector (PFD) is 5ps. The proposed ADPLL can be easily ported to different process as a soft intellectual property (IP) block, making it very suitable for System-On-Chip (SoC) and system-level applications.

37 citations


Proceedings ArticleDOI
01 Dec 2006
TL;DR: A novel block scaling method and a new ping-pong cache-memory architecture are proposed to reduce the power consumption and hardware cost and by proper scheduling of the two data streams, the proposed design achieves better hardware utilization.
Abstract: This paper presents a low-power design of a two-stream MIMO FFT/IFFT processor for WiMAX applications A novel block scaling method and a new ping-pong cache-memory architecture are proposed to reduce the power consumption and hardware cost With these schemes, half the memory accesses and 64-Kbit memory can be saved Furthermore, by proper scheduling of the two data streams, the proposed design achieves better hardware utilization and can process two 2048-point FFTs/IFFTs consecutively within 2052 cycles A test chip of the proposed FFT/IFFT processor has been designed using UMC 013 mum 1P8M process with a core area of 1332times1590 mum2 The SQNR performance of the 2048-point FFT/IFFT is over 48 dB for QPSK and 16/64-QAM modulations Power dissipation of two 2048-point FFT computations is about 1726 mW at 2286 MHz which meets the maximum throughput rate of WiMAX applications

32 citations


Journal ArticleDOI
TL;DR: A low-complexity synchronizer combining data-partition-based correlation algorithms and dynamic-threshold design is proposed for orthogonal frequency division multiplexing based UWB system and provides a methodology to reduce design complexity with an acceptable performance loss.
Abstract: In current ultra-wideband (UWB) baseband synchronizer approaches, the parallel architecture is used to achieve over 500 MSamples/s throughput requirement. Therefore achieving low power and less area becomes the challenge of UWB baseband design. In this paper, a low-complexity synchronizer combining data-partition-based correlation algorithms and dynamic-threshold design is proposed for orthogonal frequency division multiplexing based UWB system. It provides a methodology to reduce design complexity with an acceptable performance loss. Based on the data-partition algorithms, both single auto-correlator and moving-average-free matched filter are developed with 528 Msample/s throughput for the 480 Mb/s UWB design. Simulation results show the synchronization loss can be limited to 0.8-dB signal-to-noise ratio for 8% system packet-error rate

28 citations


Proceedings ArticleDOI
01 Dec 2006
TL;DR: This paper presents a multi-tone CDMA (MT-CDMA) based system specification for wireless body area network (WBAN) applications and achieves PER=1% and allows interferer distance less than 1.01 m.
Abstract: This paper presents a multi-tone CDMA (MT-CDMA) based system specification for wireless body area network (WBAN) applications. According to the factors of bandwidth, carrier frequency, and electric field intensity defined in wireless medical telemetry service, the function blocks and data format are designed through the considerations of hardware cost, power consumption, and system performance. The design constraints for baseband processor, data conversion, and RF circuits are defined. This work achieves PER=1% and allows interferer distance less than 1.01 m. In energy-spectrum efficiency, it provides 3.125 times energy-spectrum product less than the state-of-the-art systems for healthcare applications.

25 citations


Journal ArticleDOI
TL;DR: This research investigates an optimized memory scheme and rescheduled data flow to reduce power consumption and chip area and shows the power dissipation for turbo and Viterbi decodings is 83 mW and 25.1 mW respectively.
Abstract: This paper presents a channel decoder that completes both turbo and Viterbi decodings, which are pervasive in many wireless communication systems, especially those that require very low signal-to-noise ratios. The trellis decoding algorithm merges them with less redundancy. However, the implementation is still challenging due to the power consumption in wearable devices. This research investigates an optimized memory scheme and rescheduled data flow to reduce power consumption and chip area. The memory access is reduced by buffering the input symbols, and the area is reduced by reducing the embedded interleaver memory. A test chip is fabricated in a 1.8 V 0.18-/spl mu/m standard CMOS technology and verified to provide 4.25-Mb/s turbo decoding and 5.26-Mb/s Viterbi decoding. The measured power dissipation is 83 mW, while decoding a 3.1 Mb/s turbo encoded data stream with six iterations for each block. The power consumption in Viterbi decoding is 25.1 mW in the 1-Mb/s data rate. The measurement shows the power dissipation is 83 mW for the turbo decoding with six iterations at 3.1 Mb/s, and 25.1 mW for the Viterbi decoding at 1 Mb/s.

23 citations


Proceedings ArticleDOI
26 Apr 2006
TL;DR: In this paper, an all-digital Delay-Locked Loop (DLL) for DDR SDRAM controller applications is presented, which can generate the required fixed timing delay (tSD) for the output data (DQ) correctly.
Abstract: This paper presents an all-digital Delay-Locked Loop (DLL) for DDR SDRAM controller applications. The presented all-digital, cell-based, DLL-based five-phase multi-phase clock generator can generate the required fixed timing delay (tSD) for DDR SDRAM controller to capture the output data (DQ) correctly. The proposed DLL-based multi-phase clock generator architecture can lock to the harmonic of input clock period and still get a correct multi-phase clock output. Hence the design challenges to build a high resolution delay line with minimum intrinsic delay can be reduced. Simulation results and chip measurement results show that the proposed DLL can generate desired tSD delay with error < 7.6%). The power consumption of the proposed DLL is 4.1mW (at DDR-200) and is 9.0mW (at DDR-400).

16 citations


Proceedings ArticleDOI
18 Sep 2006
TL;DR: A DVB-T/H baseband receiver with multi-stage power control, 2D linear channel equalizer, synchronizer, 2/4/8k-point FFT, and Viterbi/RS decoder is implemented in 0.18mum CMOS.
Abstract: A DVB-T/H baseband receiver with multi-stage power control, 2D linear channel equalizer, synchronizer, 2/4/8k-point FFT, and Viterbi/RS decoder is implemented in 0.18mum CMOS. At the highest data rate of 31.67Mb/s, it overcomes 70Hz Doppler frequency and consumes 250mW with a die size of 6.9 times 5.8mm2

12 citations


Proceedings ArticleDOI
01 Dec 2006
TL;DR: The proposed fast-lock-in all-digital phase-locked loop (ADPLL), which is designed with the cell library and described by hardware description language (HDL), achieves high-resolution with 0.93ps resolution and can extend the controllable range easily.
Abstract: In this paper, we propose a fast-lock-in all-digital phase-locked loop (ADPLL), which is designed with the cell library and described by Hardware Description Language (HDL). The proposed ADPLL uses a novel 2-level flash time-to-digital converter (TDC) to lock in within 2 reference clock cycles. The novel digitally controlled oscillator (DCO) achieves high-resolution with 0.93ps resolution and can extend the controllable range easily. In addition to high-resolution, the power consumption of the proposed DCO can be lowered as 110?W@200MHz). The proposed ADPLL can be easily ported to different process as a soft intellectual property (IP), making it very suitable for System-On-Chip (SoC) applications as well as system-level power management.

11 citations


Proceedings ArticleDOI
15 Jun 2006
TL;DR: A MB-OFDM UWB baseband transceiver with I/Q-mismatch (IQM) calibration and dynamic sampling (DS) is presented, which calibrates IQM by 2dB gain and 20 degree phase errors, releasing IQM tolerance to 10times of existing designs.
Abstract: A MB-OFDM UWB baseband transceiver with I/Q-mismatch (IQM) calibration and dynamic sampling (DS) is presented. It calibrates IQM by 2dB gain and 20 degree phase errors, releasing IQM tolerance to 10times of existing designs. The DS reduces ADC sampling rate to 1/9 ~ frac12 of existing designs, resulting in at least 43% ADC power saving. Measured power consumes 31.2mW at 480Mb/s data rate

Proceedings ArticleDOI
24 Jul 2006
TL;DR: A design of MPEG-2 and H.264/AVC video decoder is demonstrated in a 0.18mum CMOS, including improving area and power efficiency and power dissipation is greatly lowered through the architectural exploration.
Abstract: A design of MPEG-2 and H.264/AVC video decoder is demonstrated in a 0.18?m CMOS [1]. The key design issues involved in this advanced IC are discussed, including improving area and power efficiency. Power dissipation is greatly lowered through the architectural exploration. Measurement results show that MPEG-2 and H.264/AVC real-time decoding of QCIF@15fps are achieved at 1.15MHz with power dissipation of 108?W and 125?W respectively at 1V supply voltage.

Journal ArticleDOI
TL;DR: Design challenges for low-power and dual-video standard requirements, especially in mobile applications, are highlighted and several low- power techniques targeted at achieving lower memory requirements and processing cycles are described and discussed.
Abstract: The objective of this article is to highlight design challenges for low-power and dual-video standard requirements, especially in mobile applications. Due to the advent of the newly announced H.264, a generic problem of standard incompatibility has appeared between H.264 and prevalent MPEG-x video standards, which must be resolved on both algorithmic and architectural levels. Furthermore, several low-power techniques targeted at achieving lower memory requirements and processing cycles are also described and discussed

Proceedings Article
01 Jan 2006
TL;DR: In this paper, a 0.18µm CMOS-based decoder for MPEG-2 and H.264/AVC video decoder is demonstrated, and the key design issues involved in this advanced IC are discussed, including improving area and power efficiency.
Abstract: of MPEG-2 and H.264/AVC video decoder is demonstrated in a 0.18µm CMOS (1). The key design issues involved in this advanced IC are discussed, including improving area and power efficiency. Power dissipation is greatly lowered through the architectural exploration. Measurement results show that MPEG-2 and H.264/AVC real-time decoding of QCIF@15fps are achieved at 1.15MHz with power dissipation of 108µW and 125µW respectively at 1V supply voltage.

16 Apr 2006
TL;DR: This paper presents a multi-input multi-output (MIMO) multi-layered perceptron neural network with backpropagation algorithm (MLP/BP) that can recover severe distorted NRZ data as well as suppress intersymbol interference (ISI) and co-channel interference (CCI).
Abstract: This paper presents a multi-input multi-output (MIMO) multi-layered perceptron neural network with backpropagation algorithm (MLP/BP). The proposal is a waveform equalizer for distorted nonreturn-to-zero (NRZ) data recovery in band-limited channels with co-channel interference (CCI). From the simulation results, we note that the proposed design can recover severe distorted NRZ data as well as suppress intersymbol interference (ISI) and co-channel interference. As a result, the better performance as compared to LMS DFEs is achieved in the band-limited channels where the data rate is ten times as much as the channel bandwidth.

Proceedings ArticleDOI
26 Apr 2006
TL;DR: This paper exploits three-level of memory hierarchy to break the data dependency and reduce the number of access for external memory, and applies line-pixel-lookahead (LPL) scheme to make a compromise between power consumption and internal memory cost.
Abstract: Memory storage is crucial power factor in H.264/AVC video decoding system. In this paper, we exploit three-level of memory hierarchy to break the data dependency and reduce the number of access for external memory. Further, we apply line-pixel-lookahead (LPL) scheme to make a compromise between power consumption and internal memory cost. Experimental results prove that about 50% of memory power reduction can be achieved as compared to comparable decoders without exploiting memory hierarchy (To Wei Chen et al., 2005 and Hu et al., 2004)

Proceedings ArticleDOI
04 Dec 2006
TL;DR: A new VLC-based algorithm for video decoding system is introduced to reduce bit streams needed to decode symbol data and can be merged with all Huffman table-based VLC systems, such as MPEG-2, H.264, for multi-mode video decoding applications.
Abstract: A new VLC-based algorithm for video decoding system is introduced. This self-grouping algorithm is developed to reduce bit streams needed to decode symbol data. In addition the proposal can be merged with all Huffman table-based VLC systems, such as MPEG-2, H.264, for multi-mode video decoding applications. Simulation results show about 73% in Huffman-based table size can be reduced compared to the state-of-the-art design (Shieh et al., 2001)

Proceedings ArticleDOI
26 Apr 2006
TL;DR: In this paper, a general design concept of COFDM systems is reviewed and several key issues in SoC realization are highlighted, and a system-level design flow by taking into account both performance indices and hardware complexity is introduced.
Abstract: Coded Orthogonal Frequency Division Multiplexing (COFDM) technology has been widely accepted in many communication systems due to both bandwidth efficiency and robustness to channel distortion. This opens a great of opportunities for SoC society to deal with design complexity by exploiting benefits of giga-scale integration. In this paper, we'll first review the general design concept of COFDM systems and then highlight several key issues in SoC realization. Then a system-level design flow by taking into account both performance indices and hardware complexity will be introduced. Several core modules related to COFDM system will also be addressed to see how better solutions can be achieved, especially for wireless applications. Finally two case studies on WLAN and OFDM-UWB will be discussed to demonstrate our proposals as well as to provide some directions for further research.

Proceedings ArticleDOI
26 Apr 2006
TL;DR: In this paper, the authors proposed an approach of equalizer for COFDM broadcasting systems, which is optimized for hardware cost, and power consumption without performance lost, and compared with existing design for 0.18?m process, the proposed design area is reduced to 9.5%.
Abstract: In COFDM receiver, the full time operation equalizer is the first stage of data recovery. Since the performance of equalizer will influence the overall system performance, the hardware cost and power consumption of equalizer may not be the first issue in existing designs. In this paper, we propose an approach of equalizer for COFDM broadcasting systems. This equalizer is optimized for hardware cost, and power consumption without performance lost. Comparing with existing design for 0.18?m process, the proposed design area is reduced to 9.5% and the power consumption is reduced to 30.1% of division-based equalizer, respectively.

Proceedings ArticleDOI
11 Sep 2006
TL;DR: A design of MPEG-2 and H.264/AVC video decoder is demonstrated in a 0.18/spl mu/m CMOS (Tsu-Ming Liu, 2006) and power dissipation is greatly lowered through the architectural exploration.
Abstract: A design of MPEG-2 and H.264/AVC video decoder is demonstrated in a 0.18/spl mu/m CMOS (Tsu-Ming Liu, 2006). The key design issues involved in this advanced IC are discussed, including improving area and power efficiency. Power dissipation is greatly lowered through the architectural exploration. Measurement results show that MPEG-2 and H.264/AVC real-time decoding of QCIF@15fps are achieved at 1.15MHz with power dissipation of 108/spl mu/W and 125/spl mu/W respectively at 1V supply voltage.


Patent
21 May 2006
TL;DR: In this article, a transmission method combining trellis coded modulation (TCM) code and low-density parity checking code (LDPC) and structure thereof was proposed to improve transmission quality.
Abstract: This invention reveals a transmission method combining trellis coded modulation (TCM) code and low-density parity checking code (LDPC) and structure thereof It employs the LDPC codes with better error correction capability incorporated with the TCM technique to improve transmission quality It defines TCM with different transmission speed In addition, TCM utilizes less status count to achieve better performance than the conventional spread spectrum technique Eventually, it reduces the hardware complexity of high-speed transmission application field

Journal ArticleDOI
TL;DR: A context adaptive bit-plane coding (CABIC) with a stochastic bit reshuffling (SBR) scheme to deliver higher coding efficiency and better subjective quality for fine granular scalable (FGS) video coding.
Abstract: In this paper, we propose a context adaptive bit-plane coding (CABIC) with a stochastic bit reshuffling (SBR) scheme to deliver higher coding efficiency and better subjective quality for fine granular scalable (FGS) video coding. Traditional bit-plane coding in FGS algorithm suffers from poor coding efficiency and subjective quality. To improve coding efficiency, our CABIC constructs context models based on both the energy distribution in a block and the spatial correlations in the adjacent blocks. Moreover, it exploits the context across bit-planes to save side information. To improve subjective quality, our SBR reorders the coefficient bits by their estimated rate-distortion performance. Particularly, we model transform coefficients with Laplacian distributions and incorporate them into the context probability models for content-aware parameter estimation. Moreover, our SBR is implemented with a dynamic priority management that uses a low-complexity dynamic memory organization. Experimental results show that our CABIC improves the PSNR by 0.5~1.0 dB at medium and high bit rates. While maintaining similar or even higher coding efficiency, our SBR improves the subjective quality

Proceedings ArticleDOI
01 Dec 2006
TL;DR: A new CAVLC decoding architecture with a soft-input design concept is proposed to localize the erroneous position at macroblock (MB) levels and more than 1dB of PSNR gain can be achieved under the 2.7times10-3 of bit error rates.
Abstract: In this paper, a new CAVLC decoding architecture with a soft-input design concept is proposed. The soft-decision information was introduced to localize the erroneous position at macroblock (MB) levels. Specifically, the minimal square difference between the received soft streams and decoded codewords was compared and selected. The corrupted MBs can be early detected and thereby concealed from neighboring pixels. Therefore, more than 1dB of PSNR gain can be achieved under the 2.7times10-3 of bit error rates