Home
/
Authors
/
Tung-Chien Chen

Author

Tung-Chien Chen

Bio: Tung-Chien Chen is an academic researcher from National Taiwan University. The author has contributed to research in topics: Encoder & Motion estimation. The author has an hindex of 22, co-authored 62 publications receiving 2323 citations.

Papers published on a yearly basis

2018
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Analysis, fast algorithm, and VLSI architecture design for H.264/AVC intra frame coder

[...]

Yu-Wen Huang¹, Bing-Yu Hsieh¹, Tung-Chien Chen¹, Liang-Gee Chen¹•Institutions (1)

National Taiwan University¹

01 Mar 2005-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This paper proposed two solutions for platform-based design of H.264/AVC intra frame coder with comprehensive analysis of instructions and exploration of parallelism, and proposed a system architecture with four-parallel intra prediction and mode decision to enhance the processing capability.

...read moreread less

Abstract: Intra prediction with rate-distortion constrained mode decision is the most important technology in H.264/AVC intra frame coder, which is competitive with the latest image coding standard JPEG2000, in terms of both coding performance and computational complexity. The predictor generation engine for intra prediction and the transform engine for mode decision are critical because the operations require a lot of memory access and occupy 80% of the computation time of the entire intra compression process. A low cost general purpose processor cannot process these operations in real time. In this paper, we proposed two solutions for platform-based design of H.264/AVC intra frame coder. One solution is a software implementation targeted at low-end applications. Context-based decimation of unlikely candidates, subsampling of matching operations, bit-width truncation to reduce the computations, and interleaved full-search/partial-search strategy to stop the error propagation and to maintain the image quality, are proposed and combined as our fast algorithm. Experimental results show that our method can reduce 60% of the computation used for intra prediction and mode decision while keeping the peak signal-to-noise ratio degradation less than 0.3 dB. The other solution is a hardware accelerator targeted at high-end applications. After comprehensive analysis of instructions and exploration of parallelism, we proposed our system architecture with four-parallel intra prediction and mode decision to enhance the processing capability. Hadamard-based mode decision is modified as discrete cosine transform-based version to reduce 40% of memory access. Two-stage macroblock pipelining is also proposed to double the processing speed and hardware utilization. The other features of our design are reconfigurable predictor generator supporting all of the 13 intra prediction modes, parallel multitransform and inverse transform engine, and CAVLC bitstream engine. A prototype chip is fabricated with TSMC 0.25-/spl mu/m CMOS 1P5M technology. Simulation results show that our implementation can process 16 mega-pixels (4096/spl times/4096) within 1 s, or namely 720/spl times/480 4:2:0 30 Hz video in real time, at the operating frequency of 54 MHz. The transistor count is 429 K, and the core size is only 1.855/spl times/1.885 mm/sup 2/.

...read moreread less

331 citations

Journal Article•DOI•

Analysis and architecture design of an HDTV720p 30 frames/s H.264/AVC encoder

[...]

Tung-Chien Chen¹, Shao-Yi Chien¹, Yu-Wen Huang¹, Chen-Han Tsai¹, Ching-Yeh Chen¹, To-Wei Chen¹, Liang-Gee Chen¹ - Show less +3 more•Institutions (1)

National Taiwan University¹

01 Sep 2006-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The four-stage macroblock pipelined system architecture is proposed with an efficient scheduling and memory hierarchy, and the prototype chip of the efficient H.264/AVC video encoder for HDTV applications is implemented.

...read moreread less

Abstract: H.264/AVC significantly outperforms previous video coding standards with many new coding tools. However, the better performance comes at the price of the extraordinarily huge computational complexity and memory access requirement, which makes it difficult to design a hardwired encoder for real-time applications. In addition, due to the complex, sequential, and highly data-dependent characteristics of the essential algorithms in H.264/AVC, both the pipelining and the parallel processing techniques are constrained to be employed. The hardware utilization and throughput are also decreased because of the block/MB/frame-level reconstruction loops. In this paper, we describe our techniques to design the H.264/AVC video encoder for HDTV applications. On the system design level, in consideration of the characteristics of the key components and the reconstruction loops, the four-stage macroblock pipelined system architecture is first proposed with an efficient scheduling and memory hierarchy. On the module design level, the design considerations of the significant modules are addressed followed by the hardware architectures, including low-bandwidth integer motion estimation, parallel fractional motion estimation, reconfigurable intrapredictor generator, dual-buffer block-pipelined entropy coder, and deblocking filter. With these techniques, the prototype chip of the efficient H.264/AVC encoder is implemented with 922.8 K logic gates and 34.72-KB SRAM at 108-MHz operation frequency.

...read moreread less

295 citations

Journal Article•DOI•

Analysis and architecture design of variable block-size motion estimation for H.264/AVC

[...]

Ching-Yeh Chen¹, Shao-Yi Chien¹, Yu-Wen Huang¹, Tung-Chien Chen¹, Tu-Chih Wang¹, Liang-Gee Chen¹ - Show less +2 more•Institutions (1)

National Taiwan University¹

27 Mar 2006-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: Two hardware architectures are proposed that can support traditional fixed block-size motion estimation as well as VBSME with less chip area overhead compared to previous approaches and an eight-parallel SAD tree with a shared reference buffer for H.264/AVC integer motion estimation is proposed.

...read moreread less

Abstract: Variable block-size motion estimation (VBSME) has become an important video coding technique, but it increases the difficulty of hardware design. In this paper, we use inter-/intra-level classification and various data flows to analyze the impact of supporting VBSME in different hardware architectures. Furthermore, we propose two hardware architectures that can support traditional fixed block-size motion estimation as well as VBSME with less chip area overhead compared to previous approaches. By broadcasting reference pixel rows and propagating partial sums of absolute differences (SADs), the first design has the fewer reference pixel registers and a shorter critical path. The second design utilizes a two-dimensional distortion array and one adder tree with the reference buffer that can maximize the data reuse between successive searching candidates. The first design is suitable for low resolution or a small search range, and the second design has advantages of supporting a high degree of parallelism and VBSME. Finally, we propose an eight-parallel SAD tree with a shared reference buffer for H.264/AVC integer motion estimation (IME). Its processing ability is eight times of the single SAD tree, but the reference buffer size is only doubled. Moreover, the most critical issue of H.264 IME, which is huge memory bandwidth, is overcome. We are able to save 99.9% off-chip memory bandwidth and 99.22% on-chip memory bandwidth. We demonstrate a 720-p, 30-fps solution at 108 MHz with 330.2k gate count and 208k bits on-chip memory

...read moreread less

269 citations

Proceedings Article•DOI•

A 1.3TOPS H.264/AVC single-chip encoder for HDTV applications

[...]

Yu-Wen Huang¹, Tung-Chien Chen¹, Chen-Han Tsai¹, Ching-Yeh Chen¹, To-Wei Chen¹, Chi-Shi Chen, Chun-Fu Shen, Shyh-Yih Ma, Tu-Chih Wang, Bing-Yu Hsieh, Hung-Chi Fang, Liang-Gee Chen - Show less +8 more•Institutions (1)

National Taiwan University¹

29 Aug 2005

TL;DR: An H.264/AVC encoder is implemented on a 31.72mm/sup 2/ die with 0.18/spl mu/m CMOS technology and the encoded video quality is competitive with reference software requiring 3.6TOPS on a general-purpose processor-based platform.

...read moreread less

Abstract: An H.264/AVC encoder is implemented on a 31.72mm/sup 2/ die with 0.18/spl mu/m CMOS technology. A four-stage macroblock pipelined architecture encodes 720p 30f/s HDTV videos in real time at 108MHz. The encoded video quality is competitive with reference software requiring 3.6TOPS on a general-purpose processor-based platform.

...read moreread less

142 citations

Proceedings Article•DOI•

Fully utilized and reusable architecture for fractional motion estimation of H.264/AVC

[...]

Tung-Chien Chen¹, Yu-Wen Huang¹, Liang-Gee Chen¹•Institutions (1)

National Taiwan University¹

17 May 2004

TL;DR: A new VLSI architecture for fractional motion estimation of the H.264/AVC video compression standard is contributed, characterized by a reusable feature, that can support situations in different specifications, multiple standards, fast algorithms and some cost considerations.

...read moreread less

Abstract: We contributed a new VLSI architecture for fractional motion estimation of the H.264/AVC video compression standard. Seven inter-related loops extracted from the complex procedure are analyzed and two decomposing techniques are proposed to parallelize the algorithm for hardware with a regular schedule and full utilization. The proposed architecture, also characterized by a reusable feature, can support situations in different specifications, multiple standards, fast algorithms and some cost considerations. H.264/AVC baseline profile level 3 with complete Lagrangian mode decision can be realized with 290K gates at operating frequency of 100 MHz. It is a useful intellectual property (IP) design for platform based multimedia systems.

...read moreread less

128 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Understanding sources of inefficiency in general-purpose chips

[...]

Rehan Hameed¹, Wajahat Qadeer¹, Megan Wachs¹, Omid Azizi¹, Alex Solomatnikov, Benjamin C. Lee¹, Stephen Richardson¹, Christos Kozyrakis¹, Mark Horowitz¹ - Show less +5 more•Institutions (1)

Stanford University¹

19 Jun 2010

TL;DR: The sources of these performance and energy overheads in general-purpose processing systems are explored by quantifying the overheads of a 720p HD H.264 encoder running on a general- Purpose CMP system and exploring methods to eliminate these overheads by transforming the CPU into a specialized system for H. 264 encoding.

...read moreread less

Abstract: Due to their high volume, general-purpose processors, and now chip multiprocessors (CMPs), are much more cost effective than ASICs, but lag significantly in terms of performance and energy efficiency. This paper explores the sources of these performance and energy overheads in general-purpose processing systems by quantifying the overheads of a 720p HD H.264 encoder running on a general-purpose CMP system. It then explores methods to eliminate these overheads by transforming the CPU into a specialized system for H.264 encoding. We evaluate the gains from customizations useful to broad classes of algorithms, such as SIMD units, as well as those specific to particular computation, such as customized storage and functional units. The ASIC is 500x more energy efficient than our original four-processor CMP. Broadly applicable optimizations improve performance by 10x and energy by 7x. However, the very low energy costs of actual core ops (100s fJ in 90nm) mean that over 90% of the energy used in these solutions is still "overhead". Achieving ASIC-like performance and efficiency requires algorithm-specific optimizations. For each sub-algorithm of H.264, we create a large, specialized functional unit that is capable of executing 100s of operations per instruction. This improves performance and energy by an additional 25x and the final customized CMP matches an ASIC solution's performance within 3x of its energy and within comparable area.

...read moreread less

460 citations

Journal Article•DOI•

On the computational complexity of the empirical mode decomposition algorithm

[...]

Yung-Hung Wang¹, Chien-Hung Yeh¹, Hsu Wen Vincent Young¹, Kun Hu², Men Tzung Lo¹ - Show less +1 more•Institutions (2)

National Central University¹, Brigham and Women's Hospital²

15 Apr 2014-Physica A-statistical Mechanics and Its Applications

TL;DR: This study proves that the time complexity of the EMD/EEMD is actually equivalent to that of the Fourier Transform.

...read moreread less

Abstract: It has been claimed that the empirical mode decomposition (EMD) and its improved version the ensemble EMD (EEMD) are computation intensive. In this study we will prove that the time complexity of the EMD/EEMD, which has never been analyzed before, is actually equivalent to that of the Fourier Transform. Numerical examples are presented to verify that EMD/EEMD is, in fact, a computationally efficient method.

...read moreread less

324 citations

Proceedings Article•DOI•

Fast mode decision algorithm for intra prediction in HEVC

[...]

Liang Zhao¹, Li Zhang², Siwei Ma², Debin Zhao¹•Institutions (2)

Harbin Institute of Technology¹, Peking University²

29 Dec 2011

TL;DR: Experimental results show that the fast intra mode decision scheme provides almost 20% time savings in all intra low complexity cases on average with negligible loss of coding efficiency.

...read moreread less

Abstract: As the next generation standard of video coding, the High Efficiency Video Coding (HEVC) is intended to provide significantly better coding efficiency than all existing video coding standards. To improve the coding efficiency of intra frame coding, up to 34 intra prediction modes are defined in HEVC. The best mode among these pre-defined intra prediction modes is selected by rate-distortion optimization (RDO) for each block. If all directions are tested in the RDO process, it will be very time-consuming. To alleviate the encoder computation load, this paper proposes a new method to reduce the candidates in RDO process. In addition, the direction information of the neighboring blocks is made full use of to speed up intra mode decision. Experimental results show that the proposed scheme provides 20% and 28% time savings in intra high efficiency and low complexity cases on average compared to the default encoding scheme in HM 1.0 with almost the same coding efficiency. This algorithm has been proposed to HEVC standard and partially adopted into the HEVC test model.

...read moreread less

311 citations

Journal Article•DOI•

Fast CU Splitting and Pruning for Suboptimal CU Partitioning in HEVC Intra Coding

[...]

Seung-Hyun Cho¹, Munchurl Kim¹•Institutions (1)

KAIST¹

01 Sep 2013-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A fast CU splitting and pruning method is presented for HEVC intra coding, which allows for significant reduction in computational complexity with small degradations in rate-distortion (RD) performance.

...read moreread less

Abstract: High Efficiency Video Coding (HEVC), a new video coding standard currently being established, adopts a quadtree-based Coding Unit (CU) block partitioning structure that is flexible in adapting various texture characteristics of images. However, this causes a dramatic increase in computational complexity compared to previous video coding standards due to the necessity of finding the best CU partitions. In this paper, a fast CU splitting and pruning method is presented for HEVC intra coding, which allows for significant reduction in computational complexity with small degradations in rate-distortion (RD) performance. The proposed fast splitting and pruning method is performed in two complementary steps: 1) early CU split decision and 2) early CU pruning decision. For CU blocks, the early CU splitting and pruning tests are performed at each CU depth level according to a Bayes decision rule method based on low-complexity RD costs and full RD costs, respectively. The statistical parameters for the early CU split and pruning tests are periodically updated on the fly for each CU depth level to cope with varying signal characteristics. Experimental results show that our proposed fast CU splitting and pruning method reduces the computational complexity of the current HM to about 50% in encoding time with only 0.6% increases in BD rate.

...read moreread less

306 citations