scispace - formally typeset
Search or ask a question

Showing papers by "Huizhu Jia published in 2013"


Journal ArticleDOI
Chuang Zhu1, Huizhu Jia1, Shanghang Zhang1, Xiaofeng Huang1, Xiaodong Xie1, Wen Gao1 
TL;DR: A full-featured RDO-based mode decision (MD) algorithm is presented, which makes more modes enter RDO process, and the throughput of R DO-based MD pipeline is analyzed and modeled, which can achieve the highest throughput to alleviate the RDO burden.
Abstract: Rate distortion optimization (RDO) is the best known mode decision method, while the high implementation complexity limits its applications and almost no real-time hardware encoder is truly full-featured RDO based. In this paper, first, a full-featured RDO-based mode decision (MD) algorithm is presented, which makes more modes enter RDO process. Second, the throughput of RDO-based MD pipeline is thoroughly analyzed and modeled. Third, a highly efficient adaptive block-level pipelining architecture of RDO-based MD for AVS video encoder is proposed which can achieve the highest throughput to alleviate the RDO burden. Our design is described in high-level Verilog/VHDL hardware description language and implemented under SMIC 0.18- μm CMOS technology with 232 K logic gates and 85 Kb SRAMs. The implementation results validate our architectural design and the proposed architecture can support real time processing of 1080P@30 fps. The coding efficiency of our adopted method far outperforms (0.57 dB PSNR gain in average) the traditional low-complexity MD (LCMD) methods and the throughput of our designed pipeline is increased by 11.3%, 19% and 17% for I, P and B frames, respectively, compared with the existed RDO-based architecture.

5 citations


Proceedings ArticleDOI
Xiaofeng Huang1, Chuang Zhu1, Lei Zhang1, Kaijin Wei1, Huizhu Jia1, Don Xie1, Wen Gao1 
15 Jul 2013
TL;DR: Experimental results show that the data access overhead cycles of the proposed interface design are reduced significantly and the bandwidth utilization is improved by up to 10% compared to the tile-linear address mapping scheme.
Abstract: This paper presents a highly efficient external memory interface architecture to improve memory bandwidth utilization for AVS HD video encoder. Both burst and bank interleaved SDRAM accesses are intelligently adopted in the memory interface design. Our proposed architecture is composed of an address mapping layer and an arbitration layer. In the address mapping layer, according to the data request pattern and quantity, the clients in the encoder are divided into four groups which are assigned to different banks of the SDRAM. In each group, efficient address mapping schemes are proposed to minimize inner client overhead. In the arbitration layer, a straightforward group-based interleaved arbitration scheme is proposed to minimize inter client overhead. Experimental results show that the data access overhead cycles of our proposed interface design are reduced significantly and the bandwidth utilization is improved by up to 10% compared to the tile-linear address mapping scheme.

2 citations


Proceedings ArticleDOI
Yuan Li1, Shanghang Zhang1, Huizhu Jia1, Xiaodong Xie1, Wen Gao1 
19 May 2013
TL;DR: A novel binary arithmetic coder (BAC) architecture with throughput of 2~4 bins per cycle sufficient for real-time encoding is introduced and a hybrid context memory scheme is presented to meet the throughput requirement on the BAC.
Abstract: In this paper, we propose a high-throughput low-latency arithmetic encoder (AE) design suitable for high definition (HD) real-time applications employing advanced video coding standards such as H.264/AVC or AVS and using a macroblock (MB) level pipeline. First, in order to derive the performance requirement on the AE, a buffer model in connected with which it is designed is thoroughly analyzed. Then, using joint algorithm-architecture optimization and multi-bin processing techniques, we introduce a novel binary arithmetic coder (BAC) architecture with throughput of 2~4 bins per cycle sufficient for real-time encoding. Furthermore, a hybrid context memory scheme is presented to meet the throughput requirement on the BAC. Simulation result shows that our design can support 1080p at 60 fps for AVS HDTV real-time coding with a bin rate up to 107K per MB line. Synthesized with the TSMC 0.13μm technology, the AE can run at 200MHz and costs 47.3K gates. By operating at 130MHz, the design is also verified in an AVS HD encoder on a Xilinx Virtex-6 FPGA prototype board for 1080p at 30 fps.

1 citations


Book ChapterDOI
Tongbing Cui1, Chuang Zhu1, Yangang Cai1, Meng Li1, Huizhu Jia1, Don Xie1, Wen Gao1 
13 Dec 2013
TL;DR: This paper analyzes the time consumption of the bottlenecks of the RDO-based MD, and proposes an efficient zigzag scanning and entropy coding architecture, which is realized in high-level Verilog/VHDL hardware description language and implemented in AVS encoder.
Abstract: Rate distortion optimization (RDO) technique is the best known mode decision method in recent video coding standard, such as H.264 and AVS. However, the unbearable computational burden limits its application. According to the proposed block-level pipeline architecture of RDO-based MD, we find that zigzag scanning and entropy coding are the bottlenecks. In our paper, we firstly analyze the time consumption of the bottlenecks, and then we propose our efficient zigzag scanning and entropy coding architecture. Finally, our enhanced architecture is implemented in AVS encoder. The experimental results show that 20% throughput can be increased compared with the 4-way parallel scanning and entropy coding. With the proposed architecture, the real time RDO-based MD processing of [email protected] can be supported. And our design is realized in high-level Verilog/VHDL hardware description language and implemented under SMIC 0.18μm CMOS technology with 50K logic gates and 6 KB SRAMs at 237MHZ operation frequency.

1 citations


Book ChapterDOI
Xianghu Ji1, Jie Liu1, Chuang Zhu1, Huizhu Jia1, Xiaodong Xie1, Wen Gao1 
13 Dec 2013
TL;DR: The experimental results show that the proposed Binary Adaptive Luminance Mapping (BALM) algorithm achieves higher rate-distortion (RD) performance compared with previous proposed bit reduction approach and PSNR degradation is relatively small when NTB.
Abstract: Integer Motion Estimation (IME) for block-based video coding introduces significant challenges in power consumption and silicon area usage with the adoption of more complex coding tools and higher resolution. To conquer these problems, this paper proposes an Binary Adaptive Luminance Mapping (BALM) algorithm by exploiting the local correlation in image and give a Very-large-scale Integration (VLSI) architecture for its implementation. We test the algorithm performance with different Number of Truncated Bits (NTB). And, the experimental results show that our proposed BALM achieves higher rate-distortion (RD) performance compared with previous proposed bit reduction approach and PSNR degradation is relatively small when NTB.5 using our scheme. And, the NTB4 BALM can achieve 37.3% silicon area saving and power consumption reduction with just PSNR loss of 0.1 dB in our proposed IME architecture.