scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

2-Dimensional systolic architecture for H.264/AVC variable block size motion estimation

01 Nov 2014-pp 41-44
TL;DR: A new design for the implementation of Full-Search (FS) Variable Block Size (VBS) Motion Estimation (ME) and the Sum of Absolute Differences (SAD) is presented by re-using the outputs, which features high efficiency in terms of operating frequency and reduction in hardware complexity.
Abstract: Video coding is used for lot of multimedia purposes like video conferencing, digital storage media, Internet streaming and television broadcasting. This paper presents a new design for the implementation of Full-Search (FS) Variable Block Size (VBS) Motion Estimation (ME), which is a key issue of different video compression standards such as MPEG-1, MPEG-2, MPEG-4 Visual, H.261, H.263 and H.264. The FS algorithm is widely used for implementation of ME in video compression algorithms. This design is fully parametric in terms of block size, which is variable, and the Sum of Absolute Differences (SAD) is presented by re-using the outputs. The design features high efficiency in terms of operating frequency and reduction in hardware complexity. These architectures are designed using Verilog Hardware Description Language (HDL) and the functionalities are verified using ModelSim Simulator. For two different designs, namely 1-D and 2-D systolic architectures are analyzed in terms of frequency, gate count, total power. The design is synthesized using CADENCE RTL compiler with TSMC 90nm standard cell library. The operating frequency of 1-D design is 323.20 MHz and 2-D design is 166.67 MHz and the gate count for 1-D is around 5k and for 2-D is around 21k gates and these designs can treat up to 41 Motion Vectors.
References
More filters
Proceedings ArticleDOI
25 May 2003
TL;DR: A new hardware architecture for variable block size motion estimation with full search at integer-pixel accuracy is proposed and can achieve real-time applications under the operating frequency of 64.11 MHz for 720/spl times/480 frame at 30 Hz.
Abstract: Variable block size motion estimation is adopted in the new video coding standard, MPEG-4 AVC/JVT/ITU-T H.264, due to its superior performance compared to the advanced prediction mode in MPEG-4 and H.263+. In this paper, we modified the reference software in a hardware-friendly way. Our main idea is to convert the sequential processing of each 8/spl times/8 sub-partition of a macro-block into parallel processing without sacrifice of video quality. Based on our algorithm, we proposed a new hardware architecture for variable block size motion estimation with full search at integer-pixel accuracy. The features of our design are 2-D processing element array with 1-D data broadcasting and 1-D partial result reuse, parallel adder tree, memory interleaving scheme, and high utilization. Simulation shows that our chip can achieve real-time applications under the operating frequency of 64.11 MHz for 720/spl times/480 frame at 30 Hz with search range of [-24, +23] in horizontal direction and [-16, +15] in vertical direction, which requires the computation power of more than 50 GOPS.

135 citations

Proceedings ArticleDOI
18 Jan 2005
TL;DR: The proposed VBSME can achieve 100% PE utilization by employing a preload register and a search data buffer inside each PE and allow real-time processing of 4CIF(704x576) video with 15 fps at 100 Mhz for a search range of [-32~+31].
Abstract: We describe a fast VLSI architecture for full-search motion estimation for the blocks with 7 different sizes in MPEG-4 AVC/H.264. The proposed variable block size motion estimation (VBSME) architecture consists of a 16/spl times/16 PE array, an adder tree and comparators to find all 41 motion vectors and their minimum SADs for the blocks of 16/spl times/16, 16/spl times/8, 8/spl times/16, 8/spl times/8, 8/spl times/4, 4/spl times/8 and 4/spl times/4. It employs a 2D datapath and its control of the search area data is simple and regular. The proposed VBSME can achieve 100% PE utilization by employing a preload register and a search data buffer inside each PE and allow real-time processing of 4CIF(704/spl times/576) video with 15 fps at 100 MHz for a search range of |-32/spl sim/+31|.

96 citations

Journal ArticleDOI
TL;DR: A novel flexible VLSI architecture for the implementation of variable block size motion estimation (VBSME) that has lower latency and higher throughput over other exiting VBSME architectures for the hardware implementation of H.264 encoders.
Abstract: This paper proposes a novel flexible VLSI architecture for the implementation of variable block size motion estimation (VBSME). The architecture is able to perform a full motion search on integral multiples of 4/spl times/4 blocks sizes. To use the architecture, each 16/spl times/16 macroblock of the source frames should be partitioned into sixteen 4/spl times/4 non-overlapping subblocks, called primitive subblocks. The architecture contains sixteen modules and one VBSME processor. Each module, realized by cascading ID systolic arrays, is responsible for the block-matching operations of a different primitive subblock. The realization has the advantages of high throughput, high flexibility and 100 % processing element (PE) utilization. The motion estimation of all the primitive subblocks is performed in parallel. Because these primitive subblocks can be used to form the 41 subblocks of different sizes specified by the H.264, the VBSME processor is employed to concurrently compute the sums of absolute differences (SADs) of all the 41 subblocks from the SADs of the primitive subblocks. This new architecture has lower latency and higher throughput over other exiting VBSME architectures for the hardware implementation of H.264 encoders.

75 citations

Journal ArticleDOI
TL;DR: A low-power full-search block matching (FSBM) motion-estimation design for the ITU-T recommendation H.263+ standard is proposed, which can deal with 8/spl times/8 and 16/ spl times/16 block size with different searching ranges.
Abstract: In this paper, a low-power full-search block matching (FSBM) motion-estimation design for the ITU-T recommendation H.263+ standard is proposed. New motion-estimation modes in H.263+ can be fully supported by our architecture. Unlike most previously presented motion-estimation chips, this design can deal with 8/spl times/8 and 16/spl times/16 block size with different searching ranges. Basically, the proposed architecture is composed of an integer pixel unit with 64 processing elements, and a half-pixel unit with interpolation, a control unit, and data registers. In order to minimize power consumption, gated-clock and dual-supply voltages are used. This design has been realized by TSMC 0.6 /spl mu/m SPTM CMOS technology. The power consumption is 423.8 mW at 60 MHz and the throughput is 36 fps in CIF format.

52 citations

Journal ArticleDOI
TL;DR: A flexible and powerful VLSI architecture for the implementation of a wide spectrum of full search and reduced complexity search block matching algorithms is presented and a full-custom (CMOS) implementation is described.
Abstract: A flexible and powerful VLSI architecture for the implementation of a wide spectrum of full search and reduced complexity search block matching algorithms is presented. Optimized efficiency for variable algorithm parameters is obtained by using a quadratic systolic array architecture with global accumulation, combined with a flexible meander-like data flow. Flexibility is further increased by cascadability and/or the possibility of parallel operation. Hardware overhead for particular algorithmic requirements, such as variable pixel resolution, subsampling with offset, and subpixel accuracy, is discussed in detail. A full-custom (CMOS) implementation for the architecture is described. >

49 citations