scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

High speed architecture for Variable Block Size Motion Estimation in H.264

25 Mar 2013-pp 131-134
TL;DR: A new architecture is developed for variable block size motion estimation using full search algorithm in this paper and the Sum of Absolute Differences (SAD) is presented by recycling the outputs of reduced sub-block calculations.
Abstract: By the arrival of latest video standards viz. MPEG-4 part 10 and H.264/H.26L, the usages of Advanced Video Coding (AVC) especially in the part of Variable Block Size (VBS) Motion Estimation (ME) are rising. A new architecture is developed for variable block size motion estimation using full search algorithm in this paper. There are two calculations carried out in this paper block size, which is variable, and another is the Sum of Absolute Differences (SAD), which are presented by recycling the outputs of reduced sub-block calculations. Mechanism that is incorporated by every processing element is shuffling mechanism. HDL verification is done through ModelSim simulator to verify the functionality. The design is implemented using TSMC 90nm CMOS technology. The frequency of the motion estimation block is 323.20 MHz, which can treat up to 41 Motion Vectors (MV).
Citations
More filters
Proceedings ArticleDOI
01 Nov 2014
TL;DR: A new design for the implementation of Full-Search (FS) Variable Block Size (VBS) Motion Estimation (ME) and the Sum of Absolute Differences (SAD) is presented by re-using the outputs, which features high efficiency in terms of operating frequency and reduction in hardware complexity.
Abstract: Video coding is used for lot of multimedia purposes like video conferencing, digital storage media, Internet streaming and television broadcasting. This paper presents a new design for the implementation of Full-Search (FS) Variable Block Size (VBS) Motion Estimation (ME), which is a key issue of different video compression standards such as MPEG-1, MPEG-2, MPEG-4 Visual, H.261, H.263 and H.264. The FS algorithm is widely used for implementation of ME in video compression algorithms. This design is fully parametric in terms of block size, which is variable, and the Sum of Absolute Differences (SAD) is presented by re-using the outputs. The design features high efficiency in terms of operating frequency and reduction in hardware complexity. These architectures are designed using Verilog Hardware Description Language (HDL) and the functionalities are verified using ModelSim Simulator. For two different designs, namely 1-D and 2-D systolic architectures are analyzed in terms of frequency, gate count, total power. The design is synthesized using CADENCE RTL compiler with TSMC 90nm standard cell library. The operating frequency of 1-D design is 323.20 MHz and 2-D design is 166.67 MHz and the gate count for 1-D is around 5k and for 2-D is around 21k gates and these designs can treat up to 41 Motion Vectors.
References
More filters
Journal ArticleDOI
TL;DR: A novel flexible VLSI architecture for the implementation of variable block size motion estimation (VBSME) that has lower latency and higher throughput over other exiting VBSME architectures for the hardware implementation of H.264 encoders.
Abstract: This paper proposes a novel flexible VLSI architecture for the implementation of variable block size motion estimation (VBSME). The architecture is able to perform a full motion search on integral multiples of 4/spl times/4 blocks sizes. To use the architecture, each 16/spl times/16 macroblock of the source frames should be partitioned into sixteen 4/spl times/4 non-overlapping subblocks, called primitive subblocks. The architecture contains sixteen modules and one VBSME processor. Each module, realized by cascading ID systolic arrays, is responsible for the block-matching operations of a different primitive subblock. The realization has the advantages of high throughput, high flexibility and 100 % processing element (PE) utilization. The motion estimation of all the primitive subblocks is performed in parallel. Because these primitive subblocks can be used to form the 41 subblocks of different sizes specified by the H.264, the VBSME processor is employed to concurrently compute the sums of absolute differences (SADs) of all the 41 subblocks from the SADs of the primitive subblocks. This new architecture has lower latency and higher throughput over other exiting VBSME architectures for the hardware implementation of H.264 encoders.

75 citations


"High speed architecture for Variabl..." refers background in this paper

  • ...The comparisons between 1-D and 2-D [2], [3], [4] architectures are given in the Table-I....

    [...]

Proceedings ArticleDOI
24 Jun 2003
TL;DR: This work proposes a new 1-D VLSI architecture for full search variable block size motion estimation (FSVBSME), which can process up to 41 motion vector subblocks (within a macroblock) in a comparable number of clock cycles.
Abstract: With the advent of new video standards such as MPEG-4 part-10 and H.264/H.26L, demands for advanced video coding (AVC), particularly in area of variable block searching motion estimation (VBSME), are increasing. This has led to research into suitable flexible hardware architectures to perform the various types of VBSME. We propose a new 1-D VLSI architecture for full search variable block size motion estimation (FSVBSME). The variable block size, sum of absolute differences (SAD) computation is performed by reusing the results of smaller subblock computations. These are permuted and combined by incorporating a shuffling mechanism within each processing element (PE). Whereas a conventional 1-D architecture can process only one motion vector, this architecture can process up to 41 motion vector (MV) subblocks (within a macroblock) in a comparable number of clock cycles.

42 citations


"High speed architecture for Variabl..." refers background in this paper

  • ...The design has an increase in speed as compared to [1] with the highest working frequency of 323....

    [...]

  • ...Reference [1] shows similar for block b1, it contains pixels p4 - p7, p20 - p23, p36 - p39 and p52 - p55, and so on (refer Fig....

    [...]

  • ...Reference [1] Pixel values in the blocks (namely, p0 and p1)...

    [...]

  • ...It needs similar amount of clock cycles as compared with any other 1-D architecture [1], [4]....

    [...]

Journal ArticleDOI
TL;DR: A memory-efficient and highly parallel VLSI architecture for full search VBSME (FSVBSME), which can save 98% of on-chip memory access with only 25% of local memory overhead, and a novel data reuse scheme to reduce memory access.
Abstract: Variable block size motion estimation (VBSME) is one of several contributors to H.264/AVC's excellent coding efficiency. However, its high computational complexity and huge memory traffic make deign difficult. In this paper, we propose a memory-efficient and highly parallel VLSI architecture for full search VBSME (FSVBSME). Our architecture consists of 16 2-D arrays each consists of 16 × 16 processing elements (PEs). Four arrays form a group to match in parallel four reference blocks against one current block. Four groups perform block matching for four current blocks in a pipelined fashion. Taking advantage of overlapping among multiple reference blocks of a current block and between search windows of adjacent current blocks, we propose a novel data reuse scheme to reduce memory access. Compared with the popular Level C data reuse scheme, our approach can save 98% of on-chip memory access with only 25% of local memory overhead. Synthesized into a TSMC 180-nm CMOS cell library, our design is capable of processing 1920 × 1088 30 fps video when running at 130 MHz. The architecture is scalable for wider search range, multiple reference frames and pixel truncation as well as down sampling. We suggest a criterion called design efficiency for comparing different works. It shows that the proposed design is 72% more efficient than the best design to date.

29 citations

Journal ArticleDOI
TL;DR: The proposed fast algorithm can reduce about 90% motion searching time, whereas PSNR only decreases about 0.02 dB on average, and VLSI architecture is designed with parallel structure and pipeline timing schedule to achieve high throughput rate for the HDTV system.
Abstract: This study presents a fast algorithm and its very large scale integration (VLSI) design to implement the variable block size motion estimation. The fast algorithm is proposed with a hardware-oriented concept for regular VLSI design. Simulations show that the proposed algorithm can reduce about 90% motion searching time, whereas PSNR only decreases about 0.02 dB on average. Based on the fast algorithm, VLSI architecture is designed with parallel structure and pipeline timing schedule to achieve high throughput rate for the HDTV system. The chip can compute 41 vectors for various block size during 24-240 cycles as using only 96 processing elements. Comparisons with contemporary VLSI architectures, this chip can offer higher processing speed, wider searching range and lower circuit complexity.

10 citations