scispace - formally typeset
Proceedings ArticleDOI

An Efficient SIMD Architecture with Parallel Memory for 2D Cosine Transforms of Video Coding

TLDR
An efficient SIMD architecture with parallel memory for 2D cosine transforms of multiple video standards and application specific instructions are presented to accelerate the transform kernels, such as butterfly and rotate operations with scaling, rounding and clipping.
Abstract
This paper proposes an efficient SIMD architecture with parallel memory for 2D cosine transforms of multiple video standards. A novel parallel memory scheme is employed to provide conflict-free parallel access in both horizontal and vertical directions with the successive or even/odd mode, as well as to eliminate data permutation and matrix transposition. Furthermore, application specific instructions are presented to accelerate the transform kernels, such as butterfly and rotate operations with scaling, rounding and clipping. The simulation results show that proposed architecture achieves significant performance improvement with low hardware cost of 3.2 K equivalent gate count for parallel memory subsystem (not including SRAMs) and 19.8 K for arithmetic units@250 MHz in 0.18 mum process.

read more

Citations
More filters

Dual-Processor Neural Network Implementation in FPGA

TL;DR: The results obtained show the hardware implementation works properly and introduces no additional error, and the implementation of a feed forward Artificial Neural Network in FPGA using two embedded processors.
References
More filters
Proceedings ArticleDOI

A direct adaptive method for faster backpropagation learning: the RPROP algorithm

TL;DR: A learning algorithm for multilayer feedforward networks, RPROP (resilient propagation), is proposed that performs a local adaptation of the weight-updates according to the behavior of the error function to overcome the inherent disadvantages of pure gradient-descent.
Journal ArticleDOI

Low-complexity transform and quantization in H.264/AVC

TL;DR: The 4/spl times/4 transforms in H.264 can be computed exactly in integer arithmetic, thus avoiding inverse transform mismatch problems and minimizing computational complexity, especially for low-end processors.
Proceedings ArticleDOI

Practical fast 1-D DCT algorithms with 11 multiplications

TL;DR: A class of practical fast algorithms is introduced for the discrete cosine transform (DCT) and the structure of many of the published algorithms can be found in members of this class.
Journal ArticleDOI

Variable block-size transforms for H.264/AVC

TL;DR: Simulation results reveal a performance increase up to 12% overall rate savings and 0.9 dB in peak signal-to-noise ratio.
Journal ArticleDOI

Trends and Perspectives in Image and Video Coding

TL;DR: The rapid development in the field during the past 40 years and current state-of-the art strategies for coding images and videos are outlined and novel techniques targeted at achieving higher compression gains, error robustness, and network/device adaptability are described and discussed.
Related Papers (5)