An Area and Power Efficient 1-D $4\times 4$ Integer DCT Architecture for HEVC

doi:10.1109/INDICON.2017.8487661

Proceedings Article•DOI•

An Area and Power Efficient 1-D $4\times 4$ Integer DCT Architecture for HEVC

Akshay Bhaskar¹, K. Jagannadha Naidu¹•Institutions (1)

01 Dec 2017-

TL;DR: A one dimensional (1-D) integer DCT architecture to be used in the High Efficiency Video Coding (HEVC) standard and serves as a basic building block to construct architectures for different transform lengths like $8\times 8,\ 16\times 16 and $32\times 32$.

read less

Abstract: In this paper, we present a one dimensional (1-D) $4\times 4$ integer DCT architecture to be used in the High Efficiency Video Coding (HEVC) standard. This architecture serves as a basic building block to construct architectures for different transform lengths like $8\times 8,\ 16\times 16$ and $32\times 32$ . Also, 2-D integer DCT architectures can be constructed using the proposed architecture and its scaled versions. The architecture detailed in this paper occupies an area of 1572 square microns and consumes 0.65 mW of power at a maximum operating frequency of 200 MHz. Compared to other such architectures, the proposed design achieves a 58.6% savings in area and a 53.9% savings in power. And compared to the reference algorithm, the proposed design saves 66.2% area and 78.5% power. Moreover, the proposed architecture offers higher throughput at a lower operating frequency when compared to other existing architectures. Therefore, with a processing rate of 8 pixels/cycle and a throughput of 1.6 Gsps, the proposed architecture is capable of processing 8K UHD ( $7680\times 4320$ ) video at 30 frames per second, which is an application of HEVC.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

Efficient Integer DCT Architectures for HEVC

[...]

Pramod Kumar Meher¹, Sang Yoon Park¹, Basant Kumar Mohanty², Khoon Seong Lim¹, Chuohao Yeo¹ - Show less +1 more•Institutions (2)

Institute for Infocomm Research Singapore¹, Jaypee University of Engineering and Technology²

01 Jan 2014-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: It is found that the proposed architecture involves nearly 14% less area-delay product (ADP) and 19% less energy per sample (EPS) compared to the direct implementation of the reference algorithm, on average, for integer DCT of lengths 4, 8, 16, and 32.

...read moreread less

Abstract: In this paper, we present area- and power-efficient architectures for the implementation of integer discrete cosine transform (DCT) of different lengths to be used in High Efficiency Video Coding (HEVC). We show that an efficient constant matrix-multiplication scheme can be used to derive parallel architectures for 1-D integer DCT of different lengths. We also show that the proposed structure could be reusable for DCT of lengths 4, 8, 16, and 32 with a throughput of 32 DCT coefficients per cycle irrespective of the transform size. Moreover, the proposed architecture could be pruned to reduce the complexity of implementation substantially with only a marginal affect on the coding performance. We propose power-efficient structures for folded and full-parallel implementations of 2-D DCT. From the synthesis result, it is found that the proposed architecture involves nearly 14% less area-delay product (ADP) and 19% less energy per sample (EPS) compared to the direct implementation of the reference algorithm, on average, for integer DCT of lengths 4, 8, 16, and 32. Also, an additional 19% saving in ADP and 20% saving in EPS can be achieved by the proposed pruning algorithm with nearly the same throughput rate. The proposed architecture is found to support ultrahigh definition 7680 × 4320 at 60 frames/s video, which is one of the applications of HEVC.

...read moreread less

184 citations

Proceedings Article•DOI•

A Unified 4/8/16/32-Point Integer IDCT Architecture for Multiple Video Coding Standards

[...]

Sha Shen¹, Weiwei Shen¹, Yibo Fan¹, Xiaoyang Zeng¹•Institutions (1)

Fudan University¹

09 Jul 2012

TL;DR: This work proposes a fast computational algorithm of large size integer IDCT, which can support the following video standards: MPEG-2/4, H.264, AVS, VC-1 and HEVC.

...read moreread less

Abstract: 4 or 8-point IDCT are widely used in traditional video coding standards. However larger size (16/32-point) IDCT has been proposed in the next generation video standard such as HEVC. To fulfill this requirement, this work proposes a fast computational algorithm of large size integer IDCT. A unified VLSI architecture for 4/8/16/32-point integer IDCT is also proposed accordingly. It can support the following video standards: MPEG-2/4, H.264, AVS, VC-1 and HEVC. Multiplier less MCM (Multiple Constant Multiplication) is used for 4/8-point IDCT. The regular multipliers and sharing technique are used for 16/32-point IDCT. The transpose memory uses SRAM instead of the traditional register array in order to further reduce the hardware overhead. It can support real-time decoding of 4Kx2K (4096x2048) 30fps video sequence at 191MHz working frequency, with 93K gate count and 18944-bit SRAM. We suggest a normalized criterion called design efficiency to compare with previous works. It shows that this design is 31% more efficient than previous work.

...read moreread less

75 citations

Journal Article•DOI•

2-D Large Inverse Transform (16×16, 32×32) for HEVC (High Efficiency Video Coding)

[...]

Jong-Sik Park, Woo-Jin Nam, Seung-Mok Han, Seongsoo Lee¹•Institutions (1)

Soongsil University¹

30 Jun 2012-Journal of Semiconductor Technology and Science

TL;DR: A new large inverse transform architecture based on hardware reuse for HEVC (High Efficiency Video Coding) is proposed, which is optimized by exploiting fully recursive and regular butterfly structure to achieve low area.

...read moreread less

Abstract: This paper proposes a 16×16 and 32×32 inverse transform architecture for HEVC (High Efficiency Video Coding). HEVC large transform of 16×16 and 32×32 suffers from huge computational complexity. To resolve this problem, we proposed a new large inverse transform architecture based on hardware reuse. The processing element is optimized by exploiting fully recursive and regular butterfly structure. To achieve low area, the processing element is implemented by shifters and adders without multiplier. Implementation of the proposed 2-D inverse transform architecture in 0.18 ㎛ technology shows about 300 ㎒ frequency and 287 Kgates area, which can process 4K (3840×2160)@ 30 fps image.

...read moreread less

64 citations

Journal Article•DOI•

Scalable Approximate DCT Architectures for Efficient HEVC-Compliant Video Coding

[...]

Maher Jridi¹, Pramod Kumar Meher²•Institutions (2)

Institut supérieur d'électronique et du numérique¹, Nanyang Technological University²

01 Aug 2017-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The proposed approximation has nearly the same arithmetic complexity and hardware requirement as those of recently proposed related methods, but involves significantly less error energy and offers better peak signal-to-noise ratio than the others when DCTs of length more than 8 are used.

...read moreread less

Abstract: An approximate kernel for the discrete cosine transform (DCT) of length 4 is derived from the 4-point DCT defined by the High Efficiency Video Coding (HEVC) standard and used for the computation of DCT and inverse DCT (IDCT) of power-of-two lengths. There are two reasons for considering the DCT of length 4 as the basic module. First, it allows computation of DCTs of lengths 4, 8, 16, and 32 prescribed by the HEVC. Second, the DCTs generated by the 4-point DCT not only involve lower complexity, but also offer better compression performance. Fully parallel and area-constrained architectures for the proposed approximate DCT are proposed to have flexible tradeoff between the area and time complexities. In addition, a reconfigurable architecture is proposed where an 8-point DCT can be used in place of a pair of 4-point DCTs. Using the same reconfiguration scheme, a 32-point DCT could be configured for parallel computation of two 16-point DCTs or four 8-point DCTs or eight 4-point DCTs. The proposed reconfigurable design can support real-time coding for high-definition video sequences in the 8k ultrahigh-definition television format ( $7680\times 4320$ at 30 frames/s). A unified forward and inverse transform architecture is also proposed where the hardware complexity is reduced by sharing hardware between the DCT and IDCT computations. The proposed approximation has nearly the same arithmetic complexity and hardware requirement as those of recently proposed related methods, but involves significantly less error energy and offers better peak signal-to-noise ratio than the others when DCTs of length more than 8 are used. A detailed comparison of the complexity, energy efficiency, and compression performance of different DCT approximation schemes for video coding is also presented. It is shown that the proposed approximation provides a better compressed-image quality than other approximate DCTs. The proposed method can perform HEVC-compliant video coding with marginal degradation of video quality and a slight increase the in bit rate, with a fraction of computational complexity of the latter.

...read moreread less

54 citations

"An Area and Power Efficient 1-D $4\..." refers background or methods in this paper

...They used the algorithm Meher et al. proposed to develop the 4x4 DCT module and used this recursively to build higher length DCTs with sizes 8x8, 16x16 and 32x32....
[...]
...Algorithm Jridi & Meher [9] Proposed Po w er (m W )...
[...]
...9% when compared with the architecture in [9]....
[...]
...The proposed architecture has the least area when compared with the reference algorithm and the architecture proposed by Meher et al. [2] as shown in Fig....
[...]
...3 that the proposed architecture uses considerably less power compared to the reference algorithm and the one proposed by Jridi and Meher [9]....
[...]

Proceedings Article•DOI•

Fully pipelined DCT/IDCT/Hadamard unified transform architecture for HEVC Codec

[...]

Jia Zhu¹, Zhenyu Liu¹, Dongsheng Wang¹•Institutions (1)

Tsinghua University¹

19 May 2013

TL;DR: A unified architecture for IDCT and DCT through the algorithm optimization is devised and one proposed engine provides the throughput for 8K-UHDTV real-time decoding, and it also fully supports the real- time encoding of HDTV1080p@20fps with 311MHz clock speed1.

...read moreread less

Abstract: Great amount of two-dimensional (2D) discrete cosine transforms and Hadamard transforms are executed in HEVC. Upon the end of real-time UHDTV Codec, the full pipeline variable block size 2D transform engine with the efficient hardware utilization is proposed to handle the DCT/IDCT and Hadamard transforms. The efficiency comes from two aspects. First, the hardware for small-size transforms is fully reused by other larger-size transform processing. Second, we devise the unified architecture for IDCT and DCT through the algorithm optimization. The maximum clock speed of our design is 311MHz under 90nm technology. Experiments demonstrate that, at 47MHz clock frequency, one proposed engine provides the throughput for 8K-UHDTV real-time decoding, and it also fully supports the real-time encoding of HDTV1080p@20fps with 311MHz clock speed1.

...read moreread less

45 citations

"An Area and Power Efficient 1-D $4\..." refers background or result in this paper

...When compared with the 5stage reference pipeline design, [5] reports a 610% increase in throughput for the 32x32 transform....
[...]
...It is further shown that at 47 MHz the proposed engine provides the required throughput for 8K UHD video decoding and supports real-time encoding of 1080p at 20 frames per second (fps) with a 311 MHz clock speed [5]....
[...]

An Area and Power Efficient 1-D $4\times 4$ Integer DCT Architecture for HEVC

References

"An Area and Power Efficient 1-D $4\..." refers background or methods in this paper

"An Area and Power Efficient 1-D $4\..." refers background or result in this paper

Related Papers (5)