TL;DR: This work devise a novel ADST-like transform whose kernel is consistent with that of DCT, thereby enabling butterfly structured computation flow, while largely retaining the performance advantages of hybrid transform coding scheme in terms of compression efficiency.
Abstract: The hybrid transform coding scheme that alternates amongst the asymmetric discrete sine transform (ADST) and the discrete cosine transform (DCT) depending on the boundary prediction conditions, is an efficient tool for video and image compression. It optimally exploits the statistical characteristics of prediction residual, thereby achieving significant coding performance gains over the conventional DCT-based approach. A practical concern lies in the intrinsic conflict between transform kernels of ADST and DCT, which prevents a butterfly structured implementation for parallel computing. Hence the hybrid transform coding scheme has to rely on matrix multiplication, which presents a speed-up barrier due to under-utilization of the hardware, especially for larger block sizes. In this work, we devise a novel ADST-like transform whose kernel is consistent with that of DCT, thereby enabling butterfly structured computation flow, while largely retaining the performance advantages of hybrid transform coding scheme in terms of compression efficiency. A prototype implementation of the proposed butterfly structured hybrid transform coding scheme is available in the VP9 codec repository.
Transform coding is a central component in video and image compression.
In fact, methods along this line are typically limited to smaller transform dimensions.
It is noteworthy that larger block size transforms provides higher transform coding gains for stationary signal and are experimentally proved to contribute compression efficiency in various video codecs.
The authors hence use this btfADST to replace the original ADST in the hybrid transform coding scheme.
II. SPATIAL PREDICTION AND TRANSFORM CODING
The authors revisit the mathematical theory that derived the original ADST, in the context 1-D first-order Gauss-Markov model, given partial 1In practice, all the computations are performed in the integer format for speed reasons.
Prediction boundary [1], which leads to their btf-ADST proposed in this work.
This irregularity complicates an analytic derivation of the eigenvalues and eigenvectors of P1.
The approximation clearly holds for ρ → 1, which is indeed a common approximation that describes the spatial correlation of video/image signals.
III. BUTTERFLY STRUCTURED VARIANT OF ADST
A key observation of the above derived ADST is that the rows of TS (i.e., basis functions of the transform) possess smaller values in the beginning (closer to the known boundary), and larger values towards the other end.
This effectively exploits the fact that pixels closer to the known boundary are better predicted and hence have statistically smaller variance than those at far end.
It inspires their search for a unitary sinusoidal transform that resembles the compression performance of the ADST, to overcome the intricacy of butterfly design of ADST and hence hybrid transform coding for parallel computing.
Clearly, it also possesses the property of asymmetric basis function, but has the denominator of kernel argument, 4N , consistent with that of DCT, thereby allowing the butterfly structured implementation.
In practice, all these computations are performed in the integer format, which inevitably incurs rounding effects accumulated through every stage.
IV. QUANTITATIVE ANALYSIS
The authors quantitatively evaluate the performance of the btf-ADST, original ADST, and DCT, against the KLT (of y in Sec. II) in terms of coding gains [7] under the assumed signal model, at different correlation coefficient values.
This bit-allocation problem is addressed by water filling algorithm of [7].
The coding gain, GA thus provides a comparison of the average distortion incurred with and without the transformation A. Note that for any given A (including the btf-ADST, ADST, DCT, and KLT of y), computing Rzz , and hence σ2zi , does not require making any approximations for P1.
Clearly the original ADST well approximates KLT at various values of the correlation coefficient ρ.
The maximum gap between ADST and KLT, or the maximum loss of optimality, is less than 0.05 dB.
V. EXPERIMENTAL RESULTS
The proposed btf-ADST was employed to replace the original ADST in the hybrid transform coding scheme.
This btf-ADST/DCT hybrid transform coding scheme was implemented in the VP9 codec [8].
Fig. 2 demonstrates the rate-distortion performance comparison for sequence harbour at CIF resolution.
Similar results were observed over a wide varieties of sequences and resolutions.
The authors compare the runtime of the btf-ADST/DCT and the original ADST/DCT hybrid transform schemes, in terms of the average CPU cycles, as shown in Fig.
VI. CONCLUSIONS
This work devised a novel variant of ADST transform whose kernel approximates the original ADST basis-wisely and is consistent with the DCT kernel, thereby enabling the butterfly structured implementation.
The proposed scheme allows efficient hardware utilization for significant codec speed-up, while largely retaining the advantageous compression performance of hybrid transform coding scheme.
TL;DR: A technical overview of the AV1 codec design that enables the compression performance gains with considerations for hardware feasibility is provided.
Abstract: The AV1 video compression format is developed by the Alliance for Open Media consortium. It achieves more than a 30% reduction in bit rate compared to its predecessor VP9 for the same decoded video quality. This article provides a technical overview of the AV1 codec design that enables the compression performance gains with considerations for hardware feasibility.
95 citations
Cites methods from "A butterfly structured design of th..."
...Note that since the original ADST derived in [33] cannot be decomposed for the butterfly structure, a variant of it, as introduced in [36] and also as shown in Figure 27, is adopted by AV1 for transform block sizes of 8× 8 and above....
TL;DR: In this article, the authors proposed a method to derive computationally efficient approximations to the discrete cosine transform (DCT) by minimizing the angle between the rows of the exact DCT matrix and the columns of the approximated transformation matrix.
Abstract: The principal component analysis (PCA) is widely used for data decorrelation and dimensionality reduction. However, the use of PCA may be impractical in real-time applications, or in situations were energy and computing constraints are severe. In this context, the discrete cosine transform (DCT) becomes a low-cost alternative to data decorrelation. This paper presents a method to derive computationally efficient approximations to the DCT. The proposed method aims at the minimization of the angle between the rows of the exact DCT matrix and the rows of the approximated transformation matrix. The resulting transformations matrices are orthogonal and have extremely low arithmetic complexity. Considering popular performance measures, one of the proposed transformation matrices outperforms the best competitors in both matrix error and coding capabilities. Practical applications in image and video coding demonstrate the relevance of the proposed transformation. In fact, we show that the proposed approximate DCT can outperform the exact DCT for image encoding under certain compression ratios. The proposed transform and its direct competitors are also physically realized as digital prototype circuits using FPGA technology.
TL;DR: A set of new experimental coding tools have already been added to baseline VP9 to achieve modest coding gains over a large enough test set, and this paper provides a technical overview of these coding tools.
Abstract: Google started an opensource project, entitled the WebM Project, in 2010 to develop royaltyfree video codecs for the web The present generation codec developed in the WebM project called VP9 was finalized in mid2013 and is currently being served extensively by YouTube, resulting in billions of views per day Even though adoption of VP9 outside Google is still in its infancy, the WebM project has already embarked on an ambitious project to develop a next edition codec VP10 that achieves at least a generational bitrate reduction over the current generation codec VP9 Although the project is still in early stages, a set of new experimental coding tools have already been added to baseline VP9 to achieve modest coding gains over a large enough test set This paper provides a technical overview of these coding tools
TL;DR: It is shown that Haar units (Givens rotations with angle $\pi /4$) can be used to reduce GFT computation cost when the graph is bipartite or satisfies certain symmetry properties based on node pairing.
Abstract: The graph Fourier transform (GFT) is an important tool for graph signal processing, with applications ranging from graph-based image processing to spectral clustering. However, unlike the discrete Fourier transform, the GFT typically does not have a fast algorithm. In this work, we develop new approaches to accelerate the GFT computation. In particular, we show that Haar units (Givens rotations with angle $\pi /4$ ) can be used to reduce GFT computation cost when the graph is bipartite or satisfies certain symmetry properties based on node pairing. We also propose a graph decomposition method based on graph topological symmetry, which allows us to identify and exploit butterfly structures in stages. This method is particularly useful for graphs that are nearly regular or have some specific structures, e.g., line graphs, cycle graphs, grid graphs, and human skeletal graphs. Though butterfly stages based on graph topological symmetry cannot be used for general graphs, they are useful in applications, including video compression and human action analysis, where symmetric graphs, such as symmetric line graphs and human skeletal graphs, are used. Our proposed fast GFT implementations are shown to reduce computation costs significantly, in terms of both number of operations and empirical runtimes.
21 citations
Cites background or methods from "A butterfly structured design of th..."
...2 as Haar unit, as opposed to general Givens rotations, which are often referred to as “butterflies” [21], [25], [33]....
[...]
...An n dimensional Givens rotation [30], commonly referred to as a butterfly [20], [21], [25], is a linear transformation that applies a rotation of angle θ to two coordinates, denoted as p and q....
[...]
...This means that those sub-GFTs can also be implemented using fast DCT and ADST algorithms [23]–[25]....
...We also note that, for any steerable DFT with a length n that is a multiple of 4, the GFTs of G++c and G−+c are Type-2 DCT and Type-4 DST, respectively....
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
Abstract: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today. In this edition, the authors bring their trademark method of quantitative analysis not only to high-performance desktop machine design, but also to the design of embedded and server systems. They have illustrated their principles with designs from all three of these domains, including examples from consumer electronics, multimedia and Web technologies, and high-performance computing.
11,671 citations
"A butterfly structured design of th..." refers background in this paper
...On the hardware design side, the transform module typically contributes a large portion of codec computational complexity, and hence a butterfly structured implementation that allows parallel computing via single instruction multiple data (SIMD) operations [3] is highly desirable....
TL;DR: A Fast Discrete Cosine Transform algorithm has been developed which provides a factor of six improvement in computational complexity when compared to conventional DiscreteCosine Transform algorithms using the Fast Fourier Transform.
Abstract: A Fast Discrete Cosine Transform algorithm has been developed which provides a factor of six improvement in computational complexity when compared to conventional Discrete Cosine Transform algorithms using the Fast Fourier Transform. The algorithm is derived in the form of matrices and illustrated by a signal-flow graph, which may be readily translated to hardware or software implementations.
TL;DR: The 4/spl times/4 transforms in H.264 can be computed exactly in integer arithmetic, thus avoiding inverse transform mismatch problems and minimizing computational complexity, especially for low-end processors.
Abstract: This paper presents an overview of the transform and quantization designs in H.264. Unlike the popular 8/spl times/8 discrete cosine transform used in previous standards, the 4/spl times/4 transforms in H.264 can be computed exactly in integer arithmetic, thus avoiding inverse transform mismatch problems. The new transforms can also be computed without multiplications, just additions and shifts, in 16-bit arithmetic, thus minimizing computational complexity, especially for low-end processors. By using short tables, the new quantization formulas use multiplications but avoid divisions.
726 citations
"A butterfly structured design of th..." refers methods in this paper
...) A recent development on fast transform using integer transform was proposed in [5], where it approximates the DCT transform element-wisely using a matrix whose entries are all small integers....
Q1. What contributions have the authors mentioned in the paper "A butterfly structured design of the hybrid transform coding scheme" ?
In this work, the authors devise a novel ADSTlike transform whose kernel is consistent with that of DCT, thereby enabling butterfly structured computation flow, while largely retaining the performance advantages of hybrid transform coding scheme in terms of compression efficiency.