scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Correlation Noise Modeling for Efficient Pixel and Transform Domain Wyner–Ziv Video Coding

TL;DR: The higher the estimation granularity is, the better the rate-distortion performance is since the deeper the adaptation of the decoding process is to the video statistical characteristics, which means that the pixel and coefficient levels are the best performing for PDWZ and TDWZ solutions, respectively.
Abstract: In recent years, practical Wyner-Ziv (WZ) video coding solutions have been proposed with promising results. Most of the solutions available in the literature model the correlation noise (CN) between the original frame and its estimation made at the decoder, which is the so-called side information (SI), by a given distribution whose relevant parameters are estimated using an offline process, assuming that the SI is available at the encoder or the originals are available at the decoder. The major goal of this paper is to propose a more realistic WZ video coding approach by performing online estimation of the CN model parameters at the decoder, for pixel and transform domain WZ video codecs. In this context, several new techniques are proposed based on metrics which explore the temporal correlation between frames with different levels of granularity. For pixel-domain WZ (PDWZ) video coding, three levels of granularity are proposed: frame, block, and pixel levels. For transform-domain WZ (TDWZ) video coding, DCT bands and coefficients are the two granularity levels proposed. The higher the estimation granularity is, the better the rate-distortion performance is since the deeper the adaptation of the decoding process is to the video statistical characteristics, which means that the pixel and coefficient levels are the best performing for PDWZ and TDWZ solutions, respectively.
Citations
More filters
Journal ArticleDOI
TL;DR: A novel side information refinement (SIR) algorithm is proposed for a transform domain WZ video codec based on a learning approach where the side information is successively improved as the decoding proceeds, showing significant and consistent performance improvements regarding state-of-the-art WZ and standard video codecs.
Abstract: Wyner-Ziv (WZ) video coding is a particular case of distributed video coding, which is a recent video coding paradigm based on the Slepian-Wolf and WZ theorems. Contrary to available prediction-based standard video codecs, WZ video coding exploits the source statistics at the decoder, allowing the development of simpler encoders. Until now, WZ video coding did not reach the compression efficiency performance of conventional video coding solutions, mainly due to the poor quality of the side information, which is an estimate of the original frame created at the decoder in the most popular WZ video codecs. In this context, this paper proposes a novel side information refinement (SIR) algorithm for a transform domain WZ video codec based on a learning approach where the side information is successively improved as the decoding proceeds. The results show significant and consistent performance improvements regarding state-of-the-art WZ and standard video codecs, especially under critical conditions such as high motion content and long group of pictures sizes.

110 citations


Cites background from "Correlation Noise Modeling for Effi..."

  • ...The Laplacian parameters are estimated online at the coefficient level (see [21] for more details)....

    [...]

Journal ArticleDOI
TL;DR: The main purpose and novelty of this paper is the solid and comprehensive performance evaluation made which will provide a strong, and very much needed, performance reference for researchers in this Wz video coding field, as well as a solid way to steer future WZ video coding research.
Abstract: Wyner-Ziv (WZ) video coding-a particular case of distributed video coding (DVC)-is a new video coding paradigm based on two major Information Theory results: the Slepian-Wolf and Wyner-Ziv theorems. In recent years, some practical WZ video coding solutions have been proposed with promising results. One of the most popular WZ video coding architectures in the literature uses turbo codes based Slepian-Wolf coding and a feedback channel to perform rate control at the decoder. This WZ video coding architecture has been first proposed by researchers at Stanford University and has been after adopted and improved by many research groups around the world. However, while there are many papers published with changes and improvements to this architecture, the precise and detailed evaluation of its performance, targeting its deep understanding for future advances, has not been made. Available performance results are mostly partial, under unclear and incompatible conditions, using vaguely defined and also sometimes architecturally unrealistic codec solutions. This paper targets the provision of a detailed, clear, and complete performance evaluation of an advanced transform domain WZ video codec derived from the Stanford turbo coding and feedback channel based architecture. Although the WZ video codec proposed for this evaluation is among the best available, the main purpose and novelty of this paper is the solid and comprehensive performance evaluation made which will provide a strong, and very much needed, performance reference for researchers in this WZ video coding field, as well as a solid way to steer future WZ video coding research.

87 citations


Cites methods from "Correlation Noise Modeling for Effi..."

  • ...In this step, full search motion estimation with modified matching criteria is performed [3]; the criteria include a regularized term that favors motion vectors that are closer to the origin....

    [...]

  • ...The frame interpolation module generates the side information YWZ, an estimate of the XWZ frame, based on two references, one temporally in the past (XB) and another in the future (XF), as follows: 1....

    [...]

  • ...The residual statistics between correspondent coefficients in XDCTWZ and YDCTWZ is assumed to be modeled by a Laplacian distribution; the Laplacian parameter is estimated online at the decoder, for each DCT coefficient, based on the residual between the two motion compensated reference frames used…...

    [...]

Journal ArticleDOI
TL;DR: The status and potential benefits of distributed video coding in terms of coding efficiency, complexity, error resilience, and scalability are reviewed.
Abstract: This paper surveys recent trends and perspectives in distributed video coding. More specifically, the status and potential benefits of distributed video coding in terms of coding efficiency, complexity, error resilience, and scalability are reviewed. Multiview video and applications beyond coding are also considered. In addition, recent contributions in these areas, more thoroughly explored in the papers of the present special issue, are also described.

80 citations


Cites methods from "Correlation Noise Modeling for Effi..."

  • ...In [31], a method is proposed for online estimation at the decoder of the correlation model....

    [...]

Journal ArticleDOI
TL;DR: This paper proposes a novel framework called DCast for distributed video coding and transmission over wireless networks, which is different from existing distributed schemes in three aspects, and proposes a power distortion optimization algorithm to replace the traditional rate distortion optimization.
Abstract: This paper proposes a novel framework called DCast for distributed video coding and transmission over wireless networks, which is different from existing distributed schemes in three aspects. First, coset quantized DCT coefficients and motion data are directly delivered to the channel coding layer without syndrome or entropy coding. Second, transmission power is directly allocated to coset data and motion data according to their distributions and magnitudes without forward error correction. Third, these data are transformed by Hadamard and then directly mapped using a dense constellation (64K-QAM) for transmission without Gray coding. One of the most important properties in this framework is that the coding and transmission rate is fixed and distortion is minimized by allocating the transmission power. Thus, we further propose a power distortion optimization algorithm to replace the traditional rate distortion optimization. This framework avoids the annoying cliff effect caused by the mismatch between transmission rate and channel condition. In multicast, each user can get approximately the best quality matching its channel condition. Our experiment results show that the proposed DCast outperforms the typical solution using H.264 over 802.11 up to 8 dB in video PSNR in video broadcast. Even in video unicast, the proposed DCast is still comparable to the typical solution.

66 citations

Journal ArticleDOI
TL;DR: A novel algorithm for online estimation of the SID CC parameters based on already decoded information is proposed and included in a novel DVC architecture that employs a competitive hash-based motion estimation technique to generate high-quality SI at the decoder.
Abstract: In the context of low-cost video encoding, distributed video coding (DVC) has recently emerged as a potential candidate for uplink-oriented applications. This paper builds on a concept of correlation channel (CC) modeling, which expresses the correlation noise as being statistically dependent on the side information (SI). Compared with classical side-information-independent (SII) noise modeling adopted in current DVC solutions, it is theoretically proven that side-information-dependent (SID) modeling improves the Wyner-Ziv coding performance. Anchored in this finding, this paper proposes a novel algorithm for online estimation of the SID CC parameters based on already decoded information. The proposed algorithm enables bit-plane-by-bit-plane successive refinement of the channel estimation leading to progressively improved accuracy. Additionally, the proposed algorithm is included in a novel DVC architecture that employs a competitive hash-based motion estimation technique to generate high-quality SI at the decoder. Experimental results corroborate our theoretical gains and validate the accuracy of the channel estimation algorithm. The performance assessment of the proposed architecture shows remarkable and consistent coding gains over a germane group of state-of-the-art distributed and standard video codecs, even under strenuous conditions, i.e., large groups of pictures and highly irregular motion content.

43 citations

References
More filters
Journal ArticleDOI
David Slepian1, Jack K. Wolf
TL;DR: The minimum number of bits per character R_X and R_Y needed to encode these sequences so that they can be faithfully reproduced under a variety of assumptions regarding the encoders and decoders is determined.
Abstract: Correlated information sequences \cdots ,X_{-1},X_0,X_1, \cdots and \cdots,Y_{-1},Y_0,Y_1, \cdots are generated by repeated independent drawings of a pair of discrete random variables X, Y from a given bivariate distribution P_{XY} (x,y) . We determine the minimum number of bits per character R_X and R_Y needed to encode these sequences so that they can be faithfully reproduced under a variety of assumptions regarding the encoders and decoders. The results, some of which are not at all obvious, are presented as an admissible rate region \mathcal{R} in the R_X - R_Y plane. They generalize a similar and well-known result for a single information sequence, namely R_X \geq H (X) for faithful reproduction.

4,165 citations

Journal ArticleDOI
TL;DR: The quantity R \ast (d) is determined, defined as the infimum ofrates R such that communication is possible in the above setting at an average distortion level not exceeding d + \varepsilon .
Abstract: Let \{(X_{k}, Y_{k}) \}^{ \infty}_{k=1} be a sequence of independent drawings of a pair of dependent random variables X, Y . Let us say that X takes values in the finite set \cal X . It is desired to encode the sequence \{X_{k}\} in blocks of length n into a binary stream of rate R , which can in turn be decoded as a sequence \{ \hat{X}_{k} \} , where \hat{X}_{k} \in \hat{ \cal X} , the reproduction alphabet. The average distortion level is (1/n) \sum^{n}_{k=1} E[D(X_{k},\hat{X}_{k})] , where D(x,\hat{x}) \geq 0, x \in {\cal X}, \hat{x} \in \hat{ \cal X} , is a preassigned distortion measure. The special assumption made here is that the decoder has access to the side information \{Y_{k}\} . In this paper we determine the quantity R \ast (d) , defined as the infimum ofrates R such that (with \varepsilon > 0 arbitrarily small and with suitably large n )communication is possible in the above setting at an average distortion level (as defined above) not exceeding d + \varepsilon . The main result is that R \ast (d) = \inf [I(X;Z) - I(Y;Z)] , where the infimum is with respect to all auxiliary random variables Z (which take values in a finite set \cal Z ) that satisfy: i) Y,Z conditionally independent given X ; ii) there exists a function f: {\cal Y} \times {\cal Z} \rightarrow \hat{ \cal X} , such that E[D(X,f(Y,Z))] \leq d . Let R_{X | Y}(d) be the rate-distortion function which results when the encoder as well as the decoder has access to the side information \{ Y_{k} \} . In nearly all cases it is shown that when d > 0 then R \ast(d) > R_{X|Y} (d) , so that knowledge of the side information at the encoder permits transmission of the \{X_{k}\} at a given distortion level using a smaller transmission rate. This is in contrast to the situation treated by Slepian and Wolf [5] where, for arbitrarily accurate reproduction of \{X_{k}\} , i.e., d = \varepsilon for any \varepsilon >0 , knowledge of the side information at the encoder does not allow a reduction of the transmission rate.

3,288 citations


"Correlation Noise Modeling for Effi..." refers background or methods in this paper

  • ...The extension of Slepian–Wolf coding for lossy compression is well known as Wyner–Ziv (WZ) coding [3], which deals with the lossy source coding of when some side information is available only at the decoder....

    [...]

  • ...In [3], Wyner and Ziv show that there is no increase in the transmission rate if the statistical dependency between and is only explored at the decoder compared with the case where it is explored both at the decoder and the encoder, notably, if and are jointly Gaussian and a mean-square error distortion measure is considered....

    [...]

Journal ArticleDOI
27 Jun 2005
TL;DR: The recent development of practical distributed video coding schemes is reviewed, finding that the rate-distortion performance is superior to conventional intraframe coding, but there is still a gap relative to conventional motion-compensated interframe coding.
Abstract: Distributed coding is a new paradigm for video compression, based on Slepian and Wolf's and Wyner and Ziv's information-theoretic results from the 1970s. This paper reviews the recent development of practical distributed video coding schemes. Wyner-Ziv coding, i.e., lossy compression with receiver side information, enables low-complexity video encoding where the bulk of the computation is shifted to the decoder. Since the interframe dependence of the video sequence is exploited only at the decoder, an intraframe encoder can be combined with an interframe decoder. The rate-distortion performance is superior to conventional intraframe coding, but there is still a gap relative to conventional motion-compensated interframe coding. Wyner-Ziv coding is naturally robust against transmission errors and can be used for joint source-channel coding. A Wyner-Ziv MPEG encoder that protects the video waveform rather than the compressed bit stream achieves graceful degradation under deteriorating channel conditions without a layered signal representation.

1,342 citations


"Correlation Noise Modeling for Effi..." refers background or methods in this paper

  • ...For example, in [4], [7], and [8], the CNM parameters for the transform-domain WZ (TDWZ) video coding scheme are obtained through a training stage using several video sequences (one parameter per transform band is computed)....

    [...]

  • ...A similar approach to the one mentioned [4] is...

    [...]

  • ...In [4], the CNM parameter for a PDWZ video coding scheme is said to be estimated at the decoder based on previously decoded frames statistics; however, it is not specified in [4] how this task is precisely performed....

    [...]

  • ...1 matches with the PDWZ video codec architecture in [4]....

    [...]

  • ...Knowing the CNM and its parameters at the encoder allows to perform, for example, rate control at the encoder; however, in a WZ video coding architecture with decoder rate control, as the architectures in [4] and the one used in this paper, this is not a relevant feature....

    [...]

01 Jan 1973

1,280 citations


"Correlation Noise Modeling for Effi..." refers methods in this paper

  • ...From the Information Theory, the Slepian–Wolf Theorem [2] states that it is possible to compress two statistically dependent discrete random sequences and that are independently and identically distributed (i.i.d.) in a distributed way (separate encoding and joint decoding) using a rate similar to that used in a system where the sequences are encoded and decoded together, i.e., like in traditional video coding schemes....

    [...]

  • ...From the Information Theory, the Slepian–Wolf Theorem [2] states that it is possible to compress two statistically dependent discrete random sequences and that are independently and identically distributed (i....

    [...]

Journal ArticleDOI
TL;DR: This work offers a rigorous mathematical analysis using a doubly stochastic model of the images, which not only provides the theoretical explanations necessary, but also leads to insights about various other observations from the literature.
Abstract: Over the past two decades, there have been various studies on the distributions of the DCT coefficients for images. However, they have concentrated only on fitting the empirical data from some standard pictures with a variety of well-known statistical distributions, and then comparing their goodness of fit. The Laplacian distribution is the dominant choice balancing simplicity of the model and fidelity to the empirical data. Yet, to the best of our knowledge, there has been no mathematical justification as to what gives rise to this distribution. We offer a rigorous mathematical analysis using a doubly stochastic model of the images, which not only provides the theoretical explanations necessary, but also leads to insights about various other observations from the literature. This model also allows us to investigate how certain changes in the image statistics could affect the DCT coefficient distributions.

743 citations