scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Residual image coding for stereo image compression

10 Dec 2002-Vol. 2
TL;DR: This paper proposes a new method for the coding of residual images that takes into account the properties of residual image properties, and demonstrates that it is possible to achieve good results with a computationally simple method.
Abstract: The main. focus of research in stereo image coding has been disparity estimation (DE), a technique used to reduce coding rate by taking advantage of the redundancy in a stereo image pair. Significantly less effort has been put into the coding of the residual image. In this paper we propose a new method for the coding of residual images that takes into account the properties of residual images. Particular attention is paid to the effects of occlusion and the correlation properties of residual images that result from block-based disparity estimation. The embedded, progressive nature of our coder allows one to stop decoding at any time. We demonstrate that it is possible to achieve good results with a computationally simple method.

Summary (2 min read)

1. INTRODUCTION

  • Human depth perception is in part based on the difference in the images the left and right eyes send to the brain.
  • By presenting the appropriate image of a stereo pair to the left and right eyes, the viewer perceives scenes in 3 dimensions instead of as a 2- dimensional image.
  • Disparity compensation is similar to the well known motion compensation for video compression. [1][2] [3][4] employ disparity compensation in the spatial domain, while [5] uses the wavelet domain.
  • In this paper the authors propose a progressive coding technique for the compression of stereo images.
  • The emphasis of their work is on the coding of the residual image.

2. STEREO IMAGE CODING

  • Stereoscopic image pairs represent a view of the same scene from two slightly different positions.
  • The disparity of each object in the image depends on the distance between the object and the camera lenses.
  • This estimation process works well for blocks that are present in both images.
  • Occlusion can happen for two main reasons: finite viewing area and depth discontinuity.
  • The bitplane coding is performed on both the residual and reference image at the same time guaranteeing that the most significant information for both images is sent before the less significant information.

3. RESIDUAL IMAGE CODING

  • The goal of their research was to make stereo image coding more efficient by improving the coding of the residual image.
  • The DE the authors chose is rather simple, but even with such a simple disparity estimator their proposed coding technique has very good performance.

3.1. Image Coding Method

  • Embedded coding yields good performance coupled with simplicity of coding due to not having to perform any bit allocation procedure.
  • For each bitplane, the quadtree structure is used to identify the significant coefficients; coefficients whose most significant bit is found on that bitplane.
  • For each bitplane first the sorting and refinement pass are executed for the reference image and then for the residual image.
  • The highest magnitude coefficient is usually smaller for the residual image than for the reference image.
  • MGE was chosen over SPIHT because it can encode the above scenarios more efficiently.

3.2. Occlusion

  • As noted in Section 2 there are two kinds of occlusion that may occur in DE.
  • A finite viewing area can be overcome in certain cases.
  • In the case of blocks near the image edge where a one directional search could run out of image pixels if the authors allow the search to continue in the other direction, it may find blocks similar to the one to be estimated.
  • The residual of those blocks that are occluded because of depth discontinuity displays different characteristics from the other parts of the image.
  • As noted in [9], the occluded blocks are more correlated.

3.3. Image Transform

  • Moellenhoff’s analysis [9] shows that residual images show significantly different characteristics from natural images.
  • This suggests that transforms that work well for natural images may not be as useful for residual images.
  • For one block the algorithm can find a relatively good match while its neighbor could be harder to predict from the reference image.
  • This transform consists of a Haar transform ofT levels for occluded blocks and a DCT for others with the DCT coefficients regrouped into the wavelet subbands to line up with the Haar-transformed coefficients.

4. EXPERIMENTAL RESULTS

  • The reference image was transformed using the :U V6 filters.
  • Occlusion detection consisted of looking for blocks where the estimation error was above a given threshold.
  • Figures 3 compares independent wavelet coding, JPEG-style coding, overlapped block disparity compensation (OBDC) [4], and mixed transform coding.
  • Recall that the decoder uses the compressed reference image to recreate the estimate for the other image.
  • In this case the left image is chosen as the reference image.

6. REFERENCES

  • [1] W.-H. Kim and S.-W. Ra, “Fast disparity estimation using geometric properties and selective sample decimation for stereoscopic image coding,” IEEE Trans. on Cons. Elec., vol. 45, no.
  • W. Woo and A. Ortega, “Overelapped block disparity compensation with adaptive windows for stereo image coding,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 10, no.
  • A. Said and W. A. Pearlman, “A new, fast, and efficient image codec based on set partitioning in hierarchical trees,” IEEE Trans. on Circ. and Sys. for Video Tech., vol. 6, no.
  • M. S. Moellenhoff and M. W. Maier, “Characteristics of disparity-compensated stereo image pair residuals,” Sig. Proc.:Image Comm., vol. 14, no.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

RESIDUAL IMAGE CODING FOR STEREO IMAGE COMPRESSION
Tam
´
as Frajka and Kenneth Zeger
University of California, San Diego
Department of Electrical and Computer Engineering
La Jolla, CA 92093-0407
frajka,zeger
@code.ucsd.edu
ABSTRACT
The main focus of research in stereo image coding has been dis-
parity estimation (DE), a technique used to reduce coding rate by
taking advantage of the redundancy in a stereo image pair. Sig-
nificantly less effort has been put into the coding of the residual
image. In this paper we propose a new method for the coding of
residual images that takes into account the properties of residual
images. Particular attention is paid to the effects of occlusion and
the correlation properties of residual images that result from block-
based disparity estimation. The embedded, progressive nature of
our coder allows one to stop decoding at any time. We demonstrate
that it is possible to achieve good results with a computationally
simple method.
1. INTRODUCTION
Human depth perception is in part based on the difference in the
images the left and right eyes send to the brain. By presenting
the appropriate image of a stereo pair to the left and right eyes,
the viewer perceives scenes in 3 dimensions instead of as a 2-
dimensional image. Such binocular visual information is useful in
many fields, such as tele-presence style video conferencing, tele-
medicine, remote sensing, and computer vision.
These applications require the storage or transmission of the
stereo pair. Since the images seen with the left and right eye differ
only in small areas, techniques that try to exploit the dependency
can yield better results than independent coding of the image pair.
Most successful techniques rely on disparity compensation to
achieve good performance. Disparity compensation is similar to
the well known motion compensationfor videocompression. [1][2]
[3][4] employ disparity compensation in the spatial domain, while
[5] uses the wavelet domain. Disparity compensation can be a
computationallycomplex process. In [6] a wavelet transform based
method is used that does not rely on disparity compensation.
Many of the above works use discrete cosine transform (DCT)
based coding of the images which uses a rate allocation method
to divide the available bandwidth between the two images. Em-
bedded coding techniques based on the wavelet transform [7] pro-
vide improved performance for still images when compared with
DCT-based methods. A progressive stereo image coding scheme
is proposed in [8] that achieves good performance without having
to use rate allocation.
With disparity compensation, one image is used as a reference
image, and the other is predicted using the reference image. The
Supported in part by the National Science Foundation and the UCSD
Center for Wireless Communications.
a b
Fig. 1. Original left images of the (a) Room (b) Aqua stereo pairs.
gain over independent coding comes from compressing the resid-
ual image that is obtained as the difference of the original and pre-
dicted image. Little attention has been paid to the coding of the
residual image. Moellenhoff et al. [9] looked at the properties of
disparity compensated residual images and proposed some DCT
and wavelet techniques for their improved encoding.
In this paper we propose a progressive coding technique for
the compression of stereo images. The emphasis of our work is on
the coding of the residual image. These images exhibit properties
different from natural images. Our coding techniques make use
of these differences. We propose to use transforms that take into
account those differences as well as the block-based nature of dis-
parity estimation. In our coder we treat occluded blocks differently
from blocks that are well estimated with the DE process. Multi-
Grid Embedding (MGE) by Lan and Tewfik [10] is used as the
image coder for its flexibility. All these yield to some significant
improvements over other methods in the coding of stereo images.
The outline of the paper is as follows. Section 2 gives an
overview of stereo image coding. Our contribution is in Section
3 with experimental results provided in Section 4.
2. STEREO IMAGE CODING
Stereoscopic image pairs represent a view of the same scene from
two slightly different positions. When the images are presented to
the respective eye the human observer perceives the depth in the
scene as in 3 dimensions. One can obtain stereo pairs by taking
pictures with two cameras that are placed in parallel 2-3 inches
apart. The left image of the stereo pairs used in this work can be
seen in Figure 1.
Because of the different perspective, the same point in the ob-
ject will be mapped to different coordinates in the left and right

I
I
ref
pred
Image
Image
xform
xform
DPCM
+AC
Image
Image
Coder
Coder
I
res
+
Estimation
Disparity
Fig. 2. Stereo image encoder.
images. Let

and

denote the coordinates of an ob-
ject point in the left and right images, respectively. The disparity is
the difference between these coordinates,


.
If the cameras are placed in parallel,

, and the dis-
parity is limited to the horizontal direction. One image of the pair
serves as a reference image,
"!#
, and the other
$
%!'&
is disparity
estimated with respect to the reference image. A block diagram of
the encoder is shown in Figure 2.
The disparity of each object in the image depends on the dis-
tance between the object and the camera lenses. (See [11] for more
details). The disparity estimation process tries to determine the
displacement for each image pixel. Since this process would be
quite complex if done for each pixel individually, it is carried out
for
(*)*(
blocks instead. Block sizes of
(+-,
or
(./10
provide a
good trade-off between accuracy of the estimation and the entropy
necessary to encode the disparity vector,
, for each block.
The search for the matching block is carried out in a lim-
ited search window. Given the reference image the optimal match
could be any
(2)(
block of this image. This exhaustive search is
computationally complex. From the parallel camera axis assump-
tion one can restrict the search to horizontal displacements only.
From the camera set-up it is clear that the disparity for objects in
the left image with respect to the right image is positive and vice
versa. This observation helps further limit the scope of the search.
This estimation process works well for blocks that are present
in both images. However, occlusion may result if certain image
information is only present in one of the images. Occlusion can
happen for two main reasons: finite viewing area and depth dis-
continuity. Finite viewing area occurs on the left side of the left
image and the right side of the right image where the respective
eye can see objects that the other eye cannot. Depth discontinuity
is due to overlapping objects in the image; certain portions can be
covered from one eye on which the other eye has direct sight.
The disparity vectors are usually losslessly transmitted using
differential pulse code modulation (DPCM)followed by arithmetic
coding [12].
Given the disparity estimate of the image, the residual
"!43
is
formed by subtracting the estimate from the original. This residual
and the reference image are then encoded. Many proposed tech-
niques use DCT-based block coding methods for the encoding of
both images. They also require a bit allocation mechanism to de-
termine the coding rate of each image (this bit allocation is carried
out in addition to the bit allocation between the DCT-transformed
blocks of each image). For each target bit rate, a separate opti-
mization is used to determine the appropriate bit allocation.
Embedded image coders can be terminated at any bitrate and
still yield the best reconstruction to that rate without a priori opti-
mization. Zerotree-style techniques such as Set Partitioning in Hi-
erarchical Trees (SPIHT [7]) by Said and Pearlman offer excellent
compression performance for still images. This zerotree technique
is extended to stereo images [8] by Boulgouris et al. The bitplane
coding is performed on both the residual and reference image at
the same time guaranteeing that the most significant information
for both images is sent before the less significant information.
The decoding of stereo images is rather simple. Both the resid-
ual and reference image are reconstructed. Using the DE informa-
tion and the reconstructed reference image the decoder can recover
the other image of the stereo pair.
3. RESIDUAL IMAGE CODING
The goal of our researchwas to make stereo image codingmore ef-
ficient by improving the coding of the residual image. The DE we
chose is rather simple, but even with such a simple disparity esti-
mator our proposed coding technique has very good performance.
3.1. Image Coding Method
Embedded coding yields good performance coupled with simplic-
ity of coding due to not having to perform any bit allocation proce-
dure. MGE [10] uses a quadtree structure instead of the zerotrees
of SPIHT. It employs the same bitplane coding, starting from the
most significant bits of the transform domain image down to the
least significant. For each bitplane, the quadtree structure is used
to identify the significant coefficients; coefficients whose most sig-
nificant bit is found on that bitplane. The sorting pass identifies the
coefficients that become significant on the current bitplane, while
the refinement pass refines those coefficients that have previously
become significant.
The way we use MGE for stereo image compression is similar
to [8]. For each bitplane first the sorting and refinement pass are
executed for the reference image and then for the residual image.
The highest magnitude coefficient is usually smaller for the resid-
ual image than for the reference image. The residual image con-
tains mostly high frequency information. MGE was chosen over
SPIHT because it can encode the above scenarios more efficiently.
3.2. Occlusion
As noted in Section 2 there are two kinds of occlusion that may oc-
cur in DE. A finite viewing area can be overcome in certain cases.
In the case of blocks near the image edge where a one directional
search could run out of image pixels if we allow the search to con-
tinue in the other direction, it may find blocks similar to the one to
be estimated.
The residual of those blocks that are occluded because of depth
discontinuity displays different characteristics from the other parts
of the image. As noted in [9], the occluded blocks are more corre-
lated. We propose to detect such blocks, and code them differently
from the rest of the residual image blocks for improved efficiency.
3.3. Image Transform
Moellenhoffs analysis [9] shows that residual images show signif-
icantly different characteristics from natural images. They mainly
contain edges and other high frequency information. The correla-
tion between neighboring pixels is smaller as well. This suggests
that transforms that work well for natural images may not be as
useful for residual images.

20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
PSNR
rate
Independent
DCT
mixed coding
OBDC
a
20
22
24
26
28
30
32
0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7
PSNR (dB)
5
rate (bpp)
Independent
DCT
mixed coding
OBDC
b
Fig. 3. Comparison of independent coding, JPEG-style coding, OBDC, and mixed transform coding for the left image residual (with
reference image JPEG-coded with quality factor
67
) for (a) the Room, (b) the Aqua images.
1 2 3 4 5 6 7 8
RO 0.93 0.94 0.96 0.96 0.97 0.95 0.94 0.94
RR 0.27 0.38 0.41 0.45 0.44 0.31 0.33 0.03
AO 0.90 0.89 0.89 0.89 0.88 0.88 0.87 0.89
AR 0.24 0.25 0.22 0.23 0.25 0.23 0.26 0.12
Table 1.
/
-pixel correlation for columns of
,8)9,
blocks
for RO=Room original, RR=Room residual, AO=Aqua original,
AR=Aqua residual, horizontal direction, right images.
In wavelet transform coding one of the most widely used filters
is the
:;<6
filter by Daubechies [13]. It is preferred for its regular-
ity and smoothing properties. With the image pixels less correlated
in residual images, we propose the use of Haar-filters. These
=
-tap
filters take the average and difference of two neighboring pixels.
DE uses
(>)?(
size blocks to find the best estimates for the
image. There is no reason to expect neighboring blocks to exhibit
similar residual properties. For one block the algorithm can find a
relatively good match while its neighbor could be harder to predict
from the reference image.
Moellenhoffs results only indicate that the pixels of the resid-
ual image are less correlated than that of the original image. These
results do not reveal much about the local correlation of pixels,
namely across the
(2)@(
block boundaries. We propose to look at
/
-pixel correlation on a more local scale in both horizontal and ver-
tical direction. Instead of gathering these statistics for the whole
image, we only look at the correlation between all pixels in the
ACBED
and its immediate neighbor in the
A2F
/
BED
column or row
of all
(?)G(
blocks for the case of horizontal or vertical corre-
lation, respectively. Note that the correlation between the
(
BED
and
(
F
/
BED
columns/rows gives the correlation just across the bound-
ary between two neighboring blocks. The
/
-pixel correlations in
the horizontal direction are shown in Table 1 for the Room and
Aqua images. (Vertical correlations show similar trends.) For the
original images the
/
-pixel correlation is about the same for each
position in a block, while for residual images it drops off signifi-
cantly at the block boundary (column
,
). This further supports our
assumption that different blocks exhibit different properties in the
residual image.
Based on this observation we focus on block-based transforms
that could better capture the differences between the blocks than
a global transform, such as the wavelet transform, that sweeps
across the block boundaries. DCT in practice is performed on
()(
blocks. Its performance is diminished by the JPEG encod-
ing method. However, if the DCT coefficients are regrouped into
a wavelet decomposition style subband structure as proposed in
[14], and are encoded using an embedded coder, the performance
approaches that of wavelet based methods (this method is referred
to as Embedded Zerotree DCT (EZDCT)) .
None of the proposed image transforms so far take into ac-
count the effect of occlusion. For an occluded block the best match
can still be a very distorted one. In those cases not using the es-
timate for the given block at all could be the best strategy. For
each block the estimator should decide if the best match is good
enough. If not, the given block is left intact. This process cre-
ates a mixed residual image, with some parts having mostly edges
and high frequency information, and other parts blocks from the
original image. For residual blocks that contain significant high
frequency information a uniform band partitioning (such as with
DCT) works better than octave-band signal decomposition (see
[15]), while octave-band decomposition is desirable in blocks of
the original image.
Note that the Haar transform only uses two neighboring pixels
to compute the low and high frequency coefficients, then moves on
to the next pair. If
(
, the block size is even, starting at the left edge
of the block, the Haar transform can be performed without hav-
ing to include pixels from outside the block for the computation of
Haar-wavelet coefficients for all pixels in the block. Furthermore,
this can be repeated up to
HJILKMONP(RQ
levels without affecting coeffi-
cients from outside the
()S(
block. We propose to use a mixed
image transform. This transform consists of a Haar transform of
T
levels for occluded blocks and a DCT for others with the DCT
coefficients regrouped into the wavelet subbands to line up with
the Haar-transformed coefficients.
4. EXPERIMENTAL RESULTS
In our simulations we used the
=70<)S=70
Room and
=,,<)
T
0
Aqua stereo image pairs shown in Figure 1. The reference image
was transformed using the
:UV6
filters. For DE a simple scheme

was used with a 64 pixel horizontal search window. Occlusion de-
tection consisted of looking for blocks where the estimation error
was above a given threshold.
For stereo images, the Peak Signal-to-Noise Ratio (PSNR) is
computed using the average of the mean squared error of the re-
constructed left and right images,
WYXZ@[
\/1I]KM^`_
Naa4b
cLdfehgjiEkCdfehglnmEo
Np
First we show the comparison of different methods for the cod-
ing of the disparity estimated left image. The reference image is
the uncoded right image. The bitrate figures include the coding of
the disparity vector field. In the case of the mixed transform, for
each block an extra bit is encoded to signal if that block is con-
sidered as occluded. (Clearly, in the case of independent coding
there is no need to encode any disparity information.) The PSNR
is computed using the MSE for the left image alone. The JPEG-
style coder in our comparison uses quantization tables from the
MPEG predicted frame coder.
Figures 3 compares independent wavelet coding, JPEG-style
coding, overlapped block disparity compensation (OBDC) [4], and
mixed transform coding. Mixed transform coding significantly
outperforms both independent and JPEG-style coding with a gain
of about
T
dB over the JPEG-style encoding. It also performs as
well or better than OBDC coding which uses a computationally
more complex disparity estimator.
Next we compare our proposed methods and the results from
[8]. Good residual image performance alone does not guarantee
overall good performance when the entire stereo image is con-
cerned in an embedded coding scenario. Recall that the decoder
uses the compressed reference image to recreate the estimate for
the other image. If the coding of the residual image takes away
bits from the coding of the reference image the overall result may
not be as good as the coding of the residual image would suggest.
Figure 4 demonstrates this comparison. In this case the left
image is chosen as the reference image. In the comparison, “Boul-
gouris2” refers to new results (received from the authors) obtained
by an improved version of the original Embedded Stereo Coder.
It uses a more sophisticated disparity estimator and better wavelet
filters. Our proposed method outperforms this improved algorithm
as well by
p
6*V/
dB.
26
28
30
32
34
36
38
40
42
44
46
48
50
52
0.6 0.8 1 1.2 1.4
PSNR (dB)
q
rate (bpp)
Boulgouris
Boulgouris2
mixed transform
Fig. 4. Comparison of proposed method with the Embedded
Stereo Coding scheme from Boulgouris and its improved version
for the full stereo pair.
5. ACKNOWLEDGEMENT
We would like to thank Nikolaos Boulgouris for providing data for
the performance comparison.
6. REFERENCES
[1] W.-H. Kim and S.-W. Ra, “Fast disparity estimation us-
ing geometric properties and selective sample decimation for
stereoscopic image coding, IEEE Trans. on Cons. Elec., vol.
45, no. 1, pp. 203–209, Feb. 1999.
[2] C.-W Lin, E.-Y. Fei, and Y.-C. Chen, “Hierarchical disparity
estimation using spatial correlation, IEEE Trans. on Cons.
Elec., vol. 44, no. 3, pp. 630–637, Aug. 1998.
[3] D. Tzovaras and M. G. Strintzis, “Disparity estimation using
rate-distortion theory for stereo image sequence coding, in
Int. Conf. on DSP, July 1997, vol. 1, pp. 413–416.
[4] W. Woo and A. Ortega, “Overelapped block disparity com-
pensation with adaptive windows for stereo image coding,
IEEE Trans. on Circ. and Sys. for Video Tech., vol. 10, no. 2,
pp. 861–867, Mar. 2000.
[5] Q. Jiang, J. J. Lee, and III M. H. Hayes, A wavelet based
stereo image coding algorithm, in ICASSP, Mar. 1999,
vol. 6, pp. 3157–3160.
[6] W. D. Reynolds and R. V. Kenyon, “The wavelet transform
and the suppression theory of binocular vision for stereo im-
age compression., in ICIP, Sept. 1996, vol. 2, pp. 557–560.
[7] A. Said and W. A. Pearlman, A new, fast, and efficient
image codec based on set partitioning in hierarchical trees,
IEEE Trans. on Circ. and Sys. for Video Tech., vol. 6, no. 3,
pp. 243–250, June 1996.
[8] N. V. Boulgouris and M. G. Strintzis, “Embedded coding of
stereo images, in ICIP, Sept. 2000, vol. 3, pp. 640–643.
[9] M. S. Moellenhoff and M. W. Maier, “Characteristics of
disparity-compensated stereo image pair residuals, Sig.
Proc.:Image Comm., vol. 14, no. 1–2, pp. 55–69, Nov. 1998.
[10] T. Lan and A. H. Tewfik, “Multigrid embedding (MGE) im-
age coding, in ICIP, Oct. 1999, vol. 3, pp. 369–373.
[11] H. Yamaguchi, Y. Tatehira, K. Akiyama, and Y. Kobayashi,
“Stereoscopic images disparity for predictive coding, in
ICASSP, May 1989, vol. 3, pp. 1976–1979.
[12] I. H. Witten, R. M. Neal, and J. G. Cleary, Arithmeticcoding
for data compression, Communications of the ACM, vol. 30,
no. 6, pp. 520–540, June 1987.
[13] M. Antonioni, M. Barlaud, P. Mathieu, and I. Daubechies,
“Image coding using wavelet transform, IEEE Trans. on
Image Proc., vol. 1, no. 2, pp. 205–220, Apr. 1992.
[14] Z. Xiong, O. G. Guleryuz, and M. T. Orchard, A DCT-based
embedded image coder, IEEE Signal Processing Letters,
vol. 3, no. 11, pp. 289–290, Nov. 1996.
[15] T. D. Tran and T. Q. Nguyen, A progressive transmission
image coder using linear phase uniform filterbanks as block
transforms, IEEE Trans. on Image Proc., vol. 8, no. 11, pp.
1493–1507, Nov. 1999.
Citations
More filters
Proceedings ArticleDOI
TL;DR: The design and implementation of a new stereoscopic image quality metric is described and it is suggested that it is a better predictor of human image quality preference than PSNR and could be used to predict a threshold compression level for stereoscope image pairs.
Abstract: We are interested in metrics for automatically predicting the compression settings for stereoscopic images so that we can minimize file size, but still maintain an acceptable level of image quality. Initially we investigate how Peak Signal to Noise Ratio (PSNR) measures the quality of varyingly coded stereoscopic image pairs. Our results suggest that symmetric, as opposed to asymmetric stereo image compression, will produce significantly better results. However, PSNR measures of image quality are widely criticized for correlating poorly with perceived visual quality. We therefore consider computational models of the Human Visual System (HVS) and describe the design and implementation of a new stereoscopic image quality metric. This, point matches regions of high spatial frequency between the left and right views of the stereo pair and accounts for HVS sensitivity to contrast and luminance changes at regions of high spatial frequency, using Michelson's Formula and Peli's Band Limited Contrast Algorithm. To establish a baseline for comparing our new metric with PSNR we ran a trial measuring stereoscopic image encoding quality with human subjects, using the Double Stimulus Continuous Quality Scale (DSCQS) from the ITU-R BT.500-11 recommendation. The results suggest that our new metric is a better predictor of human image quality preference than PSNR and could be used to predict a threshold compression level for stereoscopic image pairs.

167 citations

01 Jan 1994
TL;DR: This work shows how artihmetic coding works and describes an efficient implementation that uses table lookup as a fast alternative to arithmetic operations that has a provably negligible effect on the amount of compression achieved.
Abstract: Arithmetic coding provides an effective mechanism for removing redundancy in the encoding of data. We show how artihmetic coding works and describe an efficient implementation that uses table lookup as a fast alternative to arithmetic operations. The reduced-precision arithmetic has a provably negligible effect on the amount of compression achieved. We can speed up the implementation further by use of parallel processing

71 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: This approach leverages state-of-the-art single-image compression autoencoders and enhances the compression with novel parametric skip functions to feed fully differentiable, disparity-warped features at all levels to the encoder/decoder of the second image.
Abstract: In this paper we tackle the problem of stereo image compression, and leverage the fact that the two images have overlapping fields of view to further compress the representations. Our approach leverages state-of-the-art single-image compression autoencoders and enhances the compression with novel parametric skip functions to feed fully differentiable, disparity-warped features at all levels to the encoder/decoder of the second image. Moreover, we model the probabilistic dependence between the image codes using a conditional entropy model. Our experiments show an impressive 30 - 50% reduction in the second image bitrate at low bitrates compared to deep single-image compression, and a 10 - 20% reduction at higher bitrates.

29 citations


Cites background or result from "Residual image coding for stereo im..."

  • ...disparity prediction to separate transforms for residual images [40, 14, 48, 3, 34, 42]....

    [...]

  • ...The underperformance of our baseline is consistent with the findings of [49] and [14], the latter of whom noted that residual images exhibit different correlation properties than the full images and may need to be modeled differently....

    [...]

  • ...There has been an abundance of work on traditional multi-view and stereo compression [12, 14] as well as deep-learning based image and video compression [5, 33, 36, 49]....

    [...]

Proceedings ArticleDOI
09 Jul 2009
TL;DR: This paper continues the researches on storage and bandwidth reduction for stereo images by using reversible watermarking by embedding into one frame of the stereo pair the information needed to recover the other frame, the transmission/storage requirements are halved.
Abstract: This paper continues our researches on storage and bandwidth reduction for stereo images by using reversible watermarking. By embedding into one frame of the stereo pair the information needed to recover the other frame, the transmission/storage requirements are halved. Furthermore, the content of the image remains available and one out of the two images is exactly recovered. The quality of the other frame depends on two features: the embedding bit-rate of the watermarking and the size of the information needed to be embedded. This paper focuses on the second feature. Instead of a simple residual between the two frames, a disparity compensation scheme is used. The advantage is twofold. First, the quality of the recovered frame is improved. Second, at detection, the disparity frame is immediately available for 3D computation. Experimental results on standard test images are provided.

24 citations


Cites background from "Residual image coding for stereo im..."

  • ...Many researches have been devoted to efficient compression of stereo images [3-10]....

    [...]

Journal ArticleDOI
TL;DR: This article investigates techniques for optimizing sparsity criteria by focusing on the use of an ℓ1 criterion instead of an⁓2 one, and proposes to jointly optimize the prediction filters by using an algorithm that alternates between the optimization of the filters and the computation of the weights.
Abstract: Lifting schemes (LS) were found to be efficient tools for image coding purposes. Since LS-based decompositions depend on the choice of the prediction/update operators, many research efforts have been devoted to the design of adaptive structures. The most commonly used approaches optimize the prediction filters by minimizing the variance of the detail coefficients. In this article, we investigate techniques for optimizing sparsity criteria by focusing on the use of an l1 criterion instead of an l2 one. Since the output of a prediction filter may be used as an input for the other prediction filters, we then propose to optimize such a filter by minimizing a weighted l1 criterion related to the global rate-distortion performance. More specifically, it will be shown that the optimization of the diagonal prediction filter depends on the optimization of the other prediction filters and vice-versa. Related to this fact, we propose to jointly optimize the prediction filters by using an algorithm that alternates between the optimization of the filters and the computation of the weights. Experimental results show the benefits which can be drawn from the proposed optimization of the lifting operators.

22 citations


Cites background from "Residual image coding for stereo im..."

  • ...Finally, the disparity field, the reference image and the residual one are encoded [58,60]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The image coding results, calculated from actual file sizes and images reconstructed by the decoding algorithm, are either comparable to or surpass previous results obtained through much more sophisticated and computationally complex methods.
Abstract: Embedded zerotree wavelet (EZW) coding, introduced by Shapiro (see IEEE Trans. Signal Processing, vol.41, no.12, p.3445, 1993), is a very effective and computationally simple technique for image compression. We offer an alternative explanation of the principles of its operation, so that the reasons for its excellent performance can be better understood. These principles are partial ordering by magnitude with a set partitioning sorting algorithm, ordered bit plane transmission, and exploitation of self-similarity across different scales of an image wavelet transform. Moreover, we present a new and different implementation based on set partitioning in hierarchical trees (SPIHT), which provides even better performance than our previously reported extension of EZW that surpassed the performance of the original EZW. The image coding results, calculated from actual file sizes and images reconstructed by the decoding algorithm, are either comparable to or surpass previous results obtained through much more sophisticated and computationally complex methods. In addition, the new coding and decoding procedures are extremely fast, and they can be made even faster, with only small loss in performance, by omitting entropy coding of the bit stream by the arithmetic code.

5,890 citations

Journal ArticleDOI
J.M. Shapiro1
TL;DR: The embedded zerotree wavelet algorithm (EZW) is a simple, yet remarkably effective, image compression algorithm, having the property that the bits in the bit stream are generated in order of importance, yielding a fully embedded code.
Abstract: The embedded zerotree wavelet algorithm (EZW) is a simple, yet remarkably effective, image compression algorithm, having the property that the bits in the bit stream are generated in order of importance, yielding a fully embedded code The embedded code represents a sequence of binary decisions that distinguish an image from the "null" image Using an embedded coding algorithm, an encoder can terminate the encoding at any point thereby allowing a target rate or target distortion metric to be met exactly Also, given a bit stream, the decoder can cease decoding at any point in the bit stream and still produce exactly the same image that would have been encoded at the bit rate corresponding to the truncated bit stream In addition to producing a fully embedded bit stream, the EZW consistently produces compression results that are competitive with virtually all known compression algorithms on standard test images Yet this performance is achieved with a technique that requires absolutely no training, no pre-stored tables or codebooks, and requires no prior knowledge of the image source The EZW algorithm is based on four key concepts: (1) a discrete wavelet transform or hierarchical subband decomposition, (2) prediction of the absence of significant information across scales by exploiting the self-similarity inherent in images, (3) entropy-coded successive-approximation quantization, and (4) universal lossless data compression which is achieved via adaptive arithmetic coding >

5,559 citations

Journal ArticleDOI
TL;DR: A scheme for image compression that takes into account psychovisual features both in the space and frequency domains is proposed and it is shown that the wavelet transform is particularly well adapted to progressive transmission.
Abstract: A scheme for image compression that takes into account psychovisual features both in the space and frequency domains is proposed. This method involves two steps. First, a wavelet transform used in order to obtain a set of biorthogonal subclasses of images: the original image is decomposed at different scales using a pyramidal algorithm architecture. The decomposition is along the vertical and horizontal directions and maintains constant the number of pixels required to describe the image. Second, according to Shannon's rate distortion theory, the wavelet coefficients are vector quantized using a multiresolution codebook. To encode the wavelet coefficients, a noise shaping bit allocation procedure which assumes that details at high resolution are less visible to the human eye is proposed. In order to allow the receiver to recognize a picture as quickly as possible at minimum cost, a progressive transmission scheme is presented. It is shown that the wavelet transform is particularly well adapted to progressive transmission. >

3,925 citations

Journal ArticleDOI
TL;DR: The state of the art in data compression is arithmetic coding, not the better-known Huffman method, which gives greater compression, is faster for adaptive models, and clearly separates the model from the channel encoding.
Abstract: The state of the art in data compression is arithmetic coding, not the better-known Huffman method. Arithmetic coding gives greater compression, is faster for adaptive models, and clearly separates the model from the channel encoding.

3,188 citations

Journal ArticleDOI
TL;DR: It is proved that the rate distortion limit for coding stereopairs cannot in general be achieved by a coder that first codes and decodes the right picture sequence independently of the left picture sequence, and then codes anddecodes theleft picture sequence given the decoded right picture sequences.
Abstract: Two fundamentally different techniques for compressing stereopairs are discussed. The first technique, called disparity-compensated transform-domain predictive coding, attempts to minimize the mean-square error between the original stereopair and the compressed stereopair. The second technique, called mixed-resolution coding, is a psychophysically justified technique that exploits known facts about human stereovision to code stereopairs in a subjectively acceptable manner. A method for assessing the quality of compressed stereopairs is also presented. It involves measuring the ability of an observer to perceive depth in coded stereopairs. It was found that observers generally perceived objects to be further away in compressed stereopairs than they did in originals. It is proved that the rate distortion limit for coding stereopairs cannot in general be achieved by a coder that first codes and decodes the right picture sequence independently of the left picture sequence, and then codes and decodes the left picture sequence given the decoded right picture sequence. >

243 citations

Frequently Asked Questions (10)
Q1. What are the contributions in "Residual image coding for stereo image compression" ?

In this paper the authors propose a new method for the coding of residual images that takes into account the properties of residual images. The authors demonstrate that it is possible to achieve good results with a computationally simple method. 

Zerotree-style techniques such as Set Partitioning in Hierarchical Trees (SPIHT [7]) by Said and Pearlman offer excellent compression performance for still images. 

This transform consists of a Haar transform ofTlevels for occluded blocks and a DCT for others with the DCT coefficients regrouped into the wavelet subbands to line up with the Haar-transformed coefficients. 

Note that the Haar transform only uses two neighboring pixels to compute the low and high frequency coefficients, then moves on to the next pair. 

if the DCT coefficients are regrouped into a wavelet decomposition style subband structure as proposed in [14], and are encoded using an embedded coder, the performance approaches that of wavelet based methods (this method is referred to as Embedded Zerotree DCT (EZDCT)) . 

Embedded image coders can be terminated at any bitrate and still yield the best reconstruction to that rate without a priori opti-mization. 

This process creates a mixed residual image, with some parts having mostly edges and high frequency information, and other parts blocks from the original image. 

Based on this observation the authors focus on block-based transformsthat could better capture the differences between the blocks than a global transform, such as the wavelet transform, that sweeps across the block boundaries. 

If ( , the block size is even, starting at the left edge of the block, the Haar transform can be performed without having to include pixels from outside the block for the computation of Haar-wavelet coefficients for all pixels in the block. 

Mixed transform coding significantly outperforms both independent and JPEG-style coding with a gain of about T dB over the JPEG-style encoding.