scispace - formally typeset
Search or ask a question

Multiresolution based hierarchical disparity estimation for stereo image pair compression

01 Jan 1994-
TL;DR: A multiresolution based approach is proposed for compressing 'still' stereo image pairs and the typical computational gains and compression ratios possible with this approach are provided.
Abstract: Stereo vision is the process of viewing two different perspective projections of the same real world scene and perceiving the depth that was present in the original scene. These projections offer a compact 2-dimensional means of representing a 3-dimensional scene, as seen by one observer. Different display schemes have been developed to ensure that each eye sees the image that is intended for it. Each image in the image pair is referred to as the left or right image depending on the eye it is intended for. The binocular cues contain unambiguous information in contrast to monocular cues like shading or coloring. Hence binocular stereo may be quite useful, for instance, in video based training of personnel. On the entertainment side, it can make mundane TV material lively. Though the concept has been around for more than half a century, only recently have technically effective ways of making stereoscopic displays and the usually required eyeware emerged. Despite this progress, stereo TV can be made a cost effective add-on option only if the increased bandwidth requirement is relaxed somehow. Since the two images are projections of the same scene from two nearby points of view, they are bound to have a lot of redundancy between them. By properly exploiting this redundancy, the two image streams might be compressed and transmitted through a single monocular channel's bandwidth. The first step towards stereoscopic image sequence compression is 'still' stereo image pair compression that exploits the high correlation between the left and right images, in addition to exploiting the spatial correlation within each image. The temporal correlation between the frames can be taken advantage of, along the lines of the MPEG (Motion Picture Experts Group) standards, to achieve further compression. The final step would be to explore the correlation between left and right frames with a time offset between them. In this paper a multiresolution based approach is proposed for compressing 'still' stereo image pairs. In Section II the task at hand is contrasted with the stereo disparity estimation problem in the machine vision community; a block based scheme on the lines of a motion estimation scheme is suggested as a possible approach. In Section III, the suitability of hierarchical techniques for disparity estimation is outlined. Section IV provides an overview of wavelet decomposition. Section V details the multiresolution approach taken. In section VI, the typical computational gains and compression ratios possible with this …

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A new stereo image coding algorithm that is based on disparity compensation and subspace projection is described, and empirical results suggest that the SPT approach outperforms current stereo coding techniques.
Abstract: Due to advances in display technology, three-dimensional (3-D) imaging systems are becoming increasingly popular. One way of stimulating 3-D perception is to use stereo pairs, a pair of images of the same scene acquired from different perspectives. Since there is an inherent redundancy between the images of a stereo pair, data compression algorithms should be employed to represent stereo pairs efficiently. This paper focuses on the stereo image coding problem. We begin with a description of the problem and a survey of current stereo coding techniques. A new stereo image coding algorithm that is based on disparity compensation and subspace projection is described. This algorithm, the subspace projection technique (SPT), is a transform domain approach with a space-varying transformation matrix and may be interpreted as a spatial-transform domain representation of the stereo data. The advantage of the proposed approach is that it can locally adapt to the changes in the cross-correlation characteristics of the stereo pairs. Several design issues and implementations of the algorithm are discussed. Finally, we present empirical results suggesting that the SPT approach outperforms current stereo coding techniques.

68 citations


Cites background from "Multiresolution based hierarchical ..."

  • ...Since the disparity information should be transmitted, in applications where the compressed stereo pairs will be viewed by a human, computing a sparse disparity field is the key to compression [21]....

    [...]

Journal ArticleDOI
TL;DR: A new fully scalable image coder is presented and the lossless and lossy performance of these transforms in the proposed coder are investigated, which are comparable to JPEG-LS.
Abstract: Reversible integer wavelet transforms allow both lossless and lossy decoding using a single bitstream. We present a new fully scalable image coder and investigate the lossless and lossy performance of these transforms in the proposed coder. The lossless compression performance of the presented method is comparable to JPEG-LS. The lossy performance is quite competitive with other efficient lossy compression methods.

65 citations

Journal ArticleDOI
TL;DR: A new stereo image compression scheme that is based on the wavelet transform of both images and the disparity estimation between the stereo pair subbands and demonstrates very good performance as far as PSNR measures and visual quality are concerned and low complexity.

49 citations


Cites background from "Multiresolution based hierarchical ..."

  • ...Keywords: Stereo image compression; Wavelet transform; Morphology; Disparity...

    [...]

Proceedings ArticleDOI
15 Apr 1994
TL;DR: In this paper, the authors exploit the correlations between 3D-stereoscopic left-right image pairs to achieve high compression factors for imageframe storage and image stream transmission, and they find extremely high correlations between left- right frames offset in time such that perspective-induced disparity between viewpoints and motion-induced parallax from a single viewpoint are nearly identical; they coin the term "WoridLine correlation" for this condition.
Abstract: We exploit the correlations between 3D-stereoscopic left-right image pairs to achieve high compression factors for imageframe storage and image stream transmission. In particular, in image stream transmission, we can find extremely highcorrelations between left-right frames offset in time such that perspective-induced disparity between viewpoints and motion-induced parallax from a single viewpoint are nearly identical; we coin the term "WoridLine correlation' for this condition.We test these ideas in two implementations, (1) straightforward computing of blockwise cross- correlations, and (2)multiresolution hierarchical matching using a wavelet- based compression method. We find that good 3D-stereoscopic imagery can be had for only a few percent more storage space or transmission bandwidth than is required for the corresponding flat imagery.1. INTRODUCTIONThe successful development of compression schemes for motion video that exploit the high correlation between temporallyadjacent frames, e.g., MPEG, suggests that we might analogously exploit the high correlation between spatially or angularlyadjacent still frames, i.e., left-right 3D-stereoscopic image pairs. If left-right pairs are selected from 3D-stereoscopic motionstreams at different times, such that perspective-induced disparity left-right and motion-induced disparity earlier-laterproduce about the same visual effect, then extremely high correlation will exist between the members of these pairs. Thiseffect, for which we coin the term "WorldLine correlation", can be exploited to achieve extremely high compression factorsfor stereo video streams.Our experiments demonstrate that a reasonable synthesis of one image of a left-right stereo image pair can be estimated fromthe other uncompressed or conventionally compressed image augmented by a small set of numbers that describe the localcross-correlations in terms of a disparity map. When the set is as small (in bits) as 1 to 2% of the conventionally compressedimage the stereoscopically viewed pair consisting of one original and one synthesized image produces convincing stereoimagery. Occlusions, for which this approach of course fails, can be handled efficiently by encoding and transmitting errormaps (residuals) of regions where a local statistical operator indicates that an occlusion is probable.Two cross-correlation mapping schemes independently developed by two of us (P.G. and S.S.) have been coded and tested,

39 citations

Proceedings ArticleDOI
13 Nov 1994
TL;DR: The psychophysical property of the human visual system, that only one high resolution image in a stereo image pair is sufficient for satisfactory depth perception, has been used to further reduce the bit rates in this paper.
Abstract: Stereoscopic sequence compression typically involves the exploitation of the spatial redundancy between the left and right streams to achieve higher compressions than are possible with the independent compression of the two streams. In this paper the psychophysical property of the human visual system, that only one high resolution image in a stereo image pair is sufficient for satisfactory depth perception, has been used to further reduce the bit rates. Thus, one of the streams is independently coded along the lines of the MPEG standards, while the other stream is estimated at a lower resolution from this stream. A multiresolution framework has been adopted to facilitate such an estimation of motion and disparity vectors at different resolutions. Experimental results on typical sequences indicate that the additional stream can be compressed to about one-fifth of a highly compressed independently coded stream, without any significant loss in depth perception or perceived image quality. >

35 citations

References
More filters
Journal ArticleDOI
TL;DR: It is proved that the rate distortion limit for coding stereopairs cannot in general be achieved by a coder that first codes and decodes the right picture sequence independently of the left picture sequence, and then codes anddecodes theleft picture sequence given the decoded right picture sequences.
Abstract: Two fundamentally different techniques for compressing stereopairs are discussed. The first technique, called disparity-compensated transform-domain predictive coding, attempts to minimize the mean-square error between the original stereopair and the compressed stereopair. The second technique, called mixed-resolution coding, is a psychophysically justified technique that exploits known facts about human stereovision to code stereopairs in a subjectively acceptable manner. A method for assessing the quality of compressed stereopairs is also presented. It involves measuring the ability of an observer to perceive depth in coded stereopairs. It was found that observers generally perceived objects to be further away in compressed stereopairs than they did in originals. It is proved that the rate distortion limit for coding stereopairs cannot in general be achieved by a coder that first codes and decodes the right picture sequence independently of the left picture sequence, and then codes and decodes the left picture sequence given the decoded right picture sequence. >

243 citations

Proceedings ArticleDOI
03 Apr 1990
TL;DR: A two-step scheme for image compression that takes into account psychovisual features in space and frequency domains is proposed, and a progressive transmission scheme is presented, particularly well adapted to progressive transmission.
Abstract: A two-step scheme for image compression that takes into account psychovisual features in space and frequency domains is proposed. A wavelet transform is first used in order to obtain a set of orthonormal subclasses of images; the original image is decomposed at different scales using a pyramidal algorithm architecture. The decomposition is along the vertical and horizontal directions and maintains the number of pixels required to describe the image at a constant. Second, according to Shannon's rate-distortion theory, the wavelet coefficients are vector quantized using a multiresolution codebook. To encode the wavelet coefficients, a noise-shaping bit-allocation procedure which assumes that details at high resolution are less visible to the human eye is proposed. In order to allow the receiver to recognize a picture as quickly as possible at minimum cost, a progressive transmission scheme is presented. The wavelet transform is particularly well adapted to progressive transmission. >

160 citations


"Multiresolution based hierarchical ..." refers background in this paper

  • ...Thus this decomposition preserves the total number of pixels after decomposition, unlike the Laplacian pyramid structure [3] which results in an one-third increase in the number of pixels....

    [...]

Proceedings ArticleDOI
01 Jun 1990
TL;DR: In this paper, the use of orthonormal bases of compactly supported wavelets to represent a discrete signal in 2 dimensions yields a localized representation of coefficient energy, and subsequent coding of the multiresolution representation is achieved through techniques such as scalar/vector quantization, hierarchical quantization and entropy coding to achieve compression.
Abstract: Multilevel unitary wavelet transform methods for image compression are described. The sub-band decomposition preserves geometric image structure within each sub-band or level. This yields a multilevel image representation. The use of orthonormal bases of compactly supported wavelets to represent a discrete signal in 2 dimensions yields a localized representation of coefficient energy. Subsequent coding of the multiresolution representation is achieved through techniques such as scalar/vector quantization, hierarchical quantization, entropy coding, and non-linear prediction to achieve compression. Performance advantages over the Discrete Cosine Transform are discussed. These include reduction of errors and artifacts typical of Fourier-based spectral methods, such as frequency-domain quantization noise and the Gibbs phenomenon. The wavelet method also eliminates distortion arising from data blocking. The paper includes a quick review of past/present compression techniques, with special attention paid to the Haar transfOrm, the simplest wavelet transform, and conventional Fourier-based subband coding. Computational results are presented.

91 citations

Proceedings ArticleDOI
14 Apr 1991
TL;DR: The purpose of this work is to propose a new scheme for vectors quantization of wavelet coefficients based on lattice vector quantization, and the application of the D/sub 4/, E/sub 8/ and Barnes-Wall Lambda /sub 16/ lattices is investigated.
Abstract: An image coding scheme has been introduced by the authors (see IEEE ICASSP, p.2297, 1990). This scheme involves two steps. A biorthogonal wavelet transform is applied to the original image, and wavelet coefficients are then vector quantized using the LBG (Linde, Buzo and Gray, 1980) method. The purpose of this work is to propose a new scheme for vector quantization of wavelet coefficients. The proposed method is based on lattice vector quantization. The application of the D/sub 4/, E/sub 8/ and Barnes-Wall Lambda /sub 16/ lattices is investigated. These lattices are used to encode wavelet coefficients whose PDFs are close to Laplacian. A variable-length coding method is applied and the trade-off between distortion and optimal rate is investigated. Experimental results on the Lena image using the Lambda /sub 16/ lattice leads to a peak signal-to-noise ratio (PSNR) of 31.14 dB at 0.08 bpp. This result outperforms, to the authors knowledge, all other methods. Edges which are most of interest for image analysis are particularly sharp without any smoothing artefacts. >

66 citations


"Multiresolution based hierarchical ..." refers background in this paper

  • ...Each block in level(j+1) corresponds to 4 blocks in level-j....

    [...]

  • ...At each level there are four subimages that are one-fourth the size of the image at the level below....

    [...]

Journal ArticleDOI
TL;DR: This paper presents two-dimensional motion estimation methods which take advantage of the intrinsic redundancies inside 3DTV stereoscopic image sequences, subject to the crucial assumption that an initial calibration of the stereoscopic sensors provides us with geometric change of coordinates for two matched features.
Abstract: This paper presents two-dimensional motion estimation methods which take advantage of the intrinsic redundancies inside 3DTV stereoscopic image sequences. Most of the previous studies extract, either disparity vector fields if they are involved in stereovision, or apparent motion vector fields to be applied to motion compensation coding schemes. For 3DTV image sequence analysis and transmission, we can jointly estimate these two feature fields. Locally, initial image data are grouped within two views (the left and right ones) at two successive time samples and spatio-temporal coherence has to be used to enhance motion vector field estimation. Three different levels of ‘coherence’ have been experimented subject to the crucial assumption that an initial calibration of the stereoscopic sensors provides us with geometric change of coordinates for two matched features.

61 citations