scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Data compression of stereopairs

01 Apr 1992-IEEE Transactions on Communications (IEEE)-Vol. 40, Iss: 4, pp 684-696
TL;DR: It is proved that the rate distortion limit for coding stereopairs cannot in general be achieved by a coder that first codes and decodes the right picture sequence independently of the left picture sequence, and then codes anddecodes theleft picture sequence given the decoded right picture sequences.
Abstract: Two fundamentally different techniques for compressing stereopairs are discussed. The first technique, called disparity-compensated transform-domain predictive coding, attempts to minimize the mean-square error between the original stereopair and the compressed stereopair. The second technique, called mixed-resolution coding, is a psychophysically justified technique that exploits known facts about human stereovision to code stereopairs in a subjectively acceptable manner. A method for assessing the quality of compressed stereopairs is also presented. It involves measuring the ability of an observer to perceive depth in coded stereopairs. It was found that observers generally perceived objects to be further away in compressed stereopairs than they did in originals. It is proved that the rate distortion limit for coding stereopairs cannot in general be achieved by a coder that first codes and decodes the right picture sequence independently of the left picture sequence, and then codes and decodes the left picture sequence given the decoded right picture sequence. >
Citations
More filters
Journal ArticleDOI
31 Jan 2011
TL;DR: An overview of the algorithmic design used for extending H.264/MPEG-4 AVC towards MVC is provided and a summary of the coding performance achieved by MVC for both stereo- and multiview video is provided.
Abstract: Significant improvements in video compression capability have been demonstrated with the introduction of the H.264/MPEG-4 advanced video coding (AVC) standard. Since developing this standard, the Joint Video Team of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) has also standardized an extension of that technology that is referred to as multiview video coding (MVC). MVC provides a compact representation for multiple views of a video scene, such as multiple synchronized video cameras. Stereo-paired video for 3-D viewing is an important special case of MVC. The standard enables inter-view prediction to improve compression capability, as well as supporting ordinary temporal and spatial prediction. It also supports backward compatibility with existing legacy systems by structuring the MVC bitstream to include a compatible “base view.” Each other view is encoded at the same picture resolution as the base view. In recognition of its high-quality encoding capability and support for backward compatibility, the stereo high profile of the MVC extension was selected by the Blu-Ray Disc Association as the coding format for 3-D video with high-definition resolution. This paper provides an overview of the algorithmic design used for extending H.264/MPEG-4 AVC towards MVC. The basic approach of MVC for enabling inter-view prediction and view scalability in the context of H.264/MPEG-4 AVC is reviewed. Related supplemental enhancement information (SEI) metadata is also described. Various “frame compatible” approaches for support of stereo-view video as an alternative to MVC are also discussed. A summary of the coding performance achieved by MVC for both stereo- and multiview video is also provided. Future directions and challenges related to 3-D video are also briefly discussed.

683 citations

Book ChapterDOI
01 Jan 1993
TL;DR: For a wide class of distortion measures and discrete sources of information there exists a functionR(d) (depending on the particular distortion measure and source) which measures the equivalent rateR of the source (in bits per letter produced) whendis the allowed distortion level.
Abstract: Consider a discrete source producing a sequence of message letters from a finite alphabet. A single-letter distortion measure is given by a non-negative matrix (d ij ). The entryd ij measures the ?cost? or ?distortion? if letteriis reproduced at the receiver as letterj. The average distortion of a communications system (source-coder-noisy channel-decoder) is taken to bed= ? i.j P ij d ij whereP ij is the probability ofibeing reproduced asj. It is shown that there is a functionR(d) that measures the ?equivalent rate? of the source for a given level of distortion. For coding purposes where a leveldof distortion can be tolerated, the source acts like one with information rateR(d). Methods are given for calculatingR(d), and various properties discussed. Finally, generalizations to ergodic sources, to continuous sources, and to distortion measures involving blocks of letters are developed. In this paper a study is made of the problem of coding a discrete source of information, given afidelity criterionor ameasure of the distortionof the final recovered message at the receiving point relative to the actual transmitted message. In a particular case there might be a certain tolerable level of distortion as determined by this measure. It is desired to so encode the information that the maximum possible signaling rate is obtained without exceeding the tolerable distortion level. This work is an expansion and detailed elaboration of ideas presented earlier [1], with particular reference to the discrete case. We shall show that for a wide class of distortion measures and discrete sources of information there exists a functionR(d) (depending on the particular distortion measure and source) which measures, in a sense, the equivalent rateRof the source (in bits per letter produced) whendis the allowed distortion level. Methods will be given for evaluatingR(d) explicitly in certain simple cases and for evaluatingR(d) by a limiting process in more complex cases. The basic results are roughly that it is impossible to signal at a rate faster thanC/R(d) (source letters per second) over a memoryless channel of capacityC(bits per second) with a distortion measure less than or equal tod. On the other hand, by sufficiently long block codes it is possible to approach as closely as desired the rateC/R(d) with distortion leveld. Finally, some particular examples, using error probability per letter of message and other simple distortion measures, are worked out in detail.

658 citations

Journal ArticleDOI
TL;DR: Experimental results confirm the hypothesis and show that the proposed framework significantly outperforms conventional 2D QA metrics when predicting the quality of stereoscopically viewed images that may have been asymmetrically distorted.
Abstract: We develop a framework for assessing the quality of stereoscopic images that have been afflicted by possibly asymmetric distortions. An intermediate image is generated which when viewed stereoscopically is designed to have a perceived quality close to that of the cyclopean image. We hypothesize that performing stereoscopic QA on the intermediate image yields higher correlations with human subjective judgments. The experimental results confirm the hypothesis and show that the proposed framework significantly outperforms conventional 2D QA metrics when predicting the quality of stereoscopically viewed images that may have been asymmetrically distorted.

348 citations


Cites methods from "Data compression of stereopairs"

  • ...The framework can, therefore, ostensibly be used to evaluate the quality of stereo content that has been compressed using a mixed resolution coding technique [56,57]....

    [...]

Journal ArticleDOI
TL;DR: The perceptual requirements for 3-D TV that can be extracted from the literature are summarized and issues that require further investigation are addressed in order for 3D TV to be a success.
Abstract: A high-quality three-dimensional (3-D) broadcast service (3-D TV) is becoming increasingly feasible based on various recent technological developments combined with an enhanced understanding of 3-D perception and human factors issues surrounding 3-D TV. In this paper, 3-D technology and perceptually relevant issues, in particular 3-D image quality and visual comfort, in relation to 3-D TV systems are reviewed. The focus is on near-term displays for broadcast-style single- and multiple-viewer systems. We discuss how an image quality model for conventional two-dimensional images needs to be modified to be suitable for image quality research for 3-D TV. In this respect, studies are reviewed that have focused on the relationship between subjective attributes of 3-D image quality and physical system parameters that induce them (e.g., parameter choices in image acquisition, compression, and display). In particular, artifacts that may arise in 3-D TV systems are addressed, such as keystone distortion, depth-plane curvature, puppet theater effect, cross talk, cardboard effect, shear distortion, picket-fence effect, and image flipping. In conclusion, we summarize the perceptual requirements for 3-D TV that can be extracted from the literature and address issues that require further investigation in order for 3-D TV to be a success.

333 citations


Cites background from "Data compression of stereopairs"

  • ...Furthermore, a brief discussion of the ITU recommendation 500–10 psychophysical scaling paradigms is given, and we discuss the need for measurements in a home environment....

    [...]

Proceedings ArticleDOI
TL;DR: In this paper, the importance of various causes and aspects of visual discomfort was clarified and the following factors were identified: (1) excessive demand of accommodation-convergence linkage, e.g., by fast motion in depth, viewed at short distances, 3D artefacts resulting from insufficient depth information in the incoming data signal yielding spatial and temporal inconsistencies, and unnatural amounts of blur.
Abstract: Visual discomfort has been the subject of considerable research in relation to stereoscopic and autostereoscopic displays, but remains an ambiguous concept used to denote a variety of subjective symptoms potentially related to different underlying processes. In this paper we clarify the importance of various causes and aspects of visual comfort. Classical causative factors such as excessive binocular parallax and accommodation-convergence conflict appear to be of minor importance when disparity values do not surpass one degree limit of visual angle, which still provides sufficient range to allow for satisfactory depth perception in consumer applications, such as stereoscopic television. Visual discomfort, however, may still occur within this limit and we believe the following factors to be the most pertinent in contributing to this: (1) excessive demand of accommodation-convergence linkage, e.g., by fast motion in depth, viewed at short distances, (2) 3D artefacts resulting from insufficient depth information in the incoming data signal yielding spatial and temporal inconsistencies, and (3) unnatural amounts of blur. In order to adequately characterize and understand visual discomfort, multiple types of measurements, both objective and subjective, are needed.

293 citations

References
More filters
Book
01 Jan 1968
TL;DR: This chapter discusses Coding for Discrete Sources, Techniques for Coding and Decoding, and Source Coding with a Fidelity Criterion.
Abstract: Communication Systems and Information Theory. A Measure of Information. Coding for Discrete Sources. Discrete Memoryless Channels and Capacity. The Noisy-Channel Coding Theorem. Techniques for Coding and Decoding. Memoryless Channels with Discrete Time. Waveform Channels. Source Coding with a Fidelity Criterion. Index.

6,684 citations


"Data compression of stereopairs" refers background in this paper

  • ...By Shannon's first theorem [ 29 ], we know that to communicate an infinite sequence of stationary and ergodic right pictures without distortion with a probability of error arbitrarily close to zero, the minimum average number of bits per picture required is given by the entropy rate...

    [...]

  • ...This is the realm of rate-distortion theory, the Shannon theory of source coding subject to a fidelity criterion [ 29 ], [31], [32]....

    [...]

Journal ArticleDOI
TL;DR: The quantity R \ast (d) is determined, defined as the infimum ofrates R such that communication is possible in the above setting at an average distortion level not exceeding d + \varepsilon .
Abstract: Let \{(X_{k}, Y_{k}) \}^{ \infty}_{k=1} be a sequence of independent drawings of a pair of dependent random variables X, Y . Let us say that X takes values in the finite set \cal X . It is desired to encode the sequence \{X_{k}\} in blocks of length n into a binary stream of rate R , which can in turn be decoded as a sequence \{ \hat{X}_{k} \} , where \hat{X}_{k} \in \hat{ \cal X} , the reproduction alphabet. The average distortion level is (1/n) \sum^{n}_{k=1} E[D(X_{k},\hat{X}_{k})] , where D(x,\hat{x}) \geq 0, x \in {\cal X}, \hat{x} \in \hat{ \cal X} , is a preassigned distortion measure. The special assumption made here is that the decoder has access to the side information \{Y_{k}\} . In this paper we determine the quantity R \ast (d) , defined as the infimum ofrates R such that (with \varepsilon > 0 arbitrarily small and with suitably large n )communication is possible in the above setting at an average distortion level (as defined above) not exceeding d + \varepsilon . The main result is that R \ast (d) = \inf [I(X;Z) - I(Y;Z)] , where the infimum is with respect to all auxiliary random variables Z (which take values in a finite set \cal Z ) that satisfy: i) Y,Z conditionally independent given X ; ii) there exists a function f: {\cal Y} \times {\cal Z} \rightarrow \hat{ \cal X} , such that E[D(X,f(Y,Z))] \leq d . Let R_{X | Y}(d) be the rate-distortion function which results when the encoder as well as the decoder has access to the side information \{ Y_{k} \} . In nearly all cases it is shown that when d > 0 then R \ast(d) > R_{X|Y} (d) , so that knowledge of the side information at the encoder permits transmission of the \{X_{k}\} at a given distortion level using a smaller transmission rate. This is in contrast to the situation treated by Slepian and Wolf [5] where, for arbitrarily accurate reproduction of \{X_{k}\} , i.e., d = \varepsilon for any \varepsilon >0 , knowledge of the side information at the encoder does not allow a reduction of the transmission rate.

3,288 citations

Book
01 Jan 1971
TL;DR: Foundations of Cyclopean Perception as mentioned in this paper is a classic work on cyclopean perception that has influenced a generation of vision researchers, cognitive scientists, and neuroscientists and has inspired artists, designers, and computer graphics pioneers.
Abstract: This classic work on cyclopean perception has influenced a generation of vision researchers, cognitive scientists, and neuroscientists and has inspired artists, designers, and computer graphics pioneers. In Foundations of Cyclopean Perception (first published in 1971 and unavailable for years), Bela Julesz traced the visual information flow in the brain, analyzing how the brain combines separate images received from the two eyes to produce depth perception. Julesz developed novel tools to do this: random-dot stereograms and cinematograms, generated by early digital computers at Bell Labs. These images, when viewed with the special glasses that came with the book, revealed complex, three-dimensional surfaces; this mode of visual stimulus became a paradigm for research in vision and perception. This reprint edition includes all 48 color random-dot designs from the original, as well as the special 3-D glasses required to view them.Foundations of Cyclopean Perception has had a profound impact on the vision studies community. It was chosen as one of the one hundred most influential works in cognitive science in a poll conducted by the University of Minnesota's Center for Cognitive Sciences. Many copies are "permanently borrowed" from college libraries; used copies are sought after online. Now, with this facsimile of the 1971 edition, the book is available again to cognitive scientists, neuroscientists, vision researchers, artists, and designers.

2,449 citations


"Data compression of stereopairs" refers background in this paper

  • ...Julesz [ 27 ] found that “binocular fusion depends on the identity of the low or high frequency spectrum in the two images....

    [...]

Journal ArticleDOI
TL;DR: The motion compensation is applied for analysis and design of a hybrid coding scheme and the results show a factor of two gain at low bit rates.
Abstract: A new technique for estimating interframe displacement of small blocks with minimum mean square error is presented. An efficient algorithm for searching the direction of displacement has been described. The results of applying the technique to two sets of images are presented which show 8-10 dB improvement in interframe variance reduction due to motion compensation. The motion compensation is applied for analysis and design of a hybrid coding scheme and the results show a factor of two gain at low bit rates.

1,883 citations

Journal ArticleDOI
15 Oct 1976-Science
TL;DR: It is shown that this algorithm successfully extracts information from random-dot stereograms, and its implications for the psychophysics and neurophysiology of the visual system are briefly discussed.
Abstract: The extraction of stereo-disparity information from two images depends upon establishing a correspondence between them. In this article we analyze the nature of the correspondence computation and derive a cooperative algorithm that implements it. We show that this algorithm successfully extracts information from random-dot stereograms, and its implications for the psychophysics and neurophysiology of the visual system are briefly discussed.

1,392 citations