scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Compression of stereo image pairs and streams

15 Apr 1994-Vol. 2177, pp 258-268
TL;DR: In this paper, the authors exploit the correlations between 3D-stereoscopic left-right image pairs to achieve high compression factors for imageframe storage and image stream transmission, and they find extremely high correlations between left- right frames offset in time such that perspective-induced disparity between viewpoints and motion-induced parallax from a single viewpoint are nearly identical; they coin the term "WoridLine correlation" for this condition.
Abstract: We exploit the correlations between 3D-stereoscopic left-right image pairs to achieve high compression factors for imageframe storage and image stream transmission. In particular, in image stream transmission, we can find extremely highcorrelations between left-right frames offset in time such that perspective-induced disparity between viewpoints and motion-induced parallax from a single viewpoint are nearly identical; we coin the term "WoridLine correlation' for this condition.We test these ideas in two implementations, (1) straightforward computing of blockwise cross- correlations, and (2)multiresolution hierarchical matching using a wavelet- based compression method. We find that good 3D-stereoscopic imagery can be had for only a few percent more storage space or transmission bandwidth than is required for the corresponding flat imagery.1. INTRODUCTIONThe successful development of compression schemes for motion video that exploit the high correlation between temporallyadjacent frames, e.g., MPEG, suggests that we might analogously exploit the high correlation between spatially or angularlyadjacent still frames, i.e., left-right 3D-stereoscopic image pairs. If left-right pairs are selected from 3D-stereoscopic motionstreams at different times, such that perspective-induced disparity left-right and motion-induced disparity earlier-laterproduce about the same visual effect, then extremely high correlation will exist between the members of these pairs. Thiseffect, for which we coin the term "WorldLine correlation", can be exploited to achieve extremely high compression factorsfor stereo video streams.Our experiments demonstrate that a reasonable synthesis of one image of a left-right stereo image pair can be estimated fromthe other uncompressed or conventionally compressed image augmented by a small set of numbers that describe the localcross-correlations in terms of a disparity map. When the set is as small (in bits) as 1 to 2% of the conventionally compressedimage the stereoscopically viewed pair consisting of one original and one synthesized image produces convincing stereoimagery. Occlusions, for which this approach of course fails, can be handled efficiently by encoding and transmitting errormaps (residuals) of regions where a local statistical operator indicates that an occlusion is probable.Two cross-correlation mapping schemes independently developed by two of us (P.G. and S.S.) have been coded and tested,

Summary (3 min read)

1 Introduction

  • The limit load analysis and the complete failure analysis of a structural system are important problems in performance-based design procedure.
  • Localized dissipative mechanism eliminates the mesh-dependency of numerical solutions.
  • Moreover, the spreading of plasticity over the entire frame and the appearance of the softening plastic hinges in the frame is consistently accounted for in the course of the nonlinear analysis.
  • With respect to the existing embedded discontinuity beam finite elements, see Armero et al. [6], [7], [8], and Wackerfuss [11], the authors use more complex material models: stressresultant elastoplasticity with hardening to describe beam material behavior and stress-resultant rigid-plastic softening to describe material behavior at the discontinuity.
  • It is also capable of describing local buckling of the flanges and the web, which is, in bending dominated conditions, very often the reason for the localized beam failure.

2 Beam element with embedded discontinuity

  • The authors consider in this section a planar Euler-Bernoulli beam finite element.
  • The element can represent an elastoplastic bending, including the localized softening effects, which are associated with the strong discontinuity in rotation.
  • The geometrical nonlinearity is approximately taken into account by virtual axial strains of von Karman type, which allows this element to capture the global buckling modes.

2.2 Equilibrium equations

  • The weak form of the equilibrium equations (the principle of virtual work) for an element e of a chosen finite element mesh with Nel finite elements, can be written as: δΠint,(e) − δΠext,(e) = 0. (22) (25).
  • From the virtual work of external forces δΠext,(e) the authors can get the vector of element external nodal forces fext,(e), representing the external load applied to the element.
  • The authors will treat this contribution locally element by element.

3 Computation of beam plasticity material pa-

  • In the previous section, the authors have built the framework for stress-resultant plasticity for beam finite element with embedded discontinuity.
  • My can be determined by considering the uniaxial yield stress of the material σy, the bending resistance modulus of cross-section W , the cross-section area A, and the level of axial force N .
  • The computation with shell model takes into account geometrical and material nonlinearity that include: plasticity with hardening and strain-softening, strainsoftening regularization, and local buckling effects.
  • To determine the values of the beam model hardening and softening parameters, the authors make an assumption that the plastic work at failure should be equal for both the beam and the shell model.

4 Computational procedure

  • In this section the authors will present a procedure for solving the set of global (mesh related) and the set of local (element related) nonlinear equations generated by using the stress-resultant plasticity beam finite element with embedded discontinuity presented in section 2.
  • The authors keep them fixed once determined.

5 Examples

  • In this section the authors illustrate performance of the above derived beam element when analyzing push-over and collapse of steel frames.
  • The authors also illustrate the procedure, presented in section 3, for computing the beam model plasticity material parameters by using the shell finite element model.
  • The beam model computer code was generated by using symbolic manipulation code AceGen and the examples were computed by using finite element program AceFem, see Korelc [20].

5.1 Computation of beam plasticity material parameters

  • Kh and Ks as suggested in section 3.the authors.
  • In the first step the authors applied a desired level of axial force N at the mid-point MP of the rigid cross-section, see Fig.
  • The results of analyses are presented in Figs. 7 to 9.
  • In Figure 8, the authors show the corresponding moment-rotation curves.
  • The authors assume that the axial force has no influence on softening modulus and adopt the average value Ks(N) = −3.28 · 10 5 kN/cm 2 . (89) In Table 2 they make a point-wise comparison between the shell analysis results Mrefu , E W p,ref and EW p,ref and the corresponding beam model results.

5.2 Push-over of a symmetric frame

  • In this example the authors present a push-over analysis of a symmetric frame.
  • The material and cross-section properties of all frame members are equal.
  • In the left part of the Fig. 12 the authors present the total lateral load versus utop curves.
  • The nondissipative period is followed by a short period with dissipation due to material hardening only, which ends with the first activation of softening plastic hinge in one of the beam finite elements.
  • In the right part of Fig. 13 the authors present locations where the softening plastic hinges appeared during the analysis.

5.3 Push-over of an asymmetric frame

  • The total lateral load versus utop curves, where utop is horizontal displacement at the top-left corner of the frame, are presented on the left part of Fig. 15.
  • After that point the difference between those two analyses is bigger.
  • In the right part of Fig. 15 the authors present the dissipated energy versus utop curves.
  • Namely, first there is the elastic non-dissipative phase, followed by the pure hardening dissipation phase, followed by the combined hardening and softening dissipation phase and finally the pure softening dissipation phase.

5.4 Bending of beam under constant axial force

  • In this example the authors compare results of the beam model with results obtained by using the shell finite element model from ABAQUS.
  • For that reason, the geometric and material properties are the same as those in the Section 5.1.
  • The difference between the applied concentrated moment at the point MP (see Fig. 6) and M∗u thus arises due to large displacements correction.
  • When the yielding and local buckling of the beam are significant and the displacements in the y direction are no longer negligible, the contribution of the axial force N to the bending moment must be taken into account.
  • The prediction of the beam model for plastic work in hardening regime is in the case of geometrically linear analysis with SET1 material parameters 80% of the shell model prediction, and the prediction for plastic work in softening regime is 92% of the shell model prediction.

5.5 Collapse of a simple frame

  • In this example the authors compare results of the nonlinear beam model with the results of the shell model.
  • The geometry and the finite element mesh of the shell model is presented on the right side of Fig. 18.
  • The lateral load versus utop curve (left part of Fig. 19) of the beam model with SET1 material parameters has a similar shape as the shell model curve, but the prediction of the maximum resistance of the beam model is around 84% of the shell model’s resistance.

6 Conclusion

  • The latter is captured and stored within the macro-scale beam model in the manner which is compatible with enhanced beam kinematics with embedded discontinuity.
  • The most appropriate choice of the meso-scale shell model can be further guided by the error-controlled adaptive finite element method for shell structures (by using model error estimation, see e.g.
  • The multiscale procedure proposed in this paper belongs to the class of weak coupling methods, where the authors carry out the sequential computations.
  • One of its main features is that detection and development of the softening plastic hinges in the frame is fully automatic, and spreads gradually in accordance with stress redistribution in the course of the nonlinear analysis.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Compression of stereo image pairs and streams
M. W. Siegel
1
Priyan Gunatilake
2
Sriram Sethuraman
2
A. G. Jordan
1,2
1
Robotics Institute, School of Computer Science
2
Department of Electrical and Computer Engineering
Carnegie Mellon University
5000 Forbes Ave., Pittsburgh, PA, 15213
ABSTRACT
We exploit the correlations between 3D-stereoscopic left-right image pairs to achieve high compression factors for image
frame storage and image stream transmission. In particular, in image stream transmission, we can find extremely high
correlations between left-right frames offset in time such that perspective-induced disparity between viewpoints and motion-
induced parallax from a single viewpoint are nearly identical; we coin the term "WorldLine correlation" for this condition.
We test these ideas in two implementations, (1) straightforward computing of blockwise cross- correlations, and (2)
multiresolution hierarchical matching using a wavelet- based compression method. We find that good 3D-stereoscopic
imagery can be had for only a few percent more storage space or transmission bandwidth than is required for the
corresponding flat imagery.
1. INTRODUCTION
The successful development of compression schemes for motion video that exploit the high correlation between temporally
adjacent frames, e.g., MPEG, suggests that we might analogously exploit the high correlation between spatially or angularly
adjacent still frames, i.e., left-right 3D-stereoscopic image pairs. If left-right pairs are selected from 3D-stereoscopic motion
streams at different times, such that perspective-induced disparity left-right and motion-induced disparity earlier-later
produce about the same visual effect, then extremely high correlation will exist between the members of these pairs. This
effect, for which we coin the term "WorldLine correlation", can be exploited to achieve extremely high compression factors
for stereo video streams.
Our experiments demonstrate that a reasonable synthesis of one image of a left-right stereo image pair can be estimated from
the other uncompressed or conventionally compressed image augmented by a small set of numbers that describe the local
cross-correlations in terms of a disparity map. When the set is as small (in bits) as 1 to 2% of the conventionally compressed
image the stereoscopically viewed pair consisting of one original and one synthesized image produces convincing stereo
imagery. Occlusions, for which this approach of course fails, can be handled efficiently by encoding and transmitting error
maps (residuals) of regions where a local statistical operator indicates that an occlusion is probable.
Two cross-correlation mapping schemes independently developed by two of us (P.G. and S.S.) have been coded and tested,
extensively on still image pairs and more recently on some motion video streams. Both methods yield comparable
compression factors and visual fidelity; which can be coded more efficiently, and whether either can be coded efficiently
enough to make it practical for real time use, is under study.

The method developed by P.G. is based on straightforward computing of blockwise cross-correlations; heuristics that direct
the search substantially improve efficiency at the price of occasionally finding a local maximum rather than the global
maximum.
The method developed by S.S. is based on multiresolution hierarchical matching using wavelets; efficiency is achieved by
doing the search for the best match down a tree of progressively higher resolution images, starting from a low resolution
highly subsampled image.
In the following sections we discuss the need and opportunity for compression of 3D-stereoscopic imagery, discuss the
correlations that can be exploited to achieve compression, describe and refine the approach, summarize the content and
performance of the two implementations we have prototyped to date, and outline several topics we have targeted for ongoing
research.
This paper is intended as a high level introduction to our thoughts about and our progress toward compression for 3D-
stereoscopy. The specific references that we cite in the text and the general references that we also include in the
bibliography point to background literature, as well as to three recent papers [1, 2, 3] in which we document the low level
details of our recent work.
2. NEED AND OPPORTUNITY
The scenario we imagine is that binocular 3D-stereoscopy is grafted onto "flat" (monoscopic) display infrastructures; we
regard the alternative scenario, that 3D-stereoscopy is built into the foundations of the infrastructure, as being somewhat
farfetched in light of the cost and effectiveness of the current generation of 3D display devices and systems.
Displays become rapidly more expensive as their spatial resolution and temporal frame rate increases. Thus in any
application the display is usually chosen to meet but not to exceed substantially the application’s requirements. In flat
applications each eye sees, at no cost to the other eye, the full spatial and temporal bandwidth that the display delivers. When
a 3D-stereoscopic application is grafted onto a flat infrastructure the display’s capabilities must be divided between the two
eyes. The price may be extracted in either essentially the spatial domain, e.g., by assigning the odd lines to the left eye and
the even lines to the right eye, or in essentially the temporal domain, e.g., by assigning alternate frames to the left and right
eye. The distinction is in part semantic, since the "spatial" method of this example is often implemented in practice via
sequential fields in an interlaced display system. The fundamental issue is that when 3D-stereoscopy is implemented on a
single display each eye gets in some sense only half the display. A user contemplating using 3D-stereoscopy must thus
acquire a display (and the underlying system to support it) with twice the pixel-per-second capability of the minimal display
needed for the flat application; the alternatives require choosing between a flickering image or a reduced spatial resolution
image.
As indicated, lower level capacities of the system’s components must also be doubled. In particular, all the information
captured by two cameras (each equivalent to the original camera) must be stored or transmitted or both. Doubling these
capacities may be more difficult than doubling the capability of the display, inasmuch as (except at the very high end) the
capability of the display can be increased by simply paying more. The most difficult system component to increase is
probably the bandwidth of the transmission system, which is often subject to powerful regulatory as well as technical

constraints. Nevertheless, the bandwidth must apparently be doubled to transmit 3D-stereoscopic image streams at the same
spatial resolution and temporal update frequency as either flat image stream.
In fact, because the two views comprising a 3D-stereoscopic image pair are nearly identical, i.e., the information content of
both together is only a little more than the information content of one alone, it is possible to find representations of image
pairs and streams that take up little more storage space and transmission bandwidth than the space or bandwidth that is
required by either alone. The rest of this paper is devoted to an overview of how this can be done, some details of our early
implementations, and a discussion of possibilities for the future.
2.1. Background
We remind the reader that image compression methods fall into two broad categories, "lossless" and "lossy". Lossless
compression exploits the existence of redundant or repeated information, storing the image in less space by symbolically
rather than explicitly repeating information, and by related methods such as assigning the shortest codes to the most probable
occurrences. Lossy compression exploits characteristics of the human visual system by discarding image content that is
known to have little or no impact on human perception of the image.
Our approach to compression of 3D-stereoscopic imagery has two components, related to there being two perspective views
in a 3D-stereoscopic pair. One component may be either lossless or slightly lossy, as in conventional compression of flat
imagery; the other component is by itself a very lossy (or "deep") method of compression. The intimate connection between
the two views makes it possible to synthesize a perceptually acceptable image from a compression so deep that, by itself, it
would be incomprehensible.
The left and right views that comprise a 3D-stereoscopic image pair or motion stream pair are obviously very similar. There
are various ways of saying this: they are often described as "highly redundant", in that most of the information contained in
either is repeated in the other, or as "highly correlated" in that either is for the most part easily predicted from the other by
application of some external information about the relationship (the relative perspective) between them. We can thus
synthesize a reasonable approximation to either view given the other view and a little additional information that describes
the relationship between the two views. A useful form for the additional information is a disparity map: a two dimensional
vector field that encodes how to displace blocks of pixels in one view to approximate the other view.
Fortunately a "reasonable approximation" is enough: perfection is not required. This is the case because of two
psychophysical effects, one well known, the other less so.
It is well known that one good eye and one bad eye together are better than the good eye alone, i.e., the information they
provide in a sense adds rather than averages. The resulting perception is sharper than the perception provided by the better
eye alone. Thus presenting one eye with the original view intended for it, and presenting the other eye with a synthetic view
(which might be imperfect in sharpness and perhaps even missing some small features), the perception of both together is
better than the perception of the original view alone.
A related perceptual effect that we have observed informally has been documented in several controlled experiments: a
binocular 3D-stereoscopic image pair with one sharp member and one blurred member successfully stimulate appropriate
depth perception.

Thus we expect that if one member of a 3D-stereoscopic image pair is losslessly or nearly losslessly compressed and the
other is (by some appropriate method) deeply compressed, the pair of decompressed (higher resolution) and synthesized
(lower resolution) views will together be perceived comfortably and accurately.
In the following section we describe several approaches to compression, ultimately focusing on the method we are now
developing along two complementary implementation paths.
2.2. Correlations
We identify four kinds of correlations or redundancies that can be exploited to compress 3D-stereoscopic imagery. The first
two make no specific reference to 3D-stereoscopy; they are conventional image compression methods that might
(inefficiently!) be applied to two 3D-stereoscopic views independently. The third kind applies to still image pairs, or to
temporally corresponding members of a motion stream pair. The fourth kind, which is really a combination of the second
and third kinds, applies to motion stream pairs.
Spatial correlation: Within a single frame, large areas with little variation in intensity and color permit efficient
encoding based on internal predictability, i.e., the fact that any given pixel is most likely to be identical or nearly
identical to its neighbors. This is the basis for most conventional still image compression methods.
Temporal correlation: Between frames in a motion sequence, large areas in rigid-body motion permit efficient
coding based on frame-to-frame predictability. The approach is fundamentally to transmit an occasional frame,
and interpolation coefficients that permit the receiver to synthesize reasonable approximations to the
intermediate frames. MPEG is an example.
Perspective correlation: Between frames in a binocular 3D-stereoscopic image pair, large areas differing only by
small horizontal offsets permit efficient coding based on disparity predictability. If one imagines the two
perspective views as being gathered not simultaneously but rather sequentially by moving the camera from one
viewpoint to the second, then perspective correlation and temporal correlation are to first order equivalent.
WorldLine correlation: We borrow the term "worldline" from the Theory of Special Relativity, where the
worldline is a central concept that refers to the path of an object in 4-dimensional space-time. Observers moving
relative to each other, i.e., observers having different perspectives on space-time, perceive a worldline segment
as having different spatial and temporal components, but they all agree on the length of the segment.
Analogously in 3D-stereoscopic image streams, when vertical and axial velocities are small and horizontal
motion suitably compensates perspective, time-offset frames in the left and right image streams can be nearly
identical. WorldLine correlation is the combination of temporal correlation and perspective correlation; the most
interesting manifestation of WorldLine correlation is the potential near-identity of appropriately time-offset
frames in the left and right image streams respectively.* The concept is useful for situations in which the camera
is fixed and parts of the scene are in motion, the scene is fixed and the camera is in motion, and both the camera
and parts of the scene are in motion.
WorldLine correlation is depicted pictorially in Figure 1.
*Thinking in a suitable generalized fourier domain, simultaneous pairs from different perspectives and pairs from one perspective at different times are
characterized by nearly identical amplitude spectra but substantially (although systematically) different phase spectra.

almost identical
left now right now
right later
mutually predictable
Figure 1: Pictorial depiction of WorldLine correlation.
3. APPROACH
3.1. Basic Approach
Our basic approach to compression of 3D-stereoscopic imagery is based on the observation that disparity, the relative offset
between corresponding points in an image pair, varies only slowly over most of the image field. Given the validity of this
assumption, either member of an image pair can be synthesized (or "predicted") given the other member and a low-
resolution map of the relative disparity between the two members of the pair. It is the possibility that the disparity map can
be low resolution, combined with the fact that the disparities vary slowly and can be represented by small numbers (few bits)
that permits deep compression.
As a numerical example, suppose that over most of the image field the disparity does not change significantly over eight
pixels. Then a disparity map can be represented by a field with 1/64 the number of entries as the image itself. Each disparity
is a vector with two components, horizontal and vertical, so the net compression has an upper bound of 1/32, about 3%. In
fact further significant advantages can be obtained by recognizing that the disparity components can be encoded with fewer
bits than the original intensities, e.g., perhaps three bits for the vertical disparities (four pixels up or down) and perhaps five

Citations
More filters
01 Jan 2003
TL;DR: This thesis contributes to the realization of a flexible framework for videomediated communication over the Internet by presenting scalable and adaptive algorithms for multicast flow control, layered video coding, and robust transport of video.
Abstract: The tremendous success of the Internet in providing a global communication infrastructure for a wide variety of applications has inspired the invention of packet video systems for synchronous interpersonal communication The potential benefits of video-mediated communication are numerous, ranging from improved social interactions between individuals to more efficient distributed collaborative work However, since the Internet was originally designed as a data communication network, primarily supporting asynchronous applications like file transfer and electronic mail, the realization of Internet-based packet video systems presents considerable technological challenges Specifically, the best-effort service model of the Internet, that does not guarantee timely delivery of packets, implies that video applications must be resilient to packet loss and adaptive to variations in bandwidth and delay Two fundamental issues are how to make the systems scalable to large numbers of widely distributed users, and how to support video-mediated communication in highly heterogeneous environments Since the Internet is built upon network connections of widely different capacities and since the computers connected to the network have vastly different characteristics, video applications must be adaptive to diverse and dynamic conditions Furthermore, video-mediated communication systems must take various application-specific requirements and usability concerns into consideration This thesis contributes to the realization of a flexible framework for videomediated communication over the Internet by presenting scalable and adaptive algorithms for multicast flow control, layered video coding, and robust transport of video Enrichments of video-mediated communication, in the shape of stereoscopic video transmission mechanisms and mobility support, are proposed along with design and implementation guidelines Furthermore, the scope of Internet video is broadened through the introduction of a novel video gateway technology interconnecting multicast videoconferences with the World Wide Web In addition to the contributions on core technology, the thesis also deals with applications of video-mediated communication Specifically, the use of video for distributed collaborative teamwork is explored through experiments with prototype implementations

40 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: This approach leverages state-of-the-art single-image compression autoencoders and enhances the compression with novel parametric skip functions to feed fully differentiable, disparity-warped features at all levels to the encoder/decoder of the second image.
Abstract: In this paper we tackle the problem of stereo image compression, and leverage the fact that the two images have overlapping fields of view to further compress the representations. Our approach leverages state-of-the-art single-image compression autoencoders and enhances the compression with novel parametric skip functions to feed fully differentiable, disparity-warped features at all levels to the encoder/decoder of the second image. Moreover, we model the probabilistic dependence between the image codes using a conditional entropy model. Our experiments show an impressive 30 - 50% reduction in the second image bitrate at low bitrates compared to deep single-image compression, and a 10 - 20% reduction at higher bitrates.

29 citations


Cites background from "Compression of stereo image pairs a..."

  • ...disparity prediction to separate transforms for residual images [40, 14, 48, 3, 34, 42]....

    [...]

Proceedings ArticleDOI
23 Jul 2001
TL;DR: The requirements for realising a stereoscopic visual communication system based on Internet technology are discussed and in particular a transport protocol extension is proposed.
Abstract: One of the most remarkable features of the human visual system is the ability to perceive three-dimensional depth. This phenomenon is primarily related to the fact that the binocular disparity causes two slightly different images to be projected on the retinas. The images are fused by the human brain into one three-dimensional view. Various stereoscopic display systems have been devised to present computer generated or otherwise properly produced images separately to the eyes, resulting in the sensation of stereopsis. A stereoscopic visual communication system can be conceived by arranging two identical video cameras with an appropriate interocular separation, encoding the video signals and transporting the resultant data over a network to one or more receivers where it is decoded and properly displayed. The requirements for realising such a system based on Internet technology are discussed and in particular a transport protocol extension is proposed. The design and implementation of a prototype system is discussed and some experiences from using it are reported.

23 citations

Journal ArticleDOI
TL;DR: Control psychophysical experiments show that subjects perceived 3-D color images even when they were presented with only one color image in a stereoscopic pair, with no depth perception degradation and only limited color degradation.
Abstract: Utilizing remote color stereoscopic scenes typically requires the acquisition, transmission, and processing of two color images. However, the amount of information transmitted and processed is large, compared to either monocular images or monochrome stereo images. Existing approaches to this challenge focus on compression and optimization. This paper introduces an innovative complementary approach to the presentation of a color stereoscopic scene, specialized for human perception. It relies on the hypothesis that a stereo pair consisting of one monochromatic image and one color image (a MIX stereo pair) will be perceived by a human observer as a 3-D color scene. Taking advantage of color redundancy, this presentation of a monochromatic-color pair allows for a drastic reduction in the required bandwidth, even before any compression method is employed. Herein we describe controlled psychophysical experiments on up to 15 subjects. These experiments tested both color and depth perception using various combinations of color and monochromatic images. The results show that subjects perceived 3-D color images even when they were presented with only one color image in a stereoscopic pair, with no depth perception degradation and only limited color degradation. This confirms the hypothesis and validates the new approach.

12 citations

Dissertation
01 Jan 2004
TL;DR: The existence of inexpensive digital CMOS cameras are used to explore a multiimage capture paradigm and the gathering of real world real-time data of active and static scenes.
Abstract: The number of three-dimensional displays available is escalating and yet the capturing devices for multiple view content are focused on either single camera precision rigs that are limited to stationary objects or the use of synthetically created animations. In this work we will use the existence of inexpensive digital CMOS cameras to explore a multiimage capture paradigm and the gathering of real world real-time data of active and static scenes. The capturing system can be developed and employed for a wide range of applications such as portrait-based images for multi-view facial recognition systems, hypostereo surgical training systems, and stereo surveillance by unmanned aerial vehicles. The system will be adaptable to capturing the correct stereo views based on the environmental scene and the desired three-dimensional display. Several issues explored by the system will include image calibration, geometric correction, the possibility of object tracking, and transfer of the array technology into other image capturing systems. These features provide the user more freedom to interact with their specific 3-D content while allowing the computer to take on the difficult role of stereoscopic cinematographer. Thesis Supervisor: V. Michael Bove Jr. Title: Principal Research Scientist of Media Laboratory Scalable Multi-view Stereo Camera Array for Real World Real-Time Image Capture and Three-Dimensional Displays

12 citations

References
More filters
Journal ArticleDOI
Olivier Rioul1, Martin Vetterli
TL;DR: A simple, nonrigorous, synthetic view of wavelet theory is presented for both review and tutorial purposes, which includes nonstationary signal analysis, scale versus frequency,Wavelet analysis and synthesis, scalograms, wavelet frames and orthonormal bases, the discrete-time case, and applications of wavelets in signal processing.
Abstract: A simple, nonrigorous, synthetic view of wavelet theory is presented for both review and tutorial purposes. The discussion includes nonstationary signal analysis, scale versus frequency, wavelet analysis and synthesis, scalograms, wavelet frames and orthonormal bases, the discrete-time case, and applications of wavelets in signal processing. The main definitions and properties of wavelet transforms are covered, and connections among the various fields where results have been developed are shown. >

2,945 citations

Journal ArticleDOI
TL;DR: It is proved that the rate distortion limit for coding stereopairs cannot in general be achieved by a coder that first codes and decodes the right picture sequence independently of the left picture sequence, and then codes anddecodes theleft picture sequence given the decoded right picture sequences.
Abstract: Two fundamentally different techniques for compressing stereopairs are discussed. The first technique, called disparity-compensated transform-domain predictive coding, attempts to minimize the mean-square error between the original stereopair and the compressed stereopair. The second technique, called mixed-resolution coding, is a psychophysically justified technique that exploits known facts about human stereovision to code stereopairs in a subjectively acceptable manner. A method for assessing the quality of compressed stereopairs is also presented. It involves measuring the ability of an observer to perceive depth in coded stereopairs. It was found that observers generally perceived objects to be further away in compressed stereopairs than they did in originals. It is proved that the rate distortion limit for coding stereopairs cannot in general be achieved by a coder that first codes and decodes the right picture sequence independently of the left picture sequence, and then codes and decodes the left picture sequence given the decoded right picture sequence. >

243 citations

Journal ArticleDOI
TL;DR: A multiresolution representation for video signals is introduced and Interpolation in an FIR (finite impulse response) scheme solves uncovered area problems, considerably improving the temporal prediction.
Abstract: A multiresolution representation for video signals is introduced. A three-dimensional spatiotemporal pyramid algorithms for high-quality compression of advanced television sequences is presented. The scheme utilizes a finite memory structure and is robust to channel errors, provides compatible subchannels, and can handle different scan formats, making it well suited for the broadcast environment. Additional features such as fast random access and reverse playback make it suitable for digital storage as well. Model-based processing is used both over space and time, where motion-based interpolation is used. Interpolation in an FIR (finite impulse response) scheme solves uncovered area problems, considerably improving the temporal prediction. The complexity is comparable to that of previous recursive schemes. Computer simulations indicate that high compression factors (about an order of magnitude) are easily achieved with no apparent loss of quality. The scheme also has a number of commonalities with the emerging MPEG standard. >

204 citations

Proceedings ArticleDOI
14 Nov 1988
TL;DR: It was found that very deep compression of one of the images of a stereo pair does not interfere with the perception of depth in the stereo image.
Abstract: An approach to stereo image compression based on disparity compensation is proposed and evaluated. The scheme is motivated by the suppression theory in human vision. A methodology for evaluating compressed stereo images is proposed. It is based on time measurements of depth perception tasks performed by human subjects. Subjects taking part in the experiment were exposed to displays of stereo images, some of which had been compressed, and were asked to judge the relative depth within each display as fast as possible. Decision times were measured and used as the major dependent variable. It was found that very deep compression of one of the images of a stereo pair does not interfere with the perception of depth in the stereo image. >

63 citations

Journal ArticleDOI
TL;DR: This paper presents two-dimensional motion estimation methods which take advantage of the intrinsic redundancies inside 3DTV stereoscopic image sequences, subject to the crucial assumption that an initial calibration of the stereoscopic sensors provides us with geometric change of coordinates for two matched features.
Abstract: This paper presents two-dimensional motion estimation methods which take advantage of the intrinsic redundancies inside 3DTV stereoscopic image sequences. Most of the previous studies extract, either disparity vector fields if they are involved in stereovision, or apparent motion vector fields to be applied to motion compensation coding schemes. For 3DTV image sequence analysis and transmission, we can jointly estimate these two feature fields. Locally, initial image data are grouped within two views (the left and right ones) at two successive time samples and spatio-temporal coherence has to be used to enhance motion vector field estimation. Three different levels of ‘coherence’ have been experimented subject to the crucial assumption that an initial calibration of the stereoscopic sensors provides us with geometric change of coordinates for two matched features.

61 citations

Frequently Asked Questions (16)
Q1. What have the authors contributed in "Compression of stereo image pairs and streams" ?

In this paper, the authors exploit the correlations between 3D-stereoscopic left-right image pairs to achieve high compression factors for image frame storage and image stream transmission. 

Future research will address in the short term fine-tuning the architectures and algorithms and understanding their fundamental mathematical and psychophysical efficiencies, and in the long term issues such as multiple camera schemes and object based compression methods. 

APPROACHTheir basic approach to compression of 3D-stereoscopic imagery is based on the observation that disparity, the relative offset between corresponding points in an image pair, varies only slowly over most of the image field. 

When the set is as small (in bits) as 1 to 2% of the conventionally compressed image the stereoscopically viewed pair consisting of one original and one synthesized image produces convincing stereo imagery. 

Topics that the authors need to address in the context of compression of 3D-stereoscopic imagery include:• Optimizing implementation of the WorldLine approach.• 

The successful development of compression schemes for motion video that exploit the high correlation between temporally adjacent frames, e.g., MPEG, suggests that the authors might analogously exploit the high correlation between spatially or angularly adjacent still frames, i.e., left-right 3D-stereoscopic image pairs. 

Using three cameras: compute predictors for left and right views given the middle view, transmit the middle view and the predictors, synthesize 3D-stereoscopic views at the receiver. 

The fundamental issue is that when 3D-stereoscopy is implemented on a single display each eye gets in some sense only half the display. 

Their experiments demonstrate that a reasonable synthesis of one image of a left-right stereo image pair can be estimated from the other uncompressed or conventionally compressed image augmented by a small set of numbers that describe the local cross-correlations in terms of a disparity map. 

In fact, because the two views comprising a 3D-stereoscopic image pair are nearly identical, i.e., the information content of both together is only a little more than the information content of one alone, it is possible to find representations of image pairs and streams that take up little more storage space and transmission bandwidth than the space or bandwidth that is required by either alone. 

the bandwidth must apparently be doubled to transmit 3D-stereoscopic image streams at the same spatial resolution and temporal update frequency as either flat image stream. 

One component may be either lossless or slightly lossy, as in conventional compression of flat imagery; the other component is by itself a very lossy (or "deep") method of compression. 

The human visual perception system has an effective way to deal with occlusions: the authors have a detailed understanding of the image semantics, from which the authors effortlessly and unconsciously draw inferences that fill in the missing information. 

This is the obvious candidate for initial experiments because it is easy to code and because the authors have a strong intuitive understanding of its parameters. 

The price may be extracted in either essentially the spatial domain, e.g., by assigning the odd lines to the left eye and the even lines to the right eye, or in essentially the temporal domain, e.g., by assigning alternate frames to the left and right eye. 

Each disparity is a vector with two components, horizontal and vertical, so the net compression has an upper bound of 1/32, about 3%.