scispace - formally typeset
Search or ask a question

Showing papers on "View synthesis published in 2005"


Journal ArticleDOI
TL;DR: This work presents an approach for modeling and rendering a dynamic, real-world event from an arbitrary viewpoint, and at any time, using images captured from multiple video cameras, to compute a novel image from any viewpoint in the 4D space of position and time.
Abstract: We present an approach for modeling and rendering a dynamic, real-world event from an arbitrary viewpoint, and at any time, using images captured from multiple video cameras. The event is modeled as a nonrigidly varying dynamic scene, captured by many images from different viewpoints, at discrete times. First, the spatio-temporal geometric properties (shape and instantaneous motion) are computed. The view synthesis problem is then solved using a reverse mapping algorithm, ray-casting across space and time, to compute a novel image from any viewpoint in the 4D space of position and time. Results are shown on real-world events captured in the CMU 3D Room, by creating synthetic renderings of the event from novel, arbitrary positions in space and time. Multiple such recreated renderings can be put together to create retimed fly-by movies of the event, with the resulting visual experience richer than that of a regular video clip, or switching between images from multiple cameras.

141 citations


Journal ArticleDOI
TL;DR: Experimental evaluation demonstrates that this approach overcomes limitations of previous stereo- and silhouette-based approaches to rendering novel views of moving people and achieves a visual quality comparable to the captured video images.
Abstract: This paper addresses the synthesis of novel views of people from multiple view video. We consider the target area of the multiple camera 3D Virtual Studio for broadcast production with the requirement for free-viewpoint video synthesis for a virtual camera with the same quality as captured video. A framework is introduced for view-dependent optimisation of reconstructed surface shape to align multiple captured images with sub-pixel accuracy for rendering novel views. View-dependent shape optimisation combines multiple view stereo and silhouette constraints to robustly estimate correspondence between images in the presence of visual ambiguities such as uniform surface regions, self-occlusion, and camera calibration error. Free-viewpoint rendering of video sequences of people achieves a visual quality comparable to the captured video images. Experimental evaluation demonstrates that this approach overcomes limitations of previous stereo- and silhouette-based approaches to rendering novel views of moving people.

46 citations


Patent
28 Jun 2005
TL;DR: In this paper, an image-based rendering system and method for rendering a novel image from several reference images is presented, which includes a preprocessing module for pre-processing at least two of the reference images and providing pre-processed data.
Abstract: An image-based rendering system and method for rendering a novel image from several reference images. The system includes a pre-processing module for pre-processing at least two of the reference images and providing pre-processed data; a view synthesis module connected to the pre-processing module for synthesizing an intermediate image from the at least two of the reference images and the pre-processed data; and, an artifact rejection module connected to the view synthesis module for correcting the intermediate image to produce the novel image.

42 citations


01 Jan 2005
TL;DR: This paper shows how a hierarchical decomposition of the texture patch database both allows multiple-scale analysis and speeds up the imposition of the priors and yields comparable results to the previous method, with significant gains in speed.
Abstract: Novel view synthesis using image-based priors has recently been shown to provide high quality renderings of complex 3D scenes. However, current methods are extremely slow, requiring of the order of hours to render a single frame. In this paper we show how a coarse-to-fine method can be used to reduce this time significantly. In contrast to traditional multiple-view stereo methods, devising a coarse-to-fine strategy for this problem is complicated by the fact that image-based priors are strongly tied to the scale at which rendering is performed. We show how a hierarchical decomposition of the texture patch database both allows multiple-scale analysis and speeds up the imposition of the priors. Examples are shown on a number of challenging sequences, and illustrate that the new method yields comparable results to the previous method, with significant gains in speed.

19 citations


Journal ArticleDOI
01 Aug 2005
TL;DR: Two methods to extrapolate novel views of complex scenes with occlusions and large depth discontinuities from images of a moving uncalibrated multi-camera rig are discribed.
Abstract: Image-based rendering is a method to synthesise novel views from a set of given real images. Two methods to extrapolate novel views of complex scenes with occlusions and large depth discontinuities from images of a moving uncalibrated multi-camera rig are discribed. The real camera viewpoints are calibrated from the image data and dense depth maps are estimated for each real view. Novel views are synthesised from this representation with view-dependent image-based rendering techniques at interactive rates. Since the 3-D scene geometry is available in this approach, it is well suited for mixed reality applications where synthetic 3-D objects are seamlessly embedded in the novel view.

17 citations


Proceedings ArticleDOI
12 Jan 2005
TL;DR: A novel 3D face shape-modeling algorithm, Multilevel Quadratic Variation Minimization (MQVM), that makes sole use of two orthogonal real views of a face, i.e., the frontal and profile views, and can generate C²-smooth3D face surfaces.
Abstract: One of the key remaining problems in face recognition is that of handling the variability in appearance due to changes in pose. One strategy is to synthesize virtual face views from real views. In this paper, a novel 3D face shape-modeling algorithm, Multilevel Quadratic Variation Minimization (MQVM), is proposed. Our method makes sole use of two orthogonal real views of a face, i.e., the frontal and profile views. By applying quadratic variation minimization iteratively in a coarse-to-fine hierarchy of control lattices, the MQVM algorithm can generate C²-smooth 3D face surfaces. Then realistic virtual face views can be synthesized by rotating the 3D models. The algorithm works properly on sparse constraint points and large images. It is much more efficient than single-level quadratic variation minimization. The modeling results suggest the validity of the MQVM algorithm for 3D face modeling and 2D face view synthesis under different poses.

12 citations


Proceedings ArticleDOI
14 Nov 2005
TL;DR: A new multiple image view synthesis algorithm that only requires camera parameters and disparity maps to be used and is scalable as virtual views can be created given 1 to N of the available video inputs, providing a means to gracefully handle scenarios where camera inputs decrease or increase over time.
Abstract: Interactive audio-visual (AV) applications such as free viewpoint video (FVV) aim to enable unrestricted spatio-temporal navigation within multiple camera environments. Current virtual viewpoint view synthesis solutions for FVV are either purely image-based implying large information redundancy; or involve reconstructing complex 3D models of the scene. In this paper we present a new multiple image view synthesis algorithm that only requires camera parameters and disparity maps. The multi-view synthesis (MVS) approach can be used in any multi-camera environment and is scalable as virtual views can be created given 1 to N of the available video inputs, providing a means to gracefully handle scenarios where camera inputs decrease or increase over time. The algorithm identifies and selects only the best quality surface areas from available reference images, thereby reducing perceptual errors in virtual view reconstruction. Experimental results are presented and verified using both objective (PSNR) and subjective comparisons.

12 citations


Proceedings ArticleDOI
TL;DR: An algorithm for efficient image synthesis to generate realistic virtual views of a dynamic scene from a new camera viewpoint on video-conferencing applications using a combined approach of CPU and GPU processing.
Abstract: This paper presents an algorithm for efficient image synthesis. The main goal is to generate realistic virtual views of a dynamic scene from a new camera viewpoint. The algorithm works online on two or more incoming video streams from calibrated cameras. A reasonably large distance between the cameras is allowed. The main focus is on video-conferencing applications. The background is assumed to be static, as is often the case in such environments. By performing a foreground segmentation, the foreground and the background can be handled separately. For the background a slower, more accurate algorithm can be used. Reaching a high throughput is most crucial for the foreground, as this is the dynamic part of the scene. We use a combined approach of CPU and GPU processing. Performing depth calculations on the GPU is very efficient, thanks to the possibilities of the latest graphical boards. However the result tends to be rather noisy. As such we apply a regularisation algorithm on the CPU to ameliorate this result. The final interpolation is again provided by rendering on the graphical board. The big advantage of using both CPU and GPU is that they can run completely in parallel. This can be realised by an implementation using multiple threads. As such different algorithms can be applied to two frames simultaneously and the total throughput is increased.

11 citations


Proceedings Article
01 Jan 2005
TL;DR: In this article, a generic framework for novel view synthesis from two uncalibrated reference views that allows to move a virtual camera along a path that is obtained starting from the epipolar geometry of the reference views is presented.
Abstract: This paper presents a method to continuously adjust the parallax in 3D-TV visualization It is based on a generic framework for novel view synthesis from two uncalibrated reference views that allows to move a virtual camera along a path that is obtained starting from the epipolar geometry of the reference views The scene is described by its relative affine structure from which new views are extrapolated and interpolated The main contribution of this paper is an automatic method for specifying virtual camera locations in an uncalibrated setting Experiments with synthetic and real images illustrate the approach

9 citations


Proceedings ArticleDOI
01 Jan 2005
TL;DR: An algorithm for the layered segmentation of video data in multiple views based on computing the parameters of a layered representation of the scene in which each layer is modelled by its motion, appearance and occupancy is presented.
Abstract: We present an algorithm for the layered segmentation of video data in multiple views. The approach is based on computing the parameters of a layered representation of the scene in which each layer is modelled by its motion, appearance and occupancy, where occupancy describes, probabilistically, the layer's spatial extent and not simply its segmentation in a particular view. The problem is formulated as the MAP estimation of all layer parameters conditioned on those at the previous time step; i.e., a sequential estimation problem that is equivalent to tracking multiple objects in a given number views. Expectation-Maximisation is used to establish layer posterior probabilities for both occupancy and visibility, which are represented distinctly. Evidence from areas in each view which are described poorly under the model is used to propose new layers automatically. Since these potential new layers often occur at the fringes of images, the algorithm is able to segment and track these in a single view until such time as a suitable candidate match is discovered in the other views. The algorithm is shown to be very effective at segmenting and tracking non-rigid objects and can cope with extreme occlusion. We demonstrate an application of this representation to dynamic novel view synthesis.

8 citations


10 May 2005
TL;DR: In this paper, a mesh-based shape reconstruction framework is introduced to initialise and optimise the shape of a dynamic scene for view-dependent rendering, making use of silhouette and stereo data as complementary shape cues.
Abstract: This paper addresses the synthesis of virtual views of people from multiple view image sequences. We consider the target area of the multiple camera “3D Virtual Studio” with the ultimate goal of capturing video-realistic dynamic human appearance. A mesh based reconstruction framework is introduced to initialise and optimise the shape of a dynamic scene for view-dependent rendering, making use of silhouette and stereo data as complementary shape cues. The technique addresses two key problems: (1) robust shape reconstruction; and (2) accurate image correspondence for view dependent rendering in the presence of camera calibration error. We present results against ground truth data in synthetic test cases and for captured sequences of people in a studio. The framework demonstrates a higher resolution in rendering compared to shape from silhouette and multiple view stereo.

Proceedings ArticleDOI
17 Nov 2005
TL;DR: An efficient algorithm that is capable of rendering high-quality novel views from the captured images and a view-dependent adaptive capturing scheme that moves the cameras in order to show even better rendering results.
Abstract: This paper presents a self-reconfigurable camera array system that captures and renders 3D virtual scenes interactively It is composed of an array of 48 cameras mounted on mobile platforms We propose an efficient algorithm that is capable of rendering high-quality novel views from the captured images The algorithm reconstructs a view-dependent multiresolution 2D mesh model of the scene geometry on the fly and uses it for rendering The algorithm combines region of interest (ROI) identification, JPEG image decompression, lens distortion correction, scene geometry reconstruction and novel view synthesis seamlessly on a single Intel Xeon 24 GHz processor, which is capable of generating novel views at 4 10 frames per second (fps) In addition, we present a view-dependent adaptive capturing scheme that moves the cameras in order to show even better rendering results Such camera reconfiguration naturally leads to a nonuniform arrangement of the cameras on the camera plane, which is both view-dependent and scene-dependent

Proceedings ArticleDOI
14 Nov 2005
TL;DR: An automatic method for specifying the virtual viewpoint in an uncalibrated setting, based on the interpolation and extrapolation of the epipolar geometry linking the reference views is presented.
Abstract: In this paper we present a method for novel view synthesis from two uncalibrated reference views. Snapshots of a scene are created as if they were taken from a different "virtual" viewpoint. The relative affine structure is used to describe the geometry of the scene and then to extrapolate and interpolate novel views. The contribution of this paper is an automatic method for specifying the virtual viewpoint in an uncalibrated setting, based on the interpolation and extrapolation of the epipolar geometry linking the reference views. Experimental results using synthetic and real images are shown.

Proceedings ArticleDOI
13 Jun 2005
TL;DR: This paper presents a novel approach for view synthesis and image interpolation that is build up in a hierarchical way, and this on different structural levels instead of using a classic image pyramid.
Abstract: This paper presents a novel approach for view synthesis and image interpolation. The algorithm is build up in a hierarchical way, and this on different structural levels instead of using a classic image pyramid. First coarse matching is done on a 'shape basis' only. A background-foreground segmentation yields a fairly accurate contour for every incoming video stream. Inter-relating these contours is a 1D problem and as such very fast. This step is then used to compute small position dependent bounding-boxes in 3D space which enclose the underlying object. The next step is a more expensive window based matching, within the volume of these bounding-boxes. This is limited to a number of regions around 'promising' feature points. Global regularisation is obtained by a graph cut. Speed results here from limiting the number of feature points. In a third step the interpolation is 'pre-rendered' and simultaneously evaluated on a per pixel basis. This is done by computing a Birchfield dissimilarity measure on the GPU. Per pixel parallelised operations keep computational cost low. Finally the bad interpolated parts are 'patched'. This per pixel correction yields the final interpolated view at the finest level. Here we also deal explicitly with opacity at the borders of the foreground object.

Proceedings ArticleDOI
06 Jul 2005
TL;DR: A view synthesis algorithm that provides a scalable and flexible approach to virtual viewpoint synthesis in multiple camera environments and identifies and selects only the best quality surface areas from the set of available reference images, thereby reducing perceptual errors in virtual view reconstruction.
Abstract: One of the main aims of emerging audio-visual (AV) applications is to provide interactive navigation within a captured event or scene. This paper presents a view synthesis algorithm that provides a scalable and flexible approach to virtual viewpoint synthesis in multiple camera environments. The multi-view synthesis (MVS) process consists of four different phases that are described in detail: surface identification, surface selection, surface boundary blending and surface reconstruction. MVS view synthesis identifies and selects only the best quality surface areas from the set of available reference images, thereby reducing perceptual errors in virtual view reconstruction. The approach is camera setup independent and scalable as virtual views can be created given 1 to N of the available video inputs. Thus, MVS provides interactive AV applications with a means to handle scenarios where camera inputs increase or decrease over time.

Journal ArticleDOI
TL;DR: A method for synthesis of views corresponding to translational motion of the camera, which can handle occlusions and changes in visibility in the synthesized views, and gives a characterisation of the viewpoints corresponding to which views can be synthesized.

Proceedings ArticleDOI
14 Nov 2005
TL;DR: This method does not need a pre-computed dense depth map, therefore overcomes most common problems associated with conventional dense correspondence algorithms, yet still produce very photo-realistic novel images.
Abstract: This paper provides a new transfer based novel view synthesis method. This method does not need a pre-computed dense depth map, therefore overcomes most common problems associated with conventional dense correspondence algorithms, yet still produce very photo-realistic novel images. The power of the method comes from the introducing and using of a novel inverse tensor transfer technique, which offers a simple mechanism to exploit both photometric constraints and geometric constraints across multiple input images. Our method works equally well for both calibrated images and un-calibrated images. Experiments on real sequences show promising results.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: A novel method based on pre-processing prior to intermediate view interpolation to synthesise an intermediate view as would be viewed by a virtual camera located along the baseline joining the two stereo video cameras is presented.
Abstract: This paper presents a novel method based on pre-processing prior to intermediate view interpolation to synthesise an intermediate view as would be viewed by a virtual camera located along the baseline joining the two stereo video cameras. The objective is to achieve virtual eye contact for immersive videoconferencing. An energy minimisation method based on graph cut is used to obtain a disparity map used for interpolation of the virtual camera view. A simple linear interpolation method is applied to synthesise the virtual view. Perfect eye contact is achieved for Head and Shoulder stereoscopic video sequences. We evaluate our method by using variety of standard stereoscopic video test sequences.

Proceedings ArticleDOI
18 Apr 2005
TL;DR: A region based dynamic programming approach with improved matching cost, occlusion cost and vertical smoothness constraint, and a fast view interpolation method to achieve real time performance.
Abstract: In this paper, we propose a new visual communication system where eye contact is possible by using a virtual image. The virtual image is obtained by view synthesis with stereo matching from two real camera views. We developed a region based dynamic programming (DP) approach with improved matching cost, occlusion cost and vertical smoothness constraint. We also proposed a fast view interpolation method. To achieve real time performance, we developed a hardware system. Furthermore, to avoid the reordering problem in the foreground region, a view change approach with disorder detection is adopted. Experimental results demonstrate the validity of our improved DP matching algorithm and eye-contact visual communication system.

Proceedings ArticleDOI
18 Mar 2005
TL;DR: An active view synthesis approach from silhouettes is introduced, using the turning function distance as the silhouette similarity measurement, which can be used to generate the desired pose-normalized images for recognition applications.
Abstract: We introduce an active view synthesis approach from silhouettes. With the virtual camera moving on a properly selected circular trajectory around an object of interest, we achieve a collection of virtual views of the object, which is equivalent to the case that the object is on a rotating turntable and captured by a static camera whose optical axis is parallel to the turntable. We show how to derive the virtual camera's extrinsic parameters at each position on the trajectory. Using the turning function distance as the silhouette similarity measurement, this approach can be used to generate the desired pose-normalized images for recognition applications.

Proceedings ArticleDOI
07 Jun 2005
TL;DR: A disparity analysis and view synthesis algorithm is described and the effect of compression artifacts on the objective and subjective quality of the synthesised views is investigated using the H.264/AVC video coding standard.
Abstract: In rapidly deployable video surveillance networks, camera positions may be heavily constrained and coverage limited and often sparse. To enhance visualization, automatic depth estimation algorithms can be utilized to enable the synthesis of arbitrary views of the scene, offering a pseudo 'look-around' capability. In a wireless local area network, bandwidth limitations and channel transmission errors may introduce artifacts into the video data to be analysed. This paper describes a disparity analysis and view synthesis algorithm and using the H.264/AVC video coding standard, investigates the effect of compression artifacts on the objective and subjective quality of the synthesised views.

Journal ArticleDOI
TL;DR: This work proposes two solutions in order to reduce annoying flicker at object boundaries of synthesized intermediate sequences using quadtree-based disparity estimation, including adaptive temporal smoothing, using the dissimilarity between the present frame and the previous one to reduce the error of disparity estimation.
Abstract: In stereoscopic or multiview 3-D display systems, the synthesis of intermediate sequences is essential to assure lookaround capability and continuous motion parallax so that it can enhance comfortable 3-D perception. The quadtree-based disparity estimation is one of the most popular methods for the synthesis of intermediate sequences, due to the simplicity of its algorithm and hardware implementation. We propose two solutions in order to reduce annoying flicker at object boundaries of synthesized intermediate sequences using quadtree-based disparity estimation. The first requires a new splitting scheme, providing more consistent quadtree splitting strategies during the disparity estimation. The second involves adaptive temporal smoothing, using the dissimilarity between the present frame and the previous one to reduce the error of disparity estimation. These two proposals are tested by using several stereoscopic sequences, and the results show that flicker is remarkably reduced by them.

Proceedings ArticleDOI
08 Sep 2005
TL;DR: A new algorithm for the coding of depth maps that uses piecewise linear functions to approximate the depth information is proposed and it provides a high compression factor with bit-rates as low as 0.05 bit/pixel.
Abstract: An efficient way to transmit multi-view images is to send the texture image together with a corresponding depth-map. The depth-map specifies the distance between each pixel and the camera. With this information, arbitrary views can be generated at the decoder. This technique requires a compression technique for the depth-maps. Ordinary image compression algorithms like JPEG provide low quality, since the ringing artifacts along edges generate clouds of pixels in 3D space. For this reason, we propose a new algorithm for the coding of depth maps that uses piecewise linear functions to approximate the depth information. This algorithm shows no degradation along discontinuities and it provides a high compression factor with bit-rates as low as 0.05 bit/pixel.

Book ChapterDOI
21 Oct 2005
TL;DR: This paper presents a new interactive teleconferencing system which adds a ‘virtual' camera to the scene which can move freely in between multiple real cameras, which produces a clearer and more engaging view for the remote audience, without the need for a human editor.
Abstract: This paper presents a new interactive teleconferencing system. It adds a ‘virtual' camera to the scene which can move freely in between multiple real cameras. The viewpoint can automatically be selected using basic cinematographic rules, based on the position and the actions of the instructor. This produces a clearer and more engaging view for the remote audience, without the need for a human editor. For the creation of the novel views generated by such a ‘virtual' camera, segmentation and depth calculations are required. The system is semi-automatic, in that the user is asked to indicate a few corresponding points or edges for generating an initial rough background model. Next to the static background and moving foreground also multiple independently moving objects are catered for. The initial foreground contour is tracked over time, using a new active contour. If a second object appears, the contour prediction allows to recognize this situation and to take appropriate measures. The 3D models are continuously validated based on a Birchfield dissimilarity measure. The foreground model is updated every frame, the background is refined if necessary. The current implementation can reach approx 4 fps on a single desktop.

Proceedings ArticleDOI
05 Dec 2005
TL;DR: This paper addresses the problem of IBR on images taken by a mobile robot by proposing a novel IBR method, where the location uncertainty is reduced using a visual landmark, which is commonly used in mobile robotics.
Abstract: Image based rendering (IBR) is one of standard approaches to view synthesis. IBR would be very important for human-robot interface system (HRI) such as a teleoperation system, since the synthesized views are helpful for a human user to understand the robot's operating environment. In this paper, we address the problem of IBR on images taken by a mobile robot. Main difficulty of our problem is that the viewpoint locations of input (real) images are not precisely known, due to estimation errors inherent in the positioning systems. To solve this problem, we will propose a novel IBR method, where the location uncertainty is reduced using a visual landmark, which is commonly used in mobile robotics. In addition, novel priors on real images are introduced to regularize the IBR problem. As a result, IBR can be performed successfully under the location uncertainty.

Proceedings ArticleDOI
14 Mar 2005
TL;DR: The proposed system tracks the user’s pointer in real-time and solves point correspondences across all the cameras that form spatio-temporal “traces” that serve as a medium for sketching in a true 3-D space.
Abstract: This paper proposes a real-time 3D user interface using multiple possibly uncalibrated cameras. It tracks the user’s pointer in real-time and solves point correspondences across all the cameras. These correspondences form spatio-temporal “traces” that serve as a medium for sketching in a true 3-D space. Alternatively, they may be interpreted as gestures or control information to elicit some particular action(s). Through view synthesis techniques, the system enables the user to change and seemingly manipulate the viewpoint of the virtual scene even in the absence of camera calibration. It also serves as a flexible, intuitive, and portable mixed-reality display system. The proposed system has numerous implications in interaction and design, especially as a general interface for creating and manipulating various forms of 3-D media.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: This paper proposes a novel IBR method, where the location uncertainty is reduced using a visual landmark, which is commonly used in mobile robotics, and can be performed successfully under the location Uncertainty.
Abstract: In this paper, we address the problem of image based rendering (IBR) on images taken by a mobile robot, called robot image database. IBR is a standard approach to view synthesis that would be very important for human-robot interface systems such as a teleoperation system, since synthesized views are helpful for a human user to understand the robot's operating environment. Main difficulty of our problem is that the viewpoint locations of input (real) images are not precisely known, due to estimation errors inherent in the positioning systems. To solve this problem, we propose a novel IBR method, where the location uncertainty is reduced using a visual landmark, which is commonly used in mobile robotics. Also, novel priors on real images are introduced to regularize the IBR problem. As a result, IBR can be performed successfully under the location uncertainty