scispace - formally typeset
Search or ask a question

Showing papers on "View synthesis published in 2003"


Journal ArticleDOI
TL;DR: The heart of the method is to use programmable Pixel Shader technology to square intensity differences between reference image pixels, and then to choose final colors that correspond to the minimum difference, i.e. the most consistent color.
Abstract: We present a novel use of commodity graphics hardware that effectively combines a plane-sweeping algorithm with view synthesis for real-time, online 3D scene acquisition and view synthesis Using real-time imagery from a few calibrated cameras, our method can generate new images from nearby viewpoints, estimate a dense depth map from the current viewpoint, or create a textured triangular mesh We can do each of these without any prior geometric information or requiring any user interaction, in real time and online The heart of our method is to use programmable Pixel Shader technology to square intensity differences between reference image pixels, and then to choose final colors (or depths) that correspond to the minimum difference, ie the most consistent color In this paper we describe the method, place it in the context of related work in computer graphics and computer vision, and present some results ACM CSS: I33 Computer Graphics—Bitmap and framebuffer operations, I48 Image Processing and Computer Vision—Depth cues, Stereo

92 citations


Proceedings ArticleDOI
09 Jun 2003
TL;DR: This paper presents a multi-state statistical decision models with Kalman filtering based tracking for head pose detection and face orientation estimation, which allows simultaneous capture of the driver's head pose, driving view, and surroundings of the vehicle.
Abstract: Our research is focused on the development of novel machine vision based telematic systems, which provide non-intrusive probing of the state of the driver and driving conditions. In this paper we present a system which allows simultaneous capture of the driver's head pose, driving view, and surroundings of the vehicle. The integrated machine vision system utilizes a video stream of full 360 degree panoramic field of view. The processing modules include perspective transformation, feature extraction, head detection, head pose estimation, driving view synthesis, and motion segmentation. The paper presents a multi-state statistical decision models with Kalman filtering based tracking for head pose detection and face orientation estimation. The basic feasibility and robustness of the approach is demonstrated with a series of systematic experimental studies.

89 citations


01 Jan 2003
TL;DR: In this article, a novel-view synthesis algorithm for teleconferencing is proposed, which is based on an improved, dynamic-programming, stereo algorithm for efficient novel view generation.
Abstract: A new algorithm is proposed for novel-view synthesis, with particular application to teleconferencing. Given the video streams acquired by two cameras placed on either side of a computer monitor, the proposed algorithm synthesises images from a virtual camera in arbitrary position (typically located within the monitor area) to facilitate eye contact. The new technique is based on an improved, dynamic-programming, stereo algorithm for efficient novel-view generation. The two main contributions of this paper are: i) a new four-layer matching graph for dense-stereo dynamic-programming, that supports accurate occlusion labeling; ii) a compact geometric derivation for novel-view synthesis by direct projection of the minimum-cost surface. Furthermore, the paper presents an algorithm for the temporal maintenance of a background model to enhance the rendering of occlusions and reduce temporal artefacts (flicker); and a cost aggregation algorithm that acts directly in three-dimensional matching cost space. The proposed algorithm has been designed to work with input images with large disparity range, a common situation in one-to-one video-conferencing. The enhanced occlusion- handling capabilities of the new DP algorithm are evaluated against those of the most powerful state-of-the-art dynamic-programming and graph-cut techniques. A number of examples demonstrate the robustness of the algorithm to artefacts in stereo video streams. This includes demonstrations of cyclopean view synthesis in extended conversational sequences, synthesis from a freely translating virtual camera and, finally, basic 3D scene editing.

45 citations


01 Jan 2003
TL;DR: An all in-focus view is generated from the set of differently focused views based on a new focus measurement algorithm specialized for light field rendering and plenoptic sampling theory.
Abstract: Light field rendering (LFR) is a fundamental method for generating new views from a set of pre-acquired images. We use densely-aligned cameras for the process of acquiring the set of images. In most practical cases, the density of the aligned cameras is not high enough to synthesize appropriate views. This “under-sampling” condition causes focus-like effects in the synthesized views. This paper proposes a new method for solving this problem. First, a set of differently focused views is synthesized from the undersampled set of pre-acquired images. Then, an all in-focus view is generated from the set of differently focused views. This is based on a new focus measurement algorithm specialized for light field rendering and plenoptic sampling theory. Experimental results show the effectiveness of our approach.

40 citations


01 Jan 2003
TL;DR: The paper presents a multi-state statistical decision models with Kalman filtering based tracking for head pose detection and face orientation estimation for simultaneous capture of the driver's head pose and driving view.
Abstract: Driver distraction is an important issue in developing new generation of telematic systems. Our research is focused on development of novel machine vision systems, which can provide better understanding of the state of the driver and driving conditions. In this paper we discuss in detail on the development of a system which allows simultaneous capture of the driver's head pose and driving view. The system utilizes a full 360 degree panoramic field of view using a single video stream. The integrated machine vision system includes modules of perspective transformation, feature extraction, head detection, head pose estimation, and driving view synthesis. The paper presents a multi-state statistical decision models with Kalman filtering based tracking for head pose detection and face orientation estimation. The basic feasibility and robustness of the approach is demonstrated with the help of a series of systematic experimental studies.

27 citations


Proceedings ArticleDOI
17 Sep 2003
TL;DR: An automatic method for specifying the virtual viewpoint based on the replication of the epipolar geometry linking two reference views is introduced and a method for generating synthetic views of a soccer ground starting from a single uncalibrated image is presented.
Abstract: This work deals with the view synthesis problem, i.e., how to generate snapshots of a scene taken from a "virtual" viewpoint different from all the viewpoints of the real views. Starting from uncalibrated reference images, the geometry of the scene is recovered by means of the relative affine structure. This information is used to extrapolate novel views using planar warping plus parallax correction. The contributions of this paper are twofold. First we introduce an automatic method for specifying the virtual viewpoint based on the replication of the epipolar geometry linking two reference views. Second, we present a method for generating synthetic views of a soccer ground starting from a single uncalibrated image. Experimental results using real images are shown.

23 citations


Proceedings ArticleDOI
01 Jan 2003
TL;DR: A persistent representation of occupancy is maintained in spite of occlusion without enforcing a particular parametric shape model, and a MAP solution for estimating layer parameters which are consistent across views is formulated.
Abstract: We propose a multiple view layered representation for tracking and segmentation of multiple objects in a scene. Existing layered approaches are dominated by the single view case and generally exploit only motion cues. We extend this to integrate static, dynamic and structural cues over a pair of views. The goal is to update coherent correspondence information sequentially, producing a multi-object tracker as a natural byproduct. We formulate a MAP solution for estimating layer parameters which are consistent across views, with the EM algorithm used to determine both the hidden segmentation labelling and motion parameters. A persistent representation of occupancy is maintained in spite of occlusion without enforcing a particular parametric shape model. An immediate application is dynamic novel view synthesis, for which our layered approach offers a direct and convenient representation.

23 citations


Book ChapterDOI
01 Jan 2003
TL;DR: This work states that the goal of automatic recovery of camera motion and scene structure from video sequences has been a staple of computer vision research for over a decade and now represents one of the success stories ofComputer vision.
Abstract: The goal of automatic recovery of camera motion and scene structure from video sequences has been a staple of computer vision research for over a decade. As an area of endeavour, it has seen both steady and explosive progress over time, and now represents one of the success stories of computer vision. This task, automatic camera tracking or “matchmoving”, is the sine qua non of modern special effects, allowing the seamless insertion of computer generated objects onto live-action backgrounds (figure 2.1 shows an example). It has moved from a research problem for a small number of uncalibrated images to commercial software which can automatically track cameras through thousands of frames [1]. In addition, camera tracking is an important preprocess for many computer vision algorithms such as multiple-view shape reconstruction, novel view synthesis and autonomous vehicle navigation.

23 citations


Journal ArticleDOI
TL;DR: This work presents an efficient image-based rendering algorithm that generates views of a scene's photo hull that takes advantage of epipolar geometry to efficiently reconstruct the geometry and visibility of ascene.
Abstract: We present an efficient image-based rendering algorithm that generates views of a scene's photo hull. The photo hull is the largest 3D shape that is photo-consistent with photographs taken of the scene from multiple viewpoints. Our algorithm, image-based photo hulls (IBPH), like the image-based visual hulls (IBVH) algorithm from Matusik et al. on which it is based, takes advantage of epipolar geometry to efficiently reconstruct the geometry and visibility of a scene. Our IBPH algorithm differs from IBVH in that it utilizes the color information of the images to identify scene geometry. These additional color constraints result in more accurately reconstructed geometry, which often projects to better synthesized virtual views of the scene. We demonstrate our algorithm running in a realtime 3D telepresence application using video data acquired from multiple viewpoints.

19 citations


Proceedings ArticleDOI
06 Jul 2003
TL;DR: A method, which combines image-based visual hull and human body part segmentation for overcoming the inability of the visual hull method to reconstruct concave regions for human postures and texture mapping is presented.
Abstract: In this paper, we present a method, which combines image-based visual hull and human body part segmentation for overcoming the inability of the visual hull method to reconstruct concave regions. The virtual silhouette image corresponding to the given viewing direction is first produced with image-based visual hull. Human body part localization technique is used to segment the input images and the rendered virtual silhouette image into convex body parts. The body parts in the virtual view are generated separately from the corresponding body parts in the input views and then assembled together. The previously rendered silhouette image is used to locate the corresponding body parts in input views and avoid the unconnected or squeezed regions in the assembled final view. Experiments show that this method can improve the reconstruction of concave regions for human postures and texture mapping.

15 citations


Proceedings ArticleDOI
24 Nov 2003
TL;DR: This paper details a new modular approach to virtual view creation that is designed to work in conjunction with a proposed scalable teleconferencing configuration configured using building blocks defined as stereo camera analysis blocks (SCAB).
Abstract: This paper details a new modular approach to virtual view creation that is designed to work in conjunction with a proposed scalable teleconferencing configuration. This scalable system is configured using building blocks defined as stereo camera analysis blocks (SCAB). SCABs consist of axe-parallel narrow baseline stereo cameras. These provide design flexibility and improve the image analysis process. Virtual view creation is modular in such that we can add or remove SCABs based on our particular requirements without having to modify the view synthesis algorithm. A new approach to virtual view creation from multiple images is defined. This contains two separate processes: surface segmentation, which identifies surfaces, and surface selection and merging, which selects and integrates the best view of a required surface into the virtual view. The surfaces are identified via a sampling density function while the surface selection procedure is based on a weighting scheme.

Journal ArticleDOI
TL;DR: This paper aims to show that the needle-maps can be used to generate novel object views under changing light source and viewer directions, and investigates the use of shape-from-shading for coarse view synthesis.

Proceedings ArticleDOI
18 Nov 2003
TL;DR: Experimental results show that the proposed real-time method of estimating depth data corresponding to each element image on an IP image is very useful for improving the quality of the free-viewpoint image synthesis.
Abstract: In the field of 3-D imaging technology, Integral Photography (IP) is one of the promising approaches, and a combination of an HDTV camera and an optical fiber array has been investigated to display 3-D live video sequences. The authors have applied this system to a computer graphics method for synthesizing arbitrary views from IP images: a method of interactively displaying free-viewpoint images without physical lens array. This paper proposes a real-time method of estimating depth data corresponding to each element image on an IP image. Experimental results show that the proposed method is very useful for improving the quality of the free-viewpoint image synthesis.

01 Jan 2003
TL;DR: This dissertation introduces a fully automatic, physically-based framework for view synthesis that is called View-dependent Pixel Coloring (VDPC), which uses a hybrid approach that estimates the most likely color for every picture element of an image from the desired view, while simultaneously estimating a view-dependent 3D model of the scene.
Abstract: The basic goal of traditional computer graphics is to generate 2D images of a synthetic scene represented by a 3D analytical model. When it comes to real scenes however, one usually does not have a 3D model. If however one has access to 2D images of the scene gathered from a few cameras, one can use view synthesis techniques to generate 2D images from various viewing angles between and around the cameras. In this dissertation I introduce a fully automatic, physically-based framework for view synthesis that I call View-dependent Pixel Coloring (VDPC). VDPC uses a hybrid approach that estimates the most likely color for every picture element of an image from the desired view, while simultaneously estimating a view-dependent 3D model of the scene. By taking into account a variety of factors including object occlusions, surface geometry and materials, and lighting, VDPC has produced superior results under some very challenging conditions—in particular—in the presence of textureless regions and specular highlights, conditions that cause conventional approaches to fail. In addition, VDPC can be implemented on commodity graphics hardware under certain simplifying assumptions. The basic idea is to use texture-mapping functions to warp the input images to the desired view point, and use programmable pixel rendering functions to decide the most consistent color for each pixel in the output image. By exploiting the fast speed and tremendous amount of parallelism inherent in today's graphics board, one can achieve real-time, on-line view synthesis of a dynamic scene.

Book ChapterDOI
04 Jun 2003
TL;DR: This paper presents a method for obtaining accurate 3D models by merging the carving and stereo-matching algorithms, and shows the reached improvements in the accuracy of the model.
Abstract: In this paper, we present a method for obtaining accurate 3D models by merging the carving and stereo-matching algorithms. Multiple views of an object are taken from known camera poses. Object images, when segmented, are used to carve a rough 3D model of the object. View synthesis results are compared with real object views in order to validate the recovered model. When errors are detected, commonly due to occlusions and/or concavities, a fine stereo-matching algorithm is applied. Obtained depth map updates the inconsistent areas of the object model. Performed tests show the reached improvements in the accuracy of the model.

01 Jan 2003
TL;DR: A mesh based reconstruction framework is introduced to initialise and optimise the shape of a dynamic scene for view-dependent rendering, making use of silhouette and stereo data as complementary shape cues.
Abstract: This paper addresses the synthesis of virtual views of people from multiple view image sequences. We consider the target area of the multiple camera “3D Virtual Studio” with the ultimate goal of capturing video-realistic dynamic human appearance. A mesh based reconstruction framework is introduced to initialise and optimise the shape of a dynamic scene for view-dependent rendering, making use of silhouette and stereo data as complementary shape cues. The technique addresses two key problems: (1) robust shape reconstruction; and (2) accurate image correspondence for view dependent rendering in the presence of camera calibration error. We present results against ground truth data in synthetic test cases and for captured sequences of people in a studio. The framework demonstrates a higher resolution in rendering compared to shape from silhouette and multiple view stereo.

01 Jan 2003
TL;DR: An automatic method for specifying virtual camera locations in an uncalibrated setting that allows to move a virtual camera along a curve that is obtained starting from the epipolar geometry of the reference views.
Abstract: This paper presents a generic framework for novel view synthesis from two uncalibrated reference views that allows to move a virtual camera along a curve that is obtained starting from the epipolar geometry of the reference views. The scene is described by its relative affine structure from which novel views are extrapolated and interpolated. The main contribution of this paper is an automatic method for specifying virtual camera locations in an uncalibrated setting. Experiments with synthetic and real images illustrate the approach.

01 Jan 2003
TL;DR: An advanced framework for the invisible sequential view reconstruction, which includes discrete view database creating, single view input and model matching, view space projection and view morphing, is presented and can generate realistic face appearances and invisible views with limited prior information bypassing the building of 3D models.
Abstract: The reconstruction technique of the invisible views of human head belongs to the statistical technique for predicting unknown information.This paper researches on the sequential novel view synthesis or the 3D model computing from a single facial image.An advanced framework for the invisible sequential view reconstruction,which includes discrete view database creating,single view input and model matching,view space projection and view morphing,is presented.The techniques can generate realistic face appearances and invisible views with limited prior information bypassing the building of 3D models.The practical experiences for the AIAR face database creation under limited shooting angles and conditions are also presented at last to help implement the view reconstruction techniques.

Book Chapter
01 Jan 2003
TL;DR: An algorithm for rendering a novel view of object from two uncalibrated stereo images using a sparse set of features and utilizing singular value decomposition (SVD) for correspondence matching is presented.
Abstract: View synthesis has been an active area of research among the computer vision and computer graphics community due to its fast and less complicated rendering of novel views as compared to the conventional 3-D reconstruction-projection procedure. In this paper, we will present an algorithm for rendering a novel view of object from two uncalibrated stereo images using a sparse set of features and utilizing singular value decomposition (SVD) for correspondence matching. This algorithm has the advantage that no knowledge of the intrinsic and extrinsic camera parameters is required. The weak calibration which is represented by the epipolar geometry is sufficient for the generation of novel views. We will also present results of the algorithm on real and synthetic images.

Book ChapterDOI
01 Jan 2003
TL;DR: From a finite set of pictures of a real object, taken from different points of view, this work is able to generate any point of view object projection, to be inserted in the physical coherent location.
Abstract: Our research group, interested in outdoor scenes, has developed a methodology to register a CAD model of a city, with images taken with video cameras installed in a car, while driving city streets. So, we can merge video captured data and the render of elements of the model into one image, forcing real and virtual camera points of view, orientations and parameters to coincide. Consequently, we generate real-video sequences and can insert, in real time, virtual objects congruently placed in relation to real objects of the scene. With the aim of increasing realistic likeness of inserted objects, we propose an alternate method that uses view synthesis techniques instead of render techniques. From a finite set of pictures of a real object, taken from different points of view, we are able to generate any point of view object projection, to be inserted in the physical coherent location.

Book ChapterDOI
TL;DR: The level of realism of the virtual view is dependent on the camera set-up and the quality of the image analysis and synthesis processes, and a unique scalable and modular system solution is introduced.
Abstract: Image-based rendering systems are designed to render a virtual view of a scene based on a set of images and correspondences between these images. This approach is attractive as it does not require explicit scene reconstruction. In this paper we identify that the level of realism of the virtual view is dependent on the camera set-up and the quality of the image analysis and synthesis processes. We explain how wide-baseline convergent camera set-ups and virtual view independent approaches to surface selection have led to the development of very system specific solutions. We then introduce a unique scalable and modular system solution. This scalable system is configured using building blocks defined as SCABs. These provide design flexibility and improve the image analysis process. Virtual view creation is modular in such that we can add or remove SCABs based on our particular requirements without having to modify the view synthesis algorithm.

Proceedings ArticleDOI
Jong-Il Park1, Sang Hyo Han1, Um Gi Mun, Chung Hyun Ahn, Soo In Lee 
16 Jun 2003
TL;DR: This work attempts to control the baseline-stretch of stereoscopic camera by synthesizing virtual views at the desired location of interval between two cameras by obtaining a dense disparity map using a hierarchical stereo matching with the edge-adaptive shifted window.
Abstract: In stereoscopic television, there is a trade-off between visual comfort and 3D impact with respect to the baseline-stretch of 3D camera. It has been reported that an optimal condition can be reached when we set the baseline-stretch at about the distance of human pupils1. However, we cannot get such distance in case that the sizes of the lens and CCD module are big. In order to overcome this limitation, we attempt to control the baseline-stretch of stereoscopic camera by synthesizing virtual views at the desired location of interval between two cameras. Proposed technique is based on the stereo matching and view synthesis techniques. We first obtain a dense disparity map using a hierarchical stereo matching with the edge-adaptive shifted window. And then we synthesize virtual views using the disparity map. Simulation results with various stereoscopic images demonstrate the effectiveness of the proposed technique.© (2003) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Book ChapterDOI
29 Jun 2003
TL;DR: The geometry of the three-view rectification method is dealt with and a way for dodging the singularities in the position of the virtual camera is given to obtain a synthetic view from a previously forbidden point and automate the process towards fast software or hardware implementations.
Abstract: View synthesis requires the ability to estimate the image projected from a scene to a point of view where has not been placed a real camera. Many methods have been developed and three-view rectification is one of the most used, nevertheless it has some restrictions. That arises when the plane containing the focus of the three cameras involved in the process is parallel to the view direction of any of the cameras. This paper deals with the geometry of the method and gives analytically a way for dodging the singularities in the position of the virtual camera. That allows us to obtain a synthetic view from a previously forbidden point and automate the process towards fast software or hardware implementations.