scispace - formally typeset
Search or ask a question

Showing papers on "View synthesis published in 2007"


Journal ArticleDOI
TL;DR: This paper takes the first step towards constructing the surface layout, a labeling of the image intogeometric classes, to learn appearance-based models of these geometric classes, which coarsely describe the 3D scene orientation of each image region.
Abstract: Humans have an amazing ability to instantly grasp the overall 3D structure of a scene--ground orientation, relative positions of major landmarks, etc.--even from a single image. This ability is completely missing in most popular recognition algorithms, which pretend that the world is flat and/or view it through a patch-sized peephole. Yet it seems very likely that having a grasp of this "surface layout" of a scene should be of great assistance for many tasks, including recognition, navigation, and novel view synthesis. In this paper, we take the first step towards constructing the surface layout, a labeling of the image intogeometric classes. Our main insight is to learn appearance-based models of these geometric classes, which coarsely describe the 3D scene orientation of each image region. Our multiple segmentation framework provides robust spatial support, allowing a wide variety of cues (e.g., color, texture, and perspective) to contribute to the confidence in each geometric label. In experiments on a large set of outdoor images, we evaluate the impact of the individual cues and design choices in our algorithm. We further demonstrate the applicability of our method to indoor images, describe potential applications, and discuss extensions to a more complete notion of surface layout.

735 citations


Patent
09 Jan 2007
TL;DR: In this paper, a method for synthesizing a particular view of the multiview video is presented, in which each video is acquired by a corresponding camera arranged at a particular pose, and in which a view of each camera overlaps with the view of at least one other camera.
Abstract: A method processes a multiview videos of a scene, in which each video is acquired by a corresponding camera arranged at a particular pose, and in which a view of each camera overlaps with the view of at least one other camera. Side information for synthesizing a particular view of the multiview video is obtained in either an encoder or decoder. A synthesized multiview video is synthesized from the multiview videos and the side information. A reference picture list is maintained for each current frame of each of the multiview videos, the reference picture indexes temporal reference pictures and spatial reference pictures of the acquired multiview videos and the synthesized reference pictures of the synthesized multiview video. Each current frame of the multiview videos is predicted according to reference pictures indexed by the associated reference picture list with a skip mode and a direct mode, whereby the side information is inferred from the synthesized reference picture.

208 citations


Journal ArticleDOI
TL;DR: An object-based IBR system to illustrate the techniques involved and its potential application in view synthesis and processing are explained and Stereo matching, which is an important technique for depth estimation and view synthesis, is briefly explained and some of the top-ranked methods are highlighted.
Abstract: One of the most important applications in multiview imaging (MVI) is the development of advanced immersive viewing or visualization systems using, for instance, 3DTV. With the introduction of multiview TVs, it is expected that a new age of 3DTV systems will arrive in the near future. Image-based rendering (IBR) refers to a collection of techniques and representations that allow 3-D scenes and objects to be visualized in a realistic way without full 3-D model reconstruction. IBR uses images as the primary substrate. The potential for photorealistic visualization has tremendous appeal, and it has been receiving increasing attention over the years. Applications such as video games, virtual travel, and E-commerce stand to benefit from this technology. This article serves as a tutorial introduction and brief review of this important technology. First the classification, principles, and key research issues of IBR are discussed. Then, an object-based IBR system to illustrate the techniques involved and its potential application in view synthesis and processing are explained. Stereo matching, which is an important technique for depth estimation and view synthesis, is briefly explained and some of the top-ranked methods are highlighted. Finally, the challenging problem of interactive IBR is explained. Possible solutions and some state-of-the-art systems are also reviewed.

150 citations


01 Jan 2007
TL;DR: A new algorithm is proposed for efficient stereo and novel view synthesis that synthesises images from a virtual camera in arbitrary position near the physical cameras based on an improved, dynamic-programming, stereo algorithm for efficient novel view generation.
Abstract: A new algorithm is proposed for efficient stereo and novel view synthesis. Given the video streams acquired by two synchronized cameras the proposed algorithm synthesises images from a virtual camera in arbitrary position near the physical cameras. The new technique is based on an improved, dynamic-programming, stereo algorithm for efficient novel view generation. The two main contributions of this paper are: i) a new four state matching graph for dense stereo dynamic programming, that supports accurate occlusion labelling; ii) a compact geometric derivation for novel view synthesis by direct projection of the minimum cost surface. Furthermore, the paper presents an algorithm for the temporal maintenance of a background model to enhance the rendering of occlusions and reduce temporal artefacts (flicker); and a cost aggregation algorithm that acts directly in the three-dimensional matching cost space. The proposed algorithm has been designed to work with input images with large disparity range, a common practical situation. The enhanced occlusion handling capabilities of the new dynamic programming algorithm are evaluated against those of the most powerful state-of-the-art dynamic programming and graph-cut techniques. Four-state DP is also evaluated against the disparity-based Middlebury error metrics and its performance found to be amongst the best of the efficient algorithms. A number of examples demonstrate the robustness of four-state DP to artefacts in stereo video streams. This includes demonstrations of cyclopean view synthesis in extended conversational sequences, synthesis from a freely translating virtual camera and, finally, basic 3D scene editing.

118 citations


Journal ArticleDOI
TL;DR: In this article, a new four state matching graph for dense stereo dynamic programming is proposed to support accurate occlusion labelling and a compact geometric derivation for novel view synthesis by direct projection of the minimum cost surface.
Abstract: A new algorithm is proposed for efficient stereo and novel view synthesis. Given the video streams acquired by two synchronized cameras the proposed algorithm synthesises images from a virtual camera in arbitrary position near the physical cameras. The new technique is based on an improved, dynamic-programming, stereo algorithm for efficient novel view generation. The two main contributions of this paper are: (i) a new four state matching graph for dense stereo dynamic programming, that supports accurate occlusion labelling; (ii) a compact geometric derivation for novel view synthesis by direct projection of the minimum cost surface. Furthermore, the paper presents an algorithm for the temporal maintenance of a background model to enhance the rendering of occlusions and reduce temporal artefacts (flicker); and a cost aggregation algorithm that acts directly in the three-dimensional matching cost space. The proposed algorithm has been designed to work with input images with large disparity range, a common practical situation. The enhanced occlusion handling capabilities of the new dynamic programming algorithm are evaluated against those of the most powerful state-of-the-art dynamic programming and graph-cut techniques. Four-state DP is also evaluated against the disparity-based Middlebury error metrics and its performance found to be amongst the best of the efficient algorithms. A number of examples demonstrate the robustness of four-state DP to artefacts in stereo video streams. This includes demonstrations of cyclopean view synthesis in extended conversational sequences, synthesis from a freely translating virtual camera and, finally, basic 3D scene editing.

106 citations


Journal ArticleDOI
TL;DR: This paper presents a novel method for synthesizing a novel view from two sets of differently focused images taken by an aperture camera array for a scene consisting of two approximately constant depths.
Abstract: This paper presents a novel method for synthesizing a novel view from two sets of differently focused images taken by an aperture camera array for a scene consisting of two approximately constant depths. The proposed method consists of two steps. The first step is a view interpolation to reconstruct an all-in-focus dense light field of the scene. The second step is to synthesize a novel view by a light-field rendering technique from the reconstructed dense light field. The view interpolation in the first step can be achieved simply by linear filters that are designed to shift different object regions separately, without region segmentation. The proposed method can effectively create a dense array of pin-hole cameras (i.e., all-in-focus images), so that the novel view can be synthesized with better quality

92 citations


Journal ArticleDOI
TL;DR: This paper considers the problem of reconstructing visually realistic 3D models of dynamic semitransparent scenes, such as fire, from a very small set of simultaneous views, and reduces reconstruction to a convex combination of sheet-like density fields, each of which is derived from the density sheet of two input views.
Abstract: This paper considers the problem of reconstructing visually realistic 3D models of dynamic semitransparent scenes, such as fire, from a very small set of simultaneous views (even two). We show that this problem is equivalent to a severely underconstrained computerized tomography problem, for which traditional methods break down. Our approach is based on the observation that every pair of photographs of a semitransparent scene defines a unique density field, called a density sheet, that 1) concentrates all its density on one connected, semitransparent surface, 2) reproduces the two photos exactly, and 3) is the most spatially compact density field that does so. From this observation, we reduce reconstruction to the convex combination of sheet-like density fields, each of which is derived from the density sheet of two input views. We have applied this method specifically to the problem of reconstructing 3D models of fire. Experimental results suggest that this method enables high-quality view synthesis without overfitting artifacts

68 citations


Journal ArticleDOI
TL;DR: A novel method for virtual view synthesis that allows viewers to virtually fly through real soccer scenes, which are captured by multiple cameras in a stadium, by view interpolation of real camera images near the chosen viewpoints.
Abstract: This paper presents a novel method for virtual view synthesis that allows viewers to virtually fly through real soccer scenes, which are captured by multiple cameras in a stadium. The proposed method generates images of arbitrary viewpoints by view interpolation of real camera images near the chosen viewpoints. In this method, cameras do not need to be strongly calibrated since projective geometry between cameras is employed for the interpolation. For avoiding the complex and unreliable process of 3-D recovery, object scenes are segmented into several regions according to the geometric property of the scene. Dense correspondence between real views, which is necessary for intermediate view generation, is automatically obtained by applying projective geometry to each region. By superimposing intermediate images for all regions, virtual views for the entire soccer scene are generated. The efforts for camera calibration are reduced and correspondence matching requires no manual operation; hence, the proposed method can be easily applied to dynamic events in a large space. An application for fly-through observations of soccer match replays is introduced along with the algorithm of view synthesis and experimental results. This is a new approach for providing arbitrary views of an entire dynamic event.

64 citations


Proceedings ArticleDOI
01 Jan 2007
TL;DR: It is shown that application of modern multiview stereo techniques to the newview synthesis (NVS) problem introduces a number of non-trivial complexities, and a synthesis of the two approaches which yields good results on difficult image sequences is addressed.
Abstract: We show that application of modern multiview stereo techniques to the newview synthesis (NVS) problem introduces a number of non-trivial complexities. By simultaneously solving for the colour and depth of the new-view pixels we can eliminate the visual artefacts that conventional NVS-via-stereo suffers. The global occlusion reasoning which has led to considerable improvements in recent stereo algorithms can easily be included in the new algorithm, using a recently improved graph-cut-based optimizer for general multi-label conditional random fields (CRFs). However, the CRF priors that are important to success in stereo cannot be easily applied if the reconstruction is to be computed in the reference frame of the novel view. We address this problem by extending recent work on the fast optimization of texture priors in NVS to model the image edge structure, yielding a synthesis of the two approaches which yields good results on difficult image sequences.

39 citations


Proceedings ArticleDOI
07 May 2007
TL;DR: This work presents several improvements to the reference block-based depth estimation approach and demonstrates that the proposed method of depth estimation is not only efficient for view synthesis prediction, but also produces depth maps that require much fewer bits to code.
Abstract: The compression of multiview video in an end-to-end 3D system is required to reduce the amount of visual information. Since multiple cameras usually have a common field of view, high compression ratios can be achieved if both the temporal and inter-view redundancy are exploited. View synthesis prediction is a new coding tool for multiview video that essentially generates virtual views of a scene using images from neighboring cameras and estimated depth values. In this work, we consider depth estimation for view synthesis in multiview video encoding. We focus on generating smooth and accurate depth maps, which can be efficiently coded. We present several improvements to the reference block-based depth estimation approach and demonstrate that the proposed method of depth estimation is not only efficient for view synthesis prediction, but also produces depth maps that require much fewer bits to code.

33 citations


Proceedings ArticleDOI
12 Nov 2007
TL;DR: A new approach for realistic stereo view synthesis (RSVS) of existing 2D video material is presented, based on structure from motion techniques and uses image-based rendering to reconstruct the desired stereo views for each video frame.
Abstract: In the past years, the 3D display technology has become a booming branch of research with fast technical progress. Hence, the 3D conversion of already existing 2D video material increases more and more in popularity. In this paper, a new approach for realistic stereo view synthesis (RSVS) of existing 2D video material is presented. The intention of our work is not a real-time conversion of existing video material with a deduction in stereo perception, but rather a more realistic off-line conversion with high accuracy. Our approach is based on structure from motion techniques and uses image-based rendering to reconstruct the desired stereo views for each video frame. The algorithm is tested on several TV broadcast videos, as well as on sequences captured with a single handheld camera. Finally, some simulation results will show the remarkable performance of this approach.

Proceedings ArticleDOI
21 Aug 2007
TL;DR: A new approach for generation of super-resolution stereoscopic and multi-view video from monocular video, an extension of the realistic stereo view synthesis (RSVS) approach which is based on structure from motion techniques and image-based rendering to generate the desired stereoscopic views for each point in time.
Abstract: This paper presents a new approach for generation of super-resolution stereoscopic and multi-view video from monocular video. Such multi-view video is used for instance with multi-user 3D displays or auto-stereoscopic displays with head-tracking to create a depth impression of the observed scenery. Our approach is an extension of the realistic stereo view synthesis (RSVS) approach which is based on structure from motion techniques and image-based rendering to generate the desired stereoscopic views for each point in time. The extension relies on an additional super- resolution mode which utilizes a number of frames of the original video sequence to generate a virtual stereo frame with higher resolution. The algorithm is tested on several TV broadcast videos, as well as on sequences captured with a single handheld camera and sequences from the well known BBC documentation "Planet Earth". Finally, some simulation results will show that RSVS is quite suitable for super-resolution 2D-3D conversion.

Book ChapterDOI
18 Nov 2007
TL;DR: This work addresses the problem of super resolved generation of novel views of a 3D scene with the reference images obtained from cameras in general positions with a reconstruction based approach using MRF-MAP formalism and solves using graph cut optimization.
Abstract: We address the problem of super resolved generation of novel views of a 3D scene with the reference images obtained from cameras in general positions; a problem which has not been tackled before in the context of super resolution and is also of importance to the field of image based rendering. We formulate the problem as one of estimation of the color at each pixel in the high resolution novel view without explicit and accurate depth recovery. We employ a reconstruction based approach using MRF-MAP formalism and solve using graph cut optimization. We also give an effective method to handle occlusion. We present compelling results on real images.

Journal ArticleDOI
TL;DR: This paper proposes an automatic method for specifying the virtual camera position and orientation in an uncalibrated setting, based on the interpolation and extrapolation of the motion among the reference views.
Abstract: This paper deals with the views synthesis problem and proposes an automatic method for specifying the virtual camera position and orientation in an uncalibrated setting, based on the interpolation and extrapolation of the motion among the reference views. Novel images can be rendered from virtual cameras moving on parametric trajectories. Synthetic and real experiments illustrate the approach

Proceedings ArticleDOI
Sehoon Yea1, Anthony Vetro1
12 Nov 2007
TL;DR: A rate-distortion optimized framework that incorporates view synthesis for improved prediction in multiview video coding and employs variable block-size depth/motion search, optimal mode decision including view synthesis prediction, and CABAC encoding of depth and correction vectors is proposed.
Abstract: We propose a rate-distortion optimized framework that incorporates view synthesis for improved prediction in multiview video coding. In the proposed scheme, block-based depth and correction vectors are encoded and used at the decoder to generate the view synthesis prediction data. The proposed method employs variable block-size depth/motion search, optimal mode decision including view synthesis prediction, and CABAC encoding of depth and correction vectors. A sub-pixel reference matching technique is also introduced to improve prediction accuracy of the view synthesis prediction. Novel variants of the skip and direct modes are presented, which infer the depth and correction vector information from neighboring blocks in a synthesized reference picture to reduce the bits needed for the view synthesis prediction mode. Experimental results demonstrate improved coding efficiency with the proposed techniques.

Proceedings ArticleDOI
01 Oct 2007
TL;DR: This paper presents an efficient image-based rendering system capable of performing online stereo matching and view synthesis at high speed, completely on the graphics processing unit (GPU).
Abstract: This paper presents an efficient image-based rendering system capable of performing online stereo matching and view synthesis at high speed, completely on the graphics processing unit (GPU). Given two rectified stereo images, our algorithm first extracts the disparity map with a stream-centric dense depth estimation approach. For high-quality view synthesis, multi-label masks are then automatically generated to postprocess occlusions and ambiguously estimated regions adaptively. To allow even faster interactive view generation, an alternative forward warping method is also integrated. The experiments show that photorealistic intermediate views of high image quality are yielded by our algorithm. The optimized implementation also provides the state-of-the-art stereo analysis and view synthesis speed, achieving over 47 fps with 450x375 stereo images and 60 disparity levels on an Nvidia GeForce 7900 graphics card.

Proceedings ArticleDOI
12 Nov 2007
TL;DR: This work proposes a method for delivering error-resilient video from wireless camera networks in a distributed fashion over lossy channels based on distributed source coding that exploits inter-view correlation among cameras with overlapping views.
Abstract: We propose a method for delivering error-resilient video from wireless camera networks in a distributed fashion over lossy channels. Our scheme is based on distributed source coding that exploits inter-view correlation among cameras with overlapping views. The main focus in this work is on robustness which is imminently needed in a wireless setting. The proposed approach has low encoding complexity, is robust while satisfying tight latency constraints, and requires no inter-camera communication. Our system is built on and is a multi-camera extension of PRISM[1], an earlier proposed single-camera distributed video compression system. Decoder motion search, a key attribute of single-camera PRISM, is extended to the multi-view setting by using estimated scene depth information when it is available. In particular, dense stereo correspondence and view synthesis are utilized to generate side-information. When combined with decoder motion search, our proposed method can be made insensitive to small errors in camera calibration, disparity estimation and view synthesis. In experiments over a simulated wireless channel, the proposed approach achieves up to 2.1 dB gain in PSNR over a system using H.263+ with forward error correction.

01 Jan 2007
TL;DR: 3DTV and FTV are some of the most important applications of MVI and are new types of media that expand the user experience beyond what is offered by traditional media.
Abstract: Multi-view imaging (MVI) has attracted increasing attention recently, thanks to the rapidly droppingcost of digital cameras. This opens a wide variety of interesting new research topics and applica-tions, such as virtual view synthesis, high-performance imaging, image/video segmentation, object track-ing/recognition, environmental surveillance, remote education, industrial inspection, 3DTV, and FreeViewpoint TV (FTV) [9], [10]. While some of these tasks can be handled with conventional singleview images/video, the availability of multiple views of the scene significantly broadens the field ofapplications, while enhancing performance and user experience.3DTV and FTV are some of the most important applications of MVI and are new types of media thatexpand the user experience beyond what is offered by traditional media. They have been developed bythe convergence of new technologies from computer graphics, computer vision, multimedia, and relatedfields. 3DTV, also referred to as stereo TV, offers a 3D depth impression of the observed scene, whileFTV allows for an interactive selection of viewpoint and direction within a certain operating range. 3DTVand FTV are not mutually exclusive. On the contrary, they can be very well combined within a singlesystem, since they are both based on a suitable 3D scene representation. In other words, given a 3Drepresentation of a scene, if a stereo pair of images corresponding to the human eyes can be rendered,the functionality of 3DTV is provided. If a virtual view (i.e., not an actual camera view) correspondingto an arbitrary viewpoint and viewing direction can be rendered, the functionality of FTV is provided.As seen in the movie The Matrix, successive switching of multiple real images captured at differentangles can give the sensation of a flying viewpoint. In a similar way, Eye Vision [11] realized a flyingvirtual camera for a scene in a Super Bowl game. It used 33 cameras arranged around the stadium andcontrolled the camera directions mechanically to track the target scene. In these systems, however, nonew virtual images are generated, and the movement of the viewpoint is limited to the predefined original

Journal ArticleDOI
TL;DR: An algorithm for efficient depth calculations and view synthesis that applies a min-cut/max-flow algorithm on a graph, implemented on the CPU, to ameliorate this result by a global optimisation.

Journal ArticleDOI
TL;DR: A mixed reality presentation system for a soccer match is constructed that can overlay soccer scenes captured with multiple cameras in a soccer stadium onto a desktop field model with HMD to demonstrate the utility of the proposed method of free viewpoint video synthesis.
Abstract: This paper presents a new framework for observation of sporting events using HMD by applying the method of free viewpoint video synthesis for dynamic events to mixed reality. According to the viewpoint position of an observer, a virtual view image is generated by view interpolation among multiple sporting videos captured in a stadium, and then overlaid onto the real world via HMD. This makes the observer feel as if the match is played in front of his/her eyes. The proposed method performs virtual view synthesis and geometric registration between the real world and the virtual view images using projective geometry between cameras, which can be estimated from correspondence of natural feature points. It does not require calibration of multiple cameras imaging the sporting match and HMD camera capturing the real world. In this paper, we have constructed a mixed reality presentation system for a soccer match in order to demonstrate the utility of the proposed method. This system can overlay soccer scenes captured with multiple cameras in a soccer stadium onto a desktop field model with HMD. © 2006 Wiley Periodicals, Inc. Electron Comm Jpn Pt 3, 90(2): 40–49, 2007; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjc.20311

Proceedings ArticleDOI
27 May 2007
TL;DR: It is demonstrated by experimental results that the proposed algorithm offers better virtual view quality at a much lower complexity than existing methods.
Abstract: A new scheme for disparity vector (DV) estimation and virtual view synthesis to generate 3D video display from a pair of stereo video inputs is investigated in this work. Two performance metrics are considered for the algorithmic evaluation; i.e. quality and complexity. To enhance the overall performance, a two-stage algorithm for accurate and fast DV estimation and occlusion handling is first presented. Then, a preprocessing algorithm and a synthesis method are described. The proposed preprocessing algorithm can remove false matched regions for DV refinement effectively. The new synthesis method can reduce blurring and ghostly effects greatly. It is demonstrated by experimental results that the proposed algorithm offers better virtual view quality at a much lower complexity than existing methods.

Proceedings ArticleDOI
05 Aug 2007
TL;DR: A system to visualize urban structure that is a function of the time selected, thereby allowing virtual navigation in space and time, and a 4D view synthesis technique for rendering large-scale 3D structures evolving in time, given a sparse sample of historical images.
Abstract: In this sketch, we present a 4D view synthesis technique for rendering large-scale 3D structures evolving in time, given a sparse sample of historical images. We built a system to visualize urban structure that is a function of the time selected, thereby allowing virtual navigation in space and time. While there is a rich literature on image-based rendering of static 3D environments, e.g., the Facade system [Debevec et al. 1996] and Photo Tourism [Snavely et al. 2006], little has been done to address the temporal aspect (e.g., occlusion due to temporal change). We construct time-dependent geometryto handle the sparse sampling. To render, we use time-andview-dependent texture mapping and reason about visibility both in time and space. Figure 1 shows the result of view synthesis based on time-dependent geometry.

Proceedings ArticleDOI
10 Sep 2007
TL;DR: A complete pipeline that, starting with uncalibrated images, produces a virtual sequence with viewpoint control that is based on the relative affine structure is described.
Abstract: This paper deals with the process of view synthesis based on the relative affine structure. It describes a complete pipeline that, starting with uncalibrated images, produces a virtual sequence with viewpoint control. Experiments illustrate the approach.

Proceedings ArticleDOI
01 Jan 2007

Journal ArticleDOI
TL;DR: The essence of the method is to perform necessary depth estimation up to the level required by the minimal joint image-geometry sampling rate using off-the-shelf graphics hardware, so that real-time anti-aliased light field rendering is achieved even if the image samples are insufficient.
Abstract: It is known that the pure light field approach for view synthesis relies on a large number of image samples to produce anti-aliased renderings. Otherwise, the insufficiency of image sampling needs to be compensated for by geometry sampling. Currently, geometry estimation is done either offline or using dedicated hardware. Our solution to this dilemma is based on three key ideas: a formal analysis of the equivalency between light field rendering and plane-based warping, multi focus imaging in a multi camera system by plane sweeping, and the fusion of the multi focus image using multi view stereo. The essence of our method is to perform necessary depth estimation up to the level required by the minimal joint image-geometry sampling rate using off-the-shelf graphics hardware. As a result, real-time anti-aliased light field rendering is achieved even if the image samples are insufficient.

Proceedings ArticleDOI
02 Jul 2007
TL;DR: Experimental results show that the newly developed algorithm can improve image quality of synthesized virtual views with a PSNR gain of up to 0.65 dB.
Abstract: A framework for virtual view synthesis based on multiple images is presented in this paper. Compared to conventional view synthesis based on stereoscopic image pairs, a postprocessing algorithm for disparity refinement is added to exploit information contained in multiple images captured with a multi-view camera configuration. The principle for disparity refinement is examined, leading to the development of a novel algorithm. Experimental results show that the newly developed algorithm can improve image quality of synthesized virtual views with a PSNR gain of up to 0.65 dB.

01 Jan 2007
TL;DR: This thesis has implemented a statistical model combining distance in feature space (DIPS) and distance from features space (DFFS) for a pair of poses and model the relationship between the poses using a Bayesian network, which more accurately predicts small and localized features.
Abstract: Face view synthesis involves using one view of a face to artificially render another view. It is an interesting problem in computer vision and computer graphics, and can be applied in the entertainment industry for animated movies and video games. The fact that the input is only a single image, makes the problem very difficult. Previous approaches learn a linear model on pair of poses from 2D training data and then predict the unknown pose in the test example. Such 2D approaches are much more practical than approaches requiring 3D data and more computationally efficient. However they perform inadequately when dealing with large angles between poses. In this thesis, we seek to improve performance through better choices in probabilistic modeling. As a first step, we have implemented a statistical model combining distance in feature space (DIPS) and distance from feature space (DFFS) for such pair of poses. Such a representation leads to better performance. As a second step, we model the relationship between the poses using a Bayesian network. This representation takes advantage of the sparse statistical structure of faces. In particular, we have observed that a given pixel is often statistically correlated with only a small number of other pixel variables. The Bayesian network provides a concise representation for this behavior reducing the susceptibility to over-fitting. Compared with the linear method, the Bayesian network more accurately predicts small and localized features.

DissertationDOI
01 Jan 2007
TL;DR: In this paper, the authors propose a method to solve the problem of plagiarism in the field of bioinformatics.x Acknowledgements xii and Xiii.x
Abstract: x Acknowledgements xii

Journal Article
TL;DR: In order to synthesize high quality intermediate views, a new stereo matching algorithm based on adaptive weight is proposed, and the disparity smoothness constraint term is introduced in matching cost function.
Abstract: In order to synthesize high quality intermediate views,a new stereo matching algorithm based on adaptive weight is proposed,and the disparity smoothness constraint term is introduced in matching cost function.After obtaining the reliable and dense disparity map,an intermediate view is synthesized by searching corresponding pixel of intermediate view image in the left and right image.Experimental results show that the proposed algorithm can provide good disparity map and obtain the intermediate views with high quality.

01 Jan 2007
TL;DR: In this work, a novel motion segmentation algorithm from a video sequence in general motion is developed, based on differential properties in the spatio-temporal domain, and a differential occlusion detector is presented, which detects corner-like features that are indicative of motion boundaries.
Abstract: In this work I investigate spatio-temporal information in a video sequence. The advantage of considering a video sequence as a 3D spatio-temporal function with temporal continuity (rather than merely a discrete collection of 2D images) is demonstrated by two co mputer vision techniques which I have developed. View Synthesis: Each frame of the video sequence is an intersection of the spa tio-temporal video volume with a spatial plane. When a video sequence conf orms to certain geometrical constraints, intersecting the video volume with other planes or surfaces can be used to easily produce new views of the scene. This powerful view synthesis technique is base d solely on captured data and does not require scene reconstruction, as the constraint on the inpu t camera motion make it invariant to the scene structure in some respects. This technique is demonst rated with real sequences, giving visually appealing results. The technique gives rise to a novel projection model, the Crossed-Slits projection , that can be seen as a generalization of the perspective projection and sever al other models. A Crossed-Slits camera is defined by two lines which all rays must intersect. Here I st udy this new projection model and its epipolar geometry, which are shown to be quadratic equivale nts of the perspective model. Crossed-Slits images are not perspective, and thus they app ear distorted. These distortions are studied, and two frameworks are developed for handling them : First, assuming that a coarse approximation of the scene structure is known (which is used to crea te a realtime omnidirectional virtual environment); Second, without any knowledge about the scen e, based only on the set of rays. In both cases distortion is reduced by approximating the perspecti ve projection. v The work on view synthesis and the Crossed-Slits projection , presented in Chapter 3 and 4, is based on work published in [1–6]. Motion Segmentation: Analysis of anunconstrainedvideo sequence in general motion reveals a highly regular spatio-temporal structure, where moving o bjects appear as continuous structures in the temporal domain, broken by occlusion. Based on this observa tion, I developed a novel motion segmentation algorithm from a video sequence in general motion, wh ich is based on differential properties in the spatio-temporal domain. I present a differential occlusion detector, which detects corner-like features that are indicative of motion boundaries. Segmentation is achieved by integratin g the response of this detector in scale space. The algorithm is shown to give good results on real sequences tak n in general motion. Experiments with synthetic data show robustness to high levels of noise a nd illumination changes; the experiments also include cases where no intensity edge exists at the loca tion of the motion boundary, or when no parametric motion model can describe the data Next I describe two algorithms to determine depth ordering f rom twoand three-frame sequences based on observations about the scale space characteristic s of the motion boundary. An interesting property of this method is its ability compute depth orderin g from only two frames, even when no edge can be detected in a single frame. Finally, experiments show that people, like my algorithm, c an ompute depth ordering from only two frames, even when the boundary between the layers is not v isible in a single frame. The work on motion segmentation and depth ordering, present d i Chapter 5, is based on [7,8].