scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A novel variety-based 3DTV content generation scheme for casually captured sparse photo collections

TL;DR: A novel parameterized variety-based 3D exploration model is presented to comprehend the sparse unstructured collection of photographs, and automatically plan virtual 3D tours of the world's landmarks through interesting viewpoints without explicit 3D reconstruction.
Abstract: This paper presents a novel parameterized variety-based 3D exploration model to comprehend the sparse unstructured collection of photographs, and automatically plan virtual 3D tours of the world's landmarks through interesting viewpoints without explicit 3D reconstruction. The proposed system analyzes the collection of unstructured but related image data containing the same location or environment to create a parameterized scene graph: a data structure that conveys spatial relations and enable smooth virtual navigation between photos. A novel statistical-heuristic criteria is evolved exploiting the scene spatial layout and appearance to automatically identify best available portals between photographs. Once well connected, the graph is parameterized and consistently rendered choosing visually compelling 3D transition paths, maintaining a pleasing essence of parallax. The system's ability is demonstrated on several casually captured personal photo collections of heritage sites and imagery gathered from “Flickr” data.
Citations
More filters
01 Jan 2004
TL;DR: Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance and describes numerous important application areas such as image based rendering and digital libraries.
Abstract: From the Publisher: The accessible presentation of this book gives both a general view of the entire computer vision enterprise and also offers sufficient detail to be able to build useful applications. Users learn techniques that have proven to be useful by first-hand experience and a wide range of mathematical methods. A CD-ROM with every copy of the text contains source code for programming practice, color images, and illustrative movies. Comprehensive and up-to-date, this book includes essential topics that either reflect practical significance or are of theoretical importance. Topics are discussed in substantial and increasing depth. Application surveys describe numerous important application areas such as image based rendering and digital libraries. Many important algorithms broken down and illustrated in pseudo code. Appropriate for use by engineers as a comprehensive reference to the computer vision enterprise.

3,627 citations

References
More filters
Proceedings ArticleDOI
13 Jun 2010
TL;DR: An approach for enabling existing multi-view stereo methods to operate on extremely large unstructured photo collections to decompose the collection into a set of overlapping sets of photos that can be processed in parallel, and to merge the resulting reconstructions.
Abstract: This paper introduces an approach for enabling existing multi-view stereo methods to operate on extremely large unstructured photo collections. The main idea is to decompose the collection into a set of overlapping sets of photos that can be processed in parallel, and to merge the resulting reconstructions. This overlapping clustering problem is formulated as a constrained optimization and solved iteratively. The merging algorithm, designed to be parallel and out-of-core, incorporates robust filtering steps to eliminate low-quality reconstructions and enforce global visibility constraints. The approach has been tested on several large datasets downloaded from Flickr.com, including one with over ten thousand images, yielding a 3D reconstruction with nearly thirty million points.

817 citations


"A novel variety-based 3DTV content ..." refers background in this paper

  • ...The structure from motion (SfM) or multi-view stereo methods [3, 4] require significant computational effort to recover 3D geometry of landmarks if a sufficiently dense set of photos are available....

    [...]

Journal ArticleDOI
01 Jul 2005
TL;DR: This paper presents a fully automatic method for creating a 3D model from a single photograph made up of several texture-mapped planar billboards and has the complexity of a typical children's pop-up book illustration.
Abstract: This paper presents a fully automatic method for creating a 3D model from a single photograph. The model is made up of several texture-mapped planar billboards and has the complexity of a typical children's pop-up book illustration. Our main insight is that instead of attempting to recover precise geometry, we statistically model geometric classes defined by their orientations in the scene. Our algorithm labels regions of the input image into coarse categories: "ground", "sky", and "vertical". These labels are then used to "cut and fold" the image into a pop-up model using a set of simple assumptions. Because of the inherent ambiguity of the problem and the statistical nature of the approach, the algorithm is not expected to work on every image. However. it performs surprisingly well for a wide range of scenes taken from a typical person's photo album.

730 citations


"A novel variety-based 3DTV content ..." refers methods in this paper

  • ...A novel strategy is adopted by recovering the coarse knowledge of spatial layout, appearance, and structure of a scene using learning-based classification methodology [14]....

    [...]

Book ChapterDOI
15 Apr 1996
TL;DR: This work presents a geometric relationship between the image motion of pairs of points over multiple frames based on the parallax displacements of points with respect to an arbitrary planar surface, and shows applications to solving three important problems in 3D scene analysis.
Abstract: We present a geometric relationship between the image motion of pairs of points over multiple frames. This relationship is based on the parallax displacements of points with respect to an arbitrary planar surface, and does not involve epipolar geometry. A constraint is derived over two frames for any pair of points, relating their projective structure (with respect to the plane) based only on their image coordinates and their parallax displacements. Similarly, a 3D-rigidity constraint between pairs of points over multiple frames is derived. We show applications of these parallax-based constraints to solving three important problems in 3D scene analysis: (i) the recovery of 3D scene structure, (ii) the detection of moving objects in the presence of camera induced motion, and (iii) the synthesis of new camera views based on a given set of views. Moreover, we show that this approach can handle difficult situations for 3D scene analysis, e.g., where there is only a small set of parallax vectors, and in the presence of independently moving objects.

112 citations


"A novel variety-based 3DTV content ..." refers background in this paper

  • ...A parallax displacement of reference points with respect to each intermediate image plane is finally performed to generate novel stereo pairs [10]....

    [...]

01 Jan 1997
TL;DR: This thesis addresses the problem of synthesizing images of real scenes under threedimensional transformations in viewpoint and appearance and proves that all views on the line segment between the two camera centers are uniquely determined from two uncalibrated views of a scene.
Abstract: This thesis addresses the problem of synthesizing images of real scenes under threedimensional transformations in viewpoint and appearance. Solving this problem enables interactive viewing of remote scenes on a computer, in which a user can move a virtual camera through the environment and virtually paint or sculpt objects in the scene. It is demonstrated that a variety of three-dimensional scene transformations can be rendered on a video display device by applying simple transformations to a set of basis images of the scene. The virtue of these transformations is that they operate directly on images and recover only the scene information that is required in order to accomplish the desired e ect. Consequently, they are applicable in situations where accurate three-dimensional models are di cult or impossible to obtain. A central topic is the problem of view synthesis, i.e., rendering images of a real scene from di erent camera viewpoints by processing a set of basis images. Towards this end, two algorithms are described that warp and resample pixels in a set of basis images to produce new images that are physically-valid, i.e., they correspond to what a real camera would see from the speci ed viewpoints. Techniques for synthesizing other types of transformations, e.g., non-rigid shape and color transformations, are also discussed. The techniques are found to perform well on a wide variety of real and synthetic images. A basic question is uniqueness, i.e., for which views is the appearance of the scene uniquely determined from the information present in the basis views. An important contribution is a uniqueness result for the no-occlusion case, which proves that all views on the line segment between the two camera centers are uniquely determined from two uncalibrated views of a scene. Importantly, neither dense pixel correspondence nor camera information is needed. From this result, a view morphing algorithm is derived that produces high quality viewpoint and shape transformations from two uncalibrated images.

62 citations

Journal ArticleDOI
01 Jul 2012
TL;DR: A system that analyzes collections of unstructured but related video data to create a Videoscape, a data structure that enables interactive exploration of video collections by visually navigating -- spatially and/or temporally -- between different clips, is proposed.
Abstract: The abundance of mobile devices and digital cameras with video capture makes it easy to obtain large collections of video clips that contain the same location, environment, or event. However, such an unstructured collection is difficult to comprehend and explore. We propose a system that analyzes collections of unstructured but related video data to create a Videoscape: a data structure that enables interactive exploration of video collections by visually navigating -- spatially and/or temporally -- between different clips. We automatically identify transition opportunities, or portals. From these portals, we construct the Videoscape, a graph whose edges are video clips and whose nodes are portals between clips. Now structured, the videos can be interactively explored by walking the graph or by geographic map. Given this system, we gauge preference for different video transition styles in a user study, and generate heuristics that automatically choose an appropriate transition style. We evaluate our system using three further user studies, which allows us to conclude that Videoscapes provides significant benefits over related methods. Our system leads to previously unseen ways of interactive spatio-temporal exploration of casually captured videos, and we demonstrate this on several video collections.

45 citations