scispace - formally typeset
Search or ask a question

Showing papers by "Jana Kosecka published in 2009"


Proceedings ArticleDOI
01 Sep 2009
TL;DR: This work proposes an integrated multi-view sensor fusion approach that combines information from multiple color cameras and multiple ToF depth sensors to obtain high quality dense and detailed 3D models of scenes challenging for stereo alone, while simultaneously reducing complex noise of ToF sensors.
Abstract: Multi-view stereo methods frequently fail to properly reconstruct 3D scene geometry if visible texture is sparse or the scene exhibits difficult self-occlusions Time-of-Flight (ToF) depth sensors can provide 3D information regardless of texture but with only limited resolution and accuracy To find an optimal reconstruction, we propose an integrated multi-view sensor fusion approach that combines information from multiple color cameras and multiple ToF depth sensors First, multi-view ToF sensor measurements are combined to obtain a coarse but complete model Then, the initial model is refined by means of a probabilistic multi-view fusion framework, optimizing over an energy function that aggregates ToF depth sensor information with multi-view stereo and silhouette constraints We obtain high quality dense and detailed 3D models of scenes challenging for stereo alone, while simultaneously reducing complex noise of ToF sensors

174 citations


Proceedings ArticleDOI
20 Jun 2009
TL;DR: This work demonstrates how to robustly estimate camera poses without a need for bundle adjustment and proposes a multi-view stereo method which operates directly on panoramas, while enforcing the piecewise planarity constraints in the sweeping stage.
Abstract: City environments often lack textured areas, contain repetitive structures, strong lighting changes and therefore are very difficult for standard 3D modeling pipelines We present a novel unified framework for creating 3D city models which overcomes these difficulties by exploiting image segmentation cues as well as presence of dominant scene orientations and piecewise planar structures Given panoramic street view sequences, we first demonstrate how to robustly estimate camera poses without a need for bundle adjustment and propose a multi-view stereo method which operates directly on panoramas, while enforcing the piecewise planarity constraints in the sweeping stage At last, we propose a new depth fusion method which exploits the constraints of urban environments and combines advantages of volumetric and viewpoint based fusion methods Our technique avoids expensive voxelization of space, operates directly on 3D reconstructed points through effective kd-tree representation, and obtains a final surface by tessellation of backprojections of those points into the reference image

159 citations


Proceedings ArticleDOI
01 Sep 2009
TL;DR: An extensive experimental validation of the global gist descriptor computed for portions of panoramic images and a simple similarity measure between two panoramas are presented, which is robust to changes in vehicle orientation, while traversing the same areas in different directions.
Abstract: In this paper we investigate large scale view based localization in urban areas using panoramic images. The presented approach utilizes global gist descriptor computed for portions of panoramic images and a simple similarity measure between two panoramas, which is robust to changes in vehicle orientation, while traversing the same areas in different directions. The global gist feature [14] has been demonstrated previously to be a very effective conventional image descriptor, capturing the basic structure of different types of scenes in a very compact way. We present an extensive experimental validation of our panoramic gist approach on a large scale Street View data set of panoramic images for place recognition or topological localization.

93 citations


Proceedings ArticleDOI
01 Sep 2009
TL;DR: The main novelty of this generative approach is the introduction of an explicit model of spatial co-occurrence of visual words associated with super-pixels and utilization of appearance, geometry and contextual cues in a probabilistic framework.
Abstract: We present a novel approach for image semantic segmentation of street scenes into coherent regions, while simultaneously categorizing each region as one of the predefined categories representing commonly encountered object and background classes We formulate the segmentation on small blob-based superpixels and exploit a visual vocabulary tree as an intermediate image representation The main novelty of this generative approach is the introduction of an explicit model of spatial co-occurrence of visual words associated with super-pixels and utilization of appearance, geometry and contextual cues in a probabilistic framework We demonstrate how individual cues contribute towards global segmentation accuracy and how their combination yields superior performance to the best known method on the challenging benchmark dataset which exhibits diversity of street scenes with varying viewpoints, large number of categories, captured in daylight and dusk

79 citations


Book ChapterDOI
11 May 2009
TL;DR: A system to use high- level reasoning to influence the selection of landmarks along a navigation path, and lower-level reasoning to select appropriate images of those landmarks to produce a more natural navigation plan and more understandable images in a fully automatic way is developed.
Abstract: Computer vision techniques can enhance landmark-based navigation by better utilizing online photo collections. We use spatial reasoning to compute camera poses, which are then registered to the world using GPS information extracted from the image tags. Computed camera pose is used to augment the images with navigational arrows that fit the environment. We develop a system to use high-level reasoning to influence the selection of landmarks along a navigation path, and lower-level reasoning to select appropriate images of those landmarks. We also utilize an image matching pipeline based on robust local descriptors to give users of the system the ability to capture an image and receive navigational instructions overlaid on their current context. These enhancements to our previous navigation system produce a more natural navigation plan and more understandable images in a fully automatic way.

77 citations


Proceedings ArticleDOI
01 Sep 2009
TL;DR: An optimal view-selection algorithm for selecting a small set of views for texture mapping that best describe the structure, while minimizing warping and stitching artifacts, and producing a consistent visual representation is proposed.
Abstract: We present a method for automatically constructing compact, photo-realistic architectural 3D models. This method uses simple 3D building outlines obtained from existing GIS databases to bootstrap reconstruction and works with both structured and unstructured image datasets. We propose an optimal view-selection algorithm for selecting a small set of views for texture mapping that best describe the structure, while minimizing warping and stitching artifacts, and producing a consistent visual representation. The proposed method is fully automatic and can process large structured datasets in close to real-time, making it suitable for large scale urban modeling and 3D map construction.

22 citations