scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Automated conversion of 2D to 3D image using manifold learning

TL;DR: Automatic technique of 2D to 3D image conversion is proposed using manifold learning and sequential labeling which generates very reliable and accurate 3D depth maps that are very close to ground truth depths.
Abstract: Automatic technique of 2D to 3D image conversion is proposed using manifold learning and sequential labeling which generates very reliable and accurate 3D depth maps that are very close to ground truth depths. In paper, LLE which is a non linear and neighborhood preserving embedding algorithm is used for depth estimation of a 2D image. And then, fixed point supervised learning algorithm is applied to construct consistent and smooth 3D output. The high dimensional data points or pixels of the input frames can be represented by a linear combination of its nearest neighbors and a lower dimensional point is reconstructed while preserving the local and geometric properties of the frames. The neighbors are assigned to each input point in the image data set and their weight vectors are computed that best linearly reconstruct the input point from its neighbors. To get the depth value of input point in new image, the reconstruction weights of its closest neighbors in training samples are multiplied with their corresponding ground truth depth values. The fixed point learning algorithm takes depths from manifold and other image features as input vectors and generates more consistent and accurate depth images for better 3D conversion.
Citations
More filters
Proceedings ArticleDOI
18 Dec 2016
TL;DR: The predicted depth maps are reliable, accurate and very close to ground truth depths which is validated using objective measures: RMSE, PSNR, SSIM and subjective measure: MOS score.
Abstract: In this paper, the problem of depth estimation from single monocular image is considered. The depth cues such as motion, stereo correspondences are not present in single image which makes the task more challenging. We propose a machine learning based approach for extracting depth information from single image. The deep learning is used for extracting features, then, initial depths are generated using manifold learning in which neighborhood preserving embedding algorithm is used. Then, fixed point supervised learning is applied for sequential labeling to obtain more consistent and accurate depth maps. The features used are initial depths obtained from manifold learning and various image based features including texture, color and edges which provide useful information about depth. A fixed point contraction mapping function is generated using which depth map is predicted for new structured input image. The transfer learning approach is also used for improvement in learning in a new task through the transfer of knowledge from a related task that has already been learned. The predicted depth maps are reliable, accurate and very close to ground truth depths which is validated using objective measures: RMSE, PSNR, SSIM and subjective measure: MOS score.

2 citations

Journal ArticleDOI
TL;DR: This paper presents a robust depth image based rendering scheme for stereoscopic view synthesis which enables to yield visually satisfactory results for high quality 2D to 3D conversion.
Abstract: Depth image based rendering (DIBR), which is utilized to render virtual views with a color image and the corresponding depth map, is one of the key procedures in the 2D to 3D conversion process. However, some troubling problems, such as depth edge misalignment, disocclusion occurrences and cracks at resampling, still exist in current DIBR systems. To solve these problems, in this paper, we present a robust depth image based rendering scheme for stereoscopic view synthesis. The cores of the proposed scheme are two depth map filters which share a common domain transform based filtering framework. As a first step, a filter of this framework is carried out to realize texture-depth boundary alignments and directional disocclusion reduction smoothing simultaneously. Then after depth map 3D warping, another adaptive filter is used on the warped depth maps with delivered scene gradient structures to further diminish the remaining cracks and noises. Finally, with the optimized depth map of the virtual view, backward texture warping is adopted to retrieve the final texture virtual view. The proposed scheme enables to yield visually satisfactory results for high quality 2D to 3D conversion. Experimental results demonstrate the excellent performances of the proposed approach. key words: depth image-based rendering, domain transform, adaptive smoothing, stereoscopic image generation

Cites background from "Automated conversion of 2D to 3D im..."

  • ...The depth cues used in these approaches are limited, so it is still hard to estimate accurate edge boundaries in depth maps, especially for the approaches with automatic processes with certain single cues [15]....

    [...]

  • ...In practice, the depth maps estimated by various cues, especially the automatic 2D to 3D conversion schemes [15], may not align with the corresponding color image correctly in limited constraints when estimating the stereoscopic information with just incomplete 2D cues....

    [...]

References
More filters
Journal ArticleDOI
22 Dec 2000-Science
TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Abstract: Many areas of science depend on exploratory data analysis and visualization. The need to analyze large amounts of multivariate data raises the fundamental problem of dimensionality reduction: how to discover compact representations of high-dimensional data. Here, we introduce locally linear embedding (LLE), an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs. Unlike clustering methods for local dimensionality reduction, LLE maps its inputs into a single global coordinate system of lower dimensionality, and its optimizations do not involve local minima. By exploiting the local symmetries of linear reconstructions, LLE is able to learn the global structure of nonlinear manifolds, such as those generated by images of faces or documents of text.

15,106 citations


Additional excerpts

  • ...In [1], locally linear embedding (LLE), which is a manifold learning technique is used for dimension reduction where different examples of images of faces, lips and handwritten digits are shown....

    [...]

Journal ArticleDOI
TL;DR: This work considers the problem of estimating detailed 3D structure from a single still image of an unstructured environment and uses a Markov random field (MRF) to infer a set of "plane parameters" that capture both the 3D location and 3D orientation of the patch.
Abstract: We consider the problem of estimating detailed 3D structure from a single still image of an unstructured environment. Our goal is to create 3D models that are both quantitatively accurate as well as visually pleasing. For each small homogeneous patch in the image, we use a Markov random field (MRF) to infer a set of "plane parametersrdquo that capture both the 3D location and 3D orientation of the patch. The MRF, trained via supervised learning, models both image depth cues as well as the relationships between different parts of the image. Other than assuming that the environment is made up of a number of small planes, our model makes no explicit assumptions about the structure of the scene; this enables the algorithm to capture much more detailed 3D structure than does prior art and also give a much richer experience in the 3D flythroughs created using image-based rendering, even for scenes with significant nonvertical structure. Using this approach, we have created qualitatively correct 3D models for 64.9 percent of 588 images downloaded from the Internet. We have also extended our model to produce large-scale 3D models from a few images.

1,522 citations

Proceedings ArticleDOI
09 May 2011
TL;DR: A large-scale, hierarchical multi-view object dataset collected using anRGB-D camera is introduced and techniques for RGB-D based object recognition and detection are introduced, demonstrating that combining color and depth information substantially improves quality of results.
Abstract: Over the last decade, the availability of public image repositories and recognition benchmarks has enabled rapid progress in visual object category and instance detection. Today we are witnessing the birth of a new generation of sensing technologies capable of providing high quality synchronized videos of both color and depth, the RGB-D (Kinect-style) camera. With its advanced sensing capabilities and the potential for mass adoption, this technology represents an opportunity to dramatically increase robotic object recognition, manipulation, navigation, and interaction capabilities. In this paper, we introduce a large-scale, hierarchical multi-view object dataset collected using an RGB-D camera. The dataset contains 300 objects organized into 51 categories and has been made publicly available to the research community so as to enable rapid progress based on this promising technology. This paper describes the dataset collection procedure and introduces techniques for RGB-D based object recognition and detection, demonstrating that combining color and depth information substantially improves quality of results.

1,462 citations

Proceedings Article
05 Dec 2005
TL;DR: This work begins by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps, and applies supervised learning to predict the depthmap as a function of the image.
Abstract: We consider the task of depth estimation from a single monocular image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a discriminatively-trained Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models both depths at individual points as well as the relation between depths at different points. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps.

1,079 citations

Proceedings Article
16 Jun 2013
TL;DR: An algorithm with a new perspective on layered models; it aims to find a fixed-point function with the structured labels being both the output and the input, which alleviates the burden in learning multiple/different classifiers in different layers.
Abstract: In this paper, we propose a simple but effective solution to the structured labeling problem: a fixed-point model. Recently, layered models with sequential classifiers/regressors have gained an increasing amount of interests for structural prediction. Here, we design an algorithm with a new perspective on layered models; we aim to find a fixed-point function with the structured labels being both the output and the input. Our approach alleviates the burden in learning multiple/different classifiers in different layers. We devise a training strategy for our method and provide justifications for the fixed-point function to be a contraction mapping. The learned function captures rich contextual information and is easy to train and test. On several widely used benchmark datasets, the proposed method observes significant improvement in both performance and efficiency over many state-of-the-art algorithms.

44 citations