scispace - formally typeset
Search or ask a question

Showing papers on "2D to 3D conversion published in 2018"


Book ChapterDOI
08 Sep 2018
TL;DR: This paper proposes a new method for 3D hand pose estimation from a monocular image through a novel 2.5D pose representation that implicitly learns depth maps and heatmap distributions with a novel CNN architecture.
Abstract: Estimating the 3D pose of a hand is an essential part of human-computer interaction. Estimating 3D pose using depth or multi-view sensors has become easier with recent advances in computer vision, however, regressing pose from a single RGB image is much less straightforward. The main difficulty arises from the fact that 3D pose requires some form of depth estimates, which are ambiguous given only an RGB image. In this paper we propose a new method for 3D hand pose estimation from a monocular image through a novel 2.5D pose representation. Our new representation estimates pose up to a scaling factor, which can be estimated additionally if a prior of the hand size is given. We implicitly learn depth maps and heatmap distributions with a novel CNN architecture. Our system achieves state-of-the-art accuracy for 2D and 3D hand pose estimation on several challenging datasets in presence of severe occlusions.

286 citations


Journal ArticleDOI
TL;DR: A deep variational model that effectively integrates heterogeneous predictions from two convolutional neural networks, named global and local networks, which have contrasting network architecture and are designed to capture the depth information with complementary attributes.
Abstract: Recent works on machine learning have greatly advanced the accuracy of single image depth estimation. However, the resulting depth images are still over-smoothed and perceptually unsatisfying. This paper casts depth prediction from single image as a parametric learning problem. Specifically, we propose a deep variational model that effectively integrates heterogeneous predictions from two convolutional neural networks (CNNs), named global and local networks. They have contrasting network architecture and are designed to capture the depth information with complementary attributes. These intermediate outputs are then combined in the integration network based on the variational framework. By unrolling the optimization steps of Split Bregman iterations in the integration network, our model can be trained in an end-to-end manner. This enables one to simultaneously learn an efficient parameterization of the CNNs and hyper-parameter in the variational method. Finally, we offer a new data set of 0.22 million RGB-D images captured by Microsoft Kinect v2. Our model generates realistic and discontinuity-preserving depth prediction without involving any low-level segmentation or superpixels. Intensive experiments demonstrate the superiority of the proposed method in a range of RGB-D benchmarks, including both indoor and outdoor scenarios.

91 citations


Journal ArticleDOI
TL;DR: This work presents an automatic learning-based 2D-3D image conversion approach, based on the key hypothesis that color images with similar structure likely present a similar depth structure, and estimates the depth of a color query image using the prior knowledge provided by a repository of color + depth images.
Abstract: There has been a significant increase in the availability of 3D players and displays in the last years. Nonetheless, the amount of 3D content has not experimented an increment of such magnitude. To alleviate this problem, many algorithms for converting images and videos from 2D to 3D have been proposed. Here, we present an automatic learning-based 2D-3D image conversion approach, based on the key hypothesis that color images with similar structure likely present a similar depth structure. The presented algorithm estimates the depth of a color query image using the prior knowledge provided by a repository of color + depth images. The algorithm clusters this database attending to their structural similarity, and then creates a representative of each color-depth image cluster that will be used as prior depth map. The selection of the appropriate prior depth map corresponding to one given color query image is accomplished by comparing the structural similarity in the color domain between the query image and the database. The comparison is based on a K-Nearest Neighbor framework that uses a learning procedure to build an adaptive combination of image feature descriptors. The best correspondences determine the cluster, and in turn the associated prior depth map. Finally, this prior estimation is enhanced through a segmentation-guided filtering that obtains the final depth map estimation. This approach has been tested using two publicly available databases, and compared with several state-of-the-art algorithms in order to prove its efficiency.

22 citations


Journal ArticleDOI
TL;DR: This work proposes a system to infer binocular disparity from a monocular video stream in real-time, that is numerically inaccurate, but results in a very similar overall depth impression with plausible overall layout, sharp edges, fine details and agreement between luminance and disparity.
Abstract: We propose a system to infer binocular disparity from a monocular video stream in real-time Different from classic reconstruction of physical depth in computer vision, we compute perceptually plausible disparity, that is numerically inaccurate, but results in a very similar overall depth impression with plausible overall layout, sharp edges, fine details and agreement between luminance and disparity We use several simple monocular cues to estimate disparity maps and confidence maps of low spatial and temporal resolution in real-time These are complemented by spatially-varying, appearance-dependent and class-specific disparity prior maps, learned from example stereo images Scene classification selects this prior at runtime Fusion of prior and cues is done by means of robust MAP inference on a dense spatio-temporal conditional random field with high spatial and temporal resolution Using normal distributions allows this in constant-time, parallel per-pixel work We compare our approach to previous 2D-to-3D conversion systems in terms of different metrics, as well as a user study and validate our notion of perceptually plausible disparity

10 citations


Journal ArticleDOI
TL;DR: It is shown that the proposed novel, data-driven method for 2-D-to-3-D video conversion significantly outperforms the current state-of-the-art methods and produces high-quality 3-D videos that are almost indistinguishable from videos shot by stereo cameras.
Abstract: A wide adoption of 3-D videos is hindered by the lack of high-quality 3-D content. One promising solution to this problem is through data-driven 2-D-to-3-D video conversion. Such approaches are based on learning depth maps from a large dataset of 2-D+Depth images. However, current conversion methods, while general, produce low-quality results with artifacts that are not acceptable to many viewers. We propose a novel, data-driven method for 2-D-to-3-D video conversion. Our method transfers the depth gradients from a large database of 2-D+Depth images. Capturing 2-D+Depth databases, however, are complex and costly, especially for outdoor sports games. We address this problem by creating a synthetic database from computer games and showing that this synthetic database can effectively be used to convert real videos. We propose a spatio-temporal method to ensure the smoothness of the generated depth within individual frames and across successive frames. In addition, we present an object boundary detection method customized for 2-D-to-3-D conversion systems, which produces clear depth boundaries for players. We implement our method and validate it by conducting user studies that evaluate depth perception and visual comfort of the converted 3-D videos. We show that our method produces high-quality 3-D videos that are almost indistinguishable from videos shot by stereo cameras. In addition, our method significantly outperforms the current state-of-the-art methods. For example, up to 20% improvement in the perceived depth is achieved by our method, which translates to improving the mean opinion score from good to excellent.

10 citations


Journal ArticleDOI
TL;DR: A stereoview to multiview conversion system, which includes stereo matching and depth image-based rendering (DIBR) hardware designs, is proposed, which achieves an averaged peak signal-to-noise ratio of 30.2 dB and structure similarity of 0.94 for the tested images.
Abstract: In this paper, a stereoview to multiview conversion system, which includes stereo matching and depth image-based rendering (DIBR) hardware designs, is proposed. To achieve an efficient architecture, the proposed stereo matching algorithm simply generates the raw matching costs and aggregates cost based on 1D iterative aggregation schemes. For the DIBR architecture, an inpainting-based method is used to find the most similar patch from the background, according to depth information. The simulation results show that the designed architecture achieves an averaged peak signal-to-noise ratio of 30.2 dB and structure similarity of 0.94 for the tested images. The hardware design for the proposed 2D to 3D conversion system operates at a maximum clock frequency of 160.2 MHz for outputting 1080p ( $1920 \times 1080$ ) video at 60 frames per second.

5 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel method for 2D-to-3D video conversion, based on boundary information to automatically generate the depth map, which uses the Gaussian model to detect foreground objects and then separate the foreground and background.
Abstract: This paper proposes a novel method for 2D-to-3D video conversion, based on boundary information to automatically generate the depth map. First, we use the Gaussian model to detect foreground objects and then separate the foreground and background. Second, we employ the superpixel algorithm to find the edge information. According to the superpixels, we will assign corresponding hierarchical depth value to initial depth map. From the result of depth value assignment, we detect the edges by Sobel edge detection with two thresholds to strengthen edge information. To identify the boundary pixels, we use a thinning algorithm to modify edge detection. Following these results, we assign the depth value of foreground to refine it. We use four kinds of scanning path for the entire image to create a more accurate depth map. After that, we have the final depth map. Finally, we utilize depth image-based rendering (DIBR) to synthesize left and right view images. After combining the depth map and the original 2D video, a vivid 3D video is produced.

5 citations


Proceedings ArticleDOI
01 Dec 2018
TL;DR: This paper introduces a novel interactive depth map creation approach for image sequences which uses depth-scribbles as input at user-defined keyframes and propagated across the entire sequence using a 3-dimensional geodesic distance transform (3D-GDT).
Abstract: In this paper, we introduce a novel interactive depth map creation approach for image sequences which uses depth-scribbles as input at user-defined keyframes. These scribbled depth values are then propagated within these keyframes and across the entire sequence using a 3-dimensional geodesic distance transform (3D-GDT). In order to further improve the depth estimation of the intermediate frames, we make use of a convolutional neural network (CNN) in an unconventional manner. Our process is based on online learning which allows us to specifically train a disposable network for each sequence individually using the user generated depth at keyframes along with corresponding RGB images as training pairs. Thus, we actually take advantage of one of the most common issues in deep learning: over-fitting. Furthermore, we integrated this approach into a professional interactive depth map creation application and compared our results against the state of the art in interactive depth map creation.

3 citations


Proceedings ArticleDOI
Haitao Liang1, Xiu Su1, Yilin Liu1, Huaiyuan Xu1, Yi Wang1, Xiaodong Chen1 
12 Jan 2018
TL;DR: A hole-filling algorithm using the information of depth map to handle the image after DIBR and can effectively improve the quality of the 3D virtual view subjectively and objectively.
Abstract: New virtual view is synthesized through depth image based rendering(DIBR) using a single color image and its associated depth map in 3D view generation. Holes are unavoidably generated in the 2D to 3D conversion process. We propose a hole-filling method based on depth map to address the problem. Firstly, we improve the process of DIBR by proposing a one-to-four (OTF) algorithm. The “z-buffer” algorithm is used to solve overlap problem. Then, based on the classical patch-based algorithm of Criminisi et al., we propose a hole-filling algorithm using the information of depth map to handle the image after DIBR. In order to improve the accuracy of the virtual image, inpainting starts from the background side. In the calculation of the priority, in addition to the confidence term and the data term, we add the depth term. In the search for the most similar patch in the source region, we define the depth similarity to improve the accuracy of searching. Experimental results show that the proposed method can effectively improve the quality of the 3D virtual view subjectively and objectively.

3 citations


Journal ArticleDOI
TL;DR: The experimental results verify that the proposed depth prediction framework outperforms some depth estimation methods and is efficient at producing 3D views.
Abstract: With progress in 3D video technology, 2D to 3D conversion has drawn great attention in recent years. Predicting perceptually reasonable depth information from traditional monocular videos is a challenging task in 2D to 3D conversion. For the purpose of generating convincing depth maps, an efficient depth estimation algorithm utilizing non-parametric learning and bi-directional depth propagation is proposed. First, global depth maps on key frames are generated on the basis of gradient samples and the gradient reconstruction method. Then, foreground objects are extracted and employed to refine global depth maps for producing more local depth details. Next, the depth information of key frames is propagated utilizing bi-directional motion compensation to recover the forward and backward depth information of non-key frames. In the end, a weighting fusion strategy is designed to integrate forward and backward depths for predicting the depth information of each non-key frame. The quality of estimated depth maps is assessed by leveraging objective and subjective quality evaluation criteria. The experimental results verify that the proposed depth prediction framework outperforms some depth estimation methods and is efficient at producing 3D views.

3 citations


Patent
30 Nov 2018
TL;DR: In this article, the authors present a method and a device for carrying out 2D to 3D conversion on an image using a pre-built and trained parallax information extraction model.
Abstract: Embodiments of the invention disclose a method and a device for carrying out 2D to 3D conversion on an image The 2D to 3D conversion efficiency and effect of the image can be improved The method comprises the steps of S1, acquiring a to-be-processed 2D image, inputting the to-be-processed 2D image into a pre-built and trained parallax information extraction model to obtain a parallax informationimage; and S2, carrying out three-dimensional rendering on the to-be-processed 2D image in combination with the parallax information image, and performing three-dimensional reconstruction on the to-be-processed 2D image

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed residual-driven energy function removes unwanted scribbles successfully while preserving expected input, and it outperforms the state-of-the-art when presented with cross-boundary scribbles.
Abstract: Semi-automatic 2D-to-3D conversion provides a cost-effective solution to the problem of 3D content shortage. The performance of most methods degrades significantly when cross-boundary scribbles are present due to their inability to remove unwanted input. To address this problem, a residual-driven energy function is proposed to remove unwanted input introduced by cross-boundary scribbles while preserving expected user input. Firstly, confidence of user input is computed from residuals between the estimation and user-specified depth values, and it is applied to the data fidelity term. Secondly, the residual-driven optimization is performed to estimate dense depth from user scribbles. The procedure is repeated until a maximum number of iterations is exceeded. Input confidence based on residuals avoids the propagation of unwanted scribbles and thus enables to generate high-quality depth even with cross-boundary input. Experimental results demonstrate that the proposed method removes unwanted scribbles successfully while preserving expected input, and it outperforms the state-of-the-art when presented with cross-boundary scribbles.


Proceedings ArticleDOI
01 Dec 2018
TL;DR: Results indicate that the proposed approach outperforms state of the art methods for the depth estimation of single 2D images via transfer learning of pre-trained deep learning model.
Abstract: Nowadays, depth estimation from a single 2D image is a prominent task due to its numerous applications such as 2D to 3D image/video conversion, robot vision, and self-driving cars. This research proposes an automatic novel technique for the depth estimation of single 2D images via transfer learning of pre-trained deep learning model. This is a challenging problem, as a single 2D image does not carry any cues regarding depth. To tackle this, the pool of available images is exploited for which the depth is known. By following the hypothesis that the color images having similar semantics are most probably to have similar depth. Along these lines, the depth of the input image is predicted through corresponding depth maps of semantically similar images available in the dataset, fetched by high-level features of pre-trained deep learning model followed by a classifier (i.e., K-Nearest Neighbor). Afterward, a Cross Bilateral filter is applied for the removal of fallacious depth variations in the depth map. To prove the quality of the presented approach, different experiments have been conducted on two publicly available benchmark datasets, NYU (v2) and Make3D. The results indicate that the proposed approach outperforms state of the art methods.

Patent
07 Dec 2018
TL;DR: In this article, a semi-automatic 2D to 3D method with fault tolerance comprises the following steps: key frame extraction, user annotation, annotation extraction and depth conversion, super pixel segmentation, planefitting, representative pixel extraction, energy function construction with fault-tolerant structure correlation, generalized iterative re-weighted least squares method and depth map-based drawing.
Abstract: A semi-automatic 2D to 3D method with fault tolerance comprises the following steps: key frame extraction, user annotation, annotation extraction and depth conversion, super pixel segmentation, planefitting, representative pixel extraction, structural difference description, fault tolerance mechanism, energy function construction with fault-tolerant structure correlation, generalized iterative re-weighted least squares method and depth map-based drawing. The method has the following beneficial effects: a deep propagation energy model related to a fault-tolerant structure is established by means of robust estimation, and local and global constraint relationships are combined, so that the prior knowledge of a scenario can be utilized more effectively, and the quality of the estimated depthmap is enhanced while the user annotation difficulty in semi-automatic 2D to 3D conversion is reduced.

Patent
11 Oct 2018
TL;DR: In this paper, a low-latency 2D-to-3D conversion is used to identify image object edges in the forms of vertically adjacent pixels that differ by more than a predetermined threshold in color.
Abstract: A virtual-reality (VR) headset provides for enhanced immersion in a two-dimensional (2D) game using a low-latency 2D-to-3D conversion. Columns of pixels are scanned to identify image object edges in the forms of vertically adjacent pixels that differ by more than a predetermined threshold in color. To reduce processing time, only every third column is considered and only every third pixel in each considered column is considered. Object strips are identified by correlating vertical pairs of edges. Identified object strips are vertically enlarged to provide one of a stereo pair of images. Object strips are not resized for the other image of the stereo pair. The images of the stereo pair are respectively presented to the left and right eyes of a player to provide a more immersive experience.

Proceedings ArticleDOI
01 Nov 2018
TL;DR: This paper proposes an algorithm of automatic foreground depth propagation from key frames where the foreground part is segmented and depth-assigned manually with some supporting computer tools, and improves the resulting foreground depth map significantly.
Abstract: Depth map estimation is important in 2D to 3D video conversion. Normally, the background part is static or changes slowly, while the foreground part might change substantially between consecutive frames. A good strategy is that depths for the foreground and the background parts are estimated separately and then combined together to form the final depth map. In this paper, we propose, for non-key frames, an algorithm of automatic foreground depth propagation from key frames where the foreground part is segmented and depth-assigned manually with some supporting computer tools. For each non-key frame, the foreground region is segmented independently based on the graph-cut and GMM (Gaussian Mixture Model) algorithms. The superpixel algorithm is then applied to the foreground area only for partitioning it into homogeneous patches. To propagate/compensate the foreground depths from key frames, superpixel matching (based on color component and foreground labels) is performed between each non-key frame and its reference frame, with the background parts removed. We then refine the foreground depths by using bilateral filtering. Experiments show that compared to conventional algorithms of block matching, optical flow, and superpixel, our method is advantageous of resisting large foreground motion and erroneous matching caused by background interference (similar colors). In overall, our algorithm improves the resulting foreground depth map significantly.

Patent
02 Oct 2018
TL;DR: In this paper, a method and system for automatically converting 2D to 3D is presented, which comprises the steps of evaluating the total depth value, and determining the overall depth value and the display position; segmenting an image to be converted into a specific number of independent image segmentation planes according to a grayscale threshold segmentation method; automatically creating a depth map for each independent segmentation plane, automatically tracking image frames of a video object and the background when conversion is performed on a video, and associating the depth maps; generating a 3D stereoscopic image or
Abstract: The invention discloses a method and system for automatically converting 2D to 3D. The method comprises the steps of evaluating the total depth value, and determining the total depth value and the display position; segmenting an image to be converted into a specific number of independent image segmentation planes according to a grayscale threshold segmentation method; automatically creating a depth map for each independent image segmentation plane, automatically tracking image frames of a video object and the background when conversion is performed on a video, and associating the depth maps; generating a 3D stereoscopic image or video through iteration rendering; and performing adjustment on parameters of the generated 3D stereoscopic image by using an adjustment device so as to enable the3D stereoscopic image to reach a desired effect. The system comprises devices (A, B, C, D, E, F) corresponding to the steps of the method and an adjustment device. The implementation of the inventionsimplifies depth level and dynamic change steps, can obtain a good 3D stereoscopic effect and realizes automatic conversion of the 3D stereoscopic image or video.

Patent
18 Dec 2018
TL;DR: Zhang et al. as discussed by the authors proposed a monocular depth estimation method of an augmented ordinal depth relation, which comprises the following concrete steps: (1) introducing a new relative depth stereoscopic system RDIS data set dense and marking the relative depth; (2) pre-training the RESNET model on the dense RDIS dataset; (3) retrieving the metric depth prediction, and establishing a relative depth map of any normalized prediction, so that the average value and the standard deviation are the same as the metric surface true depth of the train set.
Abstract: The invention discloses a monocular depth estimation method of an augmented ordinal depth relation, which comprises the following concrete steps: (1) introducing a new relative depth stereoscopic system RDIS data set dense and marking the relative depth; (2), pre-training the RESNET model on the dense RDIS data set; (3), retrieving the metric depth prediction, and establishing a relative depth mapof any normalized prediction, so that the average value and the standard deviation are the same as the metric surface true depth of the train set. In this manner, the invention provides a monocular depth estimation method of an augmented ordinal depth relation. The real relative depth of the ground is obtained through the existing stereo algorithm and artificial post-processing, which greatly improves the performance of single image depth estimation. The proposed RDIS dataset dense learning scheme based on relative depth can realize the conversion from 2D to 3D.

Journal ArticleDOI
TL;DR: ‘synthetic’ database is used to provide the first approximation through comparison techniques and fed to the predictive tool, believed that this work will provide a basis for developing an efficient 2D to 3D conversion methodology.
Abstract: Conventional 2D to 3D rendering techniques involve a sequential process of grouping of the input images based on edge information and predictive algorithms to assign depth values to pixels with same hue. The iterative calculations and volume of data under scrutiny to assign ‘real-time’ values raise latency issues and cost considerations. For commercial consumption, where speed and accuracy define the viability of a product, there is a need to reorient the approach used in the present methodologies. In predictive methodologies one of the core interests is achieving the initial approximation as close to the ‘real’ value as possible. In this work, ‘synthetic’ database is used to provide the first approximation through comparison techniques and fed to the predictive tool. It is believed that this work will provide a basis for developing an efficient 2D to 3D conversion methodology.

Journal ArticleDOI
TL;DR: A piecewise-continuity regularized low-rank matrix recovery method that demonstrates significant advantage over depth continuous transition between neighboring regions inSemi-automatic 2D-to-3D conversion.
Abstract: Semi-automatic 2D-to-3D conversion is a promising solution to 3D stereoscopic content creation. However, the depth continuous transition between user marked neighboring regions will be lost when user scribbles are sparse. To help solve this problem, a piecewise-continuity regularized low-rank matrix recovery method is developed. Our approach is based on the fact that a depth-map can be decomposed into a low-rank matrix and an outlier term matrix. First, an initial dense depth-map is interpolated from the user scribbles using matting Laplacian scheme under the assumption that depth-map is piecewise-continuous. Second, a piecewise-continuity constrained low-rank recovery model is developed to remove outliers which are introduced by the interpolation. Experimental comparisons with existing algorithms show that our method demonstrates significant advantage over depth continuous transition between neighboring regions.

Journal Article
TL;DR: Two types of methods based on learning a point mapping from local image/ attributes, such as color, spatial position and globally estimating the entire depth map of a query image directly from a repository of 3D images are developed.
Abstract: in the last few years, the availability of 3D content is still less than 2D counterpart. Hence many 2D-to-3D image conversion methods have been proposed. Methods involving human operators have been most successful but also timeconsuming and costly. Automatic methods, that make use of a deterministic 3D scene model, have not yet achieved the same level of quality for they rely on assumptions that are often violated in practice. Here two types of methods are developed. The first is based on learning a point mapping from local image/ attributes, such as color, spatial position. The second method is based on globally estimating the entire depth map of a query image directly from a repository of 3D images (image + depth pairs or stereo pairs) using a nearest-neighbour regression type idea. It demonstrates the ability and the computational efficiency of the methods on numerous 2D images and discusses their drawbacks and benefits. Keywords-Stereoscopic images, Image conversion, nearest neighbour Classification, Cross-bilateral filtering, 3D images

Proceedings ArticleDOI
01 Apr 2018
TL;DR: This paper proposes a unique method to convert 3D video from their original 2D versions by using optical flow information to calculate the depth of objects, and smoothing the change of the depth among a mean-shift based object.
Abstract: Rapid progress of 3D modeling and presentation techniques not only affects the entertainment industry but also enriches our daily life. People can display and enjoy 3D media conveniently nowadays. However, how to generate 3D media contents economically still deserves further research. In this paper, we propose a unique method to convert 3D video from their original 2D versions. This method involves using optical flow information to calculate the depth of objects, and smoothing the change of the depth among a mean-shift based object. The converting results will be shown by rotating the angles of view.

Journal ArticleDOI
26 Jan 2018
TL;DR: This paper investigates how motion between two images is affecting the reconstruction process of the KLT algorithm, which was used to convert 2D images to 3D model and demonstrates the way of improving the algorithm where it found more points from images, so it builds more accurate3D model.
Abstract: , This paper investigate how motion between two images is affecting the reconstruction process of the KLT algorithm which we used to convert 2D images to 3D model. The reconstruction process is carried out using a single calibrated camera and an algorithm based on only two views of a scene, the SFM technique based on detecting the correspondence points between the two images, and the Epipolar inliers. Using the KLT algorithm with structure from motion method shows the incompatibility of it with the widely-spaced images. Also, the ability of reducing the rate of reprojection error by removing the images that have the biggest rate of error. The experimental results are consisting from three stages. The first stage is done by using a scene with soft surfaces, the performance of the algorithm shows some deficiencies with the soft surfaces which are have few details. The second stage is done by using different scene with objects which have more details and rough surfaces, the algorithm results become more accurate than the first scene. The third stage is done by using the first scene of the first stage but after adding more details for surface of the ball to motivate the algorithm to detect more points, the results become more accurate than the results of the first stage. The experiments are showing the performance of the algorithm with different scenes and demonstrate the way of improving the algorithm where it found more points from images, so it builds more accurate 3D model.

Journal ArticleDOI
TL;DR: The proposed scribble confidence approach can tolerate some errors from use input and can reduce depth-map artifacts caused by inaccurate user input and the proposed approach is compared with existing methods on several representative images.
Abstract: Current semiautomatic 2D-to-3D methods assume that user input is perfectly accurate. However, it is difficult to get 100% accurate user scribbles and even small errors in the input will degrade the conversion quality. This paper addresses the issue with scribble confidence that considers color differences between labeled pixels and their neighbors. First, it counts the number of neighbors which have similar and different color values for each labeled pixels, respectively. The ratio between these two numbers at each labeled pixel is regarded as its scribble confidence. Second, the sparse-to-dense depth conversion is formulated as a confident optimization problem by introducing a confident weighting data cost term and the local and k-nearest depth consistent regularization terms. Finally, the dense depth-map is obtained by solving sparse linear equations. The proposed approach is compared with existing methods on several representative images. The experimental results demonstrate that the proposed method can tolerate some errors from use input and can reduce depth-map artifacts caused by inaccurate user input.