scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Depth estimation from single image using Defocus and Texture cues

TL;DR: A model that combines two monocular depth cues namely Texture and Defocus is presented, which mainly focuses on modifying the erroneous regions in defocus map by using the texture energy present at that region.
Abstract: As imaging is a process of 2D projection of a 3D scene, the depth information is lost at the time of image capture from conventional camera. This depth information can be inferred back from a set of visual cues present in the image. In this work, we present a model that combines two monocular depth cues namely Texture and Defocus. Depth is related to the spatial extent of the defocus blur by assuming that more an object is blurred, the farther it is from the camera. At first, we estimate the amount of defocus blur present at edge pixels of an image. This is referred as the Sparse Defocus map. Using the sparse defocus map we generate the full defocus map. However such defocus maps always contain hole regions and ambiguity in depth. To handle this problem an additional depth cue, in our case texture has been integrated to generate better defocus map. This integration mainly focuses on modifying the erroneous regions in defocus map by using the texture energy present at that region. The sparse defocus map is corrected using texture based rules. The hole regions, where there are no significant edges and texture are detected and corrected in sparse defocus map. We have used region wise propagation for better defocus map generation. The accuracy of full defocus map is increased with the region wise propagation.
Citations
More filters
Journal ArticleDOI
TL;DR: An automated tool for tree trunk diameter and tree species assessments was developed that enables fast and accurate estimation even while one is walking, which reduces the time spent in measuring trees.
Abstract: Tree trunk diameter and tree species are two of the most important parameters in analyzing trees in urban areas and forests. Conventionally, diameters have been measured manually, and the species were determined by sight. An automated tool for these assessments was developed. Tree trunks are automatically detected from captured stereo images. Then, tree trunk diameters are estimated, and the species are determined. The developed graphical user interface tool enables fast and accurate estimation even while one is walking, which reduces the time spent in measuring trees.

5 citations

Journal ArticleDOI
TL;DR: In this paper , an endoscopic video acquisition system is designed, key frames of burden surface video in stable state are extracted based on feature point optical flow method, and the sparse depth is estimated by using the defocus-based method.
Abstract: Continuous and accurate depth information of blast furnace burden surface is important for optimizing charging operations, thereby reducing its energy consumption and CO 2 emissions. However, depth estimation for a single image is challenging, especially when estimating the depth of burden surface images in the harsh internal environment of the blast furnace. In this paper, a novel method that is based on edge defocus tracking is proposed to estimate the depth of burden surface images with different morphological characteristics. First, an endoscopic video acquisition system is designed, key frames of burden surface video in stable state are extracted based on feature point optical flow method, and the sparse depth is estimated by using the defocus-based method. Next, the burden surface image is divided into four subregions according to the distribution characteristics of the burden surface, the edge line trajectories and an eight-direction depth gradient template are designed to develop depth propagation rules. Finally, the depth is propagated from edge to the entire image based on edge line tracking method. The experimental results show that the proposed method can accurately and efficiently estimate the depth of the burden surface and provide key data support for optimizing the operation of blast furnace.

5 citations

Proceedings ArticleDOI
Hualu Li, Jiang Zou, Yutong Li, Yuan Xu, Zhiyong Xiao 
01 Dec 2020
TL;DR: In this paper, a vision system with a 4 by 2 time-of-flight (TOF) camera array was proposed to measure package volume automatically, and a splicing algorithm was developed so that low resolution TOF sensors can be used without sacrificing measurement accuracy.
Abstract: Package volume is critical for logistics companies to manage transportation cost. A vision system with a 4 by 2 time of flight (TOF) camera array was proposed to measure package volume automatically. Timing control was implemented to avoid interference between different TOF sensors. A splicing algorithm was developed so that low resolution TOF sensors can be used without sacrificing measurement accuracy. The vision system is able to cover 1.6 m by 3.7 m area and to process a video stream with a frame rate greater than 22 fps. The point cloud spacing ranges between 4.1 mm and 5.8 mm. The average measurement error in volume is less than 4.7%.

2 citations

Proceedings ArticleDOI
01 Jul 2020
TL;DR: The proposed model establishes a relationship between depths of different individual objects that is very important for several computer vision applications like robot, obstacle avoidance, object recognition, self-driving car etc.
Abstract: Depth estimation in single image aims to find the depth of different individual objects and the position of an object. Based on the Make3D dataset, first image segmentation is done using GMM and to describe their main features and attributes, the feature extraction is done using PCA, finally depth map will be generated with the help of FCM algorithm. Depth map provide as an excellent method to visualize distance of the surfaces of scene objects from a viewpoint. The platform used here is MATLAB R2018a. In our method, the final depth map is represented using different color where yellow color represents far objects and blue color indicates nearest objects. In the aforementioned perspective, the proposed model establishes a relationship between depths of different individual objects. It is very important for several computer vision applications like robot, obstacle avoidance, object recognition, self-driving car etc. The major finding of our study is to find the depth and position of different individual objects.

2 citations

Posted Content
TL;DR: In this article, a virtual-world supervision (MonoDEVS) and real-world SfM self-supervision is proposed to compensate the SfMs limitations by leveraging virtual world images with accurate semantic and depth supervision and addressing the virtual to real domain gap.
Abstract: Depth information is essential for on-board perception in autonomous driving and driver assistance. Monocular depth estimation (MDE) is very appealing since it allows for appearance and depth being on direct pixelwise correspondence without further calibration. Best MDE models are based on Convolutional Neural Networks (CNNs) trained in a supervised manner, i.e., assuming pixelwise ground truth (GT). Usually, this GT is acquired at training time through a calibrated multi-modal suite of sensors. However, also using only a monocular system at training time is cheaper and more scalable. This is possible by relying on structure-from-motion (SfM) principles to generate self-supervision. Nevertheless, problems of camouflaged objects, visibility changes, static-camera intervals, textureless areas, and scale ambiguity, diminish the usefulness of such self-supervision. In this paper, we perform monocular depth estimation by virtual-world supervision (MonoDEVS) and real-world SfM self-supervision. We compensate the SfM self-supervision limitations by leveraging virtual-world images with accurate semantic and depth supervision and addressing the virtual-to-real domain gap. Our MonoDEVSNet outperforms previous MDE CNNs trained on monocular and even stereo sequences.

2 citations

References
More filters
Book
01 Dec 2003
TL;DR: 1. Fundamentals of Image Processing, 2. Intensity Transformations and Spatial Filtering, and 3. Frequency Domain Processing.
Abstract: 1. Introduction. 2. Fundamentals. 3. Intensity Transformations and Spatial Filtering. 4. Frequency Domain Processing. 5. Image Restoration. 6. Color Image Processing. 7. Wavelets. 8. Image Compression. 9. Morphological Image Processing. 10. Image Segmentation. 11. Representation and Description. 12. Object Recognition.

6,306 citations


"Depth estimation from single image ..." refers background in this paper

  • ...Edged are localized by first detecting the zerocrossings of second order derivative response, ensure that these zero crossings are the gradient maximum pixels followed by edge thinning [8]....

    [...]

Journal ArticleDOI
TL;DR: A closed-form solution to natural image matting that allows us to find the globally optimal alpha matte by solving a sparse linear system of equations and predicts the properties of the solution by analyzing the eigenvectors of a sparse matrix, closely related to matrices used in spectral image segmentation algorithms.
Abstract: Interactive digital matting, the process of extracting a foreground object from an image based on limited user input, is an important task in image and video editing. From a computer vision perspective, this task is extremely challenging because it is massively ill-posed - at each pixel we must estimate the foreground and the background colors, as well as the foreground opacity ("alpha matte") from a single color measurement. Current approaches either restrict the estimation to a small part of the image, estimating foreground and background colors based on nearby pixels where they are known, or perform iterative nonlinear estimation by alternating foreground and background color estimation with alpha estimation. In this paper, we present a closed-form solution to natural image matting. We derive a cost function from local smoothness assumptions on foreground and background colors and show that in the resulting expression, it is possible to analytically eliminate the foreground and background colors to obtain a quadratic cost function in alpha. This allows us to find the globally optimal alpha matte by solving a sparse linear system of equations. Furthermore, the closed-form formula allows us to predict the properties of the solution by analyzing the eigenvectors of a sparse matrix, closely related to matrices used in spectral image segmentation algorithms. We show that high-quality mattes for natural images may be obtained from a small amount of user input.

1,851 citations


"Depth estimation from single image ..." refers background in this paper

  • ...where d̂ and d are the vector forms of the sparse defocus map and the full defocus map , L is the matting Laplacian matrix,D is a diagonal matrix and detailed derivation is given in [10]....

    [...]

Journal ArticleDOI
TL;DR: This work considers the problem of estimating detailed 3D structure from a single still image of an unstructured environment and uses a Markov random field (MRF) to infer a set of "plane parameters" that capture both the 3D location and 3D orientation of the patch.
Abstract: We consider the problem of estimating detailed 3D structure from a single still image of an unstructured environment. Our goal is to create 3D models that are both quantitatively accurate as well as visually pleasing. For each small homogeneous patch in the image, we use a Markov random field (MRF) to infer a set of "plane parametersrdquo that capture both the 3D location and 3D orientation of the patch. The MRF, trained via supervised learning, models both image depth cues as well as the relationships between different parts of the image. Other than assuming that the environment is made up of a number of small planes, our model makes no explicit assumptions about the structure of the scene; this enables the algorithm to capture much more detailed 3D structure than does prior art and also give a much richer experience in the 3D flythroughs created using image-based rendering, even for scenes with significant nonvertical structure. Using this approach, we have created qualitatively correct 3D models for 64.9 percent of 588 images downloaded from the Internet. We have also extended our model to produce large-scale 3D models from a few images.

1,522 citations

Journal ArticleDOI
01 Aug 2004
TL;DR: This paper presents a simple colorization method that requires neither precise image segmentation, nor accurate region tracking, and demonstrates that high quality colorizations of stills and movie clips may be obtained from a relatively modest amount of user input.
Abstract: Colorization is a computer-assisted process of adding color to a monochrome image or movie The process typically involves segmenting images into regions and tracking these regions across image sequences Neither of these tasks can be performed reliably in practice; consequently, colorization requires considerable user intervention and remains a tedious, time-consuming, and expensive taskIn this paper we present a simple colorization method that requires neither precise image segmentation, nor accurate region tracking Our method is based on a simple premise; neighboring pixels in space-time that have similar intensities should have similar colors We formalize this premise using a quadratic cost function and obtain an optimization problem that can be solved efficiently using standard techniques In our approach an artist only needs to annotate the image with a few color scribbles, and the indicated colors are automatically propagated in both space and time to produce a fully colorized image or sequence We demonstrate that high quality colorizations of stills and movie clips may be obtained from a relatively modest amount of user input

1,505 citations

Proceedings Article
05 Dec 2005
TL;DR: This work begins by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps, and applies supervised learning to predict the depthmap as a function of the image.
Abstract: We consider the task of depth estimation from a single monocular image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a discriminatively-trained Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models both depths at individual points as well as the relation between depths at different points. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps.

1,079 citations


"Depth estimation from single image ..." refers background or methods in this paper

  • ...But in Ashutosh [4] method and Yousun Kang, Osamu Hasegawa and Hiroshi Nagahashi [6] learning is a difficult task. hence our algorithm uses the depth information from both texture and defocus blur....

    [...]

  • ...But in Ashutosh [4] method and Yousun Kang, Osamu Hasegawa and Hiroshi Nagahashi [6] learning is a difficult task....

    [...]

  • ...Ashutosh saxena and Chung, Sung H and Ng, Andrew Y [4] has combined the two monocular cues texture and haze to get the depth information by supervised learning....

    [...]