scispace - formally typeset
Search or ask a question
Author

Davi Geiger

Bio: Davi Geiger is an academic researcher from New York University. The author has contributed to research in topics: Image segmentation & Markov random field. The author has an hindex of 31, co-authored 114 publications receiving 4372 citations. Previous affiliations of Davi Geiger include Siemens & Case Western Reserve University.


Papers
More filters
Journal ArticleDOI
TL;DR: The information provided by the user's selected points is explored and an optimal method to detect contours which allows a segmentation of the image is applied, based on dynamic programming (DP), and applies to a wide variety of shapes.
Abstract: The problem of segmenting an image into separate regions and tracking them over time is one of the most significant problems in vision. Terzopoulos et al. (1987) proposed an approach to detect the contour regions of complex shapes, assuming a user selected initial contour not very far from the desired solution. We propose to further explore the information provided by the user's selected points and apply an optimal method to detect contours which allows a segmentation of the image. The method is based on dynamic programming (DP), and applies to a wide variety of shapes. It is exact and not iterative. We also consider a multiscale approach capable of speeding up the algorithm by a factor of 20, although at the expense of losing the guaranteed optimality characteristic. The problem of tracking and matching these contours is addressed. For tracking, the final contour obtained at one frame is sampled and used as initial points for the next frame. Then, the same DP process is applied. For matching, a novel strategy is proposed where the solution is a smooth displacement field in which unmatched regions are allowed while cross vectors are not. The algorithm is again based on DP and the optimal solution is guaranteed. We have demonstrated the algorithms on natural objects in a large spectrum of applications, including interactive segmentation and automatic tracking of the regions of interest in medical images. >

512 citations

Journal ArticleDOI
TL;DR: Deterministic approximations to Markov random field (MRF) models are derived and one of the models is shown to give in a natural way the graduated nonconvexity (GNC) algorithm proposed by A. Blake and A. Zisserman (1987).
Abstract: Deterministic approximations to Markov random field (MRF) models are derived. One of the models is shown to give in a natural way the graduated nonconvexity (GNC) algorithm proposed by A. Blake and A. Zisserman (1987). This model can be applied to smooth a field preserving its discontinuities. A class of more complex models is then proposed in order to deal with a variety of vision problems. All the theoretical results are obtained in the framework of statistical mechanics and mean field techniques. A parallel, iterative algorithm to solve the deterministic equations of the two models is presented, together with some experiments on synthetic and real images. >

486 citations

Journal ArticleDOI
TL;DR: A theory for stereo is described based on the Bayesian approach, using adaptive windows and a prior weak smoothness constraint, which incorporates occlusion, and it is shown that occlusions can help stereo computation by providing cues for depth discontinuities.
Abstract: Binocular stereo is the process of obtaining depth information from a pair of left and right cameras. In the past occlusions have been regions where stereo algorithms have failed. We show that, on the contrary, they can help stereo computation by providing cues for depth discontinuities.

280 citations

Journal ArticleDOI
TL;DR: This paper identifies a number of possibly desirable properties of a shape similarity method, and determines the extent to which these properties can be captured by approaches that compare local properties of the contours of the shapes, through elastic matching.

236 citations

Book ChapterDOI
02 Jun 1998
TL;DR: A new approach to compute the disparity map by solving a global optimization problem that models occlusions, discontinuities, and epipolar-line interactions is presented.
Abstract: Binocular stereo is the process of obtaining depth information from a pair of left and right views of a scene. We present a new approach to compute the disparity map by solving a global optimization problem that models occlusions, discontinuities, and epipolar-line interactions.

224 citations


Cited by
More filters
Book
01 Jan 1998
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Abstract: Introduction to a Transient World. Fourier Kingdom. Discrete Revolution. Time Meets Frequency. Frames. Wavelet Zoom. Wavelet Bases. Wavelet Packet and Local Cosine Bases. An Approximation Tour. Estimations are Approximations. Transform Coding. Appendix A: Mathematical Complements. Appendix B: Software Toolboxes.

17,693 citations

Journal ArticleDOI
TL;DR: This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.
Abstract: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First , we highlight convolution with upsampled filters, or ‘atrous convolution’, as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second , we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third , we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed “DeepLab” system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7 percent mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

11,856 citations

Posted Content
TL;DR: DeepLab as discussed by the authors proposes atrous spatial pyramid pooling (ASPP) to segment objects at multiple scales by probing an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views.
Abstract: In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.

10,120 citations

Journal ArticleDOI
TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.
Abstract: We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by C1-minimization, we propose a general classification algorithm for (image-based) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly by exploiting the fact that these errors are often sparse with respect to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm and corroborate the above claims.

9,658 citations

Journal ArticleDOI
TL;DR: This paper has designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms.
Abstract: Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can be easily extended to include new algorithms. We have also produced several new multiframe stereo data sets with ground truth, and are making both the code and data sets available on the Web.

7,458 citations