scispace - formally typeset
Search or ask a question
Author

David Martin

Bio: David Martin is an academic researcher from Charles III University of Madrid. The author has contributed to research in topics: Kalman filter & Context (language use). The author has an hindex of 39, co-authored 143 publications receiving 12589 citations. Previous affiliations of David Martin include Google & Carlos III Health Institute.


Papers
More filters
Proceedings ArticleDOI
07 Jul 2001
TL;DR: In this paper, the authors present a database containing ground truth segmentations produced by humans for images of a wide variety of natural scenes, and define an error measure which quantifies the consistency between segmentations of differing granularities.
Abstract: This paper presents a database containing 'ground truth' segmentations produced by humans for images of a wide variety of natural scenes. We define an error measure which quantifies the consistency between segmentations of differing granularities and find that different human segmentations of the same image are highly consistent. Use of this dataset is demonstrated in two applications: (1) evaluating the performance of segmentation algorithms and (2) measuring probability distributions associated with Gestalt grouping factors as well as statistics of image region properties.

6,505 citations

Journal ArticleDOI
TL;DR: The two main results are that cue combination can be performed adequately with a simple linear model and that a proper, explicit treatment of texture is required to detect boundaries in natural images.
Abstract: The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness, color, and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, we train a classifier using human labeled images as ground truth. The output of this classifier provides the posterior probability of a boundary at each image location and orientation. We present precision-recall curves showing that the resulting detector significantly outperforms existing approaches. Our two main results are 1) that cue combination can be performed adequately with a simple linear model and 2) that a proper, explicit treatment of texture is required to detect boundaries in natural images.

2,229 citations

Journal ArticleDOI
23 Feb 1990-JAMA
TL;DR: The increased mortality associated with delirium appears to be explained by greater severity of illness, and identifies elderly at risk for death, longer hospitalization, and institutionalization.
Abstract: The prevalence, risk factors, and outcomes of delirium were studied in 229 elderly patients. Fifty patients (22%) met criteria for delirium; nondelirious elderly constituted the control group. Abnormal sodium levels, illness severity, dementia, fever or hypothermia, psychoactive drug use, and azotemia were associated with risk of delirium. Patients with three or more risk factors had a 60% rate of delirium. Delirious patients stayed 12.1 days in the hospital vs 7.2 days for controls and were more likely to die (8% vs 1%) or be institutionalized (16% vs 3%). Illness severity predicted 6-month mortality, but the effect of delirium was not significant. Delirium occurs commonly in hospitalized elderly, is associated with chronic and acute problems, and identifies elderly at risk for death, longer hospitalization, and institutionalization. The increased mortality associated with delirium appears to be explained by greater severity of illness. (JAMA. 1990;263:1097-1101)

852 citations

01 Aug 2002
TL;DR: A database containing 'ground truth' segmentations produced by humans for images of a wide variety of natural scenes is presented and an error measure is defined which quantifies the consistency between segmentations of differing granularities.
Abstract: This paper presents a database containing "ground truth" segmentations produced by humans for images of a wide variety of natural scenes. We define an error measure which quantifies the consistency between segmentations of differing granularities and find that different human segmentations of the same image are highly consistent. Use of this dataset is demonstrated in two applications: (1) evaluating the performance of segmentation algorithms and (2) measuring probability distributions associated with Gestalt grouping factors as well as statistics of image region properties.

789 citations

01 Jan 2002
TL;DR: A battery of segmentation comparison measures are developed that provide “micro-benchmarks” for boundary detection algorithms and pixel affinity functions, as well a benchmark for complete segmentation algorithms.
Abstract: This thesis presents a novel dataset of 12,000 segmentations of 1,000 natural images by 30 human subjects. The subjects marked the locations of objects in the images, providing ground truth data for learning grouping cues and benchmarking grouping algorithms. We feel that the data-driven approach is critical for two reasons: (1) the data reflects “ecological statistics” that the human visual system has evolved to exploit, and (2) innovations in computational vision should be evaluated quantitatively. We develop a battery of segmentation comparison measures that we use both to validate the consistency of the human data and to provide approaches for evaluating grouping algorithms. In conjunction with the segmentation dataset, the various measures provide “micro-benchmarks” for boundary detection algorithms and pixel affinity functions, as well a benchmark for complete segmentation algorithms. Using these performance measures, we can systematically improve grouping algorithms with the human ground truth as our goal. Starting at the lowest level, we present local boundary models based on brightness, color, and texture cues, where the cues are individually optimized with respect to the dataset and then combined in a statistically optimal manner with classifiers. The resulting detector is shown to significantly outperform prior state-of-the-art algorithms. Next, we learn from data how to combine the boundary model with patch-based features in a pixel affinity model to settle long-standing debates in computer vision with empirical results: (1) brightness boundaries are more informative than patches, and vice versa for color; (2) texture boundaries and patches are the two most powerful cues; (3) proximity is not a useful cue for grouping, it is simply a result of the process; and (4) both boundary-based and region-based approaches provide significant independent information for grouping.

180 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures, including FCN and DeconvNet.
Abstract: We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network [1] . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] and also with the well known DeepLab-LargeFOV [3] , DeconvNet [4] architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet/ .

13,468 citations

Journal ArticleDOI
TL;DR: This tutorial gives an overview of the basic ideas underlying Support Vector (SV) machines for function estimation, and includes a summary of currently used algorithms for training SV machines, covering both the quadratic programming part and advanced methods for dealing with large datasets.
Abstract: In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

10,696 citations

Journal ArticleDOI
TL;DR: A new superpixel algorithm is introduced, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels and is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation.
Abstract: Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art superpixel algorithms for their ability to adhere to image boundaries, speed, memory efficiency, and their impact on segmentation performance. We then introduce a new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels. Despite its simplicity, SLIC adheres to boundaries as well as or better than previous methods. At the same time, it is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation.

7,849 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: SRGAN as mentioned in this paper proposes a perceptual loss function which consists of an adversarial loss and a content loss, which pushes the solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.
Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

6,884 citations

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a deep learning method for single image super-resolution (SR), which directly learns an end-to-end mapping between the low/high-resolution images.
Abstract: We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage. We explore different network structures and parameter settings to achieve trade-offs between performance and speed. Moreover, we extend our network to cope with three color channels simultaneously, and show better overall reconstruction quality.

6,122 citations