Showing papers on "Image segmentation published in 2015"

PDF

Open Access

Journal Article•DOI•

The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)

[...]

Bjoern H. Menze¹, Andras Jakab², Stefan Bauer³, Jayashree Kalpathy-Cramer⁴, Keyvan Farahani⁵, Justin Kirby⁵, Yuliya Burren³, N Porz³, Johannes Slotboom³, Roland Wiest³, Levente Lanczi⁶, Elizabeth R. Gerstner⁴, Marc-André Weber⁷, Tal Arbel⁸, Brian B. Avants⁹, Nicholas Ayache¹⁰, Patricia Buendia, D. Louis Collins⁸, Nicolas Cordier¹⁰, Jason J. Corso¹¹, Antonio Criminisi¹², Tilak Das¹³, Hervé Delingette¹⁰, Çağatay Demiralp¹⁴, Christopher R. Durst¹⁵, Michel Dojat¹⁰, Senan Doyle¹⁰, Joana Festa, Florence Forbes¹⁰, Ezequiel Geremia¹⁰, Ben Glocker¹⁶, Polina Golland¹⁷, Xiaotao Guo¹⁸, Andac Hamamci¹⁹, Khan M. Iftekharuddin²⁰, Raj Jena¹³, Nigel M. John, Ender Konukoglu⁴, Danial Lashkari¹⁷, José Mariz²¹, Raphael Meier³, Sérgio Pereira, Doina Precup⁸, Stephen J. Price¹³, Tammy Riklin Raviv¹⁷, Syed M. S. Reza²⁰, Michael Ryan, Duygu Sarikaya¹¹, Lawrence H. Schwartz¹⁸, Hoo-Chang Shin, Jamie Shotton¹², Carlos A. Silva, Nuno Sousa²¹, Nagesh K. Subbanna⁸, Gábor Székely², Thomas J. Taylor, Owen M. Thomas¹³, Nicholas J. Tustison¹⁵, Gozde Unal¹⁹, Flor Vasseur¹⁰, Max Wintermark¹⁵, Dong Hye Ye²², Liang Zhao¹¹, Binsheng Zhao¹⁸, Darko Zikic¹², Marcel Prastawa²³, Mauricio Reyes³, Koen Van Leemput⁴ - Show less +64 more•Institutions (23)

01 Oct 2015-IEEE Transactions on Medical Imaging

TL;DR: The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) as mentioned in this paper was organized in conjunction with the MICCAI 2012 and 2013 conferences, and twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low and high grade glioma patients.

...read moreread less

Abstract: In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma patients—manually annotated by up to four raters—and to 65 comparable scans generated using tumor image simulation software Quantitative evaluations revealed considerable disagreement between the human raters in segmenting various tumor sub-regions (Dice scores in the range 74%–85%), illustrating the difficulty of this task We found that different algorithms worked best for different sub-regions (reaching performance comparable to human inter-rater variability), but that no single algorithm ranked in the top for all sub-regions simultaneously Fusing several good algorithms using a hierarchical majority vote yielded segmentations that consistently ranked above all individual algorithms, indicating remaining opportunities for further methodological improvements The BRATS image data and manual annotations continue to be publicly available through an online evaluation system as an ongoing benchmarking resource

...read moreread less

3,699 citations

Proceedings Article•DOI•

Learning Deconvolution Network for Semantic Segmentation

[...]

Hyeonwoo Noh¹, Seunghoon Hong¹, Bohyung Han¹•Institutions (1)

Pohang University of Science and Technology¹

07 Dec 2015

TL;DR: A novel semantic segmentation algorithm by learning a deep deconvolution network on top of the convolutional layers adopted from VGG 16-layer net, which demonstrates outstanding performance in PASCAL VOC 2012 dataset.

...read moreread less

Abstract: We propose a novel semantic segmentation algorithm by learning a deep deconvolution network. We learn the network on top of the convolutional layers adopted from VGG 16-layer net. The deconvolution network is composed of deconvolution and unpooling layers, which identify pixelwise class labels and predict segmentation masks. We apply the trained network to each proposal in an input image, and construct the final semantic segmentation map by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks by integrating deep deconvolution network and proposal-wise prediction, our segmentation method typically identifies detailed structures and handles objects in multiple scales naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset, and we achieve the best accuracy (72.5%) among the methods trained without using Microsoft COCO dataset through ensemble with the fully convolutional network.

...read moreread less

2,719 citations

Journal Article•DOI•

The Multimodal Brain TumorImage Segmentation Benchmark (BRATS)

[...]

Bjoern H. Menze, Mauricio Reyes, Koen Van Leemput, N Porz, Roland Wiest - Show less +1 more

01 Jan 2015

TL;DR: The set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences are reported, finding that different algorithms worked best for different sub-regions, but that no single algorithm ranked in the top for all sub-Regions simultaneously.

...read moreread less

Abstract: In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma patients - manually annotated by up to four raters - and to 65 comparable scans generated using tumor image simulation software. Quantitative evaluations revealed considerable disagreement between the human raters in segmenting various tumor sub-regions (Dice scores in the range 74-85%), illustrating the difficulty of this task. We found that different algorithms worked best for different sub-regions (reaching performance comparable to human inter-rater variability), but that no single algorithm ranked in the top for all subregions simultaneously. Fusing several good algorithms using a hierarchical majority vote yielded segmentations that consistently ranked above all individual algorithms, indicating remaining opportunities for further methodological improvements. The BRATS image data and manual annotations continue to be publicly available through an online evaluation system as an ongoing benchmarking resource.

...read moreread less

2,316 citations

Proceedings Article•DOI•

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture

[...]

David Eigen¹, Rob Fergus²•Institutions (2)

New York University¹, Facebook²

07 Dec 2015

TL;DR: This paper addresses three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling using a multiscale convolutional network that is able to adapt easily to each task using only small modifications.

...read moreread less

Abstract: In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling. We use a multiscale convolutional network that is able to adapt easily to each task using only small modifications, regressing from the input image to the output map directly. Our method progressively refines predictions using a sequence of scales, and captures many image details without any superpixels or low-level segmentation. We achieve state-of-the-art performance on benchmarks for all three tasks.

...read moreread less

2,046 citations

Proceedings Article•DOI•

Conditional Random Fields as Recurrent Neural Networks

[...]

Shuai Zheng¹, Sadeep Jayasumana¹, Bernardino Romera-Paredes¹, Vibhav Vineet², Zhizhong Su, Dalong Du, Chang Huang³, Philip H. S. Torr¹ - Show less +4 more•Institutions (3)

University of Oxford¹, Stanford University², Baidu³

07 Dec 2015

TL;DR: In this article, a new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling is introduced.

...read moreread less

Abstract: Pixel-level labelling tasks, such as semantic segmentation, play a central role in image understanding. Recent approaches have attempted to harness the capabilities of deep learning techniques for image recognition to tackle pixel-level labelling tasks. One central issue in this methodology is the limited capacity of deep learning techniques to delineate visual objects. To solve this problem, we introduce a new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling. To this end, we formulate Conditional Random Fields with Gaussian pairwise potentials and mean-field approximate inference as Recurrent Neural Networks. This network, called CRF-RNN, is then plugged in as a part of a CNN to obtain a deep network that has desirable properties of both CNNs and CRFs. Importantly, our system fully integrates CRF modelling with CNNs, making it possible to train the whole deep network end-to-end with the usual back-propagation algorithm, avoiding offline post-processing methods for object delineation. We apply the proposed method to the problem of semantic image segmentation, obtaining top results on the challenging Pascal VOC 2012 segmentation benchmark.

...read moreread less

1,973 citations

Journal Article•DOI•

Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool

[...]

Abdel Aziz Taha¹, Allan Hanbury¹•Institutions (1)

Vienna University of Technology¹

12 Aug 2015-BMC Medical Imaging

TL;DR: An efficient evaluation tool for 3D medical image segmentation is proposed using 20 evaluation metrics based on a comprehensive literature review and guidelines for selecting a subset of these metrics that is suitable for the data and the segmentation task are provided.

...read moreread less

Abstract: Medical Image segmentation is an important image processing step. Comparing images to evaluate the quality of segmentation is an essential part of measuring progress in this research area. Some of the challenges in evaluating medical segmentation are: metric selection, the use in the literature of multiple definitions for certain metrics, inefficiency of the metric calculation implementations leading to difficulties with large volumes, and lack of support for fuzzy segmentation by existing metrics. First we present an overview of 20 evaluation metrics selected based on a comprehensive literature review. For fuzzy segmentation, which shows the level of membership of each voxel to multiple classes, fuzzy definitions of all metrics are provided. We present a discussion about metric properties to provide a guide for selecting evaluation metrics. Finally, we propose an efficient evaluation tool implementing the 20 selected metrics. The tool is optimized to perform efficiently in terms of speed and required memory, also if the image size is extremely large as in the case of whole body MRI or CT volume segmentation. An implementation of this tool is available as an open source project. We propose an efficient evaluation tool for 3D medical image segmentation using 20 evaluation metrics and provide guidelines for selecting a subset of these metrics that is suitable for the data and the segmentation task.

...read moreread less

1,561 citations

Proceedings Article•DOI•

Hypercolumns for object segmentation and fine-grained localization

[...]

Bharath Hariharan¹, Pablo Arbeláez², Ross Girshick³, Jitendra Malik¹•Institutions (3)

University of California, Berkeley¹, University of Los Andes², Microsoft³

07 Jun 2015

TL;DR: In this paper, the authors define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel, and use hypercolumns as pixel descriptors.

...read moreread less

Abstract: Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as a feature representation. However, the information in this layer may be too coarse spatially to allow precise localization. On the contrary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show results on three fine-grained localization tasks: simultaneous detection and segmentation [22], where we improve state-of-the-art from 49.7 mean APr [22] to 60.0, keypoint localization, where we get a 3.3 point boost over [20], and part labeling, where we show a 6.6 point gain over a strong baseline.

...read moreread less

1,511 citations

Proceedings Article•DOI•

Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation

[...]

George Papandreou¹, Liang-Chieh Chen², Kevin Murphy¹, Alan L. Yuille²•Institutions (2)

Google¹, University of California, Los Angeles²

07 Dec 2015

TL;DR: Expectation-Maximization (EM) methods for semantic image segmentation model training under weakly supervised and semi-supervised settings are developed and extensive experimental evaluation shows that the proposed techniques can learn models delivering competitive results on the challenging PASCAL VOC 2012 image segmentsation benchmark, while requiring significantly less annotation effort.

...read moreread less

Abstract: Deep convolutional neural networks (DCNNs) trained on a large number of images with strong pixel-level annotations have recently significantly pushed the state-of-art in semantic image segmentation. We study the more challenging problem of learning DCNNs for semantic image segmentation from either (1) weakly annotated training data such as bounding boxes or image-level labels or (2) a combination of few strongly labeled and many weakly labeled images, sourced from one or multiple datasets. We develop Expectation-Maximization (EM) methods for semantic image segmentation model training under these weakly supervised and semi-supervised settings. Extensive experimental evaluation shows that the proposed techniques can learn models delivering competitive results on the challenging PASCAL VOC 2012 image segmentation benchmark, while requiring significantly less annotation effort. We share source code implementing the proposed system at https://bitbucket.org/deeplab/deeplab-public.

...read moreread less

979 citations

Posted Content•

Attention to Scale: Scale-aware Semantic Image Segmentation

[...]

Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu¹, Alan L. Yuille² - Show less +1 more•Institutions (2)

Baidu¹, Johns Hopkins University²

10 Nov 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: An attention mechanism that learns to softly weight the multi-scale features at each pixel location is proposed, which not only outperforms averageand max-pooling, but allows us to diagnostically visualize the importance of features at different positions and scales.

...read moreread less

Abstract: Incorporating multi-scale features in fully convolutional neural networks (FCNs) has been a key element to achieving state-of-the-art performance on semantic image segmentation. One common way to extract multi-scale features is to feed multiple resized input images to a shared deep network and then merge the resulting features for pixelwise classification. In this work, we propose an attention mechanism that learns to softly weight the multi-scale features at each pixel location. We adapt a state-of-the-art semantic image segmentation model, which we jointly train with multi-scale input images and the attention model. The proposed attention model not only outperforms average- and max-pooling, but allows us to diagnostically visualize the importance of features at different positions and scales. Moreover, we show that adding extra supervision to the output at each scale is essential to achieving excellent performance when merging multi-scale features. We demonstrate the effectiveness of our model with extensive experiments on three challenging datasets, including PASCAL-Person-Part, PASCAL VOC 2012 and a subset of MS-COCO 2014.

...read moreread less

919 citations

Proceedings Article•DOI•

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation

[...]

Jifeng Dai¹, Kaiming He¹, Jian Sun¹•Institutions (1)

Microsoft¹

07 Dec 2015

TL;DR: This paper proposes a method that achieves competitive accuracy but only requires easily obtained bounding box annotations, and yields state-of-the-art results on PASCAL VOC 2012 and PASCal-CONTEXT.

...read moreread less

Abstract: Recent leading approaches to semantic segmentation rely on deep convolutional networks trained with human-annotated, pixel-level segmentation masks. Such pixel-accurate supervision demands expensive labeling effort and limits the performance of deep networks that usually benefit from more training data. In this paper, we propose a method that achieves competitive accuracy but only requires easily obtained bounding box annotations. The basic idea is to iterate between automatically generating region proposals and training convolutional networks. These two steps gradually recover segmentation masks for improving the networks, and vise versa. Our method, called "BoxSup", produces competitive results (e.g., 62.0% mAP for validation) supervised by boxes only, on par with strong baselines (e.g., 63.8% mAP) fully supervised by masks under the same setting. By leveraging a large amount of bounding boxes, BoxSup further yields state-of-the-art results on PASCAL VOC 2012 and PASCAL-CONTEXT [26].

...read moreread less

908 citations

Proceedings Article•DOI•

Visual saliency based on multiscale deep features

[...]

Guanbin Li¹, Yizhou Yu¹•Institutions (1)

University of Hong Kong¹

07 Jun 2015

TL;DR: Zhang et al. as discussed by the authors introduced a neural network architecture which has fully connected layers on top of CNNs responsible for feature extraction at three different scales, and proposed a refinement method to enhance the spatial coherence of their saliency results.

...read moreread less

Abstract: Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision. In this paper, we discover that a high-quality visual saliency model can be learned from multiscale features extracted using deep convolutional neural networks (CNNs), which have had many successes in visual recognition tasks. For learning such saliency models, we introduce a neural network architecture, which has fully connected layers on top of CNNs responsible for feature extraction at three different scales. We then propose a refinement method to enhance the spatial coherence of our saliency results. Finally, aggregating multiple saliency maps computed for different levels of image segmentation can further boost the performance, yielding saliency maps better than those generated from a single segmentation. To promote further research and evaluation of visual saliency models, we also construct a new large database of 4447 challenging images and their pixelwise saliency annotations. Experimental results demonstrate that our proposed method is capable of achieving state-of-the-art performance on all public benchmarks, improving the F-Measure by 5.0% and 13.2% respectively on the MSRA-B dataset and our new dataset (HKU-IS), and lowering the mean absolute error by 5.7% and 35.1% respectively on these two datasets.

...read moreread less

Journal Article•DOI•

Segmentation-Based Image Copy-Move Forgery Detection Scheme

[...]

Jian Li¹, Xiaolong Li², Bin Yang², Xingming Sun¹•Institutions (2)

Nanjing University of Information Science and Technology¹, Peking University²

01 Mar 2015-IEEE Transactions on Information Forensics and Security

TL;DR: The main difference to the traditional methods is that the proposed scheme first segments the test image into semantically independent patches prior to keypoint extraction, and the copy-move regions can be detected by matching between these patches.

...read moreread less

Abstract: In this paper, we propose a scheme to detect the copy-move forgery in an image, mainly by extracting the keypoints for comparison. The main difference to the traditional methods is that the proposed scheme first segments the test image into semantically independent patches prior to keypoint extraction. As a result, the copy-move regions can be detected by matching between these patches. The matching process consists of two stages. In the first stage, we find the suspicious pairs of patches that may contain copy-move forgery regions, and we roughly estimate an affine transform matrix. In the second stage, an Expectation-Maximization-based algorithm is designed to refine the estimated matrix and to confirm the existence of copy-move forgery. Experimental results prove the good performance of the proposed scheme via comparing it with the state-of-the-art schemes on the public databases.

...read moreread less

Journal Article•DOI•

Deep convolutional neural networks for multi-modality isointense infant brain image segmentation.

[...]

Wenlu Zhang¹, Rongjian Li¹, Houtao Deng, Li Wang², Weili Lin², Shuiwang Ji¹, Dinggang Shen², Dinggang Shen³ - Show less +4 more•Institutions (3)

Old Dominion University¹, University of North Carolina at Chapel Hill², Korea University³

01 Mar 2015-NeuroImage

TL;DR: This paper proposes to use deep convolutional neural networks (CNNs) for segmenting isointense stage brain tissues using multi-modality MR images, and compared the performance of the approach with that of the commonly used segmentation methods on a set of manually segmented isointENSE stage brain images.

...read moreread less

Posted Content•

Efficient piecewise training of deep structured models for semantic segmentation

[...]

Guosheng Lin¹, Chunhua Shen¹, Anton van den Hengel¹, Ian Reid¹•Institutions (1)

University of Adelaide¹

04 Apr 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work shows how to improve semantic segmentation through the use of contextual information, specifically, ' patch-patch' context between image regions, and 'patch-background' context, and formulate Conditional Random Fields with CNN-based pairwise potential functions to capture semantic correlations between neighboring patches.

...read moreread less

Abstract: Recent advances in semantic image segmentation have mostly been achieved by training deep convolutional neural networks (CNNs). We show how to improve semantic segmentation through the use of contextual information; specifically, we explore `patch-patch' context between image regions, and `patch-background' context. For learning from the patch-patch context, we formulate Conditional Random Fields (CRFs) with CNN-based pairwise potential functions to capture semantic correlations between neighboring patches. Efficient piecewise training of the proposed deep structured model is then applied to avoid repeated expensive CRF inference for back propagation. For capturing the patch-background context, we show that a network design with traditional multi-scale image input and sliding pyramid pooling is effective for improving performance. Our experimental results set new state-of-the-art performance on a number of popular semantic segmentation datasets, including NYUDv2, PASCAL VOC 2012, PASCAL-Context, and SIFT-flow. In particular, we achieve an intersection-over-union score of 78.0 on the challenging PASCAL VOC 2012 dataset.

...read moreread less

Journal Article•DOI•

Fast Edge Detection Using Structured Forests

[...]

Piotr Dollár¹, C. Lawrence Zitnick¹•Institutions (1)

Microsoft¹

01 Aug 2015-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this paper, the problem of predicting local edge masks in a structured learning framework applied to random decision forests is formulated and a novel approach to learning decision trees robustly maps the structured labels to a discrete space on which standard information gain measures may be evaluated.

...read moreread less

Abstract: Edge detection is a critical component of many vision systems, including object detectors and image segmentation algorithms. Patches of edges exhibit well-known forms of local structure, such as straight lines or T-junctions. In this paper we take advantage of the structure present in local image patches to learn both an accurate and computationally efficient edge detector. We formulate the problem of predicting local edge masks in a structured learning framework applied to random decision forests. Our novel approach to learning decision trees robustly maps the structured labels to a discrete space on which standard information gain measures may be evaluated. The result is an approach that obtains realtime performance that is orders of magnitude faster than many competing state-of-the-art approaches, while also achieving state-of-the-art edge detection results on the BSDS500 Segmentation dataset and NYU Depth dataset. Finally, we show the potential of our approach as a general purpose edge detector by showing our learned edge models generalize well across datasets.

...read moreread less

Journal Article•DOI•

Image Segmentation Using K -means Clustering Algorithm and Subtractive Clustering Algorithm

[...]

Nameirakpam Dhanachandra¹, Khumanthem Manglem¹, Yambem Jina Chanu¹•Institutions (1)

National Institute of Technology, Manipur¹

01 Jan 2015-Procedia Computer Science

TL;DR: This paper presents k-means clustering algorithm, an unsupervised algorithm used to segment the interest area from the background, and subtractive cluster, a data clustering method, which generates the centroid based on the potential value of the data points.

...read moreread less

Proceedings Article•DOI•

Semantic Image Segmentation via Deep Parsing Network

[...]

Ziwei Liu¹, Xiaoxiao Li¹, Ping Luo¹, Chen Change Loy¹, Xiaoou Tang¹ - Show less +1 more•Institutions (1)

The Chinese University of Hong Kong¹

07 Dec 2015

TL;DR: Deep Parsing Network (DPN) as mentioned in this paper proposes a convolutional neural network (CNN) to model unary terms and additional layers are carefully devised to approximate the mean field algorithm (MF) for pairwise terms.

...read moreread less

Abstract: This paper addresses semantic image segmentation by incorporating rich information into Markov Random Field (MRF), including high-order relations and mixture of label contexts. Unlike previous works that optimized MRFs using iterative algorithm, we solve MRF by proposing a Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which enables deterministic end-to-end computation in a single forward pass. Specifically, DPN extends a contemporary CNN architecture to model unary terms and additional layers are carefully devised to approximate the mean field algorithm (MF) for pairwise terms. It has several appealing properties. First, different from the recent works that combined CNN and MRF, where many iterations of MF were required for each training image during back-propagation, DPN is able to achieve high performance by approximating one iteration of MF. Second, DPN represents various types of pairwise terms, making many existing works as its special cases. Third, DPN makes MF easier to be parallelized and speeded up in Graphical Processing Unit (GPU). DPN is thoroughly evaluated on the PASCAL VOC 2012 dataset, where a single DPN model yields a new state-of-the-art segmentation accuracy of 77.5%.

...read moreread less

Proceedings Article•DOI•

From image-level to pixel-level labeling with Convolutional Networks

[...]

Pedro O. Pinheiro¹, Ronan Collobert¹•Institutions (1)

Idiap Research Institute¹

07 Jun 2015

TL;DR: A Convolutional Neural Network-based model is proposed, which is constrained during training to put more weight on pixels which are important for classifying the image, and which beats the state of the art results in weakly supervised object segmentation task by a large margin.

...read moreread less

Abstract: We are interested in inferring object segmentation by leveraging only object class information, and by considering only minimal priors on the object segmentation task. This problem could be viewed as a kind of weakly supervised segmentation task, and naturally fits the Multiple Instance Learning (MIL) framework: every training image is known to have (or not) at least one pixel corresponding to the image class label, and the segmentation task can be rewritten as inferring the pixels belonging to the class of the object (given one image, and its object class). We propose a Convolutional Neural Network-based model, which is constrained during training to put more weight on pixels which are important for classifying the image. We show that at test time, the model has learned to discriminate the right pixels well enough, such that it performs very well on an existing segmentation benchmark, by adding only few smoothing priors. Our system is trained using a subset of the Imagenet dataset and the segmentation experiments are performed on the challenging Pascal VOC dataset (with no fine-tuning of the model on Pascal VOC). Our model beats the state of the art results in weakly supervised object segmentation task by a large margin. We also compare the performance of our model with state of the art fully-supervised segmentation approaches.

...read moreread less

Proceedings Article•DOI•

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation

[...]

Deepak Pathak¹, Philipp Krähenbühl¹, Trevor Darrell¹•Institutions (1)

University of California, Berkeley¹

07 Dec 2015

TL;DR: This work proposes Constrained CNN (CCNN), a method which uses a novel loss function to optimize for any set of linear constraints on the output space of a CNN, and demonstrates the generality of this new learning framework.

...read moreread less

Abstract: We present an approach to learn a dense pixel-wise labeling from image-level tags. Each image-level tag imposes constraints on the output labeling of a Convolutional Neural Network (CNN) classifier. We propose Constrained CNN (CCNN), a method which uses a novel loss function to optimize for any set of linear constraints on the output space (i.e. predicted label distribution) of a CNN. Our loss formulation is easy to optimize and can be incorporated directly into standard stochastic gradient descent optimization. The key idea is to phrase the training objective as a biconvex optimization for linear models, which we then relax to nonlinear deep networks. Extensive experiments demonstrate the generality of our new learning framework. The constrained loss yields state-of-the-art results on weakly supervised semantic image segmentation. We further demonstrate that adding slightly more supervision can greatly improve the performance of the learning algorithm.

...read moreread less

Posted Content•

Deep clustering: Discriminative embeddings for segmentation and separation

[...]

John R. Hershey¹, Zhuo Chen², Jonathan Le Roux¹, Shinji Watanabe¹•Institutions (2)

Mitsubishi Electric Research Laboratories¹, Columbia University²

18 Aug 2015-arXiv: Neural and Evolutionary Computing

TL;DR: Preliminary experiments on single-channel mixtures from multiple speakers show that a speaker-independent model trained on two-speaker mixtures can improve signal quality for mixtures of held-out speakers by an average of 6dB, and the same model does surprisingly well with three-speakers mixtures.

...read moreread less

Abstract: We address the problem of acoustic source separation in a deep learning framework we call "deep clustering." Rather than directly estimating signals or masking functions, we train a deep network to produce spectrogram embeddings that are discriminative for partition labels given in training data. Previous deep network approaches provide great advantages in terms of learning power and speed, but previously it has been unclear how to use them to separate signals in a class-independent way. In contrast, spectral clustering approaches are flexible with respect to the classes and number of items to be segmented, but it has been unclear how to leverage the learning power and speed of deep networks. To obtain the best of both worlds, we use an objective function that to train embeddings that yield a low-rank approximation to an ideal pairwise affinity matrix, in a class-independent way. This avoids the high cost of spectral factorization and instead produces compact clusters that are amenable to simple clustering methods. The segmentations are therefore implicitly encoded in the embeddings, and can be "decoded" by clustering. Preliminary experiments show that the proposed method can separate speech: when trained on spectrogram features containing mixtures of two speakers, and tested on mixtures of a held-out set of speakers, it can infer masking functions that improve signal quality by around 6dB. We show that the model can generalize to three-speaker mixtures despite training only on two-speaker mixtures. The framework can be used without class labels, and therefore has the potential to be trained on a diverse set of sound types, and to generalize to novel sources. We hope that future work will lead to segmentation of arbitrary sounds, with extensions to microphone array methods as well as image segmentation and other domains.

...read moreread less

Journal Article•DOI•

SuBSENSE: A Universal Change Detection Method With Local Adaptive Sensitivity

[...]

Pierre-Luc St-Charles¹, Guillaume-Alexandre Bilodeau¹, Robert Bergevin²•Institutions (2)

École Polytechnique de Montréal¹, Laval University²

01 Jan 2015-IEEE Transactions on Image Processing

TL;DR: This paper presents a universal pixel-level segmentation method that relies on spatiotemporal binary features as well as color information to detect changes, which allows camouflaged foreground objects to be detected more easily while most illumination variations are ignored.

...read moreread less

Abstract: Foreground/background segmentation via change detection in video sequences is often used as a stepping stone in high-level analytics and applications. Despite the wide variety of methods that have been proposed for this problem, none has been able to fully address the complex nature of dynamic scenes in real surveillance tasks. In this paper, we present a universal pixel-level segmentation method that relies on spatiotemporal binary features as well as color information to detect changes. This allows camouflaged foreground objects to be detected more easily while most illumination variations are ignored. Besides, instead of using manually set, frame-wide constants to dictate model sensitivity and adaptation speed, we use pixel-level feedback loops to dynamically adjust our method’s internal parameters without user intervention. These adjustments are based on the continuous monitoring of model fidelity and local segmentation noise levels. This new approach enables us to outperform all 32 previously tested state-of-the-art methods on the 2012 and 2014 versions of the ChangeDetection.net dataset in terms of overall F-Measure. The use of local binary image descriptors for pixel-level modeling also facilitates high-speed parallel implementations: our own version, which used no low-level or architecture-specific instruction, reached real-time processing speed on a midlevel desktop CPU. A complete C++ implementation based on OpenCV is available online.

...read moreread less

Journal Article•DOI•

Multi-Atlas Segmentation of Biomedical Images: A Survey

[...]

Juan Eugenio Iglesias, Mert R. Sabuncu¹•Institutions (1)

Harvard University¹

01 Aug 2015-Medical Image Analysis

TL;DR: Multi-atlas segmentation (MAS) is becoming one of the most widely used and successful image segmentation techniques in biomedical applications as mentioned in this paper, and it has been widely used in medical image classification.

...read moreread less

Proceedings Article•DOI•

Feedforward semantic segmentation with zoom-out features

[...]

Mohammadreza Mostajabi¹, Payman Yadollahpour¹, Gregory Shakhnarovich¹•Institutions (1)

Toyota Technological Institute at Chicago¹

07 Jun 2015

TL;DR: In this article, a feed-forward architecture for semantic segmentation is proposed, which maps small image elements (superpixels) to rich feature representations extracted from a sequence of nested regions of increasing extent.

...read moreread less

Abstract: We introduce a purely feed-forward architecture for semantic segmentation. We map small image elements (superpixels) to rich feature representations extracted from a sequence of nested regions of increasing extent. These regions are obtained by “zooming out” from the superpixel all the way to scene-level resolution. This approach exploits statistical structure in the image and in the label space without setting up explicit structured prediction mechanisms, and thus avoids complex and expensive inference. Instead superpixels are classified by a feedforward multilayer network. Our architecture achieves 69.6% average accuracy on the PASCAL VOC 2012 test set.

...read moreread less

Journal Article•DOI•

MRI Segmentation of the Human Brain: Challenges, Methods, and Applications

[...]

Ivana Despotovic¹, Bart Goossens¹, Wilfried Philips¹•Institutions (1)

Ghent University¹

01 Mar 2015-Computational and Mathematical Methods in Medicine

TL;DR: This paper first introduces the basic concepts of image segmentation, then explains different MRI preprocessing steps including image registration, bias field correction, and removal of nonbrain tissue.

...read moreread less

Abstract: Image segmentation is one of the most important tasks in medical image analysis and is often the first and the most critical step in many clinical applications. In brain MRI analysis, image segmentation is commonly used for measuring and visualizing the brain’s anatomical structures, for analyzing brain changes, for delineating pathological regions, and for surgical planning and image-guided interventions. In the last few decades, various segmentation techniques of different accuracy and degree of complexity have been developed and reported in the literature. In this paper we review the most popular methods commonly used for brain MRI segmentation. We highlight differences between them and discuss their capabilities, advantages, and limitations. To address the complexity and challenges of the brain MRI segmentation problem, we first introduce the basic concepts of image segmentation. Then, we explain different MRI preprocessing steps including image registration, bias field correction, and removal of nonbrain tissue. Finally, after reviewing different brain MRI segmentation methods, we discuss the validation problem in brain MRI segmentation.

...read moreread less

Proceedings Article•DOI•

DeepContour: A deep convolutional feature learned by positive-sharing loss for contour detection

[...]

Wei Shen¹, Xinggang Wang², Yan Wang³, Xiang Bai², Zhang Zhijiang¹ - Show less +1 more•Institutions (3)

Shanghai University¹, Huazhong University of Science and Technology², Nanyang Technological University³

07 Jun 2015

TL;DR: This work shows that contour detection accuracy can be improved by instead making the use of the deep features learned from convolutional neural networks (CNNs), while rather than using the networks as a blackbox feature extractor, it customize the training strategy by partitioning contour (positive) data into subclasses and fitting each subclass by different model parameters.

...read moreread less

Abstract: Contour detection serves as the basis of a variety of computer vision tasks such as image segmentation and object recognition. The mainstream works to address this problem focus on designing engineered gradient features. In this work, we show that contour detection accuracy can be improved by instead making the use of the deep features learned from convolutional neural networks (CNNs). While rather than using the networks as a blackbox feature extractor, we customize the training strategy by partitioning contour (positive) data into subclasses and fitting each subclass by different model parameters. A new loss function, named positive-sharing loss, in which each subclass shares the loss for the whole positive class, is proposed to learn the parameters. Compared to the sofmax loss function, the proposed one, introduces an extra regularizer to emphasizes the losses for the positive and negative classes, which facilitates to explore more discriminative features. Our experimental results demonstrate that learned deep features can achieve top performance on Berkeley Segmentation Dataset and Benchmark (BSDS500) and obtain competitive cross dataset generalization result on the NYUD dataset.

...read moreread less

Posted Content•

Exploring Models and Data for Image Question Answering

[...]

Mengye Ren¹, Ryan Kiros¹, Richard S. Zemel²•Institutions (2)

University of Toronto¹, Canadian Institute for Advanced Research²

08 May 2015-arXiv: Learning

TL;DR: In this paper, the authors proposed to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images.

...read moreread less

Abstract: This work aims to address the problem of image-based question-answering (QA) with new models and datasets. In our work, we propose to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images. Our model performs 1.8 times better than the only published results on an existing image QA dataset. We also present a question generation algorithm that converts image descriptions, which are widely available, into QA form. We used this algorithm to produce an order-of-magnitude larger dataset, with more evenly distributed answers. A suite of baseline results on this new dataset are also presented.

...read moreread less

Posted Content•

Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation

[...]

George Papandreou, Liang-Chieh Chen, Kevin Murphy, Alan L. Yuille

09 Feb 2015-arXiv: Computer Vision and Pattern Recognition

...read moreread less

Posted Content•

Visual Saliency Based on Multiscale Deep Features

[...]

Guanbin Li¹, Yizhou Yu¹•Institutions (1)

University of Hong Kong¹

30 Mar 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper discovers that a high-quality visual saliency model can be learned from multiscale features extracted using deep convolutional neural networks (CNNs), which have had many successes in visual recognition tasks.

...read moreread less

Abstract: Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision. In this CVPR 2015 paper, we discover that a high-quality visual saliency model can be trained with multiscale features extracted using a popular deep learning architecture, convolutional neural networks (CNNs), which have had many successes in visual recognition tasks. For learning such saliency models, we introduce a neural network architecture, which has fully connected layers on top of CNNs responsible for extracting features at three different scales. We then propose a refinement method to enhance the spatial coherence of our saliency results. Finally, aggregating multiple saliency maps computed for different levels of image segmentation can further boost the performance, yielding saliency maps better than those generated from a single segmentation. To promote further research and evaluation of visual saliency models, we also construct a new large database of 4447 challenging images and their pixelwise saliency annotation. Experimental results demonstrate that our proposed method is capable of achieving state-of-the-art performance on all public benchmarks, improving the F-Measure by 5.0% and 13.2% respectively on the MSRA-B dataset and our new dataset (HKU-IS), and lowering the mean absolute error by 5.7% and 35.1% respectively on these two datasets.

...read moreread less

Journal Article•DOI•

Image segmentation by generalized hierarchical fuzzy C-means algorithm

[...]

Yuhui Zheng¹, Byeungwoo Jeon², Danhua Xu¹, Q. M. Jonathan Wu¹, Hui Zhang¹ - Show less +1 more•Institutions (2)

Nanjing University of Information Science and Technology¹, Sungkyunkwan University²

01 Mar 2015-Journal of Intelligent and Fuzzy Systems

TL;DR: This paper introduces a new generalized hierarchical FCM (GHFCM), which is more robust to image noise with the spatial constraints: the generalized mean, and introduces a more flexibility function which considers the distance function itself as a sub-FCM.

...read moreread less

Abstract: Fuzzy c-means (FCM) has been considered as an effective algorithm for image segmentation. However, it still suffers from two problems: one is insufficient robustness to image noise, and the other is the Euclidean distance in FCM, which is sensitive to outliers. In this paper, we propose two new algorithms, generalized FCM (GFCM) and hierarchical FCM (HFCM), to solve these two problems. Traditional FCM can be considered as a linear combination of membership and distance from the expression of its mathematical formula. GFCM is generated by applying generalized mean on these two items. We impose generalized mean on membership to incorporate local spatial information and cluster information, and on distance function to incorporate local spatial information and image intensity value. Thus, our GFCM is more robust to image noise with the spatial constraints: the generalized mean. To solve the second problem caused by Euclidean distance (l2 norm), we introduce a more flexibility function which considers the distance function itself as a sub-FCM. Furthermore, the sub-FCM distance function in HFCM is general and flexible enough to deal with non-Euclidean data. Finally, we combine these two algorithms to introduce a new generalized hierarchical FCM (GHFCM). Experimental results demonstrate the improved robustness and effectiveness of the proposed algorithm.

...read moreread less

Posted Content•

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

[...]

Vijay Badrinarayanan¹, Alex Kendall¹, Roberto Cipolla¹•Institutions (1)

University of Cambridge¹

02 Nov 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: SegNet as mentioned in this paper uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling, which eliminates the need for learning to upsample.

...read moreread less

Abstract: We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network. The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN and also with the well known DeepLab-LargeFOV, DeconvNet architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. We show that SegNet provides good performance with competitive inference time and more efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at this http URL.

...read moreread less

Collapse