scispace - formally typeset
Search or ask a question

Showing papers on "Segmentation-based object categorization published in 2016"


Journal ArticleDOI
TL;DR: This paper proposes an automatic segmentation method based on Convolutional Neural Networks (CNN), exploring small 3 ×3 kernels, which allows designing a deeper architecture, besides having a positive effect against overfitting, given the fewer number of weights in the network.
Abstract: Among brain tumors, gliomas are the most common and aggressive, leading to a very short life expectancy in their highest grade. Thus, treatment planning is a key stage to improve the quality of life of oncological patients. Magnetic resonance imaging (MRI) is a widely used imaging technique to assess these tumors, but the large amount of data produced by MRI prevents manual segmentation in a reasonable time, limiting the use of precise quantitative measurements in the clinical practice. So, automatic and reliable segmentation methods are required; however, the large spatial and structural variability among brain tumors make automatic segmentation a challenging problem. In this paper, we propose an automatic segmentation method based on Convolutional Neural Networks (CNN), exploring small 3 $\times$ 3 kernels. The use of small kernels allows designing a deeper architecture, besides having a positive effect against overfitting, given the fewer number of weights in the network. We also investigated the use of intensity normalization as a pre-processing step, which though not common in CNN-based segmentation methods, proved together with data augmentation to be very effective for brain tumor segmentation in MRI images. Our proposal was validated in the Brain Tumor Segmentation Challenge 2013 database (BRATS 2013), obtaining simultaneously the first position for the complete, core, and enhancing regions in Dice Similarity Coefficient metric (0.88, 0.83, 0.77) for the Challenge data set. Also, it obtained the overall first position by the online evaluation platform. We also participated in the on-site BRATS 2015 Challenge using the same model, obtaining the second place, with Dice Similarity Coefficient metric of 0.78, 0.65, and 0.75 for the complete, core, and enhancing regions, respectively.

1,894 citations


Proceedings ArticleDOI
27 Jun 2016
TL;DR: This work presents a new benchmark dataset and evaluation methodology for the area of video object segmentation, named DAVIS (Densely Annotated VIdeo Segmentation), and provides a comprehensive analysis of several state-of-the-art segmentation approaches using three complementary metrics.
Abstract: Over the years, datasets and benchmarks have proven their fundamental importance in computer vision research, enabling targeted progress and objective comparisons in many fields. At the same time, legacy datasets may impend the evolution of a field due to saturated algorithm performance and the lack of contemporary, high quality data. In this work we present a new benchmark dataset and evaluation methodology for the area of video object segmentation. The dataset, named DAVIS (Densely Annotated VIdeo Segmentation), consists of fifty high quality, Full HD video sequences, spanning multiple occurrences of common video object segmentation challenges such as occlusions, motionblur and appearance changes. Each video is accompanied by densely annotated, pixel-accurate and per-frame ground truth segmentation. In addition, we provide a comprehensive analysis of several state-of-the-art segmentation approaches using three complementary metrics that measure the spatial extent of the segmentation, the accuracy of the silhouette contours and the temporal coherence. The results uncover strengths and weaknesses of current approaches, opening up promising directions for future works.

1,656 citations


Proceedings ArticleDOI
Jifeng Dai1, Kaiming He1, Jian Sun1
27 Jun 2016
TL;DR: This paper presents Multitask Network Cascades for instance-aware semantic segmentation, which consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects, and develops an algorithm for the nontrivial end-to-end training of this causal, cascaded structure.
Abstract: Semantic segmentation research has recently witnessed rapid progress, but many leading methods are unable to identify object instances. In this paper, we present Multitask Network Cascades for instance-aware semantic segmentation. Our model consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects. These networks form a cascaded structure, and are designed to share their convolutional features. We develop an algorithm for the nontrivial end-to-end training of this causal, cascaded structure. Our solution is a clean, single-step training framework and can be generalized to cascades that have more stages. We demonstrate state-of-the-art instance-aware semantic segmentation accuracy on PASCAL VOC. Meanwhile, our method takes only 360ms testing an image using VGG-16, which is two orders of magnitude faster than previous systems for this challenging problem. As a by product, our method also achieves compelling object detection results which surpass the competitive Fast/Faster R-CNN systems. The method described in this paper is the foundation of our submissions to the MS COCO 2015 segmentation competition, where we won the 1st place.

1,173 citations


Proceedings ArticleDOI
27 Jun 2016
TL;DR: This work proposes an energy minimization approach that places object candidates in 3D using the fact that objects should be on the ground-plane, and achieves the best detection performance on the challenging KITTI benchmark, among published monocular competitors.
Abstract: The goal of this paper is to perform 3D object detection from a single monocular image in the domain of autonomous driving. Our method first aims to generate a set of candidate class-specific object proposals, which are then run through a standard CNN pipeline to obtain highquality object detections. The focus of this paper is on proposal generation. In particular, we propose an energy minimization approach that places object candidates in 3D using the fact that objects should be on the ground-plane. We then score each candidate box projected to the image plane via several intuitive potentials encoding semantic segmentation, contextual information, size and location priors and typical object shape. Our experimental evaluation demonstrates that our object proposal generation approach significantly outperforms all monocular approaches, and achieves the best detection performance on the challenging KITTI benchmark, among published monocular competitors.

976 citations


01 Jan 2016
TL;DR: This paper provides a review on the various image segmentation techniques proposed in the literature and shows how to cluster pixels into salient image regions corresponding to individual surfaces, objects, or natural parts of objects.
Abstract: Digital image processing plays a vital role in many applications to retrieve required information from the given image in a way that it has not affect the other features of the image. Image segmentation is one of the most important tasks in image processing which is used to partition an image into several disjoint subsets such that each subset corresponds to a meaningful part of the image. The goal of image segmentation is to cluster pixels into salient image regions corresponding to individual surfaces, objects, or natural parts of objects. With the growing research on image segmentation many segmentation methods have been developed and interpreted differently towards content analysis and image understanding for different applications. Thus an organized review on image segmentation methods is essential and this paper provides a review on the various image segmentation techniques proposed in the literature.

543 citations


Book ChapterDOI
12 Dec 2016
TL;DR: This paper proposes an approach for directly optimizing this intersection-over-union (IoU) measure in deep neural networks and demonstrates that this approach outperforms DNNs trained with standard softmax loss.
Abstract: We consider the problem of learning deep neural networks (DNNs) for object category segmentation, where the goal is to label each pixel in an image as being part of a given object (foreground) or not (background). Deep neural networks are usually trained with simple loss functions (e.g., softmax loss). These loss functions are appropriate for standard classification problems where the performance is measured by the overall classification accuracy. For object category segmentation, the two classes (foreground and background) are very imbalanced. The intersection-over-union (IoU) is usually used to measure the performance of any object category segmentation method. In this paper, we propose an approach for directly optimizing this IoU measure in deep neural networks. Our experimental results on two object category segmentation datasets demonstrate that our approach outperforms DNNs trained with standard softmax loss.

541 citations


Posted Content
TL;DR: One-shot video object segmentation (OSVOS) as mentioned in this paper is based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence.
Abstract: This paper tackles the task of semi-supervised video object segmentation, i.e., the separation of an object from the background in a video, given the mask of the first frame. We present One-Shot Video Object Segmentation (OSVOS), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one-shot). Although all frames are processed independently, the results are temporally coherent and stable. We perform experiments on two annotated video segmentation databases, which show that OSVOS is fast and improves the state of the art by a significant margin (79.8% vs 68.0%).

523 citations


Journal ArticleDOI
Qiaoliang Li1, Bowei Feng1, Linpei Xie1, Ping Liang1, Huisheng Zhang1, Tianfu Wang1 
TL;DR: A wide and deep neural network with strong induction ability is proposed to model the transformation, and an efficient training strategy is presented, where instead of a single label of the center pixel, the network can output the label map of all pixels for a given image patch.
Abstract: This paper presents a new supervised method for vessel segmentation in retinal images. This method remolds the task of segmentation as a problem of cross-modality data transformation from retinal image to vessel map. A wide and deep neural network with strong induction ability is proposed to model the transformation, and an efficient training strategy is presented. Instead of a single label of the center pixel, the network can output the label map of all pixels for a given image patch. Our approach outperforms reported state-of-the-art methods in terms of sensitivity, specificity and accuracy. The result of cross-training evaluation indicates its robustness to the training set. The approach needs no artificially designed feature and no preprocessing step, reducing the impact of subjective factors. The proposed method has the potential for application in image diagnosis of ophthalmologic diseases, and it may provide a new, general, high-performance computing framework for image segmentation.

516 citations


Journal ArticleDOI
TL;DR: A novel segmentation approach based on deep 3D convolutional encoder networks with shortcut connections with results showing that this method performs comparably to the top-ranked state-of-the-art methods, even when only relatively small data sets are available for training.
Abstract: We propose a novel segmentation approach based on deep 3D convolutional encoder networks with shortcut connections and apply it to the segmentation of multiple sclerosis (MS) lesions in magnetic resonance images. Our model is a neural network that consists of two interconnected pathways, a convolutional pathway, which learns increasingly more abstract and higher-level image features, and a deconvolutional pathway, which predicts the final segmentation at the voxel level. The joint training of the feature extraction and prediction pathways allows for the automatic learning of features at different scales that are optimized for accuracy for any given combination of image types and segmentation task. In addition, shortcut connections between the two pathways allow high- and low-level features to be integrated, which enables the segmentation of lesions across a wide range of sizes. We have evaluated our method on two publicly available data sets (MICCAI 2008 and ISBI 2015 challenges) with the results showing that our method performs comparably to the top-ranked state-of-the-art methods, even when only relatively small data sets are available for training. In addition, we have compared our method with five freely available and widely used MS lesion segmentation methods (EMS, LST-LPA, LST-LGA, Lesion-TOADS, and SLS) on a large data set from an MS clinical trial. The results show that our method consistently outperforms these other methods across a wide range of lesion sizes.

432 citations


Journal ArticleDOI
TL;DR: A detailed discussion of the segmentation performance of colour index-based approaches is presented, based on studies from the literature conducted in the recent past, particularly from 2008 to 2015.

370 citations


Proceedings ArticleDOI
27 Jun 2016
TL;DR: This work forms a principled, multiscale, spatio-temporal objective function that uses optical flow to propagate information between frames for video segmentation and demonstrates the effectiveness of jointly optimizing optical flow and video segmentations using an iterative scheme.
Abstract: Video object segmentation is challenging due to fast moving objects, deforming shapes, and cluttered backgrounds. Optical flow can be used to propagate an object segmentation over time but, unfortunately, flow is often inaccurate, particularly around object boundaries. Such boundaries are precisely where we want our segmentation to be accurate. To obtain accurate segmentation across time, we propose an efficient algorithm that considers video segmentation and optical flow estimation simultaneously. For video segmentation, we formulate a principled, multiscale, spatio-temporal objective function that uses optical flow to propagate information between frames. For optical flow estimation, particularly at object boundaries, we compute the flow independently in the segmented regions and recompose the results. We call the process object flow and demonstrate the effectiveness of jointly optimizing optical flow and video segmentation using an iterative scheme. Experiments on the SegTrack v2 and Youtube-Objects datasets show that the proposed algorithm performs favorably against the other state-of-the-art methods.

Journal ArticleDOI
TL;DR: The proposed level set method can be directly applied to simultaneous segmentation and bias correction for 3 and 7T magnetic resonance images and demonstrates the superiority of the proposed method over other representative algorithms.
Abstract: It is often a difficult task to accurately segment images with intensity inhomogeneity, because most of representative algorithms are region-based that depend on intensity homogeneity of the interested object. In this paper, we present a novel level set method for image segmentation in the presence of intensity inhomogeneity. The inhomogeneous objects are modeled as Gaussian distributions of different means and variances in which a sliding window is used to map the original image into another domain, where the intensity distribution of each object is still Gaussian but better separated. The means of the Gaussian distributions in the transformed domain can be adaptively estimated by multiplying a bias field with the original signal within the window. A maximum likelihood energy functional is then defined on the whole image region, which combines the bias field, the level set function, and the piecewise constant function approximating the true image signal. The proposed level set method can be directly applied to simultaneous segmentation and bias correction for 3 and 7T magnetic resonance images. Extensive evaluation on synthetic and real-images demonstrate the superiority of the proposed method over other representative algorithms.

Proceedings ArticleDOI
01 Aug 2016
TL;DR: New extensions to the ITK-SNAP interactive image visualization and segmentation tool that support semi-automatic segmentation of multi-modality imaging datasets in a way that utilizes information from all available modalities simultaneously are described.
Abstract: Obtaining quantitative measures from biomedical images often requires segmentation, i.e., finding and outlining the structures of interest. Multi-modality imaging datasets, in which multiple imaging measures are available at each spatial location, are increasingly common, particularly in MRI. In applications where fully automatic segmentation algorithms are unavailable or fail to perform at desired levels of accuracy, semi-automatic segmentation can be a time-saving alternative to manual segmentation, allowing the human expert to guide segmentation, while minimizing the effort expended by the expert on repetitive tasks that can be automated. However, few existing 3D image analysis tools support semi-automatic segmentation of multi-modality imaging data. This paper describes new extensions to the ITK-SNAP interactive image visualization and segmentation tool that support semi-automatic segmentation of multi-modality imaging datasets in a way that utilizes information from all available modalities simultaneously. The approach combines Random Forest classifiers, trained by the user by placing several brushstrokes in the image, with the active contour segmentation algorithm. The new multi-modality semi-automatic segmentation approach is evaluated in the context of high-grade glioblastoma segmentation.

Proceedings ArticleDOI
13 Apr 2016
TL;DR: This paper formulate the vessel segmentation to a boundary detection problem, and utilize the fully convolutional neural networks (CNNs) to generate a vessel probability map that distinguishes the vessels and background in the inadequate contrast region and has robustness to the pathological regions in the fundus image.
Abstract: Vessel segmentation is a key step for various medical applications. This paper introduces the deep learning architecture to improve the performance of retinal vessel segmentation. Deep learning architecture has been demonstrated having the powerful ability in automatically learning the rich hierarchical representations. In this paper, we formulate the vessel segmentation to a boundary detection problem, and utilize the fully convolutional neural networks (CNNs) to generate a vessel probability map. Our vessel probability map distinguishes the vessels and background in the inadequate contrast region, and has robustness to the pathological regions in the fundus image. Moreover, a fully-connected Conditional Random Fields (CRFs) is also employed to combine the discriminative vessel probability map and long-range interactions between pixels. Finally, a binary vessel segmentation result is obtained by our method. We show that our proposed method achieve a state-of-the-art vessel segmentation performance on the DRIVE and STARE datasets.

Proceedings ArticleDOI
13 Apr 2016
TL;DR: This paper proposes to use fully convolutional networks (FCNs) for the segmentation of isointense phase brain MR images, and shows that the proposed model significantly outperformed previous methods in terms of accuracy and indicated a better way of integrating multi-modality images, which leads to performance improvement.
Abstract: The segmentation of infant brain tissue images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) plays an important role in studying early brain development. In the isointense phase (approximately 6–8 months of age), WM and GM exhibit similar levels of intensity in both T1 and T2 MR images, resulting in extremely low tissue contrast and thus making the tissue segmentation very challenging. The existing methods for tissue segmentation in this isointense phase usually employ patch-based sparse labeling on single T1, T2 or fractional anisotropy (FA) modality or their simply-stacked combinations without fully exploring the multi-modality information. To address the challenge, in this paper, we propose to use fully convolutional networks (FCNs) for the segmentation of isointense phase brain MR images. Instead of simply stacking the three modalities, we train one network for each modality image, and then fuse their high-layer features together for final segmentation. Specifically, we conduct a convolution-pooling stream for multimodality information from T1, T2, and FA images separately, and then combine them in high-layer for finally generating the segmentation maps as the outputs. We compared the performance of our approach with that of the commonly used segmentation methods on a set of manually segmented isointense phase brain images. Results showed that our proposed model significantly outperformed previous methods in terms of accuracy. In addition, our results also indicated a better way of integrating multi-modality images, which leads to performance improvement.

Proceedings ArticleDOI
27 Jun 2016
TL;DR: A new energy on the vertices of a regularly sampled spatiotemporal bilateral grid is designed, which can be solved efficiently using a standard graph cut label assignment, and implicitly approximates long-range, spatio-temporal connections between pixels while still containing only a small number of variables and only local graph edges.
Abstract: In this work, we propose a novel approach to video segmentation that operates in bilateral space. We design a new energy on the vertices of a regularly sampled spatiotemporal bilateral grid, which can be solved efficiently using a standard graph cut label assignment. Using a bilateral formulation, the energy that we minimize implicitly approximates long-range, spatio-temporal connections between pixels while still containing only a small number of variables and only local graph edges. We compare to a number of recent methods, and show that our approach achieves state-of-the-art results on multiple benchmarks in a fraction of the runtime. Furthermore, our method scales linearly with image size, allowing for interactive feedback on real-world high resolution video.

Proceedings Article
25 Nov 2016
TL;DR: An adversarial training approach to train semantic segmentation models that can detect and correct higher-order inconsistencies between ground truth segmentation maps and the ones produced by the segmentation net.
Abstract: Adversarial training has been shown to produce state of the art results for generative image modeling. In this paper we propose an adversarial training approach to train semantic segmentation models. We train a convolutional semantic segmentation network along with an adversarial network that discriminates segmentation maps coming either from the ground truth or from the segmentation network. The motivation for our approach is that it can detect and correct higher-order inconsistencies between ground truth segmentation maps and the ones produced by the segmentation net. Our experiments show that our adversarial training approach leads to improved accuracy on the Stanford Background and PASCAL VOC 2012 datasets.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: This paper addresses the problem of road scene segmentation in conventional RGB images by exploiting recent advances in semantic segmentation via convolutional neural networks (CNNs) and proposes several architecture refinements that provide the best trade-off between segmentation quality and runtime.
Abstract: This paper addresses the problem of road scene segmentation in conventional RGB images by exploiting recent advances in semantic segmentation via convolutional neural networks (CNNs). Segmentation networks are very large and do not currently run at interactive frame rates. To make this technique applicable to robotics we propose several architecture refinements that provide the best trade-off between segmentation quality and runtime. This is achieved by a new mapping between classes and filters at the expansion side of the network. The network is trained end-to-end and yields precise road/lane predictions at the original input resolution in roughly 50ms. Compared to the state of the art, the network achieves top accuracies on the KITTI dataset for road and lane segmentation while providing a 20× speed-up. We demonstrate that the improved efficiency is not due to the road segmentation task. Also on segmentation datasets with larger scene complexity, the accuracy does not suffer from the large speed-up.

Journal ArticleDOI
TL;DR: A deep neural network architecture, FusionNet, is introduced with a focus on its application to accomplish automatic segmentation of neuronal structures in connectomics data, which results in a much deeper network architecture and improves segmentation accuracy.
Abstract: Electron microscopic connectomics is an ambitious research direction with the goal of studying comprehensive brain connectivity maps by using high-throughput, nano-scale microscopy. One of the main challenges in connectomics research is developing scalable image analysis algorithms that require minimal user intervention. Recently, deep learning has drawn much attention in computer vision because of its exceptional performance in image classification tasks. For this reason, its application to connectomic analyses holds great promise, as well. In this paper, we introduce a novel deep neural network architecture, FusionNet, for the automatic segmentation of neuronal structures in connectomics data. FusionNet leverages the latest advances in machine learning, such as semantic segmentation and residual neural networks, with the novel introduction of summation-based skip connections to allow a much deeper network architecture for a more accurate segmentation. We demonstrate the performance of the proposed method by comparing it with state-of-the-art electron microscopy (EM) segmentation methods from the ISBI EM segmentation challenge. We also show the segmentation results on two different tasks including cell membrane and cell body segmentation and a statistical analysis of cell morphology.

Journal ArticleDOI
TL;DR: A comprehensive review of the recent progress in image segmentation, covering 190 publications, gives an overview of broad segmentation topics including not only the classic unsupervised methods, but also the recent weakly-/semi- supervised methods and the fully-super supervised methods.

Proceedings ArticleDOI
01 Jan 2016
TL;DR: A novel weakly-supervised semantic segmentation algorithm based on Deep Convolutional Neural Network based on DCNN that exploits auxiliary segmentation annotations available for different categories to guide segmentations on images with only image-level class labels.
Abstract: We propose a novel weakly-supervised semantic segmentation algorithm based on Deep Convolutional Neural Network (DCNN). Contrary to existing weakly-supervised approaches, our algorithm exploits auxiliary segmentation annotations available for different categories to guide segmentations on images with only image-level class labels. To make segmentation knowledge transferrable across categories, we design a decoupled encoder-decoder architecture with attention model. In this architecture, the model generates spatial highlights of each category presented in images using an attention model, and subsequently performs binary segmentation for each highlighted region using decoder. Combining attention model, the decoder trained with segmentation annotations in different categories boosts accuracy of weakly-supervised semantic segmentation. The proposed algorithm demonstrates substantially improved performance compared to the state-of-theart weakly-supervised techniques in PASCAL VOC 2012 dataset when our model is trained with the annotations in 60 exclusive categories in Microsoft COCO dataset.

Posted Content
TL;DR: The authors proposed an end-to-end recurrent neural network (RNN) architecture with an attention mechanism to model a human-like counting process, and produce detailed instance segmentations.
Abstract: While convolutional neural networks have gained impressive success recently in solving structured prediction problems such as semantic segmentation, it remains a challenge to differentiate individual object instances in the scene. Instance segmentation is very important in a variety of applications, such as autonomous driving, image captioning, and visual question answering. Techniques that combine large graphical models with low-level vision have been proposed to address this problem; however, we propose an end-to-end recurrent neural network (RNN) architecture with an attention mechanism to model a human-like counting process, and produce detailed instance segmentations. The network is jointly trained to sequentially produce regions of interest as well as a dominant object segmentation within each region. The proposed model achieves competitive results on the CVPPP, KITTI, and Cityscapes datasets.

Posted Content
TL;DR: This survey gives an overview over different techniques used for pixel-level semantic segmentation such as unsupervised methods, Decision Forests and SVMs and recently published approaches with convolutional neural networks.
Abstract: This survey gives an overview over different techniques used for pixel-level semantic segmentation. Metrics and datasets for the evaluation of segmentation algorithms and traditional approaches for segmentation such as unsupervised methods, Decision Forests and SVMs are described and pointers to the relevant papers are given. Recently published approaches with convolutional neural networks are mentioned and typical problematic situations for segmentation algorithms are examined. A taxonomy of segmentation algorithms is given.

Posted Content
TL;DR: An approach to instance-level image segmentation that is built on top of category-level segmentation, which follows a different pipeline to the popular detect-then-segment approaches that first predict instances' bounding boxes.
Abstract: We propose an approach to instance-level image segmentation that is built on top of category-level segmentation. Specifically, for each pixel in a semantic category mask, its corresponding instance bounding box is predicted using a deep fully convolutional regression network. Thus it follows a different pipeline to the popular detect-then-segment approaches that first predict instances' bounding boxes, which are the current state-of-the-art in instance segmentation. We show that, by leveraging the strength of our state-of-the-art semantic segmentation models, the proposed method can achieve comparable or even better results to detect-then-segment approaches. We make the following contributions. (i) First, we propose a simple yet effective approach to semantic instance segmentation. (ii) Second, we propose an online bootstrapping method during training, which is critically important for achieving good performance for both semantic category segmentation and instance-level segmentation. (iii) As the performance of semantic category segmentation has a significant impact on the instance-level segmentation, which is the second step of our approach, we train fully convolutional residual networks to achieve the best semantic category segmentation accuracy. On the PASCAL VOC 2012 dataset, we obtain the currently best mean intersection-over-union score of 79.1%. (iv) We also achieve state-of-the-art results for instance-level segmentation.

Journal ArticleDOI
01 Sep 2016
TL;DR: A novel approach is presented, named an improved intuitionistic fuzzy c-means (IIFCM), which considers the local spatial information in an intuitionists fuzzy way, which preserves the image details, is insensitive to noise, and is free of requirement of any parameter tuning.
Abstract: Original and segmented simulated brain image by different algorithms: (a) axial view of original simulated T1-weighted brain image with INU=0 and 1% noise, (b) skull stripping simulated brain image, (c) manual segmented CSF, GM and WM images, (d) IIFCM algorithm, (e) IFCM algorithm, (f) FLICM algorithm, (g) EnFCM algorithm, (h) FGFCM algorithm, (i) FCM_S1 algorithm, (j) FCM_S2 algorithm, (k) ImFCM algorithm. The segmentation of brain magnetic resonance (MR) images plays an important role in the computer-aided diagnosis and clinical research. However, due to presence of noise and uncertainty on the boundary between different tissues in the brain image, the segmentation of brain image is a challenging task. Many variants of standard fuzzy c-means (FCM) algorithm have been proposed to handle the noise. Intuitionistic fuzzy c-means (IFCM) algorithm, one of the variants of FCM, is found suitable for image segmentation. It incorporates the advantage of intuitionistic fuzzy sets theory. The IFCM successfully handles the uncertainty but it is sensitive to noise as it does not incorporate any local spatial information. In this paper, we have presented a novel approach, named an improved intuitionistic fuzzy c-means (IIFCM), which considers the local spatial information in an intuitionistic fuzzy way. The IIFCM preserves the image details, is insensitive to noise, and is free of requirement of any parameter tuning. The obtained segmentation results on synthetic square image, real and simulated MRI brain image demonstrate the efficacy of the IIFCM algorithm and superior performance in comparison to existing segmentation methods. A nonparametric statistical analysis is also carried out to show the significant performance of the IIFCM algorithm in comparison to other existing segmentation algorithms.

Journal ArticleDOI
TL;DR: This work proposes a pipeline for object detection and segmentation in the context of volumetric image parsing, solving a two-step learning problem: anatomical pose estimation and boundary delineation, and introduces Marginal Space Deep Learning (MSDL), a novel framework exploiting both the strengths of efficient object parametrization in hierarchical marginal spaces and the automated feature design of Deep Learning network architectures.
Abstract: Robust and fast solutions for anatomical object detection and segmentation support the entire clinical workflow from diagnosis, patient stratification, therapy planning, intervention and follow-up. Current state-of-the-art techniques for parsing volumetric medical image data are typically based on machine learning methods that exploit large annotated image databases. Two main challenges need to be addressed, these are the efficiency in scanning high-dimensional parametric spaces and the need for representative image features which require significant efforts of manual engineering. We propose a pipeline for object detection and segmentation in the context of volumetric image parsing, solving a two-step learning problem: anatomical pose estimation and boundary delineation. For this task we introduce Marginal Space Deep Learning (MSDL), a novel framework exploiting both the strengths of efficient object parametrization in hierarchical marginal spaces and the automated feature design of Deep Learning (DL) network architectures. In the 3D context, the application of deep learning systems is limited by the very high complexity of the parametrization. More specifically 9 parameters are necessary to describe a restricted affine transformation in 3D, resulting in a prohibitive amount of billions of scanning hypotheses. The mechanism of marginal space learning provides excellent run-time performance by learning classifiers in clustered, high-probability regions in spaces of gradually increasing dimensionality. To further increase computational efficiency and robustness, in our system we learn sparse adaptive data sampling patterns that automatically capture the structure of the input. Given the object localization, we propose a DL-based active shape model to estimate the non-rigid object boundary. Experimental results are presented on the aortic valve in ultrasound using an extensive dataset of 2891 volumes from 869 patients, showing significant improvements of up to 45.2% over the state-of-the-art. To our knowledge, this is the first successful demonstration of the DL potential to detection and segmentation in full 3D data with parametrized representations.

Journal ArticleDOI
TL;DR: The tandem of precision-recall curves for boundaries and for objects-and-parts as the tool of choice for the supervised evaluation of image segmentation is proposed and the datasets and code of all the measures publicly available.
Abstract: This paper tackles the supervised evaluation of image segmentation and object proposal algorithms. It surveys, structures, and deduplicates the measures used to compare both segmentation results and object proposals with a ground truth database; and proposes a new measure: the precision-recall for objects and parts. To compare the quality of these measures, eight state-of-the-art object proposal techniques are analyzed and two quantitative meta-measures involving nine state of the art segmentation methods are presented. The meta-measures consist in assuming some plausible hypotheses about the results and assessing how well each measure reflects these hypotheses. As a conclusion of the performed experiments, this paper proposes the tandem of precision-recall curves for boundaries and for objects-and-parts as the tool of choice for the supervised evaluation of image segmentation. We make the datasets and code of all the measures publicly available.

Proceedings ArticleDOI
01 Dec 2016
TL;DR: The experimental results show that the proposed method for accurate extraction of lesion region can outperform the existing state-of-the-art algorithms in terms of segmentation accuracy.
Abstract: Melanoma is the most aggressive form of skin cancer and is on rise. There exists a research trend for computerized analysis of suspicious skin lesions for malignancy using images captured by digital cameras. Analysis of these images is usually challenging due to existence of disturbing factors such as illumination variations and light reflections from skin surface. One important stage in diagnosis of melanoma is segmentation of lesion region from normal skin. In this paper, a method for accurate extraction of lesion region is proposed that is based on deep learning approaches. The input image, after being preprocessed to reduce noisy artifacts, is applied to a deep convolutional neural network (CNN). The CNN combines local and global contextual information and outputs a label for each pixel, producing a segmentation mask that shows the lesion region. This mask will be further refined by some post processing operations. The experimental results show that our proposed method can outperform the existing state-of-the-art algorithms in terms of segmentation accuracy.

Proceedings ArticleDOI
01 Aug 2016
TL;DR: Experimental results demonstrate that word features lead to comparable performances to the best systems in the literature, and a further combination of discrete and neural features gives top accuracies.
Abstract: Character-based and word-based methods are two main types of statistical models for Chinese word segmentation, the former exploiting sequence labeling models over characters and the latter typically exploiting a transition-based model, with the advantages that word-level features can be easily utilized. Neural models have been exploited for character-based Chinese word segmentation, giving high accuracies by making use of external character embeddings, yet requiring less feature engineering. In this paper, we study a neural model for word-based Chinese word segmentation, by replacing the manuallydesigned discrete features with neural features in a word-based segmentation framework. Experimental results demonstrate that word features lead to comparable performances to the best systems in the literature, and a further combination of discrete and neural features gives top accuracies.

Proceedings ArticleDOI
27 Jun 2016
TL;DR: This paper presents an unsupervised approach that generates a diverse, ranked set of bounding box and segmentation video object proposals-spatio-temporal tubes that localize the foreground objects-in an unannotated video, and demonstrates state-of-the-art segmentation results on the SegTrack v2 dataset.
Abstract: We present an unsupervised approach that generates a diverse, ranked set of bounding box and segmentation video object proposals—spatio-temporal tubes that localize the foreground objects—in an unannotated video. In contrast to previous unsupervised methods that either track regions initialized in an arbitrary frame or train a fixed model over a cluster of regions, we instead discover a set of easy-togroup instances of an object and then iteratively update its appearance model to gradually detect harder instances in temporally-adjacent frames. Our method first generates a set of spatio-temporal bounding box proposals, and then refines them to obtain pixel-wise segmentation proposals. We demonstrate state-of-the-art segmentation results on the SegTrack v2 dataset, and bounding box tracking results that perform competitively to state-of-the-art supervised tracking methods.