scispace - formally typeset
Search or ask a question

Showing papers on "Image segmentation published in 2013"


Journal ArticleDOI
TL;DR: A method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel, alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information.
Abstract: Scene labeling consists of labeling each pixel in an image with the category of the object it belongs to. We propose a method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel. The method alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information. We report results using multiple postprocessing methods to produce the final labeling. Among those, we propose a technique to automatically retrieve, from a pool of segmentation components, an optimal set of components that best explain the scene; these components are arbitrary, for example, they can be taken from a segmentation tree or from any family of oversegmentations. The system yields record accuracies on the SIFT Flow dataset (33 classes) and the Barcelona dataset (170 classes) and near-record accuracy on Stanford background dataset (eight classes), while being an order of magnitude faster than competing approaches, producing a 320×240 image labeling in less than a second, including feature extraction.

2,791 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work considers both foreground and background cues in a different way and ranks the similarity of the image elements with foreground cues or background cues via graph-based manifold ranking, defined based on their relevances to the given seeds or queries.
Abstract: Most existing bottom-up methods measure the foreground saliency of a pixel or region based on its contrast within a local context or the entire image, whereas a few methods focus on segmenting out background regions and thereby salient objects Instead of considering the contrast between the salient objects and their surrounding regions, we consider both foreground and background cues in a different way We rank the similarity of the image elements (pixels or regions) with foreground cues or background cues via graph-based manifold ranking The saliency of the image elements is defined based on their relevances to the given seeds or queries We represent the image as a close-loop graph with super pixels as nodes These nodes are ranked based on the similarity to background and foreground queries, based on affinity matrices Saliency detection is carried out in a two-stage scheme to extract background regions and foreground salient objects efficiently Experimental results on two large benchmark databases demonstrate the proposed method performs well when against the state-of-the-art methods in terms of accuracy and speed We also create a more difficult benchmark database containing 5,172 images to test the proposed saliency model and make this database publicly available with this paper for further studies in the saliency field

2,278 citations


Journal ArticleDOI
01 Mar 2013
TL;DR: Recent advances in spectral-spatial classification of hyperspectral images are presented in this paper and several techniques are investigated for combining both spatial and spectral information.
Abstract: Recent advances in spectral-spatial classification of hyperspectral images are presented in this paper. Several techniques are investigated for combining both spatial and spectral information. Spatial information is extracted at the object (set of pixels) level rather than at the conventional pixel level. Mathematical morphology is first used to derive the morphological profile of the image, which includes characteristics about the size, orientation, and contrast of the spatial structures present in the image. Then, the morphological neighborhood is defined and used to derive additional features for classification. Classification is performed with support vector machines (SVMs) using the available spectral information and the extracted spatial information. Spatial postprocessing is next investigated to build more homogeneous and spatially consistent thematic maps. To that end, three presegmentation techniques are applied to define regions that are used to regularize the preliminary pixel-wise thematic map. Finally, a multiple-classifier (MC) system is defined to produce relevant markers that are exploited to segment the hyperspectral image with the minimum spanning forest algorithm. Experimental results conducted on three real hyperspectral images with different spatial and spectral resolutions and corresponding to various contexts are presented. They highlight the importance of spectral-spatial strategies for the accurate classification of hyperspectral images and validate the proposed methods.

1,225 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper regards saliency map computation as a regression problem, which is based on multi-level image segmentation, and uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the salency map.
Abstract: Salient object detection has been attracting a lot of interest, and recently various heuristic computational models have been designed. In this paper, we regard saliency map computation as a regression problem. Our method, which is based on multi-level image segmentation, uses the supervised learning approach to map the regional feature vector to a saliency score, and finally fuses the saliency scores across multiple levels, yielding the saliency map. The contributions lie in two-fold. One is that we show our approach, which integrates the regional contrast, regional property and regional background ness descriptors together to form the master saliency map, is able to produce superior saliency maps to existing algorithms most of which combine saliency maps heuristically computed from different types of features. The other is that we introduce a new regional feature vector, background ness, to characterize the background, which can be regarded as a counterpart of the objectness descriptor [2]. The performance evaluation on several popular benchmark data sets validates that our approach outperforms existing state-of-the-arts.

1,057 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: This paper forms the problem of predicting local edge masks in a structured learning framework applied to random decision forests and develops a novel approach to learning decision trees robustly maps the structured labels to a discrete space on which standard information gain measures may be evaluated.
Abstract: Edge detection is a critical component of many vision systems, including object detectors and image segmentation algorithms. Patches of edges exhibit well-known forms of local structure, such as straight lines or T-junctions. In this paper we take advantage of the structure present in local image patches to learn both an accurate and computationally efficient edge detector. We formulate the problem of predicting local edge masks in a structured learning framework applied to random decision forests. Our novel approach to learning decision trees robustly maps the structured labels to a discrete space on which standard information gain measures may be evaluated. The result is an approach that obtains real time performance that is orders of magnitude faster than many competing state-of-the-art approaches, while also achieving state-of-the-art edge detection results on the BSDS500 Segmentation dataset and NYU Depth dataset. Finally, we show the potential of our approach as a general purpose edge detector by showing our learned edge models generalize well across datasets.

981 citations


Journal ArticleDOI
TL;DR: A new solution for the label fusion problem in which weighted voting is formulated in terms of minimizing the total expectation of labeling error and in which pairwise dependency between atlases is explicitly modeled as the joint probability of two atlas making a segmentation error at a voxel is proposed.
Abstract: Multi-atlas segmentation is an effective approach for automatically labeling objects of interest in biomedical images. In this approach, multiple expert-segmented example images, called atlases, are registered to a target image, and deformed atlas segmentations are combined using label fusion. Among the proposed label fusion strategies, weighted voting with spatially varying weight distributions derived from atlas-target intensity similarity have been particularly successful. However, one limitation of these strategies is that the weights are computed independently for each atlas, without taking into account the fact that different atlases may produce similar label errors. To address this limitation, we propose a new solution for the label fusion problem in which weighted voting is formulated in terms of minimizing the total expectation of labeling error and in which pairwise dependency between atlases is explicitly modeled as the joint probability of two atlases making a segmentation error at a voxel. This probability is approximated using intensity similarity between a pair of atlases and the target image in the neighborhood of each voxel. We validate our method in two medical image segmentation problems: hippocampus segmentation and hippocampus subfield segmentation in magnetic resonance (MR) images. For both problems, we show consistent and significant improvement over label fusion strategies that assign atlas weights independently.

800 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work proposes algorithms for object boundary detection and hierarchical segmentation that generalize the gPb-ucm approach of [2] by making effective use of depth information and shows how this contextual information in turn improves object recognition.
Abstract: We address the problems of contour detection, bottom-up grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gPb-ucm approach of [2] by making effective use of depth information. We show that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We then turn to the problem of semantic segmentation and propose a simple approach that classifies super pixels into the 40 dominant object categories in NYUD2. We use both generic and class-specific features to encode the appearance and geometry of objects. We also show how our approach can be used for scene classification, and how this contextual information in turn improves object recognition. In all of these tasks, we report significant improvements over the state-of-the-art.

699 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: This method is fast, fully automatic, and makes minimal assumptions about the video, which enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations.
Abstract: We present a technique for separating foreground objects from the background in a video. Our method is fast, fully automatic, and makes minimal assumptions about the video. This enables handling essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations. In experiments on two datasets containing over 1400 video shots, our method outperforms a state-of-the-art background subtraction technique [4] as well as methods based on clustering point tracks [6, 18, 19]. Moreover, it performs comparably to recent video object segmentation methods based on object proposals [14, 16, 27], while being orders of magnitude faster.

662 citations


Journal ArticleDOI
TL;DR: This work proposes a generic and simple framework comprising three steps: constructing a cost volume, fast cost volume filtering, and 3) Winner-Takes-All label selection that achieves 1) disparity maps in real time whose quality exceeds those of all other fast (local) approaches on the Middlebury stereo benchmark, and 2) optical flow fields which contain very fine structures as well as large displacements.
Abstract: Many computer vision tasks can be formulated as labeling problems. The desired solution is often a spatially smooth labeling where label transitions are aligned with color edges of the input image. We show that such solutions can be efficiently achieved by smoothing the label costs with a very fast edge-preserving filter. In this paper, we propose a generic and simple framework comprising three steps: 1) constructing a cost volume, 2) fast cost volume filtering, and 3) Winner-Takes-All label selection. Our main contribution is to show that with such a simple framework state-of-the-art results can be achieved for several computer vision applications. In particular, we achieve 1) disparity maps in real time whose quality exceeds those of all other fast (local) approaches on the Middlebury stereo benchmark, and 2) optical flow fields which contain very fine structures as well as large displacements. To demonstrate robustness, the few parameters of our framework are set to nearly identical values for both applications. Also, competitive results for interactive image segmentation are presented. With this work, we hope to inspire other researchers to leverage this framework to other application areas.

618 citations


Journal ArticleDOI
TL;DR: Extensive synthetic and real data experiments show that the proposed small target detection method not only works more stably for different target sizes and signal-to-clutter ratio values, but also has better detection performance compared with conventional baseline methods.
Abstract: The robust detection of small targets is one of the key techniques in infrared search and tracking applications. A novel small target detection method in a single infrared image is proposed in this paper. Initially, the traditional infrared image model is generalized to a new infrared patch-image model using local patch construction. Then, because of the non-local self-correlation property of the infrared background image, based on the new model small target detection is formulated as an optimization problem of recovering low-rank and sparse matrices, which is effectively solved using stable principle component pursuit. Finally, a simple adaptive segmentation method is used to segment the target image and the segmentation result can be refined by post-processing. Extensive synthetic and real data experiments show that under different clutter backgrounds the proposed method not only works more stably for different target sizes and signal-to-clutter ratio values, but also has better detection performance compared with conventional baseline methods.

617 citations


Journal ArticleDOI
Maoguo Gong1, Yan Liang1, Jiao Shi1, Wenping Ma1, Jingjing Ma1 
TL;DR: An improved fuzzy C-means (FCM) algorithm for image segmentation is presented by introducing a tradeoff weighted fuzzy factor and a kernel metric and results show that the new algorithm is effective and efficient, and is relatively independent of this type of noise.
Abstract: In this paper, we present an improved fuzzy C-means (FCM) algorithm for image segmentation by introducing a tradeoff weighted fuzzy factor and a kernel metric. The tradeoff weighted fuzzy factor depends on the space distance of all neighboring pixels and their gray-level difference simultaneously. By using this factor, the new algorithm can accurately estimate the damping extent of neighboring pixels. In order to further enhance its robustness to noise and outliers, we introduce a kernel distance measure to its objective function. The new algorithm adaptively determines the kernel parameter by using a fast bandwidth selection rule based on the distance variance of all data points in the collection. Furthermore, the tradeoff weighted fuzzy factor and the kernel distance measure are both parameter free. Experimental results on synthetic and real images show that the new algorithm is effective and efficient, and is relatively independent of this type of noise.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: An unsupervised video segmentation approach by simultaneously tracking multiple holistic figure-ground segments that outperforms state-of-the-art approaches in the dataset, showing its efficiency and robustness to challenges in different video sequences.
Abstract: We propose an unsupervised video segmentation approach by simultaneously tracking multiple holistic figure-ground segments. Segment tracks are initialized from a pool of segment proposals generated from a figure-ground segmentation algorithm. Then, online non-local appearance models are trained incrementally for each track using a multi-output regularized least squares formulation. By using the same set of training examples for all segment tracks, a computational trick allows us to track hundreds of segment tracks efficiently, as well as perform optimal online updates in closed-form. Besides, a new composite statistical inference approach is proposed for refining the obtained segment tracks, which breaks down the initial segment proposals and recombines for better ones by utilizing high-order statistic estimates from the appearance model and enforcing temporal consistency. For evaluating the algorithm, a dataset, SegTrack v2, is collected with about 1,000 frames with pixel-level annotations. The proposed framework outperforms state-of-the-art approaches in the dataset, showing its efficiency and robustness to challenges in different video sequences.

Journal ArticleDOI
TL;DR: The proposed segmentation methods have been evaluated in a database of 650 images with optic disc and optic cup boundaries manually marked by trained professionals and achieves areas under curve of 0.800 and 0.822 in two data sets, which is higher than other methods.
Abstract: Glaucoma is a chronic eye disease that leads to vision loss. As it cannot be cured, detecting the disease in time is important. Current tests using intraocular pressure (IOP) are not sensitive enough for population based glaucoma screening. Optic nerve head assessment in retinal fundus images is both more promising and superior. This paper proposes optic disc and optic cup segmentation using superpixel classification for glaucoma screening. In optic disc segmentation, histograms, and center surround statistics are used to classify each superpixel as disc or non-disc. A self-assessment reliability score is computed to evaluate the quality of the automated optic disc segmentation. For optic cup segmentation, in addition to the histograms and center surround statistics, the location information is also included into the feature space to boost the performance. The proposed segmentation methods have been evaluated in a database of 650 images with optic disc and optic cup boundaries manually marked by trained professionals. Experimental results show an average overlapping error of 9.5% and 24.1% in optic disc and optic cup segmentation, respectively. The results also show an increase in overlapping error as the reliability score is reduced, which justifies the effectiveness of the self-assessment. The segmented optic disc and optic cup are then used to compute the cup to disc ratio for glaucoma screening. Our proposed method achieves areas under curve of 0.800 and 0.822 in two data sets, which is higher than other methods. The methods can be used for segmentation and glaucoma screening. The self-assessment will be used as an indicator of cases with large errors and enhance the clinical deployment of the automatic segmentation and screening.

Journal ArticleDOI
TL;DR: The goal is to explore and record the variability of the computed effective properties as a function of using different tools and workflows, and benchmarking is the topic of the two present companion papers.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: A novel over-segmentation algorithm which uses voxel relationships to produce over-Segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space is proposed.
Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as super pixels, is a widely used preprocessing step in segmentation algorithms Super pixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information Nevertheless, as some information is inevitably lost, it is vital that super pixels not cross object boundaries, as such errors will propagate through later steps Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent super pixels from crossing regions of empty space We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work improves a state of the art method for estimating human joint locations from videos and incorporates additional segmentation cues and temporal constraints to select the ``best'' one, which is able to localize body joints more accurately than existing methods.
Abstract: We address action recognition in videos by modeling the spatial-temporal structures of human poses. We start by improving a state of the art method for estimating human joint locations from videos. More precisely, we obtain the K-best estimations output by the existing method and incorporate additional segmentation cues and temporal constraints to select the ``best'' one. Then we group the estimated joints into five body parts (e.g. the left arm) and apply data mining techniques to obtain a representation for the spatial-temporal structures of human actions. This representation captures the spatial configurations of body parts in one frame (by spatial-part-sets) as well as the body part movements(by temporal-part-sets) which are characteristic of human actions. It is interpretable, compact, and also robust to errors on joint estimations. Experimental results first show that our approach is able to localize body joints more accurately than existing methods. Next we show that it outperforms state of the art action recognizers on the UCF sport, the Keck Gesture and the MSR-Action3D datasets.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work proposes to use dense correspondences between images to capture the sparsity and visual variability of the common object over the entire database, which enables us to ignore noise objects that may be salient within their own images but do not commonly occur in others.
Abstract: We present a new unsupervised algorithm to discover and segment out common objects from large and diverse image collections. In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search. The key insight to our algorithm is that common object patterns should be salient within each image, while being sparse with respect to smooth transformations across other images. We propose to use dense correspondences between images to capture the sparsity and visual variability of the common object over the entire database, which enables us to ignore noise objects that may be salient within their own images but do not commonly occur in others. We performed extensive numerical evaluation on established co-segmentation datasets, as well as several new datasets generated using Internet search. Our approach is able to effectively segment out the common object for diverse object categories, while naturally identifying images where the common object is not present.

Journal ArticleDOI
Bahriye Akay1
01 Jun 2013
TL;DR: Experiments based on Kapur's entropy indicate that the ABC algorithm can be efficiently used in multilevel thresholding, and CPU time results show that the algorithms are scalable and that the running times of the algorithms seem to grow at a linear rate as the problem size increases.
Abstract: Segmentation is a critical task in image processing. Bi-level segmentation involves dividing the whole image into partitions based on a threshold value, whereas multilevel segmentation involves multiple threshold values. A successful segmentation assigns proper threshold values to optimise a criterion such as entropy or between-class variance. High computational cost and inefficiency of an exhaustive search for the optimal thresholds leads to the use of global search heuristics to set the optimal thresholds. An emerging area in global heuristics is swarm-intelligence, which models the collective behaviour of the organisms. In this paper, two successful swarm-intelligence-based global optimisation algorithms, particle swarm optimisation (PSO) and artificial bee colony (ABC), have been employed to find the optimal multilevel thresholds. Kapur's entropy, one of the maximum entropy techniques, and between-class variance have been investigated as fitness functions. Experiments have been performed on test images using various numbers of thresholds. The results were assessed using statistical tools and suggest that Otsu's technique, PSO and ABC show equal performance when the number of thresholds is two, while the ABC algorithm performs better than PSO and Otsu's technique when the number of thresholds is greater than two. Experiments based on Kapur's entropy indicate that the ABC algorithm can be efficiently used in multilevel thresholding. Moreover, segmentation methods are required to have a minimum running time in addition to high performance. Therefore, the CPU times of ABC and PSO have been investigated to check their validity in real-time. The CPU time results show that the algorithms are scalable and that the running times of the algorithms seem to grow at a linear rate as the problem size increases.

Journal ArticleDOI
TL;DR: This paper introduces a new cluster-based algorithm for co-saliency detection that is mostly bottom-up without heavy learning, and outperforms most the state-of-the-art saliency detection methods.
Abstract: Co-saliency is used to discover the common saliency on the multiple images, which is a relatively underexplored area. In this paper, we introduce a new cluster-based algorithm for co-saliency detection. Global correspondence between the multiple images is implicitly learned during the clustering process. Three visual attention cues: contrast, spatial, and corresponding, are devised to effectively measure the cluster saliency. The final co-saliency maps are generated by fusing the single image saliency and multiimage saliency. The advantage of our method is mostly bottom-up without heavy learning, and has the property of being simple, general, efficient, and effective. Quantitative and qualitative experiments result in a variety of benchmark datasets demonstrating the advantages of the proposed method over the competing co-saliency methods. Our method on single image also outperforms most the state-of-the-art saliency detection methods. Furthermore, we apply the co-saliency method on four vision applications: co-segmentation, robust image distance, weakly supervised learning, and video foreground detection, which demonstrate the potential usages of the co-saliency map.

Journal ArticleDOI
TL;DR: A new fully automatic image information extraction, generalization and mosaic workflow is presented that is based on multiscale textural and morphological image features extraction and a new systematic approach for quality control and validation allowing global spatial and thematic consistency checking is proposed and applied.
Abstract: A general framework for processing high and very-high resolution imagery in support of a Global Human Settlement Layer (GHSL) is presented together with a discussion on the results of the first operational test of the production workflow. The test involved the mapping of 24.3 million km2 of the Earth surface spread in four continents, corresponding to an estimated population of 1.3 billion people in 2010. The resolution of the input image data ranges from 0.5 to 10 meters, collected by a heterogeneous set of platforms including satellite SPOT (2 and 5), CBERS 2B, RapidEye (2 and 4), WorldView (1 and 2), GeoEye 1, QuickBird 2, Ikonos 2, and airborne sensors. Several imaging modes were tested including panchromatic, multispectral and pan-sharpened images. A new fully automatic image information extraction, generalization and mosaic workflow is presented that is based on multiscale textural and morphological image features extraction. New image feature compression and optimization are introduced, together with new learning and classification techniques allowing for the processing of HR/VHR image data using low-resolution thematic layers as reference. A new systematic approach for quality control and validation allowing global spatial and thematic consistency checking is proposed and applied. The quality of the results are discussed by sensor, band, resolution, and eco-regions. Critical points, lessons learned and next steps are highlighted.

Journal ArticleDOI
TL;DR: The concept of matched filtering is improved, and the proposed blood vessel segmentation approach is at least comparable with recent state-of-the-art methods, and outperforms most of them with an accuracy of 95% evaluated on the new database.
Abstract: Automatic assessment of retinal vessels plays an important role in the diagnosis of various eye, as well as systemic diseases. A public screening is highly desirable for prompt and effective treatment, since such diseases need to be diagnosed at an early stage. Automated and accurate segmentation of the retinal blood vessel tree is one of the challenging tasks in the computer-aided analysis of fundus images today. We improve the concept of matched filtering, and propose a novel and accurate method for segmenting retinal vessels. Our goal is to be able to segment blood vessels with varying vessel diameters in high-resolution colour fundus images. All recent authors compare their vessel segmentation results to each other using only low-resolution retinal image databases. Consequently, we provide a new publicly available high-resolution fundus image database of healthy and pathological retinas. Our performance evaluation shows that the proposed blood vessel segmentation approach is at least comparable with recent state-of-the-art methods. It outperforms most of them with an accuracy of 95% evaluated on the new database.

Journal ArticleDOI
TL;DR: This review presents the past and present work on GPU accelerated medical image processing, and is meant to serve as an overview and introduction to existing GPU implementations.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: The extracted primary object regions are then used to build object models for optimized video segmentation and outperforms both unsupervised and supervised state-of-the-art methods.
Abstract: In this paper, we propose a novel approach to extract primary object segments in videos in the `object proposal' domain. The extracted primary object regions are then used to build object models for optimized video segmentation. The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video. We exploit the fact that, in general, objects are spatially cohesive and characterized by locally smooth motion trajectories, to extract the primary object from the set of all available proposals based on motion, appearance and predicted-shape similarity across frames. Second, the DAG is initialized with an enhanced object proposal set where motion based proposal predictions (from adjacent frames) are used to expand the set of object proposals for a particular frame. Last, the paper presents a motion scoring function for selection of object proposals that emphasizes high optical flow gradients at proposal boundaries to discriminate between moving objects and the background. The proposed approach is evaluated using several challenging benchmark videos and it outperforms both unsupervised and supervised state-of-the-art methods.

Journal ArticleDOI
TL;DR: In this paper, a Markov Random Field (MRF) is used to combine scene-level matching with global image descriptors, followed by superpixel level matching with local features and efficient MRF optimization for incorporating neighborhood context.
Abstract: This paper presents a simple and effective nonparametric approach to the problem of image parsing, or labeling image regions (in our case, superpixels produced by bottom-up segmentation) with their categories. This approach requires no training, and it can easily scale to datasets with tens of thousands of images and hundreds of labels. It works by scene-level matching with global image descriptors, followed by superpixel-level matching with local features and efficient Markov random field (MRF) optimization for incorporating neighborhood context. Our MRF setup can also compute a simultaneous labeling of image regions into semantic classes (e.g., tree, building, car) and geometric classes (sky, vertical, ground). Our system outperforms the state-of-the-art non-parametric method based on SIFT Flow on a dataset of 2,688 images and 33 labels. In addition, we report per-pixel rates on a larger dataset of 15,150 images and 170 labels. To our knowledge, this is the first complete evaluation of image parsing on a dataset of this size, and it establishes a new benchmark for the problem.

Journal ArticleDOI
TL;DR: A systematic survey of graph theoretical methods for image segmentation, where the problem is modeled in terms of partitioning a graph into several sub-graphs such that each of them represents a meaningful object of interest in the image.

01 Jan 2013
TL;DR: This paper studies various Otsu algorithms, an automatic threshold selection region based segmentation method, which is one of the most successful methods for image thresholding because of its simple calculation.
Abstract: Image segmentation is the fundamental approach of digital image processing. Among all the segmentation methods, Otsu method is one of the most successful methods for image thresholding because of its simple calculation. Otsu is an automatic threshold selection region based segmentation method. This paper studies various Otsu algorithms.

Proceedings ArticleDOI
01 Nov 2013
TL;DR: This survey examines methods that have been proposed to segment 3D point clouds into multiple homogeneous regions and outlines the promising future research directions.
Abstract: 3D point cloud segmentation is the process of classifying point clouds into multiple homogeneous regions, the points in the same region will have the same properties. The segmentation is challenging because of high redundancy, uneven sampling density, and lack explicit structure of point cloud data. This problem has many applications in robotics such as intelligent vehicles, autonomous mapping and navigation. Many authors have introduced different approaches and algorithms. In this survey, we examine methods that have been proposed to segment 3D point clouds. The advantages, disadvantages, and design mechanisms of these methods are analyzed and discussed. Finally, we outline the promising future research directions.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work introduces a fast deformable spatial pyramid (DSP) matching algorithm for computing dense pixel correspondences that simultaneously regularizes match consistency at multiple spatial extents-ranging from an entire image, to coarse grid cells, to every single pixel.
Abstract: We introduce a fast deformable spatial pyramid (DSP) matching algorithm for computing dense pixel correspondences. Dense matching methods typically enforce both appearance agreement between matched pixels as well as geometric smoothness between neighboring pixels. Whereas the prevailing approaches operate at the pixel level, we propose a pyramid graph model that simultaneously regularizes match consistency at multiple spatial extents-ranging from an entire image, to coarse grid cells, to every single pixel. This novel regularization substantially improves pixel-level matching in the face of challenging image variations, while the "deformable" aspect of our model overcomes the strict rigidity of traditional spatial pyramids. Results on Label Me and Caltech show our approach outperforms state-of-the-art methods (SIFT Flow [15] and Patch-Match [2]), both in terms of accuracy and run time.

Journal ArticleDOI
TL;DR: A general, fully-automated method for multi-organ segmentation of abdominal computed tomography (CT) scans based on a hierarchical atlas registration and weighting scheme that generates target specific priors from an atlas database by combining aspects from multi-atlasRegistration and patch-based segmentation, two widely used methods in brain segmentation.
Abstract: A robust automated segmentation of abdominal organs can be crucial for computer aided diagnosis and laparoscopic surgery assistance. Many existing methods are specialized to the segmentation of individual organs and struggle to deal with the variability of the shape and position of abdominal organs. We present a general, fully-automated method for multi-organ segmentation of abdominal computed tomography (CT) scans. The method is based on a hierarchical atlas registration and weighting scheme that generates target specific priors from an atlas database by combining aspects from multi-atlas registration and patch-based segmentation, two widely used methods in brain segmentation. The final segmentation is obtained by applying an automatically learned intensity model in a graph-cuts optimization step, incorporating high-level spatial knowledge. The proposed approach allows to deal with high inter-subject variation while being flexible enough to be applied to different organs. We have evaluated the segmentation on a database of 150 manually segmented CT images. The achieved results compare well to state-of-the-art methods, that are usually tailored to more specific questions, with Dice overlap values of 94%, 93%, 70%, and 92% for liver, kidneys, pancreas, and spleen, respectively.

Journal ArticleDOI
TL;DR: The Virtual Skeleton Database is proposed as a centralized storage system where the data necessary to build statistical shape models can be stored and shared and has been proven to be a useful tool for collaborative model building, as a resource for biomechanical population studies, or to enhance segmentation algorithms.
Abstract: Background: Statistical shape models are widely used in biomedical research. They are routinely implemented for automatic image segmentation or object identification in medical images. In these fields, however, the acquisition of the large training datasets, required to develop these models, is usually a time-consuming process. Even after this effort, the collections of datasets are often lost or mishandled resulting in replication of work. Objective: To solve these problems, the Virtual Skeleton Database (VSD) is proposed as a centralized storage system where the data necessary to build statistical shape models can be stored and shared. Methods: The VSD provides an online repository system tailored to the needs of the medical research community. The processing of the most common image file types, a statistical shape model framework, and an ontology-based search provide the generic tools to store, exchange, and retrieve digital medical datasets. The hosted data are accessible to the community, and collaborative research catalyzes their productivity. Results: To illustrate the need for an online repository for medical research, three exemplary projects of the VSD are presented: (1) an international collaboration to achieve improvement in cochlear surgery and implant optimization, (2) a population-based analysis of femoral fracture risk between genders, and (3) an online application developed for the evaluation and comparison of the segmentation of brain tumors. Conclusions: The VSD is a novel system for scientific collaboration for the medical image community with a data-centric concept and semantically driven search option for anatomical structures. The repository has been proven to be a useful tool for collaborative model building, as a resource for biomechanical population studies, or to enhance segmentation algorithms. [J Med Internet Res 2013;15(11):e245]