scispace - formally typeset
Search or ask a question

Showing papers on "Segmentation-based object categorization published in 2013"


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work proposes algorithms for object boundary detection and hierarchical segmentation that generalize the gPb-ucm approach of [2] by making effective use of depth information and shows how this contextual information in turn improves object recognition.
Abstract: We address the problems of contour detection, bottom-up grouping and semantic segmentation using RGB-D data. We focus on the challenging setting of cluttered indoor scenes, and evaluate our approach on the recently introduced NYU-Depth V2 (NYUD2) dataset [27]. We propose algorithms for object boundary detection and hierarchical segmentation that generalize the gPb-ucm approach of [2] by making effective use of depth information. We show that our system can label each contour with its type (depth, normal or albedo). We also propose a generic method for long-range amodal completion of surfaces and show its effectiveness in grouping. We then turn to the problem of semantic segmentation and propose a simple approach that classifies super pixels into the 40 dominant object categories in NYUD2. We use both generic and class-specific features to encode the appearance and geometry of objects. We also show how our approach can be used for scene classification, and how this contextual information in turn improves object recognition. In all of these tasks, we report significant improvements over the state-of-the-art.

699 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: A novel over-segmentation algorithm which uses voxel relationships to produce over-Segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space is proposed.
Abstract: Unsupervised over-segmentation of an image into regions of perceptually similar pixels, known as super pixels, is a widely used preprocessing step in segmentation algorithms Super pixel methods reduce the number of regions that must be considered later by more computationally expensive algorithms, with a minimal loss of information Nevertheless, as some information is inevitably lost, it is vital that super pixels not cross object boundaries, as such errors will propagate through later steps Existing methods make use of projected color or depth information, but do not consider three dimensional geometric relationships between observed data points which can be used to prevent super pixels from crossing regions of empty space We propose a novel over-segmentation algorithm which uses voxel relationships to produce over-segmentations which are fully consistent with the spatial geometry of the scene in three dimensional, rather than projective, space Enforcing the constraint that segmented regions must have spatial connectivity prevents label flow across semantic object boundaries which might otherwise be violated Additionally, as the algorithm works directly in 3D space, observations from several calibrated RGB+D cameras can be segmented jointly Experiments on a large data set of human annotated RGB+D images demonstrate a significant reduction in occurrence of clusters crossing object boundaries, while maintaining speeds comparable to state-of-the-art 2D methods

477 citations


Proceedings ArticleDOI
01 Jan 2013
TL;DR: This work argues that a per-image score instead of one computed over the entire dataset brings a lot more insight, and proposes new ways to evaluate semantic segmentation.
Abstract: In this work, we consider the evaluation of the semantic segmentation task. We discuss the strengths and limitations of the few existing measures, and propose new ways to evaluate semantic segmentation. First, we argue that a per-image score instead of one computed over the entire dataset brings a lot more insight. Second, we propose to take contours more carefully into account. Based on the conducted experiments, we suggest best practices for the evaluation. Finally, we present a user study we conducted to better understand how the quality of image segmentations is perceived by humans.

439 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work proposes to use dense correspondences between images to capture the sparsity and visual variability of the common object over the entire database, which enables us to ignore noise objects that may be salient within their own images but do not commonly occur in others.
Abstract: We present a new unsupervised algorithm to discover and segment out common objects from large and diverse image collections. In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search. The key insight to our algorithm is that common object patterns should be salient within each image, while being sparse with respect to smooth transformations across other images. We propose to use dense correspondences between images to capture the sparsity and visual variability of the common object over the entire database, which enables us to ignore noise objects that may be salient within their own images but do not commonly occur in others. We performed extensive numerical evaluation on established co-segmentation datasets, as well as several new datasets generated using Internet search. Our approach is able to effectively segment out the common object for diverse object categories, while naturally identifying images where the common object is not present.

408 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: The extracted primary object regions are then used to build object models for optimized video segmentation and outperforms both unsupervised and supervised state-of-the-art methods.
Abstract: In this paper, we propose a novel approach to extract primary object segments in videos in the `object proposal' domain. The extracted primary object regions are then used to build object models for optimized video segmentation. The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video. We exploit the fact that, in general, objects are spatially cohesive and characterized by locally smooth motion trajectories, to extract the primary object from the set of all available proposals based on motion, appearance and predicted-shape similarity across frames. Second, the DAG is initialized with an enhanced object proposal set where motion based proposal predictions (from adjacent frames) are used to expand the set of object proposals for a particular frame. Last, the paper presents a motion scoring function for selection of object proposals that emphasizes high optical flow gradients at proposal boundaries to discriminate between moving objects and the background. The proposed approach is evaluated using several challenging benchmark videos and it outperforms both unsupervised and supervised state-of-the-art methods.

354 citations


Journal ArticleDOI
TL;DR: A systematic survey of graph theoretical methods for image segmentation, where the problem is modeled in terms of partitioning a graph into several sub-graphs such that each of them represents a meaningful object of interest in the image.

345 citations


01 Jan 2013
TL;DR: This paper studies various Otsu algorithms, an automatic threshold selection region based segmentation method, which is one of the most successful methods for image thresholding because of its simple calculation.
Abstract: Image segmentation is the fundamental approach of digital image processing. Among all the segmentation methods, Otsu method is one of the most successful methods for image thresholding because of its simple calculation. Otsu is an automatic threshold selection region based segmentation method. This paper studies various Otsu algorithms.

344 citations


Proceedings ArticleDOI
01 Nov 2013
TL;DR: This survey examines methods that have been proposed to segment 3D point clouds into multiple homogeneous regions and outlines the promising future research directions.
Abstract: 3D point cloud segmentation is the process of classifying point clouds into multiple homogeneous regions, the points in the same region will have the same properties. The segmentation is challenging because of high redundancy, uneven sampling density, and lack explicit structure of point cloud data. This problem has many applications in robotics such as intelligent vehicles, autonomous mapping and navigation. Many authors have introduced different approaches and algorithms. In this survey, we examine methods that have been proposed to segment 3D point clouds. The advantages, disadvantages, and design mechanisms of these methods are analyzed and discussed. Finally, we outline the promising future research directions.

323 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: In this paper, a generalization of SIRFS is proposed, Scene-SIRFS, which takes as input a single RGB-D image and produces as output an improved depth map, a set of surface normals, a reflectance image, a shading image, and a spatially varying model of illumination.
Abstract: In this paper we extend the “shape, illumination and reflectance from shading” (SIRFS) model [3, 4], which recovers intrinsic scene properties from a single image. Though SIRFS performs well on images of segmented objects, it performs poorly on images of natural scenes, which contain occlusion and spatially-varying illumination. We therefore present Scene-SIRFS, a generalization of SIRFS in which we have a mixture of shapes and a mixture of illuminations, and those mixture components are embedded in a “soft” segmentation of the input image. We additionally use the noisy depth maps provided by RGB-D sensors (in this case, the Kinect) to improve shape estimation. Our model takes as input a single RGB-D image and produces as output an improved depth map, a set of surface normals, a reflectance image, a shading image, and a spatially varying model of illumination. The output of our model can be used for graphics applications, or for any application involving RGB-D images.

279 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: It is argued that image segmentation and dense 3D reconstruction contribute valuable information to each other's task and a rigorous mathematical framework is proposed to formulate and solve a joint segmentations and dense reconstruction problem.
Abstract: Both image segmentation and dense 3D modeling from images represent an intrinsically ill-posed problem. Strong regularizers are therefore required to constrain the solutions from being 'too noisy'. Unfortunately, these priors generally yield overly smooth reconstructions and/or segmentations in certain regions whereas they fail in other areas to constrain the solution sufficiently. In this paper we argue that image segmentation and dense 3D reconstruction contribute valuable information to each other's task. As a consequence, we propose a rigorous mathematical framework to formulate and solve a joint segmentation and dense reconstruction problem. Image segmentations provide geometric cues about which surface orientations are more likely to appear at a certain location in space whereas a dense 3D reconstruction yields a suitable regularization for the segmentation problem by lifting the labeling from 2D images to 3D space. We show how appearance-based cues and 3D surface orientation priors can be learned from training data and subsequently used for class-specific regularization. Experimental results on several real data sets highlight the advantages of our joint formulation.

264 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper presents a system for image parsing, or labeling each pixel in an image with its semantic category, aimed at achieving broad coverage across hundreds of object categories, many of them sparsely sampled.
Abstract: This paper presents a system for image parsing, or labeling each pixel in an image with its semantic category, aimed at achieving broad coverage across hundreds of object categories, many of them sparsely sampled. The system combines region-level features with per-exemplar sliding window detectors. Per-exemplar detectors are better suited for our parsing task than traditional bounding box detectors: they perform well on classes with little training data and high intra-class variation, and they allow object masks to be transferred into the test image for pixel-level segmentation. The proposed system achieves state-of-the-art accuracy on three challenging datasets, the largest of which contains 45,676 images and 232 labels.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: The model builds a model of the base-level category that can be fitted to images, producing high-quality foreground segmentation and mid-level part localizations, and improves the categorization accuracy over the state-of-the-art.
Abstract: We propose a new method for the task of fine-grained visual categorization The method builds a model of the base-level category that can be fitted to images, producing high-quality foreground segmentation and mid-level part localizations The model can be learnt from the typical datasets available for fine-grained categorization, where the only annotation provided is a loose bounding box around the instance (eg bird) in each image Both segmentation and part localizations are then used to encode the image content into a highly-discriminative visual signature The model is symbiotic in that part discovery/localization is helped by segmentation and, conversely, the segmentation is helped by the detection (eg part layout) Our model builds on top of the part-based object category detector of Felzenszwalb et al, and also on the powerful Grab Cut segmentation algorithm of Rother et al, and adds a simple spatial saliency coupling between them In our evaluation, the model improves the categorization accuracy over the state-of-the-art It also improves over what can be achieved with an analogous system that runs segmentation and part-localization independently

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This work proposes a new energy term explicitly measuring L1 distance between the object and background appearance models that can be globally maximized in one graph cut and shows that in many applications this simple term makes NP-hard segmentation functionals unnecessary.
Abstract: Among image segmentation algorithms there are two major groups: (a) methods assuming known appearance models and (b) methods estimating appearance models jointly with segmentation. Typically, the first group optimizes appearance log-likelihoods in combination with some spacial regularization. This problem is relatively simple and many methods guarantee globally optimal results. The second group treats model parameters as additional variables transforming simple segmentation energies into high-order NP-hard functionals (Zhu-Yuille, Chan-Vese, Grab Cut, etc). It is known that such methods indirectly minimize the appearance overlap between the segments. We propose a new energy term explicitly measuring L1 distance between the object and background appearance models that can be globally maximized in one graph cut. We show that in many applications our simple term makes NP-hard segmentation functionals unnecessary. Our one cut algorithm effectively replaces approximate iterative optimization techniques based on block coordinate descent.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: It is shown that combining this with a state-of-the-art classification algorithm leads to significant improvements in performance especially for datasets which are considered particularly hard for recognition, e.g. birds species.
Abstract: We propose a detection and segmentation algorithm for the purposes of fine-grained recognition. The algorithm first detects low-level regions that could potentially belong to the object and then performs a full-object segmentation through propagation. Apart from segmenting the object, we can also `zoom in' on the object, i.e. center it, normalize it for scale, and thus discount the effects of the background. We then show that combining this with a state-of-the-art classification algorithm leads to significant improvements in performance especially for datasets which are considered particularly hard for recognition, e.g. birds species. The proposed algorithm is much more efficient than other known methods in similar scenarios. Our method is also simpler and we apply it here to different classes of objects, e.g. birds, flowers, cats and dogs. We tested the algorithm on a number of benchmark datasets for fine-grained categorization. It outperforms all the known state-of-the-art methods on these datasets, sometimes by as much as 11%. It improves the performance of our baseline algorithm by 3-4%, consistently on all datasets. We also observed more than a 4% improvement in the recognition performance on a challenging large-scale flower dataset, containing 578 species of flowers and 250,000 images.

Journal ArticleDOI
TL;DR: A multi-atlas method that formulates a patch-based label fusion model in a Bayesian framework for cardiac magnetic resonance (MR) image segmentation and improves image registration accuracy by utilizing label information, which leads to improvement of segmentation accuracy.
Abstract: The evaluation of ventricular function is important for the diagnosis of cardiovascular diseases. It typically involves measurement of the left ventricular (LV) mass and LV cavity volume. Manual delineation of the myocardial contours is time-consuming and dependent on the subjective experience of the expert observer. In this paper, a multi-atlas method is proposed for cardiac magnetic resonance (MR) image segmentation. The proposed method is novel in two aspects. First, it formulates a patch-based label fusion model in a Bayesian framework. Second, it improves image registration accuracy by utilizing label information, which leads to improvement of segmentation accuracy. The proposed method was evaluated on a cardiac MR image set of 28 subjects. The average Dice overlap metric of our segmentation is 0.92 for the LV cavity, 0.89 for the right ventricular cavity and 0.82 for the myocardium. The results show that the proposed method is able to provide accurate information for clinical diagnosis.

Journal ArticleDOI
TL;DR: A novel method for the automatic segmentation of brain MRI images by using discriminative dictionary learning and sparse coding techniques, which can learn dictionaries offline and perform segmentations online, enabling a significant speed-up in the segmentation stage.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This work introduces a new volume-based metric that includes the important aspect of temporal consistency, that can deal with segmentation hierarchies, and that reflects the tradeoff between over-segmentation and segmentation accuracy.
Abstract: Video segmentation research is currently limited by the lack of a benchmark dataset that covers the large variety of sub problems appearing in video segmentation and that is large enough to avoid over fitting. Consequently, there is little analysis of video segmentation which generalizes across subtasks, and it is not yet clear which and how video segmentation should leverage the information from the still-frames, as previously studied in image segmentation, alongside video specific information, such as temporal volume, motion and occlusion. In this work we provide such an analysis based on annotations of a large video dataset, where each video is manually segmented by multiple persons. Moreover, we introduce a new volume-based metric that includes the important aspect of temporal consistency, that can deal with segmentation hierarchies, and that reflects the tradeoff between over-segmentation and segmentation accuracy.

Proceedings ArticleDOI
Luming Zhang1, Mingli Song1, Zicheng Liu2, Xiao Liu1, Jiajun Bu1, Chun Chen1 
23 Jun 2013
TL;DR: A novel image segmentation algorithm is proposed, called graph let cut, that leverages the learned graph let distribution in measuring the homogeneity of a set of spatially structured super pixels.
Abstract: Weakly supervised image segmentation is a challenging problem in computer vision field. In this paper, we present a new weakly supervised image segmentation algorithm by learning the distribution of spatially structured super pixel sets from image-level labels. Specifically, we first extract graph lets from each image where a graph let is a small-sized graph consisting of super pixels as its nodes and it encapsulates the spatial structure of those super pixels. Then, a manifold embedding algorithm is proposed to transform graph lets of different sizes into equal-length feature vectors. Thereafter, we use GMM to learn the distribution of the post-embedding graph lets. Finally, we propose a novel image segmentation algorithm, called graph let cut, that leverages the learned graph let distribution in measuring the homogeneity of a set of spatially structured super pixels. Experimental results show that the proposed approach outperforms state-of-the-art weakly supervised image segmentation methods, and its performance is comparable to those of the fully supervised segmentation models.

Proceedings ArticleDOI
Zhuoyuan Chen1, Hailin Jin2, Zhe Lin2, Scott Cohen2, Ying Wu1 
23 Jun 2013
TL;DR: An optical flow algorithm for large displacement motions that uses approximate nearest neighbor fields to compute an initial motion field and a robust algorithm to compute a set of similarity transformations as the motion candidates for segmentation to account for deviations from similarity transformations.
Abstract: We present an optical flow algorithm for large displacement motions. Most existing optical flow methods use the standard coarse-to-fine framework to deal with large displacement motions which has intrinsic limitations. Instead, we formulate the motion estimation problem as a motion segmentation problem. We use approximate nearest neighbor fields to compute an initial motion field and use a robust algorithm to compute a set of similarity transformations as the motion candidates for segmentation. To account for deviations from similarity transformations, we add local deformations in the segmentation process. We also observe that small objects can be better recovered using translations as the motion candidates. We fuse the motion results obtained under similarity transformations and under translations together before a final refinement. Experimental validation shows that our method can successfully handle large displacement motions. Although we particularly focus on large displacement motions in this work, we make no sacrifice in terms of overall performance. In particular, our method ranks at the top of the Middlebury benchmark.

Journal ArticleDOI
TL;DR: An automatic graph-based multi-surface segmentation algorithm that internally uses soft constraints to add prior information from a learned model is proposed, which improves the accuracy of the segmentation and increase the robustness to noise.
Abstract: Optical coherence tomography (OCT) is a well-established image modality in ophthalmology and used daily in the clinic. Automatic evaluation of such datasets requires an accurate segmentation of the retinal cell layers. However, due to the naturally low signal to noise ratio and the resulting bad image quality, this task remains challenging. We propose an automatic graph-based multi-surface segmentation algorithm that internally uses soft constraints to add prior information from a learned model. This improves the accuracy of the segmentation and increase the robustness to noise. Furthermore, we show that the graph size can be greatly reduced by applying a smart segmentation scheme. This allows the segmentation to be computed in seconds instead of minutes, without deteriorating the segmentation accuracy, making it ideal for a clinical setup. An extensive evaluation on 20 OCT datasets of healthy eyes was performed and showed a mean unsigned segmentation error of 3.05 ± 0.54 μm over all datasets when compared to the average observer, which is lower than the inter-observer variability. Similar performance was measured for the task of drusen segmentation, demonstrating the usefulness of using soft constraints as a tool to deal with pathologies.

Journal ArticleDOI
TL;DR: Experimental results indicate that the presented method is superior to threshold and Bayesian methods commonly used in PET image segmentation, is more accurate and robust compared to the other PET-CT segmentation methods recently published in the literature, and also it is general in the sense of simultaneously segmenting multiple scans in real-time with high accuracy needed in routine clinical use.

Journal ArticleDOI
TL;DR: The proposed automatic adjustable algorithm for segmentation of color images, using linear support vector machine (SVM) and Otsu's thresholding method, for apple sorting and grading provides an effective and robust segmentation means and can be easily adapted for other imaging-based agricultural applications.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work proposes an exemplar-based face image segmentation algorithm, taking inspiration from previous works on image parsing for general scenes, that first selects a subset of exemplar images from the database, then computes a nonrigid warp for each exemplar image to align it with the test image.
Abstract: In this work, we propose an exemplar-based face image segmentation algorithm. We take inspiration from previous works on image parsing for general scenes. Our approach assumes a database of exemplar face images, each of which is associated with a hand-labeled segmentation map. Given a test image, our algorithm first selects a subset of exemplar images from the database, Our algorithm then computes a nonrigid warp for each exemplar image to align it with the test image. Finally, we propagate labels from the exemplar images to the test image in a pixel-wise manner, using trained weights to modulate and combine label maps from different exemplars. We evaluate our method on two challenging datasets and compare with two face parsing algorithms and a general scene parsing algorithm. We also compare our segmentation results with contour-based face alignment results, that is, we first run the alignment algorithms to extract contour points and then derive segments from the contours. Our algorithm compares favorably with all previous works on all datasets evaluated.

Journal ArticleDOI
TL;DR: The existing techniques are reviewed and four key techniques including localization of the whole heart, initialization of substructures, refinement of boundary delineation, and regularization of shapes are discussed.
Abstract: Whole heart segmentation from magnetic resonance imaging or computed tomography is a prerequisite for many clinical applications. Since manual delineation can be tedious and subject to bias, automating such segmentation becomes increasingly popular in the image computing field. However, fully automatic whole heart segmentation is challenging and only limited studies were reported in the literature. This article reviews the existing techniques and analyzes the challenges and methodologies. The techniques are classified in terms of the types of the prior models and the algorithms used to fit the model to unseen images. The prior models include the atlases and the deformable models, and the fitting algorithms are further decomposed into four key techniques including localization of the whole heart, initialization of substructures, refinement of boundary delineation, and regularization of shapes. Finally, the validation issues, challenges, and future directions are discussed.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: A novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions that outperform the previous state-of-the-art on VOC 2010 test by 4%.
Abstract: In this paper we are interested in how semantic segmentation can help object detection. Towards this goal, we propose a novel deformable part-based model which exploits region-based segmentation algorithms that compute candidate object regions by bottom-up clustering followed by ranking of those regions. Our approach allows every detection hypothesis to select a segment (including void), and scores each box in the image using both the traditional HOG filters as well as a set of novel segmentation features. Thus our model ``blends'' between the detector and segmentation models. Since our features can be computed very efficiently given the segments, we maintain the same complexity as the original DPM. We demonstrate the effectiveness of our approach in PASCAL VOC 2010, and show that when employing only a root filter our approach outperforms Dalal & Triggs detector on all classes, achieving 13% higher average AP. When employing the parts, we outperform the original DPM in $19$ out of $20$ classes, achieving an improvement of 8% AP. Furthermore, we outperform the previous state-of-the-art on VOC 2010 test by 4%.

Journal ArticleDOI
TL;DR: The method based on the mixture of Gaussian functions to approximate the 1D histogram of a gray level image and whose parameters are calculated using three nature inspired algorithms is used, applied to multi-threshold problem.
Abstract: In the field of image analysis, segmentation is one of the most important preprocessing steps. One way to achieve segmentation is by mean of threshold selection, where each pixel that belongs to a determined class is labeled according to the selected threshold, giving as a result pixel groups that share visual characteristics in the image. Several methods have been proposed in order to solve threshold selection problems; in this work, it is used the method based on the mixture of Gaussian functions to approximate the 1D histogram of a gray level image and whose parameters are calculated using three nature inspired algorithms (Particle Swarm Optimization, Artificial Bee Colony Optimization and Differential Evolution). Each Gaussian function approximates the histogram, representing a pixel class and therefore a threshold point. Experimental results are shown, comparing in quantitative and qualitative fashion as well as the main advantages and drawbacks of each algorithm, applied to multi-threshold problem.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A novel algorithm for fast tracking of generic objects in videos that makes use of the generalised Hough transform with pixel-based descriptors and a probabilistic segmentation method based on global models for foreground and background is presented.
Abstract: In this paper, we present a novel algorithm for fast tracking of generic objects in videos. The algorithm uses two components: a detector that makes use of the generalised Hough transform with pixel-based descriptors, and a probabilistic segmentation method based on global models for foreground and background. These components are used for tracking in a combined way, and they adapt each other in a co-training manner. Through effective model adaptation and segmentation, the algorithm is able to track objects that undergo rigid and non-rigid deformations and considerable shape and appearance variations. The proposed tracking method has been thoroughly evaluated on challenging standard videos, and outperforms state-of-the-art tracking methods designed for the same task. Finally, the proposed models allow for an extremely efficient implementation, and thus tracking is very fast.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A method to produce tentative object segmentation masks to suppress background clutter in the features to improve object detection significantly and exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism.
Abstract: We present an object detection system based on the Fisher vector (FV) image representation computed over SIFT and color descriptors. For computational and storage efficiency, we use a recent segmentation-based method to generate class-independent object detection hypotheses, in combination with data compression techniques. Our main contribution is a method to produce tentative object segmentation masks to suppress background clutter in the features. Re-weighting the local image features based on these masks is shown to improve object detection significantly. We also exploit contextual features in the form of a full-image FV descriptor, and an inter-category rescoring mechanism. Our experiments on the VOC 2007 and 2010 datasets show that our detector improves over the current state-of-the-art detection results.

Journal ArticleDOI
TL;DR: The proposed improvement method, named improved spatial fuzzy c-means IFCMS, was evaluated on several test images including both synthetic images and simulated brain MRI images from the McConnell Brain Imaging Center (BrainWeb) database and demonstrates the efficiency of the ideas presented.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: This paper builds a network that contains segmented as well as unsegmented images, and extracts functional maps between connected image pairs based on image appearance features, which act as general property transporters between the images and are used to transfer segmentations.
Abstract: Joint segmentation of image sets has great importance for object recognition, image classification, and image retrieval. In this paper, we aim to jointly segment a set of images starting from a small number of labeled images or none at all. To allow the images to share segmentation information with each other, we build a network that contains segmented as well as unsegmented images, and extract functional maps between connected image pairs based on image appearance features. These functional maps act as general property transporters between the images and, in particular, are used to transfer segmentations. We define and operate in a reduced functional space optimized so that the functional maps approximately satisfy cycle-consistency under composition in the network. A joint optimization framework is proposed to simultaneously generate all segmentation functions over the images so that they both align with local segmentation cues in each particular image, and agree with each other under network transportation. This formulation allows us to extract segmentations even with no training data, but can also exploit such data when available. The collective effect of the joint processing using functional maps leads to accurate information sharing among images and yields superior segmentation results, as shown on the iCoseg, MSRC, and PASCAL data sets.