scispace - formally typeset
Search or ask a question

Showing papers on "Image segmentation published in 2005"


Journal ArticleDOI
TL;DR: An algebro-geometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points and applications of GPCA to computer vision problems such as face clustering, temporal video segmentation, and 3D motion segmentation from point correspondences in multiple affine views are presented.
Abstract: This paper presents an algebro-geometric solution to the problem of segmenting an unknown number of subspaces of unknown and varying dimensions from sample data points. We represent the subspaces with a set of homogeneous polynomials whose degree is the number of subspaces and whose derivatives at a data point give normal vectors to the subspace passing through the point. When the number of subspaces is known, we show that these polynomials can be estimated linearly from data; hence, subspace segmentation is reduced to classifying one point per subspace. We select these points optimally from the data set by minimizing certain distance function, thus dealing automatically with moderate noise in the data. A basis for the complement of each subspace is then recovered by applying standard PCA to the collection of derivatives (normal vectors). Extensions of GPCA that deal with data in a high-dimensional space and with an unknown number of subspaces are also presented. Our experiments on low-dimensional data show that GPCA outperforms existing algebraic algorithms based on polynomial factorization and provides a good initialization to iterative techniques such as k-subspaces and expectation maximization. We also present applications of GPCA to computer vision problems such as face clustering, temporal video segmentation, and 3D motion segmentation from point correspondences in multiple affine views.

1,162 citations


Proceedings ArticleDOI
17 Oct 2005
TL;DR: This work treats object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics, and develops a model developed in the statistical text literature: probabilistic latent semantic analysis (pLSA).
Abstract: We seek to discover the object categories depicted in a set of unlabelled images. We achieve this using a model developed in the statistical text literature: probabilistic latent semantic analysis (pLSA). In text analysis, this is used to discover topics in a corpus using the bag-of-words document representation. Here we treat object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics. The model is applied to images by using a visual analogue of a word, formed by vector quantizing SIFT-like region descriptors. The topic discovery approach successfully translates to the visual domain: for a small set of objects, we show that both the object categories and their approximate spatial layout are found without supervision. Performance of this unsupervised method is compared to the supervised approach of Fergus et al. (2003) on a set of unseen images containing only one object per image. We also extend the bag-of-words vocabulary to include 'doublets' which encode spatially local co-occurring regions. It is demonstrated that this extended vocabulary gives a cleaner image segmentation. Finally, the classification and segmentation methods are applied to a set of images containing multiple objects per image. These results demonstrate that we can successfully build object class models from an unsupervised analysis of images.

1,129 citations


Journal ArticleDOI
TL;DR: A novel method for separating images into texture and piecewise smooth (cartoon) parts, exploiting both the variational and the sparsity mechanisms is presented, combining the basis pursuit denoising (BPDN) algorithm and the total-variation (TV) regularization scheme.
Abstract: The separation of image content into semantic parts plays a vital role in applications such as compression, enhancement, restoration, and more. In recent years, several pioneering works suggested such a separation be based on variational formulation and others using independent component analysis and sparsity. This paper presents a novel method for separating images into texture and piecewise smooth (cartoon) parts, exploiting both the variational and the sparsity mechanisms. The method combines the basis pursuit denoising (BPDN) algorithm and the total-variation (TV) regularization scheme. The basic idea presented in this paper is the use of two appropriate dictionaries, one for the representation of textures and the other for the natural scene parts assumed to be piecewise smooth. Both dictionaries are chosen such that they lead to sparse representations over one type of image-content (either texture or piecewise smooth). The use of the BPDN with the two amalgamed dictionaries leads to the desired separation, along with noise removal as a by-product. As the need to choose proper dictionaries is generally hard, a TV regularization is employed to better direct the separation process and reduce ringing artifacts. We present a highly efficient numerical scheme to solve the combined optimization problem posed by our model and to show several experimental results that validate the algorithm's performance.

1,032 citations


MonographDOI
01 Sep 2005
TL;DR: The author’s research focused on image modeling and representation, which focused on the representation of black-and-white images through the lens of a discrete-time model.
Abstract: Preface 1. Introduction 2. Some modern image analysis tools 3. Image modeling and representation 4. Image denoising 5. Image deblurring 6. Image inpainting 7. Image processing: segmentation Bibliography Index.

1,025 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: Qualitative and quantitative results on a large data set confirm that the core part of the method is the combination of local and global cues via probabilistic top-down segmentation that allows examining and comparing object hypotheses with high precision down to the pixel level.
Abstract: In this paper, we address the problem of detecting pedestrians in crowded real-world scenes with severe overlaps. Our basic premise is that this problem is too difficult for any type of model or feature alone. Instead, we present an algorithm that integrates evidence in multiple iterations and from different sources. The core part of our method is the combination of local and global cues via probabilistic top-down segmentation. Altogether, this approach allows examining and comparing object hypotheses with high precision down to the pixel level. Qualitative and quantitative results on a large data set confirm that our method is able to reliably detect pedestrians in crowded scenes, even when they overlap and partially occlude each other. In addition, the flexible nature of our approach allows it to operate on very small training sets.

952 citations


Journal ArticleDOI
TL;DR: A new class of bases are introduced, called bandelet bases, which decompose the image along multiscale vectors that are elongated in the direction of a geometric flow, which leads to optimal approximation rates for geometrically regular images.
Abstract: This paper introduces a new class of bases, called bandelet bases, which decompose the image along multiscale vectors that are elongated in the direction of a geometric flow. This geometric flow indicates directions in which the image gray levels have regular variations. The image decomposition in a bandelet basis is implemented with a fast subband-filtering algorithm. Bandelet bases lead to optimal approximation rates for geometrically regular images. For image compression and noise removal applications, the geometric flow is optimized with fast algorithms so that the resulting bandelet basis produces minimum distortion. Comparisons are made with wavelet image compression and noise-removal algorithms.

922 citations


Journal ArticleDOI
Dar-Shyang Lee1
TL;DR: An effective scheme to improve the convergence rate without compromising model stability is proposed by replacing the global, static retention factor with an adaptive learning rate calculated for each Gaussian at every frame.
Abstract: Adaptive Gaussian mixtures have been used for modeling nonstationary temporal distributions of pixels in video surveillance applications. However, a common problem for this approach is balancing between model convergence speed and stability. This paper proposes an effective scheme to improve the convergence rate without compromising model stability. This is achieved by replacing the global, static retention factor with an adaptive learning rate calculated for each Gaussian at every frame. Significant improvements are shown on both synthetic and real video data. Incorporating this algorithm into a statistical framework for background subtraction leads to an improved segmentation performance compared to a standard method.

867 citations


Journal ArticleDOI
John C. Fiala1
TL;DR: Reconstruct is a free editor designed to facilitate montaging, alignment, analysis and visualization of serial sections, which can reduce the time and resources expended for serial section studies and allows a larger tissue volume to be analysed more quickly.
Abstract: Many microscopy studies require reconstruction from serial sections, a method of analysis that is sometimes difficult and time-consuming. When each section is cut, mounted and imaged separately, section images must be montaged and realigned to accurately analyse and visualize the three-dimensional (3D) structure. Reconstruct is a free editor designed to facilitate montaging, alignment, analysis and visualization of serial sections. The methods used by Reconstruct for organizing, transforming and displaying data enable the analysis of series with large numbers of sections and images over a large range of magnifications by making efficient use of computer memory. Alignments can correct for some types of non-linear deformations, including cracks and folds, as often encountered in serial electron microscopy. A large number of different structures can be easily traced and placed together in a single 3D scene that can be animated or saved. As a flexible editor, Reconstruct can reduce the time and resources expended for serial section studies and allows a larger tissue volume to be analysed more quickly.

854 citations


Journal ArticleDOI
TL;DR: This work finds that color quantization can be as low as 64 bins per channel, although higher histogram sizes give better segmentation performance, and that the Bayesian classifier with the histogram technique and the multilayer perceptron classifier are found to perform better compared to other tested classifiers.
Abstract: This work presents a study of three important issues of the color pixel classification approach to skin segmentation: color representation, color quantization, and classification algorithm. Our analysis of several representative color spaces using the Bayesian classifier with the histogram technique shows that skin segmentation based on color pixel classification is largely unaffected by the choice of the color space. However, segmentation performance degrades when only chrominance channels are used in classification. Furthermore, we find that color quantization can be as low as 64 bins per channel, although higher histogram sizes give better segmentation performance. The Bayesian classifier with the histogram technique and the multilayer perceptron classifier are found to perform better compared to other tested classifiers, including three piecewise linear classifiers, three unimodal Gaussian classifiers, and a Gaussian mixture classifier.

810 citations


Proceedings ArticleDOI
20 Jun 2005
TL;DR: The segmentation algorithm works simultaneously across the graph scales, with an inter-scale constraint to ensure communication and consistency between the segmentations at each scale, and incorporates long-range connections with linear-time complexity, providing high-quality segmentations efficiently.
Abstract: We present a multiscale spectral image segmentation algorithm. In contrast to most multiscale image processing, this algorithm works on multiple scales of the image in parallel, without iteration, to capture both coarse and fine level details. The algorithm is computationally efficient, allowing to segment large images. We use the normalized cut graph partitioning framework of image segmentation. We construct a graph encoding pairwise pixel affinity, and partition the graph for image segmentation. We demonstrate that large image graphs can be compressed into multiple scales capturing image structure at increasingly large neighborhood. We show that the decomposition of the image segmentation graph into different scales can be determined by ecological statistics on the image grouping cues. Our segmentation algorithm works simultaneously across the graph scales, with an inter-scale constraint to ensure communication and consistency between the segmentations at each scale. As the results show, we incorporate long-range connections with linear-time complexity, providing high-quality segmentations efficiently. Images that previously could not be processed because of their size have been accurately segmented thanks to this method.

635 citations


Book ChapterDOI
06 Sep 2005
TL;DR: A new iris database that contains images with noise is presented, in contrast with the existing databases, that are noise free.
Abstract: This paper presents a new iris database that contains images with noise. This is in contrast with the existing databases, that are noise free. UBIRIS is a tool for the development of robust iris recognition algorithms for biometric proposes. We present a detailed description of the many characteristics of UBIRIS and a comparison of several image segmentation approaches used in the current iris segmentation methods where it is evident their small tolerance to noisy images.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: This paper embeds the visibility constraint within an energy minimization framework, resulting in a symmetric stereo model that treats left and right images equally and can also incorporate segmentation as a soft constraint.
Abstract: In this paper, we propose a symmetric stereo model to handle occlusion in dense two-frame stereo. Our occlusion reasoning is directly based on the visibility constraint that is more general than both ordering and uniqueness constraints used in previous work. The visibility constraint requires occlusion in one image and disparity in the other to be consistent. We embed the visibility constraint within an energy minimization framework, resulting in a symmetric stereo model that treats left and right images equally. An iterative optimization algorithm is used to approximate the minimum of the energy using belief propagation. Our stereo model can also incorporate segmentation as a soft constraint. Experimental results on the Middlebury stereo images show that our algorithm is state-of-the-art.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian framework for parsing images into their constituent visual patterns is presented, which optimizes the posterior probability and outputs a scene representation as a "parsing graph", in a spirit similar to parsing sentences in speech and natural language.
Abstract: In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a "parsing graph", in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches--generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this paper, we focus on two types of visual patterns--generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation (Tu and Zhu, 2002. IEEE Trans. PAMI, 24(5):657--673). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions.

Journal ArticleDOI
TL;DR: The proposed demosaicing algorithm estimates missing pixels by interpolating in the direction with fewer color artifacts, and the aliasing problem is addressed by applying filterbank techniques to 2-D directional interpolation.
Abstract: A cost-effective digital camera uses a single-image sensor, applying alternating patterns of red, green, and blue color filters to each pixel location. A way to reconstruct a full three-color representation of color images by estimating the missing pixel components in each color plane is called a demosaicing algorithm. This paper presents three inherent problems often associated with demosaicing algorithms that incorporate two-dimensional (2-D) directional interpolation: misguidance color artifacts, interpolation color artifacts, and aliasing. The level of misguidance color artifacts present in two images can be compared using metric neighborhood modeling. The proposed demosaicing algorithm estimates missing pixels by interpolating in the direction with fewer color artifacts. The aliasing problem is addressed by applying filterbank techniques to 2-D directional interpolation. The interpolation artifacts are reduced using a nonlinear iterative procedure. Experimental results using digital images confirm the effectiveness of this approach.

Proceedings ArticleDOI
John Winn1, Nebojsa Jojic1
17 Oct 2005
TL;DR: LOCUS (learning object classes with unsupervised segmentation) is introduced which uses a generative probabilistic model to combine bottom-up cues of color and edge with top-down cues of shape and pose, allowing for significant within-class variation.
Abstract: We address the problem of learning object class models and object segmentations from unannotated images. We introduce LOCUS (learning object classes with unsupervised segmentation) which uses a generative probabilistic model to combine bottom-up cues of color and edge with top-down cues of shape and pose. A key aspect of this model is that the object appearance is allowed to vary from image to image, allowing for significant within-class variation. By iteratively updating the belief in the object's position, size, segmentation and pose, LOCUS avoids making hard decisions about any of these quantities and so allows for each to be refined at any stage. We show that LOCUS successfully learns an object class model from unlabeled images, whilst also giving segmentation accuracies that rival existing supervised methods. Finally, we demonstrate simultaneous recognition and segmentation in novel images using the learned models for a number of object classes, as well as unsupervised object discovery and tracking in video.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: This work devise a graph cut algorithm for interactive segmentation which incorporates shape priors, and positive results on both medical and natural images are demonstrated.
Abstract: Interactive or semi-automatic segmentation is a useful alternative to pure automatic segmentation in many applications. While automatic segmentation can be very challenging, a small amount of user input can often resolve ambiguous decisions on the part of the algorithm. In this work, we devise a graph cut algorithm for interactive segmentation which incorporates shape priors. While traditional graph cut approaches to interactive segmentation are often quite successful, they may fail in cases where there are diffuse edges, or multiple similar objects in close proximity to one another. Incorporation of shape priors within this framework mitigates these problems. Positive results on both medical and natural images are demonstrated.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: This work addresses the problem of segmenting 3D scan data into objects or object classes by using a recently proposed maximum-margin framework to discriminatively train the model from a set of labeled scans and automatically learn the relative importance of the features for the segmentation task.
Abstract: We address the problem of segmenting 3D scan data into objects or object classes. Our segmentation framework is based on a subclass of Markov random fields (MRFs) which support efficient graph-cut inference. The MRF models incorporate a large set of diverse features and enforce the preference that adjacent scan points have the same classification label. We use a recently proposed maximum-margin framework to discriminatively train the model from a set of labeled scans; as a result we automatically learn the relative importance of the features for the segmentation task. Performing graph-cut inference in the trained MRF can then be used to segment new scenes very efficiently. We test our approach on three large-scale datasets produced by different kinds of 3D sensors, showing its applicability to both outdoor and indoor environments containing diverse objects.

Proceedings ArticleDOI
17 Oct 2005
TL;DR: Probabilistic latent semantic analysis generates a compact scene representation, discriminative for accurate classification, and significantly more robust when less training data are available, and the ability of PLSA to automatically extract visually meaningful aspects is exploited to propose new algorithms for aspect-based image ranking and context-sensitive image segmentation.
Abstract: We present a new approach to model visual scenes in image collections, based on local invariant features and probabilistic latent space models. Our formulation provides answers to three open questions:(l) whether the invariant local features are suitable for scene (rather than object) classification; (2) whether unsupennsed latent space models can be used for feature extraction in the classification task; and (3) whether the latent space formulation can discover visual co-occurrence patterns, motivating novel approaches for image organization and segmentation. Using a 9500-image dataset, our approach is validated on each of these issues. First, we show with extensive experiments on binary and multi-class scene classification tasks, that a bag-of-visterm representation, derived from local invariant descriptors, consistently outperforms state-of-the-art approaches. Second, we show that probabilistic latent semantic analysis (PLSA) generates a compact scene representation, discriminative for accurate classification, and significantly more robust when less training data are available. Third, we have exploited the ability of PLSA to automatically extract visually meaningful aspects, to propose new algorithms for aspect-based image ranking and context-sensitive image segmentation.

Journal ArticleDOI
01 Jul 2005
TL;DR: This work presents an interactive video cutout system that allows users to quickly extract foreground objects from video sequences for use in a variety of applications including compositing onto new backgrounds and NPR cartoon style rendering.
Abstract: We present an interactive system for efficiently extracting foreground objects from a video. We extend previous min-cut based image segmentation techniques to the domain of video with four new contributions. We provide a novel painting-based user interface that allows users to easily indicate the foreground object across space and time. We introduce a hierarchical mean-shift preprocess in order to minimize the number of nodes that min-cut must operate on. Within the min-cut we also define new local cost functions to augment the global costs defined in earlier work. Finally, we extend 2D alpha matting methods designed for images to work with 3D video volumes. We demonstrate that our matting approach preserves smoothness across both space and time. Our interactive video cutout system allows users to quickly extract foreground objects from video sequences for use in a variety of applications including compositing onto new backgrounds and NPR cartoon style rendering.

Journal ArticleDOI
TL;DR: An automatic seeded region growing algorithm for color image segmentation that can produce good results as favorably compared to some existing algorithms.

Journal ArticleDOI
01 Sep 2005
TL;DR: A robust segmentation technique based on an extension to the traditional fuzzy c-means (FCM) clustering algorithm is proposed and a neighborhood attraction, which is dependent on the relative location and features of neighboring pixels, is shown to improve the segmentation performance dramatically.
Abstract: Image segmentation is an indispensable process in the visualization of human tissues, particularly during clinical analysis of magnetic resonance (MR) images. Unfortunately, MR images always contain a significant amount of noise caused by operator performance, equipment, and the environment, which can lead to serious inaccuracies with segmentation. A robust segmentation technique based on an extension to the traditional fuzzy c-means (FCM) clustering algorithm is proposed in this paper. A neighborhood attraction, which is dependent on the relative location and features of neighboring pixels, is shown to improve the segmentation performance dramatically. The degree of attraction is optimized by a neural-network model. Simulated and real brain MR images with different noise levels are segmented to demonstrate the superiority of the proposed technique compared to other FCM-based methods. This segmentation method is a key component of an MR image-based classification system for brain tumors, currently being developed.

Journal ArticleDOI
TL;DR: Evaluations by comparison with the results of polarizing filters demonstrate the effectiveness of the proposed method, which is based solely on colors, particularly chromaticity, without requiring any geometrical information.
Abstract: In inhomogeneous objects, highlights are linear combinations of diffuse and specular reflection components. To our knowledge, all methods that use a single input image require explicit color segmentation to deal with multicolored surfaces. Unfortunately, for complex textured images, current color segmentation algorithms are still problematic to segment correctly. Consequently, a method without explicit color segmentation becomes indispensable and This work presents such a method. The method is based solely on colors, particularly chromaticity, without requiring any geometrical information. One of the basic ideas is to iteratively compare the intensity logarithmic differentiation of an input image and its specular-free image. A specular-free image is an image that has exactly the same geometrical profile as the diffuse component of the input image and that can be generated by shifting each pixel's intensity and maximum chromaticity nonlinearly. Unlike existing methods using a single image, all processes in the proposed method are done locally, involving a maximum of only two neighboring pixels. This local operation is useful for handling textured objects with complex multicolored scenes. Evaluations by comparison with the results of polarizing filters demonstrate the effectiveness of the proposed method.

Proceedings ArticleDOI

[...]

20 Jun 2005
TL;DR: A principled Bayesian method for detecting and segmenting instances of a particular object category within an image, providing a coherent methodology for combining top down and bottom up cues and developing an efficient method, OBJ CUT, to obtain segmentations using this model.
Abstract: In this paper, we present a principled Bayesian method for detecting and segmenting instances of a particular object category within an image, providing a coherent methodology for combining top down and bottom up cues. The work draws together two powerful formulations: pictorial structures (PS) and Markov random fields (MRFs) both of which have efficient algorithms for their solution. The resulting combination, which we call the object category specific MRF, suggests a solution to the problem that has long dogged MRFs namely that they provide a poor prior for specific shapes. In contrast, our model provides a prior that is global across the image plane using the PS. We develop an efficient method, OBJ CUT, to obtain segmentations using this model. Novel aspects of this method include an efficient algorithm for sampling the PS model, and the observation that the expected log likelihood of the model can be increased by a single graph cut. Results are presented on two object categories, cows and horses. We compare our methods to the state of the art in object category specific image segmentation and demonstrate significant improvements.

Journal ArticleDOI
TL;DR: Experimental results obtained on multitem temporal SAR images acquired by the ERS-1 satellite confirm the effectiveness of the proposed approach to change detection in multitemporal synthetic aperture radar images.
Abstract: This paper presents a novel approach to change detection in multitemporal synthetic aperture radar (SAR) images. The proposed approach exploits a wavelet-based multiscale decomposition of the log-ratio image (obtained by a comparison of the original multitemporal data) aimed at achieving different scales (levels) of representation of the change signal. Each scale is characterized by a different tradeoff between speckle reduction and preservation of geometrical details. For each pixel, a subset of reliable scales is identified on the basis of a local statistic measure applied to scale-dependent log-ratio images. The final change-detection result is obtained according to an adaptive scale-driven fusion algorithm. Experimental results obtained on multitemporal SAR images acquired by the ERS-1 satellite confirm the effectiveness of the proposed approach.

Journal ArticleDOI
TL;DR: A validation study on statistical nonsupervised brain tissue classification techniques in magnetic resonance (MR) images demonstrates that methods relying on both intensity and spatial information are more robust to noise and field inhomogeneities and shows that simulated data results can be extended to real data.
Abstract: This paper presents a validation study on statistical nonsupervised brain tissue classification techniques in magnetic resonance (MR) images. Several image models assuming different hypotheses regarding the intensity distribution model, the spatial model and the number of classes are assessed. The methods are tested on simulated data for which the classification ground truth is known. Different noise and intensity nonuniformities are added to simulate real imaging conditions. No enhancement of the image quality is considered either before or during the classification process. This way, the accuracy of the methods and their robustness against image artifacts are tested. Classification is also performed on real data where a quantitative validation compares the methods' results with an estimated ground truth from manual segmentations by experts. Validity of the various classification methods in the labeling of the image as well as in the tissue volume is estimated with different local and global measures. Results demonstrate that methods relying on both intensity and spatial information are more robust to noise and field inhomogeneities. We also demonstrate that partial volume is not perfectly modeled, even though methods that account for mixture classes outperform methods that only consider pure Gaussian classes. Finally, we show that simulated data results can also be extended to real data.

Proceedings ArticleDOI
20 Jun 2005
TL;DR: This work uses a new expectation-maximization (EM) scheme to impose both spatial and color smoothness to infer natural connectivity among pixels, and demonstrates results on a variety of applications including image deblurring, enhanced color transfer, and colorizing gray scale images.
Abstract: We address the problem of regional color transfer between two natural images by probabilistic segmentation. We use a new expectation-maximization (EM) scheme to impose both spatial and color smoothness to infer natural connectivity among pixels. Unlike previous work, our method takes local color information into consideration, and segment image with soft region boundaries for seamless color transfer and compositing. Our modified EM method has two advantages in color manipulation: first, subject to different levels of color smoothness in image space, our algorithm produces an optimal number of regions upon convergence, where the color statistics in each region can be adequately characterized by a component of a Gaussian mixture model (GMM). Second, we allow a pixel to fall in several regions according to our estimated probability distribution in the EM step, resulting in a transparency-like ratio for compositing different regions seamlessly. Hence, natural color transition across regions can be achieved, where the necessary intra-region and inter-region smoothness are enforced without losing original details. We demonstrate results on a variety of applications including image deblurring, enhanced color transfer, and colorizing gray scale images. Comparisons with previous methods are also presented.

Proceedings ArticleDOI
17 Oct 2005
TL;DR: This paper combines the segmentation and matting problem together and proposes a unified optimization approach based on belief propagation, which is more efficient to extract high quality mattes for foregrounds with significant semitransparent regions.
Abstract: Separating a foreground object from the background in a static image involves determining both full and partial pixel coverages, also known as extracting a matte. Previous approaches require the input image to be presegmented into three regions: foreground, background and unknown, which are called a trimap. Partial opacity values are then computed only for pixels inside the unknown region. This presegmentation based approach fails for images with large portions of semitransparent foreground where the trimap is difficult to create even manually. In this paper, we combine the segmentation and matting problem together and propose a unified optimization approach based on belief propagation. We iteratively estimate the opacity value for every pixel in the image, based on a small sample of foreground and background pixels marked by the user. Experimental results show that compared with previous approaches, our method is more efficient to extract high quality mattes for foregrounds with significant semitransparent regions

Proceedings ArticleDOI
17 Oct 2005
TL;DR: The major contributions are the application of boosted local contour-based features for object detection in a partially supervised learning framework, and an efficient new boosting procedure for simultaneously selecting features and estimating per-feature parameters.
Abstract: We present a novel categorical object detection scheme that uses only local contour-based features. A two-stage, partially supervised learning architecture is proposed: a rudimentary detector is learned from a very small set of segmented images and applied to a larger training set of un-segmented images; the second stage bootstraps these detections to learn an improved classifier while explicitly training against clutter. The detectors are learned with a boosting algorithm which creates a location-sensitive classifier using a discriminative set of features from a randomly chosen dictionary of contour fragments. We present results that are very competitive with other state-of-the-art object detection schemes and show robustness to object articulations, clutter, and occlusion. Our major contributions are the application of boosted local contour-based features for object detection in a partially supervised learning framework, and an efficient new boosting procedure for simultaneously selecting features and estimating per-feature parameters.

Proceedings ArticleDOI
17 Oct 2005
TL;DR: It is shown how segmentation as preprocessing paradigm can be used to improve the efficiency and accuracy of model search in an image by using an over-segmentation of an image into superpixels.
Abstract: In this paper we show how segmentation as preprocessing paradigm can be used to improve the efficiency and accuracy of model search in an image. We operationalize this idea using an over-segmentation of an image into superpixels. The problem domain we explore is human body pose estimation from still images. The superpixels prove useful in two ways. First, we restrict the joint positions in our human body model to lie at centers of superpixels, which reduces the size of the model search space. In addition, accurate support masks for computing features on half-limbs of the body model are obtained by using agglomerations of superpixels as half limb segments. We present results on a challenging dataset of people in sports news images

Journal ArticleDOI
TL;DR: In this paper, a multiscale object-specific segmentation (MOSS) approach is presented for automatically delineating image-objects (i.e., segments) at multiple scales from a high-spatial resolution remotely sensed forest scene.