scispace - formally typeset
Search or ask a question

Showing papers on "Image segmentation published in 2007"


Journal ArticleDOI
TL;DR: A survey of a specific class of region-based level set segmentation methods and how they can all be derived from a common statistical framework is presented.
Abstract: Since their introduction as a means of front propagation and their first application to edge-based segmentation in the early 90's, level set methods have become increasingly popular as a general framework for image segmentation. In this paper, we present a survey of a specific class of region-based level set segmentation methods and clarify how they can all be derived from a common statistical framework. Region-based segmentation schemes aim at partitioning the image domain by progressively fitting statistical models to the intensity, color, texture or motion in each of a set of regions. In contrast to edge-based schemes such as the classical Snakes, region-based methods tend to be less sensitive to noise. For typical images, the respective cost functionals tend to have less local minima which makes them particularly well-suited for local optimization methods such as the level set method. We detail a general statistical formulation for level set segmentation. Subsequently, we clarify how the integration of various low level criteria leads to a set of cost functionals. We point out relations between the different segmentation schemes. In experimental results, we demonstrate how the level set function is driven to partition the image plane into domains of coherent color, texture, dynamic texture or motion. Moreover, the Bayesian formulation allows to introduce prior shape knowledge into the level set method. We briefly review a number of advances in this domain.

1,117 citations


Journal ArticleDOI
TL;DR: By incorporating local spatial and gray information together, a novel fast and robust FCM framework for image segmentation, i.e., fast generalized fuzzy c-means (FGFCM) clustering algorithms, is proposed and can mitigate the disadvantages of FCM_S and at the same time enhances the clustering performance.

1,021 citations


Proceedings ArticleDOI
Tie Liu, Jian Sun, Nanning Zheng, Xiaoou Tang1, Heung-Yeung Shum1 
17 Jun 2007
TL;DR: A set of novel features including multi-scale contrast, center-surround histogram, and color spatial distribution are proposed to describe a salient object locally, regionally, and globally for salient object detection.
Abstract: We study visual attention by detecting a salient object in an input image. We formulate salient object detection as an image segmentation problem, where we separate the salient object from the image background. We propose a set of novel features including multi-scale contrast, center-surround histogram, and color spatial distribution to describe a salient object locally, regionally, and globally. A conditional random field is learned to effectively combine these features for salient object detection. We also constructed a large image database containing tens of thousands of carefully labeled images by multiple users. To our knowledge, it is the first large image database for quantitative evaluation of visual attention algorithms. We validate our approach on this image database, which is public available with this paper.

1,010 citations


Journal ArticleDOI
TL;DR: The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost and to be fairly robust to parameter tuning.
Abstract: A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning

962 citations


Journal ArticleDOI
TL;DR: This paper proposes to unify three well-known image variational models, namely the snake model, the Rudin–Osher–Fatemi denoising model and the Mumford–Shah segmentation model, and establishes theorems with proofs to determine a global minimum of the active contour model.
Abstract: The active contour/snake model is one of the most successful variational models in image segmentation. It consists of evolving a contour in images toward the boundaries of objects. Its success is based on strong mathematical properties and efficient numerical schemes based on the level set method. The only drawback of this model is the existence of local minima in the active contour energy, which makes the initial guess critical to get satisfactory results. In this paper, we propose to solve this problem by determining a global minimum of the active contour model. Our approach is based on the unification of image segmentation and image denoising tasks into a global minimization framework. More precisely, we propose to unify three well-known image variational models, namely the snake model, the Rudin---Osher---Fatemi denoising model and the Mumford---Shah segmentation model. We will establish theorems with proofs to determine the existence of a global minimum of the active contour model. From a numerical point of view, we propose a new practical way to solve the active contour propagation problem toward object boundaries through a dual formulation of the minimization problem. The dual formulation, easy to implement, allows us a fast global minimization of the snake energy. It avoids the usual drawback in the level set approach that consists of initializing the active contour in a distance function and re-initializing it periodically during the evolution, which is time-consuming. We apply our segmentation algorithms on synthetic and real-world images, such as texture images and medical images, to emphasize the performances of our model compared with other segmentation models.

909 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: The proposed region-based active contour model can be used to segment images with intensity inhomogeneity, which overcomes the limitation of piecewise constant models and has promising application to image denoising.
Abstract: Local image information is crucial for accurate segmentation of images with intensity inhomogeneity. However, image information in local region is not embedded in popular region-based active contour models, such as the piecewise constant models. In this paper, we propose a region-based active contour model that is able to utilize image information in local regions. The major contribution of this paper is the introduction of a local binary fitting energy with a kernel function, which enables the extraction of accurate local image information. Therefore, our model can be used to segment images with intensity inhomogeneity, which overcomes the limitation of piecewise constant models. Comparisons with other major region-based models, such as the piece-wise smooth model, show the advantages of our method in terms of computational efficiency and accuracy. In addition, the proposed method has promising application to image denoising.

891 citations


Journal ArticleDOI
TL;DR: It is demonstrated how a recently proposed measure of similarity, the normalized probabilistic rand (NPR) index, can be used to perform a quantitative comparison between image segmentation algorithms using a hand-labeled set of ground-truth segmentations.
Abstract: Unsupervised image segmentation is an important component in many image understanding algorithms and practical vision systems. However, evaluation of segmentation algorithms thus far has been largely subjective, leaving a system designer to judge the effectiveness of a technique based only on intuition and results in the form of a few example segmented images. This is largely due to image segmentation being an ill-defined problem-there is no unique ground-truth segmentation of an image against which the output of an algorithm may be compared. This paper demonstrates how a recently proposed measure of similarity, the normalized probabilistic rand (NPR) index, can be used to perform a quantitative comparison between image segmentation algorithms using a hand-labeled set of ground-truth segmentations. We show that the measure allows principled comparisons between segmentations created by different algorithms, as well as segmentations on different images. We outline a procedure for algorithm evaluation through an example evaluation of some familiar algorithms - the mean-shift-based algorithm, an efficient graph-based segmentation algorithm, a hybrid algorithm that combines the strengths of both methods, and expectation maximization. Results are presented on the 300 images in the publicly available Berkeley segmentation data set

826 citations


Journal ArticleDOI
TL;DR: In the framework of computer-aided diagnosis of eye diseases, retinal vessel segmentation based on line operators is proposed and two segmentation methods are considered.
Abstract: In the framework of computer-aided diagnosis of eye diseases, retinal vessel segmentation based on line operators is proposed. A line detector, previously used in mammography, is applied to the green channel of the retinal image. It is based on the evaluation of the average grey level along lines of fixed length passing through the target pixel at different orientations. Two segmentation methods are considered. The first uses the basic line detector whose response is thresholded to obtain unsupervised pixel classification. As a further development, we employ two orthogonal line detectors along with the grey level of the target pixel to construct a feature vector for supervised classification using a support vector machine. The effectiveness of both methods is demonstrated through receiver operating characteristic analysis on two publicly available databases of color fundus images.

819 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper compares four 3D motion segmentation algorithms for affine cameras on a benchmark of 155 motion sequences of checkerboard, traffic, and articulated scenes.
Abstract: Over the past few years, several methods for segmenting a scene containing multiple rigidly moving objects have been proposed. However, most existing methods have been tested on a handful of sequences only, and each method has been often tested on a different set of sequences. Therefore, the comparison of different methods has been fairly limited. In this paper, we compare four 3D motion segmentation algorithms for affine cameras on a benchmark of 155 motion sequences of checkerboard, traffic, and articulated scenes.

757 citations


Proceedings ArticleDOI
Lu Gan1
01 Jul 2007
TL;DR: This paper proposes and study block compressed sensing for natural images, where image acquisition is conducted in a block-by-block manner through the same operator, and shows that the proposed scheme can sufficiently capture the complicated geometric structures of natural images.
Abstract: Compressed sensing (CS) is a new technique for simultaneous data sampling and compression. In this paper, we propose and study block compressed sensing for natural images, where image acquisition is conducted in a block-by-block manner through the same operator. While simpler and more efficient than other CS techniques, the proposed scheme can sufficiently capture the complicated geometric structures of natural images. Our image reconstruction algorithm involves both linear and nonlinear operations such as Wiener filtering, projection onto the convex set and hard thresholding in the transform domain. Several numerical experiments demonstrate that the proposed block CS compares favorably with existing schemes at a much lower implementation cost.

715 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: A framework in which Lagrangian particle dynamics is used for the segmentation of high density crowd flows and detection of flow instabilities and the maximum eigenvalue of the tensor is used to construct a finite time Lyapunov exponent (FTLE) field, which reveals thelagrangian coherent structures (LCS) present in the underlying flow.
Abstract: This paper proposes a framework in which Lagrangian particle dynamics is used for the segmentation of high density crowd flows and detection of flow instabilities. For this purpose, a flow field generated by a moving crowd is treated as an aperiodic dynamical system. A grid of particles is overlaid on the flow field, and is advected using a numerical integration scheme. The evolution of particles through the flow is tracked using a flow map, whose spatial gradients are subsequently used to setup a Cauchy Green deformation tensor for quantifying the amount by which the neighboring particles have diverged over the length of the integration. The maximum eigenvalue of the tensor is used to construct a finite time Lyapunov exponent (FTLE) field, which reveals the Lagrangian coherent structures (LCS) present in the underlying flow. The LCS divide flow into regions of qualitatively different dynamics and are used to locate boundaries of the flow segments in a normalized cuts framework. Any change in the number of flow segments over time is regarded as an instability, which is detected by establishing correspondences between flow segments over time. The experiments are conducted on a challenging set of videos taken from Google Video and a National Geographic documentary.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A parameter free approach that utilizes multiple cues for image segmentation that takes into account intensity and texture distributions in a local area around each region and incorporates priors based on the geometry of the regions.
Abstract: We present a parameter free approach that utilizes multiple cues for image segmentation. Beginning with an image, we execute a sequence of bottom-up aggregation steps in which pixels are gradually merged to produce larger and larger regions. In each step we consider pairs of adjacent regions and provide a probability measure to assess whether or not they should be included in the same segment. Our probabilistic formulation takes into account intensity and texture distributions in a local area around each region. It further incorporates priors based on the geometry of the regions. Finally, posteriors based on intensity and texture cues are combined using a mixture of experts formulation. This probabilistic approach is integrated into a graph coarsening scheme providing a complete hierarchical segmentation of the image. The algorithm complexity is linear in the number of the image pixels and it requires almost no user-tuned parameters. We test our method on a variety of gray scale images and compare our results to several existing segmentation algorithms.

Proceedings ArticleDOI
26 Dec 2007
TL;DR: An interactive framework for soft segmentation and matting of natural images and videos is presented, based on the optimal, linear time, computation of weighted geodesic distances to the user-provided scribbles, from which the whole data is automatically segmented.
Abstract: An interactive framework for soft segmentation and matting of natural images and videos is presented in this paper. The proposed technique is based on the optimal, linear time, computation of weighted geodesic distances to the user-provided scribbles, from which the whole data is automatically segmented. The weights are based on spatial and/or temporal gradients, without explicit optical flow or any advanced and often computationally expensive feature detectors. These could be naturally added to the proposed framework as well if desired, in the form of weights in the geodesic distances. A localized refinement step follows this fast segmentation in order to accurately compute the corresponding matte function. Additional constraints into the distance definition permit to efficiently handle occlusions such as people or objects crossing each other in a video sequence. The presentation of the framework is complemented with numerous and diverse examples, including extraction of moving foreground from dynamic background, and comparisons with the recent literature.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper introduces an algorithm for learning shapelet features, a set of mid-level features that are built from low-level gradient information that discriminates between pedestrian and non-pedestrian classes on the INRIA dataset.
Abstract: In this paper, we address the problem of detecting pedestrians in still images. We introduce an algorithm for learning shapelet features, a set of mid-level features. These features are focused on local regions of the image and are built from low-level gradient information that discriminates between pedestrian and non-pedestrian classes. Using Ad-aBoost, these shapelet features are created as a combination of oriented gradient responses. To train the final classifier, we use AdaBoost for a second time to select a subset of our learned shapelets. By first focusing locally on smaller feature sets, our algorithm attempts to harvest more useful information than by examining all the low-level features together. We present quantitative results demonstrating the effectiveness of our algorithm. In particular, we obtain an error rate 14 percentage points lower (at 10-6 FPPW) than the previous state of the art detector of Dalal and Triggs on the INRIA dataset.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: An efficient implementation of the "probing" technique is discussed, which simplifies the MRF while preserving the global optimum, and a new technique which takes an arbitrary input labeling and tries to improve its energy is presented.
Abstract: Many computer vision applications rely on the efficient optimization of challenging, so-called non-submodular, binary pairwise MRFs. A promising graph cut based approach for optimizing such MRFs known as "roof duality" was recently introduced into computer vision. We study two methods which extend this approach. First, we discuss an efficient implementation of the "probing" technique introduced recently by Bows et al. (2006). It simplifies the MRF while preserving the global optimum. Our code is 400-700 faster on some graphs than the implementation of the work of Bows et al. (2006). Second, we present a new technique which takes an arbitrary input labeling and tries to improve its energy. We give theoretical characterizations of local minima of this procedure. We applied both techniques to many applications, including image segmentation, new view synthesis, super-resolution, diagram recognition, parameter learning, texture restoration, and image deconvolution. For several applications we see that we are able to find the global minimum very efficiently, and considerably outperform the original roof duality approach. In comparison to existing techniques, such as graph cut, TRW, BP, ICM, and simulated annealing, we nearly always find a lower energy.

Journal ArticleDOI
TL;DR: The steepest descent for minimizing the functional is interpreted as a nonlocal diffusion process, which allows a convenient framework for nonlocal variational minimizations, including variational denoising, Bregman iterations, and the recently proposed inverse scale space.
Abstract: A nonlocal quadratic functional of weighted differences is examined. The weights are based on image features and represent the affinity between different pixels in the image. By prescribing different formulas for the weights, one can generalize many local and nonlocal linear denoising algorithms, including the nonlocal means filter and the bilateral filter. In this framework one can easily show that continuous iterations of the generalized filter obey certain global characteristics and converge to a constant solution. The linear operator associated with the Euler–Lagrange equation of the functional is closely related to the graph Laplacian. We can thus interpret the steepest descent for minimizing the functional as a nonlocal diffusion process. This formulation allows a convenient framework for nonlocal variational minimizations, including variational denoising, Bregman iterations, and the recently proposed inverse scale space. It is also demonstrated how the steepest descent flow can be used for segmenta...

Journal ArticleDOI
TL;DR: It is shown that a deterministic segmentation is approximately the (asymptotically) optimal solution for compressing mixed data and can be readily applied to segment real imagery and bioinformatic data.
Abstract: In this paper, based on ideas from lossy data coding and compression, we present a simple but effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions, which are allowed to be almost degenerate. The goal is to find the optimal segmentation that minimizes the overall coding length of the segmented data, subject to a given distortion. By analyzing the coding length/rate of mixed data, we formally establish some strong connections of data segmentation to many fundamental concepts in lossy data compression and rate-distortion theory. We show that a deterministic segmentation is approximately the (asymptotically) optimal solution for compressing mixed data. We propose a very simple and effective algorithm that depends on a single parameter, the allowable distortion. At any given distortion, the algorithm automatically determines the corresponding number and dimension of the groups and does not involve any parameter estimation. Simulation results reveal intriguing phase-transition-like behaviors of the number of segments when changing the level of distortion or the amount of outliers. Finally, we demonstrate how this technique can be readily applied to segment real imagery and bioinformatic data.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A discriminative framework for simultaneous sequence segmentation and labeling which can capture both intrinsic and extrinsic class dynamics and incorporates hidden state variables which model the sub-structure of a class sequence and learn dynamics between class labels.
Abstract: Many problems in vision involve the prediction of a class label for each frame in an unsegmented sequence. In this paper, we develop a discriminative framework for simultaneous sequence segmentation and labeling which can capture both intrinsic and extrinsic class dynamics. Our approach incorporates hidden state variables which model the sub-structure of a class sequence and learn dynamics between class labels. Each class label has a disjoint set of associated hidden states, which enables efficient training and inference in our model. We evaluated our method on the task of recognizing human gestures from unsegmented video streams and performed experiments on three different datasets of head and eye gestures. Our results demonstrate that our model compares favorably to Support Vector Machines, Hidden Markov Models, and Conditional Random Fields on visual gesture recognition tasks.

Journal ArticleDOI
TL;DR: Examples and comparisons are presented to show the advantages of this innovation, including superior noise robustness, reduced computational cost, and the flexibility of tailoring the force field.
Abstract: Snakes, or active contours, have been widely used in image processing applications. Typical roadblocks to consistent performance include limited capture range, noise sensitivity, and poor convergence to concavities. This paper proposes a new external force for active contours, called vector field convolution (VFC), to address these problems. VFC is calculated by convolving the edge map generated from the image with the user-defined vector field kernel. We propose two structures for the magnitude function of the vector field kernel, and we provide an analytical method to estimate the parameter of the magnitude function. Mixed VFC is introduced to alleviate the possible leakage problem caused by choosing inappropriate parameters. We also demonstrate that the standard external force and the gradient vector flow (GVF) external force are special cases of VFC in certain scenarios. Examples and comparisons with GVF are presented in this paper to show the advantages of this innovation, including superior noise robustness, reduced computational cost, and the flexibility of tailoring the force field.

Journal ArticleDOI
01 Oct 2007
TL;DR: A novel approach that provides effective and robust segmentation of color images by incorporating the advantages of the mean shift segmentation and the normalized cut partitioning methods, which requires low computational complexity and is therefore very feasible for real-time image segmentation processing.
Abstract: In this correspondence, we develop a novel approach that provides effective and robust segmentation of color images. By incorporating the advantages of the mean shift (MS) segmentation and the normalized cut (Ncut) partitioning methods, the proposed method requires low computational complexity and is therefore very feasible for real-time image segmentation processing. It preprocesses an image by using the MS algorithm to form segmented regions that preserve the desirable discontinuity characteristics of the image. The segmented regions are then represented by using the graph structures, and the Ncut method is applied to perform globally optimized clustering. Because the number of the segmented regions is much smaller than that of the image pixels, the proposed method allows a low-dimensional image clustering with significant reduction of the complexity compared to conventional graph-partitioning methods that are directly applied to the image pixels. In addition, the image clustering using the segmented regions, instead of the image pixels, also reduces the sensitivity to noise and results in enhanced image segmentation performance. Furthermore, to avoid some inappropriate partitioning when considering every region as only one graph node, we develop an improved segmentation strategy using multiple child nodes for each region. The superiority of the proposed method is examined and demonstrated through a large number of experiments using color natural scene images.

Proceedings ArticleDOI
26 Dec 2007
TL;DR: Spatial-LTM represents an image containing objects in a hierarchical way by over-segmented image regions of homogeneous appearances and the salient image patches within the regions, enforcing the spatial coherency of the model.
Abstract: We present a novel generative model for simultaneously recognizing and segmenting object and scene classes. Our model is inspired by the traditional bag of words representation of texts and images as well as a number of related generative models, including probabilistic latent semantic analysis (pLSA) and latent Dirichlet allocation (LDA). A major drawback of the pLSA and LDA models is the assumption that each patch in the image is independently generated given its corresponding latent topic. While such representation provides an efficient computational method, it lacks the power to describe the visually coherent images and scenes. Instead, we propose a spatially coherent latent topic model (spatial-LTM). Spatial-LTM represents an image containing objects in a hierarchical way by over-segmented image regions of homogeneous appearances and the salient image patches within the regions. Only one single latent topic is assigned to the image patches within each region, enforcing the spatial coherency of the model. This idea gives rise to the following merits of spatial-LTM: (1) spatial-LTM provides a unified representation for spatially coherent bag of words topic models; (2) spatial-LTM can simultaneously segment and classify objects, even in the case of occlusion and multiple instances; and (3) spatial-LTM can be trained either unsupervised or supervised, as well as when partial object labels are provided. We verify the success of our model in a number of segmentation and classification experiments.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: This work represents the body using a recently proposed triangulated mesh model called SCAPE which employs a low-dimensional, but detailed, parametric model of shape and pose-dependent deformations that is learned from a database of range scans of human bodies.
Abstract: Much of the research on video-based human motion capture assumes the body shape is known a priori and is represented coarsely (e.g. using cylinders or superquadrics to model limbs). These body models stand in sharp contrast to the richly detailed 3D body models used by the graphics community. Here we propose a method for recovering such models directly from images. Specifically, we represent the body using a recently proposed triangulated mesh model called SCAPE which employs a low-dimensional, but detailed, parametric model of shape and pose-dependent deformations that is learned from a database of range scans of human bodies. Previous work showed that the parameters of the SCAPE model could be estimated from marker-based motion capture data. Here we go further to estimate the parameters directly from image data. We define a cost function between image observations and a hypothesized mesh and formulate the problem as optimization over the body shape and pose parameters using stochastic search. Our results show that such rich generative models enable the automatic recovery of detailed human shape and pose from images.

Journal ArticleDOI
TL;DR: A new white matter atlas creation method that learns a model of the common white matter structures present in a group of subjects, enabling group comparison of white matter anatomy and results regarding the stability of the method and parameter choices are presented.
Abstract: We propose a new white matter atlas creation method that learns a model of the common white matter structures present in a group of subjects. We demonstrate that our atlas creation method, which is based on group spectral clustering of tractography, discovers structures corresponding to expected white matter anatomy such as the corpus callosum, uncinate fasciculus, cingulum bundles, arcuate fasciculus, and corona radiata. The white matter clusters are augmented with expert anatomical labels and stored in a new type of atlas that we call a high-dimensional white matter atlas. We then show how to perform automatic segmentation of tractography from novel subjects by extending the spectral clustering solution, stored in the atlas, using the Nystrom method. We present results regarding the stability of our method and parameter choices. Finally we give results from an atlas creation and automatic segmentation experiment. We demonstrate that our automatic tractography segmentation identifies corresponding white matter regions across hemispheres and across subjects, enabling group comparison of white matter anatomy.

Journal ArticleDOI
TL;DR: This paper presents a method for classification of structural brain magnetic resonance (MR) images, by using a combination of deformation-based morphometry and machine learning methods, which demonstrates not only high classification accuracy but also good stability.
Abstract: This paper presents a method for classification of structural brain magnetic resonance (MR) images, by using a combination of deformation-based morphometry and machine learning methods. A morphological representation of the anatomy of interest is first obtained using a high-dimensional mass-preserving template warping method, which results in tissue density maps that constitute local tissue volumetric measurements. Regions that display strong correlations between tissue volume and classification (clinical) variables are extracted using a watershed segmentation algorithm, taking into account the regional smoothness of the correlation map which is estimated by a cross-validation strategy to achieve robustness to outliers. A volume increment algorithm is then applied to these regions to extract regional volumetric features, from which a feature selection technique using support vector machine (SVM)-based criteria is used to select the most discriminative features, according to their effect on the upper bound of the leave-one-out generalization error. Finally, SVM-based classification is applied using the best set of features, and it is tested using a leave-one-out cross-validation strategy. The results on MR brain images of healthy controls and schizophrenia patients demonstrate not only high classification accuracy (91.8% for female subjects and 90.8% for male subjects), but also good stability with respect to the number of features selected and the size of SVM kernel used

Journal ArticleDOI
TL;DR: A new exemplar-based framework is presented, which treats image completion, texture synthesis, and image inpainting in a unified manner, and manages to resolve what is currently considered as one major limitation of the BP algorithm: its inefficiency in handling MRFs with very large discrete state spaces.
Abstract: In this paper, a new exemplar-based framework is presented, which treats image completion, texture synthesis, and image inpainting in a unified manner. In order to be able to avoid the occurrence of visually inconsistent results, we pose all of the above image-editing tasks in the form of a discrete global optimization problem. The objective function of this problem is always well-defined, and corresponds to the energy of a discrete Markov random field (MRF). For efficiently optimizing this MRF, a novel optimization scheme, called priority belief propagation (BP), is then proposed, which carries two very important extensions over the standard BP algorithm: ldquopriority-based message schedulingrdquo and ldquodynamic label pruning.rdquo These two extensions work in cooperation to deal with the intolerable computational cost of BP, which is caused by the huge number of labels associated with our MRF. Moreover, both of our extensions are generic, since they do not rely on the use of domain-specific prior knowledge. They can, therefore, be applied to any MRF, i.e., to a very wide class of problems in image processing and computer vision, thus managing to resolve what is currently considered as one major limitation of the BP algorithm: its inefficiency in handling MRFs with very large discrete state spaces. Experimental results on a wide variety of input images are presented, which demonstrate the effectiveness of our image-completion framework for tasks such as object removal, texture synthesis, text removal, and image inpainting.

Proceedings ArticleDOI
26 Dec 2007
TL;DR: Non-metric similarities between pairs of images by matching SIFT features are derived and affinity propagation successfully identifies meaningful categories, which provide a natural summarization of the training images and can be used to classify new input images.
Abstract: Unsupervised categorization of images or image parts is often needed for image and video summarization or as a preprocessing step in supervised methods for classification, tracking and segmentation. While many metric-based techniques have been applied to this problem in the vision community, often, the most natural measures of similarity (e.g., number of matching SIFT features) between pairs of images or image parts is non-metric. Unsupervised categorization by identifying a subset of representative exemplars can be efficiently performed with the recently-proposed 'affinity propagation' algorithm. In contrast to k-centers clustering, which iteratively refines an initial randomly-chosen set of exemplars, affinity propagation simultaneously considers all data points as potential exemplars and iteratively exchanges messages between data points until a good solution emerges. When applied to the Olivetti face data set using a translation-invariant non-metric similarity, affinity propagation achieves a much lower reconstruction error and nearly halves the classification error rate, compared to state-of-the-art techniques. For the more challenging problem of unsupervised categorization of images from the CaltechlOl data set, we derived non-metric similarities between pairs of images by matching SIFT features. Affinity propagation successfully identifies meaningful categories, which provide a natural summarization of the training images and can be used to classify new input images.

Journal ArticleDOI
TL;DR: This paper addresses the problem of image segmentation by means of active contours, whose evolution is driven by the gradient flow derived from an energy functional that is based on the Bhattacharyya distance, and proposes a method for automatically adjusting the smoothness properties of the empirical distributions.
Abstract: This paper addresses the problem of image segmentation by means of active contours, whose evolution is driven by the gradient flow derived from an energy functional that is based on the Bhattacharyya distance. In particular, given the values of a photometric variable (or of a set thereof), which is to be used for classifying the image pixels, the active contours are designed to converge to the shape that results in maximal discrepancy between the empirical distributions of the photometric variable inside and outside of the contours. The above discrepancy is measured by means of the Bhattacharyya distance that proves to be an extremely useful tool for solving the problem at hand. The proposed methodology can be viewed as a generalization of the segmentation methods, in which active contours maximize the difference between a finite number of empirical moments of the ldquoinsiderdquo and ldquooutsiderdquo distributions. Furthermore, it is shown that the proposed methodology is very versatile and flexible in the sense that it allows one to easily accommodate a diversity of the image features based on which the segmentation should be performed. As an additional contribution, a method for automatically adjusting the smoothness properties of the empirical distributions is proposed. Such a procedure is crucial in situations when the number of data samples (supporting a certain segmentation class) varies considerably in the course of the evolution of the active contour. In this case, the smoothness properties of the empirical distributions have to be properly adjusted to avoid either over- or underestimation artifacts. Finally, a number of relevant segmentation results are demonstrated and some further research directions are discussed.

Proceedings ArticleDOI
26 Dec 2007
TL;DR: This work explores the segmentation algorithm defined by an linfin norm, provides a method for the optimization and shows that the resulting algorithm produces an accurate segmentation that demonstrates greater stability with respect to the number of seeds employed than either the graph cuts or random walker methods.
Abstract: In this work, we present a common framework for seeded image segmentation algorithms that yields two of the leading methods as special cases - The graph cuts and the random walker algorithms. The formulation of this common framework naturally suggests a new, third, algorithm that we develop here. Specifically, the former algorithms may be shown to minimize a certain energy with respect to either an l1 or an l2 norm. Here, we explore the segmentation algorithm defined by an linfin norm, provide a method for the optimization and show that the resulting algorithm produces an accurate segmentation that demonstrates greater stability with respect to the number of seeds employed than either the graph cuts or random walker methods.

Book
12 Oct 2007
TL;DR: This chapter discusses the development of Character Recognition, Evolution and Development, and some of the techniques used to achieve this goal, including Bayes Decision Theory, as well as some new methods based onributed graph matching.
Abstract: Figures. List of Tables. Preface. Acknowledgments. Acronyms. 1. Introduction: Character Recognition, Evolution and Development. 1.1 Generation and Recognition of Characters. 1.2 History of OCR. 1.3 Development of New Techniques. 1.4 Recent Trends and Movements. 1.5 Organization of the Remaining Chapters. References. 2. Tools for Image Pre-Processing. 2.1 Generic Form Processing System. 2.2 A Stroke Model for Complex Background Elimination. 2.2.1 Global Gray Level Thresholding. 2.2.2 Local Gray Level Thresholding. 2.2.3 Local Feature Thresholding-Stroke Based Model. 2.2.4 Choosing the Most Efficient Character Extraction Method. 2.2.5 Cleaning up Form Items Using Stroke Based Model. 2.3 A Scale-Space Approach for Visual Data Extraction. 2.3.1 Image Regularization. 2.3.2 Data Extraction. 2.3.3 Concluding Remarks. 2.4 Data Pre-Processing. 2.4.1 Smoothing and Noise Removal. 2.4.2 Skew Detection and Correction. 2.4.3 Slant Correction. 2.4.4 Character Normalization. 2.4.5 Contour Tracing/Analysis. 2.4.6 Thinning. 2.5 Chapter Summary. References 72. 3. Feature Extraction, Selection and Creation. 3.1 Feature Extraction. 3.1.1 Moments. 3.1.2 Histogram. 3.1.3 Direction Features. 3.1.4 Image Registration. 3.1.5 Hough Transform. 3.1.6 Line-Based Representation. 3.1.7 Fourier Descriptors. 3.1.8 Shape Approximation. 3.1.9 Topological Features. 3.1.10 Linear Transforms. 3.1.11 Kernels. 3.2 Feature Selection for Pattern Classification. 3.2.1 Review of Feature Selection Methods. 3.3 Feature Creation for Pattern Classification. 3.3.1 Categories of Feature Creation. 3.3.2 Review of Feature Creation Methods. 3.3.3 Future Trends. 3.4 Chapter Summary. References. 4. Pattern Classification Methods. 4.1 Overview of Classification Methods. 4.2 Statistical Methods. 4.2.1 Bayes Decision Theory. 4.2.2 Parametric Methods. 4.2.3 Non-ParametricMethods. 4.3 Artificial Neural Networks. 4.3.1 Single-Layer Neural Network. 4.3.2 Multilayer Perceptron. 4.3.3 Radial Basis Function Network. 4.3.4 Polynomial Network. 4.3.5 Unsupervised Learning. 4.3.6 Learning Vector Quantization. 4.4 Support Vector Machines. 4.4.1 Maximal Margin Classifier. 4.4.2 Soft Margin and Kernels. 4.4.3 Implementation Issues. 4.5 Structural Pattern Recognition. 4.5.1 Attributed String Matching. 4.5.2 Attributed Graph Matching. 4.6 Combining Multiple Classifiers. 4.6.1 Problem Formulation. 4.6.2 Combining Discrete Outputs. 4.6.3 Combining Continuous Outputs. 4.6.4 Dynamic Classifier Selection. 4.6.5 Ensemble Generation. 4.7 A Concrete Example. 4.8 Chapter Summary. References. 5. Word and String Recognition. 5.1 Introduction. 5.2 Character Segmentation. 5.2.1 Overview of Dissection Techniques. 5.2.2 Segmentation of Handwritten Digits. 5.3 Classification-Based String Recognition. 5.3.1 String Classification Model. 5.3.2 Classifier Design for String Recognition. 5.3.3 Search Strategies. 5.3.4 Strategies for Large Vocabulary. 5.4 HMM-Based Recognition. 5.4.1 Introduction to HMMs. 5.4.2 Theory and Implementation. 5.4.3 Application of HMMs to Text Recognition. 5.4.4 Implementation Issues. 5.4.5 Techniques for Improving HMMs' Performance. 5.4.6 Summary to HMM-Based Recognition. 5.5 Holistic Methods For Handwritten Word Recognition. 5.5.1 Introduction to Holistic Methods. 5.5.2 Overview of Holistic Methods. 5.5.3 Summary to Holistic Methods. 5.6 Chapter Summary. References. 6. Case Studies. 6.1 Automatically Generating Pattern Recognizers with Evolutionary Computation. 6.1.1 Motivation. 6.1.2 Introduction. 6.1.3 Hunters and Prey. 6.1.4 Genetic Algorithm. 6.1.5 Experiments. 6.1.6 Analysis. 6.1.7 Future Directions. 6.2 Offline Handwritten Chinese Character Recognition. 6.2.1 Related Works. 6.2.2 System Overview. 6.2.3 Character Normalization. 6.2.4 Direction Feature Extraction. 6.2.5 Classification Methods. 6.2.6 Experiments. 6.2.7 Concluding Remarks. 6.3 Segmentation and Recognition of Handwritten Dates on Canadian Bank Cheques. 6.3.1 Introduction. 6.3.2 System Architecture. 6.3.3 Date Image Segmentation. 6.3.4 Date Image Recognition. 6.3.5 Experimental Results. 6.3.6 Concluding Remarks. References.

Proceedings ArticleDOI
01 Jan 2007
TL;DR: The multiple segmentation approach is used to evaluate how close can real segments approach the ground-truth for real objects, and at what cost.
Abstract: Sliding window scanning is the dominant paradigm in object recognition research today. But while much success has been reported in detecting several rectangular-shaped object classes (i.e. faces, cars, pedestrians), results have been much less impressive for more general types of objects. Several researchers have advocated the use of image segmentation as a way to get a better spatial support for objects. In this paper, our aim is to address this issue by studying the following two questions: 1) how important is good spatial support for recognition? 2) can segmentation provide better spatial support for objects? To answer the first, we compare recognition performance using ground-truth segmentation vs. bounding boxes. To answer the second, we use the multiple segmentation approach to evaluate how close can real segments approach the ground-truth for real objects, and at what cost. Our results demonstrate the importance of finding the right spatial support for objects, and the feasibility of doing so without excessive computational burden.