scispace - formally typeset
Search or ask a question

Showing papers on "Image segmentation published in 2008"


Journal ArticleDOI
TL;DR: This paper describes the Semi-Global Matching (SGM) stereo method, which uses a pixelwise, Mutual Information based matching cost for compensating radiometric differences of input images and demonstrates a tolerance against a wide range of radiometric transformations.
Abstract: This paper describes the semiglobal matching (SGM) stereo method. It uses a pixelwise, mutual information (Ml)-based matching cost for compensating radiometric differences of input images. Pixelwise matching is supported by a smoothness constraint that is usually expressed as a global cost function. SGM performs a fast approximation by pathwise optimizations from all directions. The discussion also addresses occlusion detection, subpixel refinement, and multibaseline matching. Additionally, postprocessing steps for removing outliers, recovering from specific problems of structured environments, and the interpolation of gaps are presented. Finally, strategies for processing almost arbitrarily large images and fusion of disparity images using orthographic projection are proposed. A comparison on standard stereo images shows that SGM is among the currently top-ranked algorithms and is best, if subpixel accuracy is considered. The complexity is linear to the number of pixels and disparity range, which results in a runtime of just 1-2 seconds on typical test images. An in depth evaluation of the Ml-based matching cost demonstrates a tolerance against a wide range of radiometric transformations. Finally, examples of reconstructions from huge aerial frame and pushbroom images demonstrate that the presented ideas are working well on practical problems.

3,302 citations


Journal ArticleDOI
TL;DR: A closed-form solution to natural image matting that allows us to find the globally optimal alpha matte by solving a sparse linear system of equations and predicts the properties of the solution by analyzing the eigenvectors of a sparse matrix, closely related to matrices used in spectral image segmentation algorithms.
Abstract: Interactive digital matting, the process of extracting a foreground object from an image based on limited user input, is an important task in image and video editing. From a computer vision perspective, this task is extremely challenging because it is massively ill-posed - at each pixel we must estimate the foreground and the background colors, as well as the foreground opacity ("alpha matte") from a single color measurement. Current approaches either restrict the estimation to a small part of the image, estimating foreground and background colors based on nearby pixels where they are known, or perform iterative nonlinear estimation by alternating foreground and background color estimation with alpha estimation. In this paper, we present a closed-form solution to natural image matting. We derive a cost function from local smoothness assumptions on foreground and background colors and show that in the resulting expression, it is possible to analytically eliminate the foreground and background colors to obtain a quadratic cost function in alpha. This allows us to find the globally optimal alpha matte by solving a sparse linear system of equations. Furthermore, the closed-form formula allows us to predict the properties of the solution by analyzing the eigenvectors of a sparse matrix, closely related to matrices used in spectral image segmentation algorithms. We show that high-quality mattes for natural images may be obtained from a small amount of user input.

1,851 citations


Journal ArticleDOI
TL;DR: This work puts forward ways for handling nonhomogeneous noise and missing information, paving the way to state-of-the-art results in applications such as color image denoising, demosaicing, and inpainting, as demonstrated in this paper.
Abstract: Sparse representations of signals have drawn considerable interest in recent years. The assumption that natural signals, such as images, admit a sparse decomposition over a redundant dictionary leads to efficient algorithms for handling such sources of data. In particular, the design of well adapted dictionaries for images has been a major challenge. The K-SVD has been recently proposed for this task and shown to perform very well for various grayscale image processing tasks. In this paper, we address the problem of learning dictionaries for color images and extend the K-SVD-based grayscale image denoising algorithm that appears in . This work puts forward ways for handling nonhomogeneous noise and missing information, paving the way to state-of-the-art results in applications such as color image denoising, demosaicing, and inpainting, as demonstrated in this paper.

1,818 citations


Journal ArticleDOI
TL;DR: This work proposes a region-based active contour model that draws upon intensity information in local regions at a controllable scale to cope with intensity inhomogeneity and shows desirable performances of this model.
Abstract: Intensity inhomogeneities often occur in real-world images and may cause considerable difficulties in image segmentation. In order to overcome the difficulties caused by intensity inhomogeneities, we propose a region-based active contour model that draws upon intensity information in local regions at a controllable scale. A data fitting energy is defined in terms of a contour and two fitting functions that locally approximate the image intensities on the two sides of the contour. This energy is then incorporated into a variational level set formulation with a level set regularization term, from which a curve evolution equation is derived for energy minimization. Due to a kernel function in the data fitting term, intensity information in local regions is extracted to guide the motion of the contour, which thereby enables our model to cope with intensity inhomogeneity. In addition, the regularity of the level set function is intrinsically preserved by the level set regularization term to ensure accurate computation and avoids expensive reinitialization of the evolving level set function. Experimental results for synthetic and real images show desirable performances of our method.

1,630 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A privacy-preserving system for estimating the size of inhomogeneous crowds, composed of pedestrians that travel in different directions, without using explicit object segmentation or tracking is presented.
Abstract: We present a privacy-preserving system for estimating the size of inhomogeneous crowds, composed of pedestrians that travel in different directions, without using explicit object segmentation or tracking. First, the crowd is segmented into components of homogeneous motion, using the mixture of dynamic textures motion model. Second, a set of simple holistic features is extracted from each segmented region, and the correspondence between features and the number of people per segment is learned with Gaussian process regression. We validate both the crowd segmentation algorithm, and the crowd counting system, on a large pedestrian dataset (2000 frames of video, containing 49,885 total pedestrian instances). Finally, we present results of the system running on a full hour of video.

1,164 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: The proposed semantic texton forests are ensembles of decision trees that act directly on image pixels, and therefore do not need the expensive computation of filter-bank responses or local descriptors, and give at least a five-fold increase in execution speed.
Abstract: We propose semantic texton forests, efficient and powerful new low-level features. These are ensembles of decision trees that act directly on image pixels, and therefore do not need the expensive computation of filter-bank responses or local descriptors. They are extremely fast to both train and test, especially compared with k-means clustering and nearest-neighbor assignment of feature descriptors. The nodes in the trees provide (i) an implicit hierarchical clustering into semantic textons, and (ii) an explicit local classification estimate. Our second contribution, the bag of semantic textons, combines a histogram of semantic textons over an image region with a region prior category distribution. The bag of semantic textons is computed over the whole image for categorization, and over local rectangular regions for segmentation. Including both histogram and region prior allows our segmentation algorithm to exploit both textural and semantic context. Our third contribution is an image-level prior for segmentation that emphasizes those categories that the automatic categorization believes to be present. We evaluate on two datasets including the very challenging VOC 2007 segmentation dataset. Our results significantly advance the state-of-the-art in segmentation accuracy, and furthermore, our use of efficient decision forests gives at least a five-fold increase in execution speed.

1,162 citations


Journal ArticleDOI
TL;DR: A natural framework that allows any region-based segmentation energy to be re-formulated in a local way is proposed and the localization of three well-known energies are demonstrated in order to illustrate how this framework can be applied to any energy.
Abstract: In this paper, we propose a natural framework that allows any region-based segmentation energy to be re-formulated in a local way. We consider local rather than global image statistics and evolve a contour based on local information. Localized contours are capable of segmenting objects with heterogeneous feature profiles that would be difficult to capture correctly using a standard global method. The presented technique is versatile enough to be used with any global region-based active contour energy and instill in it the benefits of localization. We describe this framework and demonstrate the localization of three well-known energies in order to illustrate how our framework can be applied to any energy. We then compare each localized energy to its global counterpart to show the improvements that can be achieved. Next, an in-depth study of the behaviors of these energies in response to the degree of localization is given. Finally, we show results on challenging images to illustrate the robust and accurate segmentations that are possible with this new class of active contour models.

1,149 citations


Journal ArticleDOI
23 Jun 2008
TL;DR: This paper proposes a novel framework for labelling problems which is able to combine multiple segmentations in a principled manner based on higher order conditional random fields and uses potentials defined on sets of pixels generated using unsupervised segmentation algorithms.
Abstract: This paper proposes a novel framework for labelling problems which is able to combine multiple segmentations in a principled manner. Our method is based on higher order conditional random fields and uses potentials defined on sets of pixels (image segments) generated using unsupervised segmentation algorithms. These potentials enforce label consistency in image regions and can be seen as a strict generalization of the commonly used pairwise contrast sensitive smoothness potentials. The higher order potential functions used in our framework take the form of the robust Pn model. This enables the use of powerful graph cut based move making algorithms for performing inference in the framework [14 ]. We test our method on the problem of multi-class object segmentation by augmenting the conventional CRF used for object segmentation with higher order potentials defined on image regions. Experiments on challenging data sets show that integration of higher order potentials quantitatively and qualitatively improves results leading to much better definition of object boundaries. We believe that this method can be used to yield similar improvements for many other labelling problems.

1,069 citations


Journal ArticleDOI
TL;DR: A set of energy minimization benchmarks are described and used to compare the solution quality and runtime of several common energy minimizations algorithms and a general-purpose software interface is provided that allows vision researchers to easily switch between optimization methods.
Abstract: Among the most exciting advances in early vision has been the development of efficient energy minimization algorithms for pixel-labeling tasks such as depth or texture computation. It has been known for decades that such problems can be elegantly expressed as Markov random fields, yet the resulting energy minimization problems have been widely viewed as intractable. Algorithms such as graph cuts and loopy belief propagation (LBP) have proven to be very powerful: For example, such methods form the basis for almost all the top-performing stereo methods. However, the trade-offs among different energy minimization algorithms are still not well understood. In this paper, we describe a set of energy minimization benchmarks and use them to compare the solution quality and runtime of several common energy minimization algorithms. We investigate three promising methods-graph cuts, LBP, and tree-reweighted message passing-in addition to the well-known older iterated conditional mode (ICM) algorithm. Our benchmark problems are drawn from published energy functions used for stereo, image stitching, interactive segmentation, and denoising. We also provide a general-purpose software interface that allows vision researchers to easily switch between optimization methods. The benchmarks, code, images, and results are available at http://vision.middlebury.edu/MRF/.

1,065 citations


Journal ArticleDOI
TL;DR: An extensive evaluation of the unsupervised objective evaluation methods that have been proposed in the literature are presented and the advantages and shortcomings of the underlying design mechanisms in these methods are discussed and analyzed.

996 citations


Journal ArticleDOI
TL;DR: This survey covers the historical development and current state of the art in image understanding for iris biometrics and suggests a short list of recommended readings for someone new to the field to quickly grasp the big picture of irisBiometrics.

Book ChapterDOI
12 May 2008
TL;DR: A novel method to determine salient regions in images using low-level features of luminance and color is presented, which is fast, easy to implement and generates high quality saliency maps of the same size and resolution as the input image.
Abstract: Detection of salient image regions is useful for applications like image segmentation, adaptive compression, and region-based image retrieval. In this paper we present a novel method to determine salient regions in images using low-level features of luminance and color. The method is fast, easy to implement and generates high quality saliency maps of the same size and resolution as the input image. We demonstrate the use of the algorithm in the segmentation of semantically meaningful whole objects from digital images.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: This article proposes an energy formulation with both sparse reconstruction and class discrimination components, jointly optimized during dictionary learning, for local image discrimination tasks, and paves the way for a novel scene analysis and recognition framework based on simultaneously learning discriminative and reconstructive dictionaries.
Abstract: Sparse signal models have been the focus of much recent research, leading to (or improving upon) state-of-the-art results in signal, image, and video restoration. This article extends this line of research into a novel framework for local image discrimination tasks, proposing an energy formulation with both sparse reconstruction and class discrimination components, jointly optimized during dictionary learning. This approach improves over the state of the art in texture segmentation experiments using the Brodatz database, and it paves the way for a novel scene analysis and recognition framework based on simultaneously learning discriminative and reconstructive dictionaries. Preliminary results in this direction using examples from the Pascal VOC06 and Graz02 datasets are presented as well.

Journal ArticleDOI
TL;DR: This work proposes an approach based on self organization through artificial neural networks, widely applied in human image processing systems and more generally in cognitive science, that can handle scenes containing moving backgrounds, gradual illumination variations and camouflage, and achieves robust detection for different types of videos taken with stationary cameras.
Abstract: Detection of moving objects in video streams is the first relevant step of information extraction in many computer vision applications. Aside from the intrinsic usefulness of being able to segment video streams into moving and background components, detecting moving objects provides a focus of attention for recognition, classification, and activity analysis, making these later steps more efficient. We propose an approach based on self organization through artificial neural networks, widely applied in human image processing systems and more generally in cognitive science. The proposed approach can handle scenes containing moving backgrounds, gradual illumination variations and camouflage, has no bootstrapping limitations, can include into the background model shadows cast by moving objects, and achieves robust detection for different types of videos taken with stationary cameras. We compare our method with other modeling techniques and report experimental results, both in terms of detection accuracy and in terms of processing speed, for color video sequences that represent typical situations critical for video surveillance systems.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: This work proposes a principled approach to summarization of visual data based on optimization of a well-defined similarity measure and shows that the same approach can be used to address a variety of other problems, including automatic cropping, completion and synthesis ofVisual data, image collage, object removal, photo reshuffling and more.
Abstract: We propose a principled approach to summarization of visual data (images or video) based on optimization of a well-defined similarity measure. The problem we consider is re-targeting (or summarization) of image/video data into smaller sizes. A good ldquovisual summaryrdquo should satisfy two properties: (1) it should contain as much as possible visual information from the input data; (2) it should introduce as few as possible new visual artifacts that were not in the input data (i.e., preserve visual coherence). We propose a bi-directional similarity measure which quantitatively captures these two requirements: Two signals S and T are considered visually similar if all patches of S (at multiple scales) are contained in T, and vice versa. The problem of summarization/re-targeting is posed as an optimization problem of this bi-directional similarity measure. We show summarization results for image and video data. We further show that the same approach can be used to address a variety of other problems, including automatic cropping, completion and synthesis of visual data, image collage, object removal, photo reshuffling and more.

Journal ArticleDOI
TL;DR: The framework takes advantage of the integration of image processing, geometric analysis and mesh generation techniques, with an accent on full automation and high-level interaction, to be performed in the context of large-scale studies.
Abstract: We present a modeling framework designed for patient-specific computational hemodynamics to be performed in the context of large-scale studies. The framework takes advantage of the integration of image processing, geometric analysis and mesh generation techniques, with an accent on full automation and high-level interaction. Image segmentation is performed using implicit deformable models taking advantage of a novel approach for selective initialization of vascular branches, as well as of a strategy for the segmentation of small vessels. A robust definition of centerlines provides objective geometric criteria for the automation of surface editing and mesh generation. The framework is available as part of an open-source effort, the Vascular Modeling Toolkit, a first step towards the sharing of tools and data which will be necessary for computational hemodynamics to play a role in evidence-based medicine.

Journal ArticleDOI
TL;DR: An automatic four-chamber heart segmentation system for the quantitative functional analysis of the heart from cardiac computed tomography (CT) volumes is proposed and an efficient and robust approach for automatic heart chamber segmentation in 3D CT volumes is developed.
Abstract: We propose an automatic four-chamber heart segmentation system for the quantitative functional analysis of the heart from cardiac computed tomography (CT) volumes. Two topics are discussed: heart modeling and automatic model fitting to an unseen volume. Heart modeling is a nontrivial task since the heart is a complex nonrigid organ. The model must be anatomically accurate, allow manual editing, and provide sufficient information to guide automatic detection and segmentation. Unlike previous work, we explicitly represent important landmarks (such as the valves and the ventricular septum cusps) among the control points of the model. The control points can be detected reliably to guide the automatic model fitting process. Using this model, we develop an efficient and robust approach for automatic heart chamber segmentation in 3D CT volumes. We formulate the segmentation as a two-step learning problem: anatomical structure localization and boundary delineation. In both steps, we exploit the recent advances in learning discriminative models. A novel algorithm, marginal space learning (MSL), is introduced to solve the 9-D similarity transformation search problem for localizing the heart chambers. After determining the pose of the heart chambers, we estimate the 3D shape through learning-based boundary delineation. The proposed method has been extensively tested on the largest dataset (with 323 volumes from 137 patients) ever reported in the literature. To the best of our knowledge, our system is the fastest with a speed of 4.0 s per volume (on a dual-core 3.2-GHz processor) for the automatic segmentation of all four chambers.

Journal ArticleDOI
TL;DR: A review of the state of the art of segmentation and partitioning techniques of boundary meshes and identifies two primarily distinct types of mesh segmentation, namely part segments and surface‐patch segmentation.
Abstract: We present a review of the state of the art of segmentation and partitioning techniques of boundary meshes. Recently, these have become a part of many mesh and object manipulation algorithms in computer graphics, geometric modelling and computer aided design. We formulate the segmentation problem as an optimization problem and identify two primarily distinct types of mesh segmentation, namely part segmentation and surface-patch segmentation. We classify previous segmentation solutions according to the different segmentation goals, the optimization criteria and features used, and the various algorithmic techniques employed. We also present some generic algorithms for the major segmentation techniques.

Journal ArticleDOI
TL;DR: It is shown that kAS substantially outperform IPs for detecting shape-based classes, and the object detector is compared to the recent state-of-the-art system by Dalal and Triggs (2005).
Abstract: We present a family of scale-invariant local shape features formed by chains of k connected roughly straight contour segments (kAS), and their use for object class detection. kAS are able to cleanly encode pure fragments of an object boundary without including nearby clutter. Moreover, they offer an attractive compromise between information content and repeatability and encompass a wide variety of local shape structures. We also define a translation and scale invariant descriptor encoding the geometric configuration of the segments within a kAS, making kAS easy to reuse in other frameworks, for example, as a replacement or addition to interest points (IPs). Software for detecting and describing kAS is released at http://lear.inrialpes.fr/software. We demonstrate the high performance of kAS within a simple but powerful sliding-window object detection scheme. Through extensive evaluations, involving eight diverse object classes and more than 1,400 images, we (1) study the evolution of performance as the degree of feature complexity k varies and determine the best degree, (2) show that kAS substantially outperform IPs for detecting shape-based classes, and (3) compare our object detector to the recent state-of-the-art system by Dalal and Triggs (2005).

Journal ArticleDOI
TL;DR: This paper offers to researchers a link to a public image database to define a common reference point for LPR algorithmic assessment and issues such as processing time, computational power, and recognition rate are addressed.
Abstract: License plate recognition (LPR) algorithms in images or videos are generally composed of the following three processing steps: 1) extraction of a license plate region; 2) segmentation of the plate characters; and 3) recognition of each character This task is quite challenging due to the diversity of plate formats and the nonuniform outdoor illumination conditions during image acquisition Therefore, most approaches work only under restricted conditions such as fixed illumination, limited vehicle speed, designated routes, and stationary backgrounds Numerous techniques have been developed for LPR in still images or video sequences, and the purpose of this paper is to categorize and assess them Issues such as processing time, computational power, and recognition rate are also addressed, when available Finally, this paper offers to researchers a link to a public image database to define a common reference point for LPR algorithmic assessment

Journal ArticleDOI
TL;DR: This paper model the distribution of the texture features using a mixture of Gaussian distributions, allowing the mixture components to be degenerate or nearly-degenerate, and shows that such a mixture distribution can be effectively segmented by a simple agglomerative clustering algorithm derived from a lossy data compression approach.

Journal ArticleDOI
TL;DR: A method to automatically detect the position of the OD in digital retinal fundus images by normalizing luminosity and contrast through out the image using illumination equalization and adaptive histogram equalization methods, which was evaluated using a subset of the STARE project's dataset.
Abstract: Optic disc (OD) detection is a main step while developing automated screening systems for diabetic retinopathy. We present in this paper a method to automatically detect the position of the OD in digital retinal fundus images. The method starts by normalizing luminosity and contrast through out the image using illumination equalization and adaptive histogram equalization methods respectively. The OD detection algorithm is based on matching the expected directional pattern of the retinal blood vessels. Hence, a simple matched filter is proposed to roughly match the direction of the vessels at the OD vicinity. The retinal vessels are segmented using a simple and standard 2-D Gaussian matched filter. Consequently, a vessels direction map of the segmented retinal vessels is obtained using the same segmentation algorithm. The segmented vessels are then thinned, and filtered using local intensity, to represent finally the OD-center candidates. The difference between the proposed matched filter resized into four different sizes, and the vessels' directions at the surrounding area of each of the OD-center candidates is measured. The minimum difference provides an estimate of the OD-center coordinates. The proposed method was evaluated using a subset of the STARE project's dataset, containing 81 fundus images of both normal and diseased retinas, and initially used by literature OD detection methods. The OD-center was detected correctly in 80 out of the 81 images (98.77%). In addition, the OD-center was detected correctly in all of the 40 images (100%) using the publicly available DRIVE dataset.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: A novel method to address the problem of estimating the number of people in surveillance scenes with people gathering and waiting by combining a MID based foreground segmentation algorithm and a HOG based head-shoulder detection algorithm to provide an accurate estimation of people counts in the observed area.
Abstract: This paper proposes a novel method to address the problem of estimating the number of people in surveillance scenes with people gathering and waiting. The proposed method combines a MID (mosaic image difference) based foreground segmentation algorithm and a HOG (histograms of oriented gradients) based head-shoulder detection algorithm to provide an accurate estimation of people counts in the observed area. In our framework, the MID-based foreground segmentation module provides active areas for the head-shoulder detection module to detect heads and count the number of people. Numerous experiments are conducted and convincing results demonstrate the effectiveness of our method.

Journal ArticleDOI
Stephen Gould1, Jim Rodgers1, David I. Cohen1, Gal Elidan1, Daphne Koller1 
TL;DR: This work proposes a method for capturing global information from inter-class spatial relationships and encoding it as a local feature and shows that the incorporation of relative location information allows it to significantly outperform the current state-of-the-art.
Abstract: Multi-class image segmentation has made significant advances in recent years through the combination of local and global features. One important type of global feature is that of inter-class spatial relationships. For example, identifying "tree" pixels indicates that pixels above and to the sides are more likely to be "sky" whereas pixels below are more likely to be "grass." Incorporating such global information across the entire image and between all classes is a computational challenge as it is image-dependent, and hence, cannot be precomputed. In this work we propose a method for capturing global information from inter-class spatial relationships and encoding it as a local feature. We employ a two-stage classification process to label all image pixels. First, we generate predictions which are used to compute a local relative location feature from learned relative location maps. In the second stage, we combine this with appearance-based features to provide a final segmentation. We compare our results to recent published results on several multi-class image segmentation databases and show that the incorporation of relative location information allows us to significantly outperform the current state-of-the-art.

Journal ArticleDOI
TL;DR: In this paper, a Bayesian formulation for incorporating soft model assignments into the calculation of affinities is presented. And the resulting soft model assignment is integrated into the multilevel segmentation by weighted aggregation algorithm, and applied to the task of detecting and segmenting brain tumor and edema in multichannel magnetic resonance (MR) volumes.
Abstract: We present a new method for automatic segmentation of heterogeneous image data that takes a step toward bridging the gap between bottom-up affinity-based segmentation methods and top-down generative model based approaches. The main contribution of the paper is a Bayesian formulation for incorporating soft model assignments into the calculation of affinities, which are conventionally model free. We integrate the resulting model-aware affinities into the multilevel segmentation by weighted aggregation algorithm, and apply the technique to the task of detecting and segmenting brain tumor and edema in multichannel magnetic resonance (MR) volumes. The computationally efficient method runs orders of magnitude faster than current state-of-the-art techniques giving comparable or improved results. Our quantitative results indicate the benefit of incorporating model-aware affinities into the segmentation process for the difficult case of glioblastoma multiforme brain tumor.

Journal ArticleDOI
TL;DR: The model-based approach for the fully automatic segmentation of the whole heart (four chambers, myocardium, and great vessels) from 3-D CT images shows better interphase and interpatient shape variability characterization than commonly used principal component analysis.
Abstract: Automatic image processing methods are a pre-requisite to efficiently analyze the large amount of image data produced by computed tomography (CT) scanners during cardiac exams. This paper introduces a model-based approach for the fully automatic segmentation of the whole heart (four chambers, myocardium, and great vessels) from 3-D CT images. Model adaptation is done by progressively increasing the degrees-of-freedom of the allowed deformations. This improves convergence as well as segmentation accuracy. The heart is first localized in the image using a 3-D implementation of the generalized Hough transform. Pose misalignment is corrected by matching the model to the image making use of a global similarity transformation. The complex initialization of the multicompartment mesh is then addressed by assigning an affine transformation to each anatomical region of the model. Finally, a deformable adaptation is performed to accurately match the boundaries of the patient's anatomy. A mean surface-to-surface error of 0.82 mm was measured in a leave-one-out quantitative validation carried out on 28 images. Moreover, the piecewise affine transformation introduced for mesh initialization and adaptation shows better interphase and interpatient shape variability characterization than commonly used principal component analysis.

Journal ArticleDOI
TL;DR: An automatic method for delineating the prostate in three-dimensional magnetic resonance scans is presented, based on nonrigid registration of a set of prelabeled atlas images, and the segmentation quality is especially good at the prostate-rectum interface.
Abstract: An automatic method for delineating the prostate (including the seminal vesicles) in three-dimensional magnetic resonance scans is presented. The method is based on nonrigid registration of a set of prelabeled atlas images. Each atlas image is nonrigidly registered with the target patient image. Subsequently, the deformed atlas label images are fused to yield a single segmentation of the patient image. The proposed method is evaluated on 50 clinical scans, which were manually segmented by three experts. The Dice similarity coefficient (DSC) is used to quantify the overlap between the automatic and manual segmentations. We investigate the impact of several factors on the performance of the segmentation method. For the registration, two similarity measures are compared: Mutual information and a localized version of mutual information. The latter turns out to be superior (median DeltaDSC approximately equal 0.02, p 0.05). To assess the influence of the atlas composition, two atlas sets are compared. The first set consists of 38 scans of healthy volunteers. The second set is constructed by a leave-one-out approach using the 50 clinical scans that are used for evaluation. The second atlas set gives substantially better performance (DeltaDSC=0.04, p < 0.01), stressing the importance of a careful atlas definition. With the best settings, a median DSC of around 0.85 is achieved, which is close to the median interobserver DSC of 0.87. The segmentation quality is especially good at the prostate-rectum interface, where the segmentation error remains below 1 mm in 50% of the cases and below 1.5 mm in 75% of the cases.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: This work forms several versions of the connectivity constraint and shows that the corresponding optimization problems are all NP-hard.
Abstract: Graph cut is a popular technique for interactive image segmentation. However, it has certain shortcomings. In particular, graph cut has problems with segmenting thin elongated objects due to the ldquoshrinking biasrdquo. To overcome this problem, we propose to impose an additional connectivity prior, which is a very natural assumption about objects. We formulate several versions of the connectivity constraint and show that the corresponding optimization problems are all NP-hard. For some of these versions we propose two optimization algorithms: (i) a practical heuristic technique which we call DijkstraGC, and (ii) a slow method based on problem decomposition which provides a lower bound on the problem. We use the second technique to verify that for some practical examples DijkstraGC is able to find the global minimum.

Journal ArticleDOI
TL;DR: This work studies the mixture of dynamic textures, a statistical model for an ensemble of video sequences that is sampled from a finite collection of visual processes, each of which is a dynamic texture.
Abstract: A dynamic texture is a spatio-temporal generative model for video, which represents video sequences as observations from a linear dynamical system. This work studies the mixture of dynamic textures, a statistical model for an ensemble of video sequences that is sampled from a finite collection of visual processes, each of which is a dynamic texture. An expectation-maximization (EM) algorithm is derived for learning the parameters of the model, and the model is related to previous works in linear systems, machine learning, time- series clustering, control theory, and computer vision. Through experimentation, it is shown that the mixture of dynamic textures is a suitable representation for both the appearance and dynamics of a variety of visual processes that have traditionally been challenging for computer vision (for example, fire, steam, water, vehicle and pedestrian traffic, and so forth). When compared with state-of-the-art methods in motion segmentation, including both temporal texture methods and traditional representations (for example, optical flow or other localized motion representations), the mixture of dynamic textures achieves superior performance in the problems of clustering and segmenting video of such processes.

Journal ArticleDOI
TL;DR: A model-based approach to interpret the image observations by multiple partially occluded human hypotheses in a Bayesian framework is proposed, which defines a joint image likelihood for multiple humans based on the appearance of the humans, the visibility of the body obtained by occlusion reasoning, and foreground/background separation.
Abstract: Segmentation and tracking of multiple humans in crowded situations is made difficult by interobject occlusion. We propose a model-based approach to interpret the image observations by multiple partially occluded human hypotheses in a Bayesian framework. We define a joint image likelihood for multiple humans based on the appearance of the humans, the visibility of the body obtained by occlusion reasoning, and foreground/background separation. The optimal solution is obtained by using an efficient sampling method, data-driven Markov chain Monte Carlo (DDMCMC), which uses image observations for proposal probabilities. Knowledge of various aspects, including human shape, camera model, and image cues, are integrated in one theoretically sound framework. We present experimental results and quantitative evaluation, demonstrating that the resulting approach is effective for very challenging data.