scispace - formally typeset
Search or ask a question

Showing papers on "Image segmentation published in 1997"


Proceedings ArticleDOI
17 Jun 1997
TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.
Abstract: We propose a novel approach for solving the perceptual grouping problem in vision. Rather than focusing on local features and their consistencies in the image data, our approach aims at extracting the global impression of an image. We treat image segmentation as a graph partitioning problem and propose a novel global criterion, the normalized cut, for segmenting the graph. The normalized cut criterion measures both the total dissimilarity between the different groups as well as the total similarity within the groups. We show that an efficient computational technique based on a generalized eigenvalue problem can be used to optimize this criterion. We have applied this approach to segmenting static images and found results very encouraging.

11,827 citations


Proceedings ArticleDOI
16 Jun 1997
TL;DR: The paper gives an overview of the various tasks involved in motion analysis of the human body, and focuses on three major areas related to interpreting human motion: motion analysis involving human body parts, tracking of human motion using single or multiple cameras, and recognizing human activities from image sequences.
Abstract: Human motion analysis is receiving increasing attention from computer vision researchers. This interest is motivated by a wide spectrum of applications, such as athletic performance analysis, surveillance, man-machine interfaces, content-based image storage and retrieval, and video conferencing. The paper gives an overview of the various tasks involved in motion analysis of the human body. The authors focus on three major areas related to interpreting human motion: 1) motion analysis involving human body parts, 2) tracking of human motion using single or multiple cameras, and 3) recognizing human activities from image sequences. Motion analysis of human body parts involves the low-level segmentation of the human body into segments connected by joints, and recovers the 3D structure of the human body using its 2D projections over a sequence of images. Tracking human motion using a single or multiple camera focuses on higher-level processing, in which moving humans are tracked without identifying specific parts of the body structure. After successfully matching the moving human image from one frame to another in image sequences, understanding the human movements or activities comes naturally, which leads to a discussion of recognizing human activities. The review is illustrated by examples.

1,665 citations


Proceedings Article
01 Aug 1997
TL;DR: A mixture-of-Gaussians classification model for each pixel is learned using an unsupervised technique--an efficient, incremental version of EM, which identifies and eliminates shadows much more effectively than other techniques such as thresholding.
Abstract: "Background subtraction" is an old technique for finding moving objects in a video sequence--for example, cars driving on a freeway. The idea is that subtracting the current image from a time-averaged background image will leave only nonstationary objects. It is, however, a crude approximation to the task of classifying each pixel of the current image; it fails with slow-moving objects and does not distinguish shadows from moving objects. The basic idea of this paper is that we can classify each pixel using a model of how that pixel looks when it is part of different classes. We learn a mixture-of-Gaussians classification model for each pixel using an unsupervised technique--an efficient, incremental version of EM. Unlike the standard image-averaging approach, this automatically updates the mixture component for each class according to likelihood of membership; hence slow-moving objects are handled perfectly. Our approach also identifies and eliminates shadows much more effectively than other techniques such as thresholding. Application of this method as part of the Roadwatch traffic surveillance project is expected to result in significant improvements in vehicle identification and tracking.

1,003 citations


Proceedings ArticleDOI
17 Jun 1997
TL;DR: A new external force for active contours is developed, which is computed as a diffusion of the gradient vectors of a gray-level or binary edge map derived from the image, which has a large capture range and forces active contour regions into concave regions.
Abstract: Snakes, or active contours, are used extensively in computer vision and image processing applications, particularly to locate object boundaries. Problems associated with initialization and poor convergence to concave boundaries, however, have limited their utility. This paper develops a new external force for active contours, largely solving both problems. This external force, which we call gradient vector flow (GVF) is computed as a diffusion of the gradient vectors of a gray-level or binary edge map derived from the image. The resultant field has a large capture range and forces active contours into concave regions. Examples on simulated images and one real image are presented.

914 citations


Proceedings ArticleDOI
26 Oct 1997
TL;DR: An implementation of NeTra, a prototype image retrieval system that uses color, texture, shape and spatial location information in segmented image regions to search and retrieve similar regions from the database, is presented.
Abstract: We present an implementation of NeTra, a prototype image retrieval system that uses color texture, shape and spatial location information in segmented image database. A distinguishing aspect of this system is its incorporation of a robust automated image segmentation algorithm that allows object or region based search. Image segmentation significantly improves the quality of image retrieval when images contain multiple complex objects. Other important components of the system include an efficient color representation, and indexing of color, texture, and shape features for fast search and retrieval. This representation allows the user to compose interesting queries such as "retrieve all images that contain regions that have the color of object A, texture of object B, shape of object C, and lie in the upper one-third of the image" where the individual objects could be regions belonging to different images.

884 citations


Proceedings ArticleDOI
17 Jun 1997
TL;DR: A general technique for the recovery of significant image features is presented, based on the mean shift algorithm, a simple nonparametric procedure for estimating density gradients.
Abstract: A general technique for the recovery of significant image features is presented. The technique is based on the mean shift algorithm, a simple nonparametric procedure for estimating density gradients. Drawbacks of the current methods (including robust clustering) are avoided. Feature space of any nature can be processed, and as an example, color image segmentation is discussed. The segmentation is completely autonomous, only its class is chosen by the user. Thus, the same program can produce a high quality edge image, or provide, by extracting all the significant colors, a preprocessor for content-based query systems. A 512/spl times/512 color image is analyzed in less than 10 seconds on a standard workstation. Gray level images are handled as color images having only the lightness coordinate.

790 citations


Journal ArticleDOI
TL;DR: This work employs the new geometric active contour models, previously formulated, for edge detection and segmentation of magnetic resonance imaging (MRI), computed tomography (CT), and ultrasound medical imagery, and leads to a novel snake paradigm in which the feature of interest may be considered to lie at the bottom of a potential well.
Abstract: We employ the new geometric active contour models, previously formulated, for edge detection and segmentation of magnetic resonance imaging (MRI), computed tomography (CT), and ultrasound medical imagery. Our method is based on defining feature-based metrics on a given image which in turn leads to a novel snake paradigm in which the feature of interest may be considered to lie at the bottom of a potential well. Thus, the snake is attracted very quickly and efficiently to the desired feature.

676 citations


Journal ArticleDOI
TL;DR: The proposed scheme for segmentation is based on the iterative conditional modes (ICM) algorithm in which measurement model parameters are estimated using local information at each site, and the prior model parametersare estimated using the segmentation after each cycle of iterations.
Abstract: A statistical model is presented that represents the distributions of major tissue classes in single-channel magnetic resonance (MR) cerebral images. Using the model, cerebral images are segmented into gray matter, white matter, and cerebrospinal fluid (CSF). The model accounts for random noise, magnetic field inhomogeneities, and biological variations of the tissues. Intensity measurements are modeled by a finite Gaussian mixture. Smoothness and piecewise contiguous nature of the tissue regions are modeled by a three-dimensional (3-D) Markov random field (MRF). A segmentation algorithm, based on the statistical model, approximately finds the maximum a posteriori (MAP) estimation of the segmentation and estimates the model parameters from the image data. The proposed scheme for segmentation is based on the iterative conditional modes (ICM) algorithm in which measurement model parameters are estimated using local information at each site, and the prior model parameters are estimated using the segmentation after each cycle of iterations. Application of the algorithm to a sample of clinical MR brain scans, comparisons of the algorithm with other statistical methods, and a validation study with a phantom are presented. The algorithm constitutes a significant step toward a complete data driven unsupervised approach to segmentation of MR images in the presence of the random noise and intensity inhomogeneities.

659 citations


Journal ArticleDOI
TL;DR: A methodology for evaluating medical image segmentation algorithms wherein the only information available is boundaries outlined by multiple expert observers is proposed, and the results of the segmentation algorithm can be evaluated against the multiple observers' outlines.
Abstract: Image segmentation is the partition of an image into a set of nonoverlapping regions whose union is the entire image. The image is decomposed into meaningful parts which are uniform with respect to certain characteristics, such as gray level or texture. In this paper, we propose a methodology for evaluating medical image segmentation algorithms wherein the only information available is boundaries outlined by multiple expert observers. In this case, the results of the segmentation algorithm can be evaluated against the multiple observers' outlines. We have derived statistics to enable us to find whether the computer-generated boundaries agree with the observers' hand-outlined boundaries as much as the different observers agree with each other. We illustrate the use of this methodology by evaluating image segmentation algorithms on two different applications in ultrasound imaging. In the first application, we attempt to find the epicardial and endocardial boundaries from cardiac ultrasound images, and in the second application, our goal is to find the fetal skull and abdomen boundaries from prenatal ultrasound images.

572 citations


Proceedings ArticleDOI
17 Jun 1997
TL;DR: A new view-based approach to the representation and recognition of action is presented, using a temporal template-a static vector-image where the vector value at each point is a function of the motion properties at the corresponding spatial location in an image sequence.
Abstract: A new view-based approach to the representation and recognition of action is presented. The basis of the representation is a temporal template-a static vector-image where the vector value at each point is a function of the motion properties at the corresponding spatial location in an image sequence. Using 18 aerobics exercises as a test domain, we explore the representational power of a simple, two component version of the templates: the first value is a binary value indicating the presence of motion, and the second value is a function of the recency of motion in a sequence. We then develop a recognition method which matches these temporal templates against stored instances of views of known actions. The method automatically performs temporal segmentation, is invariant to linear changes in speed, and runs in real-time on a standard platform. We recently incorporated this technique into the KIDSROOM: an interactive, narrative play-space for children.

546 citations


Book ChapterDOI
19 Mar 1997
TL;DR: A line-enhancement filter based on the eigenvalues of Hessian matrix aiming at both the discrimination of line structures from other structures and the recovery of original line structuresfrom corrupted ones is developed.
Abstract: This paper describes a method for the enhancement of curvilinear structures like vessels and bronchi in 3D medical images. We develop a line-enhancement filter based on the eigenvalues of Hessian matrix aiming at both the discrimination of line structures from other structures and the recovery of original line structures from corrupted ones. The multi-scale responses of the line filters are integrated based on the equalization of noise level at each scale. The resulted multi-scale line filtered images provide significantly improved segmentation of curvilinear structures. The line-filtered images are also useful for the direct visualization of curvilinear structures by combining with a volume rendering technique even from conventional MR images. We show the usefulness of the method through the segmentation and visualization of vessels from MRA and MR images, and bronchi from CT images.

Journal ArticleDOI
TL;DR: In this article, a fully-automatic 3D-segmentation technique for brain magnetic resonance (MR) images is described. And the impact of noise, inhomogeneity, smoothing, and structure thickness are analyzed quantitatively.
Abstract: Describes a fully-automatic three-dimensional (3-D)-segmentation technique for brain magnetic resonance (MR) images. By means of Markov random fields (MRF's) the segmentation algorithm captures three features that are of special importance for MR images, i.e., nonparametric distributions of tissue intensities, neighborhood correlations, and signal inhomogeneities. Detailed simulations and real MR images demonstrate the performance of the segmentation algorithm. In particular, the impact of noise, inhomogeneity, smoothing, and structure thickness are analyzed quantitatively. Even single-echo MR images are well classified into gray matter, white matter, cerebrospinal fluid, scalp-bone, and background. A simulated annealing and an iterated conditional modes implementation are presented.

Journal ArticleDOI
TL;DR: This paper presents a general technique for thresholding of digital images based on Renyi's entropy, which includes two of the previously proposed well known global thresholding methods.

Journal ArticleDOI
TL;DR: A color segmentation algorithm which combines region growing and region merging processes to generate a non-partitioned segmentation of the image being processed in spatially disconnected but colorimetrically similar regions.

Journal ArticleDOI
TL;DR: The segmentation procedure has been found to be very robust, producing good results not only on granite images, but on the wide range of other noisy color images as well, subject to the termination criterion.
Abstract: A new method is proposed for processing randomly textured color images. The method is based on a bottom-up segmentation algorithm that takes into consideration both color and texture properties of the image. An LUV gradient is introduced, which provides both a color similarity measure and a basis for applying the watershed transform. The patches of watershed mosaic are merged according to their color contrast until a termination criterion is met. This criterion is based on the topology of the typical processed image. The resulting algorithm does not require any additional information, be it various thresholds, marker extraction rules, and suchlike, thus being suitable for automatic processing of color images. The algorithm is demonstrated within the framework of the problem of automatic granite inspection. The segmentation procedure has been found to be very robust, producing good results not only on granite images, but on the wide range of other noisy color images as well, subject to the termination criterion.

Journal ArticleDOI
TL;DR: It is argued that LEGION provides a novel and effective framework for image segmentation and figure-ground segregation and exhibits a natural capacity in segmenting images.
Abstract: We study image segmentation on the basis of locally excitatory, globally inhibitory oscillator networks (LEGION), whereby the phases of oscillators encode the binding of pixels. We introduce a lateral potential for each oscillator so that only oscillators with strong connections from their neighborhood can develop high potentials. Based on the concept of the lateral potential, a solution to remove noisy regions in an image is proposed for LEGION, so that it suppresses the oscillators corresponding to noisy regions but without affecting those corresponding to major regions. We show that the resulting oscillator network separates an image into several major regions, plus a background consisting of all noisy regions, and we illustrate network properties by computer simulation. The network exhibits a natural capacity in segmenting images. The oscillatory dynamics leads to a computer algorithm, which is applied successfully to segmenting real gray-level images. A number of issues regarding biological plausibility and perceptual organization are discussed. We argue that LEGION provides a novel and effective framework for image segmentation and figure-ground segregation.

Journal ArticleDOI
TL;DR: The authors show that replacing the class other, which includes all tissue not modeled explicitly by Gaussians with small variance, by a uniform probability density, and amending the expectation-maximization (EM) algorithm appropriately, gives significantly better results.
Abstract: The authors propose a modification of Wells et al. (ibid., vol. 15, no. 4, p. 429-42, 1996) technique for bias field estimation and segmentation of magnetic resonance (MR) images. They show that replacing the class other, which includes all tissue not modeled explicitly by Gaussians with small variance, by a uniform probability density, and amending the expectation-maximization (EM) algorithm appropriately, gives significantly better results. The authors next consider the estimation and filtering of high-frequency information in MR images, comprising noise, intertissue boundaries, and within tissue microstructures. The authors conclude that post-filtering is preferable to the prefiltering that has been proposed previously. The authors observe that the performance of any segmentation algorithm, in particular that of Wells et al. (and the authors' refinements of it) is affected substantially by the number and selection of the tissue classes that are modeled explicitly, the corresponding defining parameters and, critically, the spatial distribution of tissues in the image. The authors present an initial exploration to choose automatically the number of classes and the associated parameters that give the best output. This requires the authors to define what is meant by "best output" and for this they propose the application of minimum entropy. The methods developed have been implemented and are illustrated throughout on simulated and real data (brain and breast MR).

Journal ArticleDOI
TL;DR: A highly efficient system that can rapidly detect human face regions in MPEG video sequences by detecting faces directly in the compressed domain, and there is no need to carry out the inverse DCT transform, so that the algorithm can run faster than the real time.
Abstract: Human faces provide a useful cue in indexing video content. We present a highly efficient system that can rapidly detect human face regions in MPEG video sequences. The underlying algorithm takes the inverse quantized discrete cosine transform (DCT) coefficients of MPEG video as the input, and outputs the locations of the detected face regions. The algorithm consists of three stages, where chrominance, shape, and frequency information are used, respectively. By detecting faces directly in the compressed domain, there is no need to carry out the inverse DCT transform, so that the algorithm can run faster than the real time. In our experiments, the algorithm detected 85-92% of the faces in three test sets, including both intraframe and interframe coded image frames from news video. The average run time ranges from 13-33 ms per frame. The algorithm can be applied to JPEG unconstrained images or motion JPEG video as well.

Journal ArticleDOI
TL;DR: This fully automated technique produces reliable and reproducible MR image segmentation and classification while eliminating intra- and interobserver variability.
Abstract: Presents a fully automated process for segmentation and classification of multispectral magnetic resonance (MR) images This hybrid neural network method uses a Kohonen self-organizing neural network for segmentation and a multilayer backpropagation neural network for classification To separate different tissue types, this process uses the standard T1-, T2-, and PD-weighted MR images acquired in clinical examinations Volumetric measurements of brain structures, relative to intracranial volume, were calculated for an index transverse section in 14 normal subjects (median age 25 years; 7 male, 7 female) This index slice was at the level of the basal ganglia, included both genu and splenium of the corpus callosum, and generally, showed the putamen and lateral ventricle An intraclass correlation of this automated segmentation and classification of tissues with the accepted standard of radiologist identification for the index slice in the 14 volunteers demonstrated coefficients (r/sub i/) of 091, 095, and 098 for white matter, gray matter, and ventricular cerebrospinal fluid (CSF), respectively An analysis of variance for estimates of brain parenchyma volumes in 5 volunteers imaged 5 times each demonstrated high intrasubject reproducibility with a significance of at least p<005 for white matter, gray matter, and white/gray partial volumes The population variation, across 14 volunteers, demonstrated little deviation from the averages for gray and white matter, while partial volume classes exhibited a slightly higher degree of variability This fully automated technique produces reliable and reproducible MR image segmentation and classification while eliminating intra- and interobserver variability

Journal ArticleDOI
TL;DR: Presents an automated, knowledge-based method for segmenting chest computed tomography datasets and suggests that use of expert knowledge provides an increased level of automation compared with low-level segmentation techniques and may better discriminate between structures of similar attenuation and anatomic contiguity.
Abstract: Presents an automated, knowledge-based method for segmenting chest computed tomography (CT) datasets. Anatomical knowledge including expected volume, shape, relative position, and X-ray attenuation of organs provides feature constraints that guide the segmentation process. Knowledge is represented at a high level using an explicit anatomical model. The model is stored in a frame-based semantic network and anatomical variability is incorporated using fuzzy sets. A blackboard architecture permits the data representation and processing algorithms in the model domain to be independent of those in the image domain. Knowledge-constrained segmentation routines extract contiguous three-dimensional (3-D) sets of voxels, and their feature-space representations are posted on the blackboard. An inference engine uses fuzzy logic to match image to model objects based on the feature constraints. Strict separation of model and image domains allows for systematic extension of the knowledge base. In preliminary experiments, the method has been applied to a small number of thoracic CT datasets. Based on subjective visual assessment by experienced thoracic radiologists, basic anatomic structures such as the lungs, central tracheobronchial tree, chest wall, and mediastinum were successfully segmented. To demonstrate the extensibility of the system, knowledge was added to represent the more complex anatomy of lung lesions in contact with vessels or the chest wall. Visual inspection of these segmented lesions was also favorable. These preliminary results suggest that use of expert knowledge provides an increased level of automation compared with low-level segmentation techniques. Moreover, the knowledge-based approach may better discriminate between structures of similar attenuation and anatomic contiguity. Further validation is required.

Proceedings ArticleDOI
20 Jun 1997
TL;DR: A new image representation is presented which provides a transformation from the raw pixel data to a small set of localized coherent regions in color and texture space based on segmentation using the expectation-maximization algorithm on combined color andtexture features.
Abstract: Retrieving images from large and varied collections using image content as a key is a challenging and important problem In this paper, we present a new image representation which provides a transformation from the raw pixel data to a small set of localized coherent regions in color and texture space This so-called lblobworldr representation is based on segmentation using the expectation-maximization algorithm on combined color and texture features The texture features we use for the segmentation arise from a new approach to texture description and scale selection We describe a system that uses the blobworld representation to retrieve images An important and unique aspect of the system is that, in the context of similarity-based querying, the user is allowed to view the internal representation of the submitted image and the query results Similar systems do not offer the user this view into the workings of the system; consequently, the outcome of many queries on these systems can be quite inexplicable, despite the availability of knobs for adjusting the similarity metric

Proceedings ArticleDOI
17 Jun 1997
TL;DR: This paper proposes and examines non-parametric statistical tests to define similarity and homogeneity measures for textures and demonstrates that these similarity measures are useful for both, texture based image retrieval and for unsupervised texture segmentation, and hence offer a unified approach to these closely related tasks.
Abstract: In this paper we propose and examine non-parametric statistical tests to define similarity and homogeneity measures for textures. The statistical tests are applied to the coefficients of images filtered by a multi-scale Gabor filter bank. We demonstrate that these similarity measures are useful for both, texture based image retrieval and for unsupervised texture segmentation, and hence offer a unified approach to these closely related tasks. We present results on Brodatz-like micro-textures and a collection of real-word images.

Book
01 Dec 1997
TL;DR: An introduction to Computer Vision and Image Processing and CVIPtools Library Functions, and Programming with CVIP tools.
Abstract: I. COMPUTER VISION AND IMAGE PROCESSING FUNDAMENTALS. 1. Introduction to Computer Vision and Image Processing. Overview: Computer Imaging. Computer Vision. Image Processing. Computer Imaging Systems. The CVIPtools Software. Human Visual Perception. Image Representation. Digital Image File Formats. References. 2. Image Analysis. Introduction. Preprocessing. Edge/Line Detection. Segmentation. Discrete Transforms. Feature Extraction and Analysis. References. 3. Image Restoration. Introduction. Noise. Noise Removal Using Spatial Filters. Frequency Domain Filters. Geometric Transforms. References. 4. Image Enhancement. Introduction. Gray-Scale Modification. Image Sharpening. Image Smoothing. References. 5. Image Compression. Introduction. Lossless Compression Methods. Lossy Compression Methods. References. II. CVIPtools. 6. Using CVIPtools. Introduction and Overview. The Graphical User Interface. Examples. 7. CVIPtools Applications. Introduction. Automatic Skin Tumor Border Identification. Helicopter Image Enhancement and Analysis. Wavelet/Vector Quantization Compression. Image Segmentation Using a Deformable Template Algorithm. Visual Acuity/Night Vision Simulation. 8. Programming with CVIPtools. Introduction to CVIPlab. CVIP Laboratory Exercises. The CVIPtcl and CVIPwish Shells. 9. CVIPtools Library Functions. Introduction. Arithmetic and Logic Library_libarithlogic. Band Image Library_libband. Color Image Library_libcolor. Compression Library_libcompress. Conversion Library_libconverter. Display Library_libdisplay. Feature Extraction Library_libfeature. Geometry Library_libgeometry. Histogram Library_libhisto. Image Library_libimage. Data Mapping Library_libmap. Morphological Library_libmorph. Noise Library_libnoise. Segmentation Library_libsegment. Spatial Filter Library_libspatialfilter. Transform Library_libtransform. III. APPENDICES. A. The CVIPtools CD-ROM. B. Setting Up and Updating Your CVIPtools Environment. Getting CVIPtools software updates. To get via the WWW. C. CVIPtools Functions. Toolkit Libraries. Toolbox Libraries. D. CVIPtcl Command List and Corresponding CVIPtools Functions. E. CVIPtcl Function Usage Notes. F. CVIP Resources. Index.

Journal ArticleDOI
TL;DR: Experimental results prove that the approach using the variable duration outperforms the method using fixed duration in terms of both accuracy and speed.
Abstract: A fast method of handwritten word recognition suitable for real time applications is presented in this paper. Preprocessing, segmentation and feature extraction are implemented using a chain code representation of the word contour. Dynamic matching between characters of a lexicon entry and segment(s) of the input word image is used to rank the lexicon entries in order of best match. Variable duration for each character is defined and used during the matching. Experimental results prove that our approach using the variable duration outperforms the method using fixed duration in terms of both accuracy and speed. Speed of the entire recognition process is about 200 msec on a single SPARC-10 platform and the recognition accuracy is 96.8 percent are achieved for lexicon size of 10, on a database of postal words captured at 212 dpi.

Journal ArticleDOI
TL;DR: An iterative algorithm which allows the probabilities of transitions to be estimated directly from the images under investigation, which may provide remarkably better detection accuracy than the "Post Classification Comparison" algorithm, which is based on the separate classifications of the two images.
Abstract: The authors propose a supervised nonparametric technique based on the "compound classification rule" for minimum error, to detect land-cover transitions between two remote-sensing images acquired at different times. Thanks to a simplifying hypothesis, the compound classification rule is transformed into a form easier to compute. In the obtained rule, an important role is played by the probabilities of transitions, which take into account the temporal dependence between two images. In order to avoid requiring that training sets be representative of all possible types of transitions, the authors propose an iterative algorithm which allows the probabilities of transitions to be estimated directly from the images under investigation. Experimental results on two Thematic Mapper images confirm that the proposed algorithm may provide remarkably better detection accuracy than the "Post Classification Comparison" algorithm, which is based on the separate classifications of the two images.

Proceedings ArticleDOI
17 Jun 1997
TL;DR: A novel boundary detection scheme based on "edge flow" that utilizes a predictive coding model to identify the direction of change in color and texture at each image location at a given scale, and constructs an edge flow vector.
Abstract: A novel boundary detection scheme based on "edge flow" is proposed in this paper. This scheme utilizes a predictive coding model to identify the direction of change in color and texture at each image location at a given scale, and constructs an edge flow vector. By iteratively propagating the edge flow, the boundaries can be detected at image locations which encounter two opposite directions of flow in the stable state. A user defined image scale is the only significant control parameter that is needed by the algorithm. The scheme facilitates integration of color and texture into a single framework for boundary detection.

Proceedings ArticleDOI
18 Aug 1997
TL;DR: A new method is presented for adaptive document image binarization, where the page is considered as a collection of subcomponents such as text, background and picture, using document characteristics to determine (surface) attributes, often used in document segmentation.
Abstract: A new method is presented for adaptive document image binarization, where the page is considered as a collection of subcomponents such as text, background and picture. The problems caused by noise, illumination and many source type related degradations are addressed. The algorithm uses document characteristics to determine (surface) attributes, often used in document segmentation. Using characteristic analysis, two new algorithms are applied to determine a local threshold for each pixel. An algorithm based on soft decision control is used for thresholding the background and picture regions. An approach utilizing local mean and variance of gray values is applied to textual regions. Tests were performed with images including different types of document components and degradations. The results show that the method adapts and performs well in each case.

Journal ArticleDOI
TL;DR: The experiments with synthetic, Brodatz texture, and real satellite images show that the multiresolution technique results in a better segmentation and requires lesser computation than the single resolution algorithm.
Abstract: This paper presents multiresolution models for Gauss-Markov random fields (GMRFs) with applications to texture segmentation. Coarser resolution sample fields are obtained by subsampling the sample field at fine resolution. Although the Markov property is lost under such resolution transformation, coarse resolution non-Markov random fields can be effectively approximated by Markov fields. We present two techniques to estimate the GMRF parameters at coarser resolutions from the fine resolution parameters, one by minimizing the Kullback-Leibler distance and another based on local conditional distribution invariance. We also allude to the fact that different GMRF parameters at the fine resolution can result in the same probability measure after subsampling and present the results for first- and second-order cases. We apply this multiresolution model to texture segmentation. Different texture regions in an image are modeled by GMRFs and the associated parameters are assumed to be known. Parameters at lower resolutions are estimated from the fine resolution parameters. The coarsest resolution data is first segmented and the segmentation results are propagated upward to the finer resolution. We use the iterated conditional mode (ICM) minimization at all resolutions. Our experiments with synthetic, Brodatz texture, and real satellite images show that the multiresolution technique results in a better segmentation and requires lesser computation than the single resolution algorithm.

Proceedings ArticleDOI
17 Jun 1997
TL;DR: This work presents a variant of the EM algorithm that can segment image sequences by fitting multiple smooth flow fields to the spatiotemporal data and shows how the estimation of a single smooth flow field can be performed in closed form, thus making the multiple model estimation computationally feasible.
Abstract: Grouping based on common motion, or "common fate" provides a powerful cue for segmenting image sequences. Recently a number of algorithms have been developed that successfully perform motion segmentation by assuming that the motion of each group can be described by a low dimensional parametric model (e.g. affine). Typically the assumption is that motion segments correspond to planar patches in 3D undergoing rigid motion. Here we develop an alternative approach, where the motion of each group is described by a smooth dense flow field and the stability of the estimation is ensured by means of a prior distribution on the class of flow fields. We present a variant of the EM algorithm that can segment image sequences by fitting multiple smooth flow fields to the spatiotemporal data. Using the method of Green's functions, we show how the estimation of a single smooth flow field can be performed in closed form, thus making the multiple model estimation computationally feasible. Furthermore, the number of models is estimated automatically using similar methods to those used in the parametric approach. We illustrate the algorithm's performance on synthetic and real image sequences.

Journal ArticleDOI
TL;DR: It is argued that the issues of scale selection and structure detection cannot be treated separately and a new concept of scale is presented that represents image structures at different scales, and not the image itself.
Abstract: This paper is concerned with the detection of low-level structure in images. It describes an algorithm for image segmentation at multiple scales. The detected regions are homogeneous and surrounded by closed edge contours. Previous approaches to multiscale segmentation represent an image at different scales using a scale-space. However, structure is only represented implicitly in this representation, structures at coarser scales are inherently smoothed, and the problem of structure extraction is unaddressed. This paper argues that the issues of scale selection and structure detection cannot be treated separately. A new concept of scale is presented that represents image structures at different scales, and not the image itself. This scale is integrated into a nonlinear transform which makes structure explicit in the transformed domain. Structures that are stable (locally invariant) to changes in scale are identified as being perceptually relevant. The transform can be viewed as collecting spatially distributed evidence for edges and regions, and making it available at contour locations, thereby facilitating integrated detection of edges and regions without restrictive models of geometry or homogeneity. In this sense, it performs Gestalt analysis. All scale parameters of the transform are automatically determined, and the structure of any arbitrary geometry can be identified without any smoothing, even at coarse scales.