scispace - formally typeset
Search or ask a question

Showing papers on "Contextual image classification published in 2004"


Journal ArticleDOI
TL;DR: This paper addresses the problem of the classification of hyperspectral remote sensing images by support vector machines by understanding and assessing the potentialities of SVM classifiers in hyperdimensional feature spaces and concludes that SVMs are a valid and effective alternative to conventional pattern recognition approaches.
Abstract: This paper addresses the problem of the classification of hyperspectral remote sensing images by support vector machines (SVMs) First, we propose a theoretical discussion and experimental analysis aimed at understanding and assessing the potentialities of SVM classifiers in hyperdimensional feature spaces Then, we assess the effectiveness of SVMs with respect to conventional feature-reduction-based approaches and their performances in hypersubspaces of various dimensionalities To sustain such an analysis, the performances of SVMs are compared with those of two other nonparametric classifiers (ie, radial basis function neural networks and the K-nearest neighbor classifier) Finally, we study the potentially critical issue of applying binary SVMs to multiclass problems in hyperspectral data In particular, four different multiclass strategies are analyzed and compared: the one-against-all, the one-against-one, and two hierarchical tree-based strategies Different performance indicators have been used to support our experimental studies in a detailed and accurate way, ie, the classification accuracy, the computational time, the stability to parameter setting, and the complexity of the multiclass architecture The results obtained on a real Airborne Visible/Infrared Imaging Spectroradiometer hyperspectral dataset allow to conclude that, whatever the multiclass strategy adopted, SVMs are a valid and effective alternative to conventional pattern recognition approaches (feature-reduction procedures combined with a classification method) for the classification of hyperspectral remote sensing data

3,607 citations


Journal ArticleDOI
TL;DR: Support Vector Tracking integrates the Support Vector Machine (SVM) classifier into an optic-flow-based tracker and maximizes the SVM classification score to account for large motions between successive frames.
Abstract: Support Vector Tracking (SVT) integrates the Support Vector Machine (SVM) classifier into an optic-flow-based tracker. Instead of minimizing an intensity difference function between successive frames, SVT maximizes the SVM classification score. To account for large motions between successive frames, we build pyramids from the support vectors and use a coarse-to-fine approach in the classification stage. We show results of using SVT for vehicle tracking in image sequences.

1,131 citations


Journal ArticleDOI
TL;DR: Quantitative evaluation and comparison show that the proposed Bayesian framework for foreground object detection in complex environments provides much improved results.
Abstract: This paper addresses the problem of background modeling for foreground object detection in complex environments. A Bayesian framework that incorporates spectral, spatial, and temporal features to characterize the background appearance is proposed. Under this framework, the background is represented by the most significant and frequent features, i.e., the principal features , at each pixel. A Bayes decision rule is derived for background and foreground classification based on the statistics of principal features. Principal feature representation for both the static and dynamic background pixels is investigated. A novel learning method is proposed to adapt to both gradual and sudden "once-off" background changes. The convergence of the learning process is analyzed and a formula to select a proper learning rate is derived. Under the proposed framework, a novel algorithm for detecting foreground objects from complex environments is then established. It consists of change detection, change classification, foreground segmentation, and background maintenance. Experiments were conducted on image sequences containing targets of interest in a variety of environments, e.g., offices, public buildings, subway stations, campuses, parking lots, airports, and sidewalks. Good results of foreground detection were obtained. Quantitative evaluation and comparison with the existing method show that the proposed method provides much improved results.

1,120 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compare the accuracy of thematic maps derived by image classification analyses in remote sensing studies using the kappa coefficient of agreement derived for each map, which is a subjective assessment of the observed difference in accuracy but should be undertaken in a statistically rigorous fashion.
Abstract: The accuracy of thematic maps derived by image classification analyses is often compared in remote sensing studies. This comparison is typically achieved by a basic subjective assessment of the observed difference in accuracy but should be undertaken in a statistically rigorous fashion. One approach for the evaluation of the statistical significance of a difference in map accuracy that has been widely used in remote sensing research is based on the comparison of the kappa coefficient of agreement derived for each map. The conventional approach to the comparison of kappa coefficients assumes that the samples used in their calculation are independent, an assumption that is commonly unsatisfied because the same sample of ground data sites is often used for each map. Alternative methods to evaluate the statistical significance of differences in accuracy are available for both related and independent samples. Approaches for map comparison based on the kappa coefficient and proportion of correctly allocated cases, the two most widely used metrics of thematic map accuracy in remote sensing, are discussed. An example illustrates how classifications based on the same sample of ground data sites may be compared rigorously and highlights the importance of distinguishing between one- and two-sided statistical tests in the comparison of classification accuracy statements.

1,003 citations


Journal ArticleDOI
TL;DR: Although each classifier could yield a very accurate classification, > 90% correct, the classifiers differed in the ability to correctly label individual cases and so may be suitable candidates for an ensemble-based approach to classification.
Abstract: Support vector machines (SVMs) have considerable potential as classifiers of remotely sensed data. A constraint on their application in remote sensing has been their binary nature, requiring multiclass classifications to be based upon a large number of binary analyses. Here, an approach for multiclass classification of airborne sensor data by a single SVM analysis is evaluated against a series of classifiers that are widely used in remote sensing, with particular regard to the effect of training set size on classification accuracy. In addition to the SVM, the same datasets were classified using a discriminant analysis, decision tree, and multilayer perceptron neural network. The accuracy statements of the classifications derived from the different classifiers were compared in a statistically rigorous fashion that accommodated for the related nature of the samples used in the analyses. For each classification technique, accuracy was positively related with the size of the training set. In general, the most accurate classifications were derived from the SVM approach, and with the largest training set the SVM classification was significantly (p 90% correct, the classifiers differed in the ability to correctly label individual cases and so may be suitable candidates for an ensemble-based approach to classification.

962 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: An approach to include contextual features for labeling images, in which each pixel is assigned to one of a finite set of labels, are incorporated into a probabilistic framework, which combines the outputs of several components.
Abstract: We propose an approach to include contextual features for labeling images, in which each pixel is assigned to one of a finite set of labels. The features are incorporated into a probabilistic framework, which combines the outputs of several components. Components differ in the information they encode. Some focus on the image-label mapping, while others focus solely on patterns within the label field. Components also differ in their scale, as some focus on fine-resolution patterns while others on coarser, more global structure. A supervised version of the contrastive divergence algorithm is applied to learn these features from labeled image data. We demonstrate performance on two real-world image databases and compare it to a classifier and a Markov random field.

820 citations


Journal ArticleDOI
TL;DR: It is shown that an efficient face detection system does not require any costly local preprocessing before classification of image areas, and provides very high detection rate with a particularly low level of false positives, demonstrated on difficult test sets, without requiring the use of multiple networks for handling difficult cases.
Abstract: In this paper, we present a novel face detection approach based on a convolutional neural architecture, designed to robustly detect highly variable face patterns, rotated up to /spl plusmn/20 degrees in image plane and turned up to /spl plusmn/60 degrees, in complex real world images. The proposed system automatically synthesizes simple problem-specific feature extractors from a training set of face and nonface patterns, without making any assumptions or using any hand-made design concerning the features to extract or the areas of the face pattern to analyze. The face detection procedure acts like a pipeline of simple convolution and subsampling modules that treat the raw input image as a whole. We therefore show that an efficient face detection system does not require any costly local preprocessing before classification of image areas. The proposed scheme provides very high detection rate with a particularly low level of false positives, demonstrated on difficult test sets, without requiring the use of multiple networks for handling difficult cases. We present extensive experimental results illustrating the efficiency of the proposed approach on difficult test sets and including an in-depth sensitivity analysis with respect to the degrees of variability of the face patterns.

610 citations


Proceedings ArticleDOI
B. Froba1, A. Ernst1
17 May 2004
TL;DR: An efficient four-stage classifier for rapid detection of illumination invariant local structure features for object detection and a modified census transform which enhances the original work of Zabih and Woodfill is proposed.
Abstract: Illumination variation is a big problem in object recognition, which usually requires a costly compensation prior to classification. It would be desirable to have an image-to-image transform, which uncovers only the structure of an object for an efficient matching. In this context the contribution of our work is two-fold. First, we introduce illumination invariant local structure features for object detection. For an efficient computation we propose a modified census transform which enhances the original work of Zabih and Woodfill. We show some shortcomings and how to get over them with the modified version. S6econdly, we introduce an efficient four-stage classifier for rapid detection. Each single stage classifier is a linear classifier, which consists of a set of feature lookup-tables. We show that the first stage, which evaluates only 20 features filters out more than 99% of all background positions. Thus, the classifier structure is much simpler than previous described multi-stage approaches, while having similar capabilities. The combination of illumination invariant features together with a simple classifier leads to a real-time system on standard computers (60 msec, image size: 288/spl times/384, 2GHi Pentium). Detection results are presented on two commonly used databases in this field namely the MIT+CMU set of 130 images and the BioID set of 1526 images. We are achieving detection rates of more than 90% with a very low false positive rate of 10/sup -7/%. We also provide a demo program that can be found on the Internet http://www.iis.fraunhofer.de/bv/biometrie/download/.

534 citations


Journal ArticleDOI
TL;DR: The results illustrate the potential to direct training data acquisition strategies to target the most useful training samples to allow efficient and accurate image classification.

524 citations


Journal ArticleDOI
TL;DR: An unsupervised terrain and land-use classification algorithm using polarimetric synthetic aperture radar data using a combination of a scattering model-based decomposition developed by Freeman and Durden and the maximum-likelihood classifier based on the complex Wishart distribution is proposed.
Abstract: In this paper, we proposed an unsupervised terrain and land-use classification algorithm using polarimetric synthetic aperture radar data. Unlike other algorithms that classify pixels statistically and ignore their scattering characteristics, this algorithm not only uses a statistical classifier, but also preserves the purity of dominant polarimetric scattering properties. This algorithm uses a combination of a scattering model-based decomposition developed by Freeman and Durden and the maximum-likelihood classifier based on the complex Wishart distribution. The first step is to apply the Freeman and Durden decomposition to divide pixels into three scattering categories: surface scattering, volume scattering, and double-bounce scattering. To preserve the purity of scattering characteristics, pixels in a scattering category are restricted to be classified with other pixels in the same scattering category. An efficient and effective class initialization scheme is also devised to initially merge clusters from many small clusters in each scattering category by applying a merge criterion developed based on the Wishart distance measure. Then, the iterative Wishart classifier is applied. The stability in convergence is much superior to that of the previous algorithm using the entropy/anisotropy/Wishart classifier. Finally, an automated color rendering scheme is proposed, based on the classes' scattering category to code the pixels to resemble their natural color. This algorithm is also flexible and computationally efficient. The effectiveness of this algorithm is demonstrated using the Jet Propulsion Laboratory's AIRSAR and the German Aerospace Center's (DLR) E-SAR L-band polarimetric synthetic aperture radar images.

448 citations


Journal ArticleDOI
TL;DR: The new method provides greater weight to samples near the expected decision boundary, which tends to provide for increased classification accuracy and to reduce the effect of the singularity problem.
Abstract: In this paper, a new nonparametric feature extraction method is proposed for high-dimensional multiclass pattern recognition problems. It is based on a nonparametric extension of scatter matrices. There are at least two advantages to using the proposed nonparametric scatter matrices. First, they are generally of full rank. This provides the ability to specify the number of extracted features desired and to reduce the effect of the singularity problem. This is in contrast to parametric discriminant analysis, which usually only can extract L-1 (number of classes minus one) features. In a real situation, this may not be enough. Second, the nonparametric nature of scatter matrices reduces the effects of outliers and works well even for nonnormal datasets. The new method provides greater weight to samples near the expected decision boundary. This tends to provide for increased classification accuracy.

Proceedings ArticleDOI
14 Jun 2004
TL;DR: The functional and architectural breakdown of a monocular pedestrian detection system is described and the approach for single-frame classification based on a novel scheme of breaking down the class variability by repeatedly training a set of relatively simple classifiers on clusters of the training set is described.
Abstract: We describe the functional and architectural breakdown of a monocular pedestrian detection system. We describe in detail our approach for single-frame classification based on a novel scheme of breaking down the class variability by repeatedly training a set of relatively simple classifiers on clusters of the training set. Single-frame classification performance results and system level performance figures for daytime conditions are presented with a discussion about the remaining gap to meet a daytime normal weather condition production system.

Journal ArticleDOI
TL;DR: A mathematical model that relies on the Fisher distribution and the log-moment estimation and which is relevant for one-look data is used, and its accuracy for urban areas at high resolution is proved.
Abstract: We propose a classification method suitable for high-resolution synthetic aperture radar (SAR) images over urban areas. When processing SAR images, there is a strong need for statistical models of scattering to take into account multiplicative noise and high dynamics. For instance, the classification process needs to be based on the use of statistics. Our main contribution is the choice of an accurate model for high-resolution SAR images over urban areas and its use in a Markovian classification algorithm. Clutter in SAR images becomes non-Gaussian when the resolution is high or when the area is man-made. Many models have been proposed to fit with non-Gaussian scattering statistics (K, Weibull, Log-normal, Nakagami-Rice, etc.), but none of them is flexible enough to model all kinds of surfaces in our context. As a consequence, we use a mathematical model that relies on the Fisher distribution and the log-moment estimation and which is relevant for one-look data. This estimation method is based on the second-kind statistics, which are detailed in the paper. We also prove its accuracy for urban areas at high resolution. The quality of the classification that is obtained by mixing this model and a Markovian segmentation is high and enables us to distinguish between ground, buildings, and vegetation.

Journal ArticleDOI
TL;DR: A human recognition algorithm by combining static and dynamic body biometrics, fused on the decision level using different combinations of rules to improve the performance of both identification and verification is described.
Abstract: Vision-based human identification at a distance has recently gained growing interest from computer vision researchers. This paper describes a human recognition algorithm by combining static and dynamic body biometrics. For each sequence involving a walker, temporal pose changes of the segmented moving silhouettes are represented as an associated sequence of complex vector configurations and are then analyzed using the Procrustes shape analysis method to obtain a compact appearance representation, called static information of body. In addition, a model-based approach is presented under a Condensation framework to track the walker and to further recover joint-angle trajectories of lower limbs, called dynamic information of gait. Both static and dynamic cues obtained from walking video may be independently used for recognition using the nearest exemplar classifier. They are fused on the decision level using different combinations of rules to improve the performance of both identification and verification. Experimental results of a dataset including 20 subjects demonstrate the feasibility of the proposed algorithm.

Journal ArticleDOI
TL;DR: This work considers the problem of clustering dynamic point sets in a metric space and proposes a model called incremental clustering which is based on a careful analysis of the requirements of the information retrieval application, and which should also be useful in other applications.
Abstract: Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retrieval application, and which should also be useful in other applications. The goal is to efficiently maintain clusters of small diameter as new points are inserted. We analyze several natural greedy algorithms and demonstrate that they perform poorly. We propose new deterministic and randomized incremental clustering algorithms which have a provably good performance, and which we believe should also perform well in practice. We complement our positive results with lower bounds on the performance of incremental algorithms. Finally, we consider the dual clustering problem where the clusters are of fixed diameter, and the goal is to minimize the number of clusters.

Proceedings ArticleDOI
24 Oct 2004
TL;DR: A number of features are proposed which could be used by a classifier to identify the source camera of an image in a blind manner and shown reasonable accuracy in distinguishing images from the two and five different camera models using the proposed features.
Abstract: An interesting problem in digital forensics is that given a digital image, would it be possible to identify the camera model which was used to obtain the image. In this paper we look at a simplified version of this problem by trying to distinguish between images captured by a limited number of camera models. We propose a number of features which could be used by a classifier to identify the source camera of an image in a blind manner. We also provide experimental results and show reasonable accuracy in distinguishing images from the two and five different camera models using the proposed features.

Proceedings ArticleDOI
17 May 2004
TL;DR: A novel, unsupervised approach to training an efficient and robust detector which is capable of not only detecting the presence of human hands within an image but classifying the hand shape.
Abstract: The ability to detect a persons unconstrained hand in a natural video sequence has applications in sign language, gesture recognition and HCl. This paper presents a novel, unsupervised approach to training an efficient and robust detector which is capable of not only detecting the presence of human hands within an image but classifying the hand shape. A database of images is first clustered using a k-method clustering algorithm with a distance metric based upon shape context. From this, a tree structure of boosted cascades is constructed. The head of the tree provides a general hand detector while the individual branches of the tree classify a valid shape as belong to one of the predetermined clusters exemplified by an indicative hand shape. Preliminary experiments carried out showed that the approach boasts a promising 99.8% success rate on hand detection and 97.4% success at classification. Although we demonstrate the approach within the domain of hand shape it is equally applicable to other problems where both detection and classification are required for objects that display high variability in appearance.

Journal ArticleDOI
TL;DR: Support vector machines yield better outcomes than neural networks regarding accuracy, simplicity, and robustness, and training neural and neurofuzzy models is unfeasible when working with high-dimensional input spaces and great amounts of training data.
Abstract: We propose the use of support vector machines (SVMs) for automatic hyperspectral data classification and knowledge discovery. In the first stage of the study, we use SVMs for crop classification and analyze their performance in terms of efficiency and robustness, as compared to extensively used neural and fuzzy methods. Efficiency is assessed by evaluating accuracy and statistical differences in several scenes. Robustness is analyzed in terms of: (1) suitability to working conditions when a feature selection stage is not possible and (2) performance when different levels of Gaussian noise are introduced at their inputs. In the second stage of this work, we analyze the distribution of the support vectors (SVs) and perform sensitivity analysis on the best classifier in order to analyze the significance of the input spectral bands. For classification purposes, six hyperspectral images acquired with the 128-band HyMAP spectrometer during the DAISEX-1999 campaign are used. Six crop classes were labeled for each image. A reduced set of labeled samples is used to train the models, and the entire images are used to assess their performance. Several conclusions are drawn: (1) SVMs yield better outcomes than neural networks regarding accuracy, simplicity, and robustness; (2) training neural and neurofuzzy models is unfeasible when working with high-dimensional input spaces and great amounts of training data; (3) SVMs perform similarly for different training subsets with varying input dimension, which indicates that noisy bands are successfully detected; and (4) a valuable ranking of bands through sensitivity analysis is achieved.

Journal ArticleDOI
TL;DR: The accuracy of the new dynamic skin color segmentation algorithm is compared to that obtained via a static color model, and an overall increase in segmentation accuracy of up to 24 percent is observed in 17 out of 21 test sequences.
Abstract: A novel approach for real-time skin segmentation in video sequences is described. The approach enables reliable skin segmentation despite wide variation in illumination during tracking. An explicit second order Markov model is used to predict evolution of the skin-color (HSV) histogram over time. Histograms are dynamically updated based on feedback from the current segmentation and predictions of the Markov model. The evolution of the skin-color distribution at each frame is parameterized by translation, scaling, and rotation in color space. Consequent changes in geometric parameterization of the distribution are propagated by warping and resampling the histogram. The parameters of the discrete-time dynamic Markov model are estimated using maximum likelihood estimation and also evolve over time. The accuracy of the new dynamic skin color segmentation algorithm is compared to that obtained via a static color model. Segmentation accuracy is evaluated using labeled ground-truth video sequences taken from staged experiments and popular movies. An overall increase in segmentation accuracy of up to 24 percent is observed in 17 out of 21 test sequences. In all but one case, the skin-color classification rates for our system were higher, with background classification rates comparable to those of the static segmentation.

Proceedings ArticleDOI
24 Oct 2004
TL;DR: These algorithms first construct a secondary image, derived from input image by pseudo-randomly extracting features that approximately capture semi-global geometric characteristics, and propose novel hashing algorithms employing transforms that are based on matrix invariants.
Abstract: In this paper we suggest viewing images (as well as attacks on them) as a sequence of linear operators and propose novel hashing algorithms employing transforms that are based on matrix invariants. To derive this sequence, we simply cover a two dimensional representation of an image by a sequence of (possibly overlapping) rectangles R/sub i/ whose sizes and locations are chosen randomly/sup 1/ from a suitable distribution. The restriction of the image (representation) to each R/sub i/ gives rise to a matrix A/sub i/. The fact that A/sub i/'s will overlap and are random, makes the sequence (respectively) a redundant and non-standard representation of images, but is crucial for our purposes. Our algorithms first construct a secondary image, derived from input image by pseudo-randomly extracting features that approximately capture semi-global geometric characteristics. From the secondary image (which does not perceptually resemble the input), we further extract the final features which can be used as a hash value (and can be further suitably quantized). In this paper, we use spectral matrix invariants as embodied by singular value decomposition. Surprisingly, formation of the secondary image turns out be quite important since it not only introduces further robustness (i.e., resistance against standard signal processing transformations), but also enhances the security properties (i.e. resistance against intentional attacks). Indeed, our experiments reveal that our hashing algorithms extract most of the geometric information from the images and hence are robust to severe perturbations (e.g. up to %50 cropping by area with 20 degree rotations) on images while avoiding misclassification. Our methods are general enough to yield a watermark embedding scheme, which will be studied in another paper.

Journal ArticleDOI
TL;DR: This paper presents a multiscale approach to image texture where first and second-order statistical measures were derived from different sizes of processing windows and were used as additional information in a supervised classification.
Abstract: Image texture is a complex visual perception. With the ever- increasing spatial resolution of remotely sensed data, the role of image texture in image classification has increased. Current approaches to image texture analysis rely on a single band of spatial information to characterize texture. This paper presents a multiscale approach to image texture where first and second- order statistical measures were derived from different sizes of processing windows and were used as additional information in a supervised classification. By using several bands of textural information processed with different window sizes (from 56 5t o 15615) the main forest stands in the image were improved up to a maximum of 40%. A geostatistical analysis indicated that there was no single window size that would adequately characterize the range of textural conditions present in this image. A number of different statistical texture measures were compared for this image. While all of the different texture measures provided a degree of improvement (from 4 to 13% overall), the multiscale approach achieved a higher degree of classification accuracy regardless of which statistical procedure was used. When compared with single band texture measures, the level of overall improvement varied between 4 and 8%. The results indicate that this multiscale approach is an improvement over the current single band approach to analysing image texture.

Journal ArticleDOI
TL;DR: In this paper, the authors give an overview of different ways to use satellite images for land cover area estimation, including regression, calibration and small area estimators, combining exhaustive but inaccurate information (from satellite images) with accurate information on a sample (most often ground surveys).
Abstract: This article gives an overview of different ways to use satellite images for land cover area estimation. Approaches are grouped into three categories. (1) Estimates coming essentially from remote sensing. Ground data, are used as an auxiliary tool, mainly as training data for image classification, or sub-pixel analysis. Area estimates from pixel counting are sometimes used without a solid statistical justification. (2) Methods, such as regression, calibration and small area estimators, combining exhaustive but inaccurate information (from satellite images) with accurate information on a sample (most often ground surveys). (3) Satellite images can support area frame surveys in several ways: to define sampling units, for stratification; as graphic documents for the ground survey, or for quality control. Cost-efficiency is discussed. Operational use of remote sensing is easier now with cheaper Landsat Thematic Mapper images and computing, but many administrations are reluctant to integrate remote sensing in ...

Journal ArticleDOI
TL;DR: It is demonstrated in both evaluation studies that segmentations produced by combining multiple individual registration-based segmentations are more accurate for the two classifier fusion methods proposed, which weight the individual classifiers according to their EM-based performance estimates, than for simple sum rule fusion, which weights each classifier equally.
Abstract: It is well known in the pattern recognition community that the accuracy of classifications obtained by combining decisions made by independent classifiers can be substantially higher than the accuracy of the individual classifiers. We have previously shown this to be true for atlas-based segmentation of biomedical images. The conventional method for combining individual classifiers weights each classifier equally (vote or sum rule fusion). In this paper, we propose two methods that estimate the performances of the individual classifiers and combine the individual classifiers by weighting them according to their estimated performance. The two methods are multiclass extensions of an expectation-maximization (EM) algorithm for ground truth estimation of binary classification based on decisions of multiple experts (Warfield et al., 2004). The first method performs parameter estimation independently for each class with a subsequent integration step. The second method considers all classes simultaneously. We demonstrate the efficacy of these performance-based fusion methods by applying them to atlas-based segmentations of three-dimensional confocal microscopy images of bee brains. In atlas-based image segmentation, multiple classifiers arise naturally by applying different registration methods to the same atlas, or the same registration method to different atlases, or both. We perform a validation study designed to quantify the success of classifier combination methods in atlas-based segmentation. By applying random deformations, a given ground truth atlas is transformed into multiple segmentations that could result from imperfect registrations of an image to multiple atlas images. In a second evaluation study, multiple actual atlas-based segmentations are combined and their accuracies computed by comparing them to a manual segmentation. We demonstrate in both evaluation studies that segmentations produced by combining multiple individual registration-based segmentations are more accurate for the two classifier fusion methods we propose, which weight the individual classifiers according to their EM-based performance estimates, than for simple sum rule fusion, which weights each classifier equally.

Journal ArticleDOI
TL;DR: Texture was more effective for improving the classification accuracy of land use classes at finer resolution levels because the more heterogeneous are the land use/cover units and the more fragmented are the landscapes, the finer the resolution required.
Abstract: The purpose of this paper is to evaluate spatial resolution effects on image classification. Classification maps were generated with a maximum likelihood (ML) classifier applied to three multi-spectral bands and variance texture images. A total of eight urban land use/cover classes were obtained at six spatial resolution levels based on a series of aggregated Colour Infrared Digital Orthophoto Quarter Quadrangle (DOQQ) subsets in urban and rural fringe areas of the San Diego metropolitan area. The classification results were compared using overall and individual classification accuracies. Classification accuracies were shown to be influenced by image spatial resolution, window size used in texture extraction and differences in spatial structure within and between categories. The more heterogeneous are the land use/cover units and the more fragmented are the landscapes, the finer the resolution required. Texture was more effective for improving the classification accuracy of land use classes at finer resolution levels. For spectrally homogeneous classes, a small window is preferable. But for spectrally heterogeneous classes, a large window size is required.

Journal ArticleDOI
TL;DR: A new analysis is provided that shows under what conditions unlabeled data can be used in learning to improve classification performance, and how the resulting algorithms are successfully employed in two applications related to human-computer interaction and pattern recognition: facial expression recognition and face detection.
Abstract: Automatic classification is one of the basic tasks required in any pattern recognition and human computer interaction application. In this paper, we discuss training probabilistic classifiers with labeled and unlabeled data. We provide a new analysis that shows under what conditions unlabeled data can be used in learning to improve classification performance. We also show that, if the conditions are violated, using unlabeled data can be detrimental to classification performance. We discuss the implications of this analysis to a specific type of probabilistic classifiers, Bayesian networks, and propose a new structure learning algorithm that can utilize unlabeled data to improve classification. Finally, we show how the resulting algorithms are successfully employed in two applications related to human-computer interaction and pattern recognition: facial expression recognition and face detection.

Book ChapterDOI
01 Jan 2004
TL;DR: This chapter explores how segmentation and object-based methods improve on traditional pixel-based image analysis/classifi cation methods and describes diff erent approaches to image segmentation.
Abstract: Th e continuously improving spatial resolution of remote sensing (RS) sensors sets new demand for applications utilising this information. Th e need for the more effi cient extraction of information from high resolution RS imagery and the seamless integration of this information into Geographic Information System (GIS) databases is driving geo-information theory and methodology into new territory. As the dimension of the ground instantaneous fi eld of view (GIFOV), or pixel (picture element) size, decreases many more fi ne landscape features can be readily delineated, at least visually. Th e challenge has been to produce proven man-machine methods that externalize and improve on human interpretation skills. Some of the most promising results in this research programme have come from the adoption of image segmentation algorithms and the development of so-called object-based classifi cation methodologies. In this chapter we describe diff erent approaches to image segmentation and explore how segmentation and object-based methods improve on traditional pixel-based image analysis/classifi cation methods. According to Schowengerdt () the traditional image processing/image classifi cation methodology is referred to as an image-centred approach. Here, the primary goal is to produce a map describing the spatial relationships between phenomena of interest. A second type, the data-centred approach, is pursued when the user is primarily interested in estimating parameters for individual phenomena based on the data values. Due to recent developments in image processing the two approaches appear to be converging: from image and data centred views to an information-centred approach. For instance, for change detection and environmental monitoring tasks we must not only extract information from the spectral and temporal data dimensions. We must also integrate these estimates into a spatial framework and make a priori and a posteriori utilization of GIS databases. A decision support system must encapsulate manager knowledge, context/ecological knowledge and planning knowledge. Technically, this necessitates a closer integration of remote sensing and GIS methods. Ontologically, it demands a new methodology that can provide a fl exible, demanddriven generation of information and, consequently, hierarchically structured semantic rules describing the relationships between the diff erent levels of spatial entities. Several of the aspects of geo-information involved cannot be obtained by pixel information as such but can only be achieved with an exploitation of neighbourhood information and context of the objects of interest. Th e relationship between ground objects and image objects

Journal ArticleDOI
TL;DR: An image retrieval framework that integrates efficient region-based representation in terms of storage and complexity and effective on-line learning capability and a region weighting strategy is introduced to optimally weight the regions and enable the system to self-improve.
Abstract: An image retrieval framework that integrates efficient region-based representation in terms of storage and complexity and effective on-line learning capability is proposed. The framework consists of methods for region-based image representation and comparison, indexing using modified inverted files, relevance feedback, and learning region weighting. By exploiting a vector quantization method, both compact and sparse (vector) region-based image representations are achieved. Using the compact representation, an indexing scheme similar to the inverted file technology and an image similarity measure based on Earth Mover's Distance are presented. Moreover, the vector representation facilitates a weighted query point movement algorithm and the compact representation enables a classification-based algorithm for relevance feedback. Based on users' feedback information, a region weighting strategy is also introduced to optimally weight the regions and enable the system to self-improve. Experimental results on a database of 10 000 general-purposed images demonstrate the efficiency and effectiveness of the proposed framework.

Journal ArticleDOI
TL;DR: A robust and automated segmentation technique--based on dynamic programming--to segment mass lesions from surrounding tissue and an efficient algorithm to guarantee resulting contours to be closed is presented.
Abstract: Mass segmentation plays a crucial role in computer-aided diagnosis (CAD) systems for classification of suspicious regions as normal, benign, or malignant. In this article we present a robust and automated segmentation technique--based on dynamic programming--to segment mass lesions from surrounding tissue. In addition, we propose an efficient algorithm to guarantee resulting contours to be closed. The segmentation method based on dynamic programming was quantitatively compared with two other automated segmentation methods (region growing and the discrete contour model) on a dataset of 1210 masses. For each mass an overlap criterion was calculated to determine the similarity with manual segmentation. The mean overlap percentage for dynamic programming was 0.69, for the other two methods 0.60 and 0.59, respectively. The difference in overlap percentage was statistically significant. To study the influence of the segmentation method on the performance of a CAD system two additional experiments were carried out. The first experiment studied the detection performance of the CAD system for the different segmentation methods. Free-response receiver operating characteristics analysis showed that the detection performance was nearly identical for the three segmentation methods. In the second experiment the ability of the classifier to discriminate between malignant and benign lesions was studied. For region based evaluation the area Az under the receiver operating characteristics curve was 0.74 for dynamic programming, 0.72 for the discrete contour model, and 0.67 for region growing. The difference in Az values obtained by the dynamic programming method and region growing was statistically significant. The differences between other methods were not significant.

Journal ArticleDOI
TL;DR: A method of content-based image classification using a neural network for shape-based texture features extracted from wavelet-transformed images and among the various texture features, the diagonal moment was the most effective.

Journal ArticleDOI
TL;DR: Experimental results confirm the effectiveness of the proposed system, which exhibits both high classification accuracy and good stability versus parameter settings, and point out that properly integrating a pattern recognition procedure with an accurate feature extraction phase represents an effective approach to SAR data analysis.
Abstract: A novel system for the classification of multitemporal synthetic aperture radar (SAR) images is presented. It has been developed by integrating an analysis of the multitemporal SAR signal physics with a pattern recognition approach. The system is made up of a feature-extraction module and a neural-network classifier, as well as a set of standard preprocessing procedures. The feature-extraction module derives a set of features from a series of multitemporal SAR images. These features are based on the concepts of long-term coherence and backscattering temporal variability and have been defined according to an analysis of the multitemporal SAR signal behavior in the presence of different land-cover classes. The neural-network classifier (which is based on a radial basis function neural architecture) properly exploits the multitemporal features for producing accurate land-cover maps. Thanks to the effectiveness of the extracted features, the number of measures that can be provided as input to the classifier is significantly smaller than the number of available multitemporal images. This reduces the complexity of the neural architecture (and consequently increases the generalization capabilities of the classifier) and relaxes the requirements relating to the number of training patterns to be used for classifier learning. Experimental results (obtained on a multitemporal series of European Remote Sensing 1 satellite SAR images) confirm the effectiveness of the proposed system, which exhibits both high classification accuracy and good stability versus parameter settings. These results also point out that properly integrating a pattern recognition procedure (based on machine learning) with an accurate feature extraction phase (based on the SAR sensor physics understanding) represents an effective approach to SAR data analysis.