scispace - formally typeset
Search or ask a question

Showing papers on "Contextual image classification published in 2002"


Journal ArticleDOI
TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Abstract: Presents a theoretically very simple, yet efficient, multiresolution approach to gray-scale and rotation invariant texture classification based on local binary patterns and nonparametric discrimination of sample and prototype distributions. The method is based on recognizing that certain local binary patterns, termed "uniform," are fundamental properties of local image texture and their occurrence histogram is proven to be a very powerful texture feature. We derive a generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis. The proposed approach is very robust in terms of gray-scale variations since the operator is, by definition, invariant against any monotonic transformation of the gray scale. Another advantage is computational simplicity as the operator can be realized with a few operations in a small neighborhood and a lookup table. Experimental results demonstrate that good discrimination can be achieved with the occurrence statistics of simple rotation invariant local binary patterns.

14,245 citations


Journal ArticleDOI
TL;DR: A novel algorithm for fuzzy segmentation of magnetic resonance imaging (MRI) data and estimation of intensity inhomogeneities using fuzzy logic and the neighborhood effect acts as a regularizer and biases the solution toward piecewise-homogeneous labelings.
Abstract: We present a novel algorithm for fuzzy segmentation of magnetic resonance imaging (MRI) data and estimation of intensity inhomogeneities using fuzzy logic. MRI intensity inhomogeneities can be attributed to imperfections in the radio-frequency coils or to problems associated with the acquisition sequences. The result is a slowly varying shading artifact over the image that can produce errors with conventional intensity-based classification. Our algorithm is formulated by modifying the objective function of the standard fuzzy c-means (FCM) algorithm to compensate for such inhomogeneities and to allow the labeling of a pixel (voxel) to be influenced by the labels in its immediate neighborhood. The neighborhood effect acts as a regularizer and biases the solution toward piecewise-homogeneous labelings. Such a regularization is useful in segmenting scans corrupted by salt and pepper noise. Experimental results on both synthetic images and MR data are given to demonstrate the effectiveness and efficiency of the proposed algorithm.

1,786 citations


Journal ArticleDOI
TL;DR: Algorithm for vision-based detection and classification of vehicles in monocular image sequences of traffic scenes recorded by a stationary camera based on the establishment of correspondences between regions and vehicles, as the vehicles move through the image sequence is presented.
Abstract: This paper presents algorithms for vision-based detection and classification of vehicles in monocular image sequences of traffic scenes recorded by a stationary camera. Processing is done at three levels: raw images, region level, and vehicle level. Vehicles are modeled as rectangular patches with certain dynamic behavior. The proposed method is based on the establishment of correspondences between regions and vehicles, as the vehicles move through the image sequence. Experimental results from highway scenes are provided which demonstrate the effectiveness of the method. We also briefly describe an interactive camera calibration tool that we have developed for recovering the camera parameters using features in the image selected by the user.

833 citations


Proceedings ArticleDOI
20 May 2002
TL;DR: This work describes a representation of gait appearance based on simple features such as moments extracted from orthogonal view video silhouettes of human walking motion that contains enough information to perform well on human identification and gender classification tasks.
Abstract: We describe a representation of gait appearance for the purpose of person identification and classification This gait representation is based on simple features such as moments extracted from orthogonal view video silhouettes of human walking motion Despite its simplicity, the resulting feature vector contains enough information to perform well on human identification and gender classification tasks We explore the recognition behaviors of two different methods to aggregate features over time under different recognition tasks We demonstrate the accuracy of recognition using gait video sequences collected over different days and times and under varying lighting environments In addition, we show results for gender classification based our gait appearance features using a support-vector machine

775 citations


Journal ArticleDOI
TL;DR: Nonlinear support vector machines are investigated for appearance-based gender classification with low-resolution "thumbnail" faces processed from the FERET (FacE REcognition Technology) face database, demonstrating robustness and stability with respect to scale and the degree of facial detail.
Abstract: Nonlinear support vector machines (SVMs) are investigated for appearance-based gender classification with low-resolution "thumbnail" faces processed from 1,755 images from the FERET (FacE REcognition Technology) face database. The performance of SVMs (3.4% error) is shown to be superior to traditional pattern classifiers (linear, quadratic, Fisher linear discriminant, nearest-neighbor) as well as more modern techniques, such as radial basis function (RBF) classifiers and large ensemble-RBF networks. Furthermore, the difference in classification performance with low-resolution "thumbnails" (21/spl times/12 pixels) and the corresponding higher-resolution images (84/spl times/48 pixels) was found to be only 1%, thus demonstrating robustness and stability with respect to scale and the degree of facial detail.

641 citations


Journal ArticleDOI
TL;DR: An active shape model segmentation scheme is presented that is steered by optimal local features, contrary to normalized first order derivative profiles, as in the original formulation, using a nonlinear kNN-classifier to find optimal displacements for landmarks.
Abstract: An active shape model segmentation scheme is presented that is steered by optimal local features, contrary to normalized first order derivative profiles, as in the original formulation [Cootes and Taylor, 1995, 1999, and 2001]. A nonlinear kNN-classifier is used, instead of the linear Mahalanobis distance, to find optimal displacements for landmarks. For each of the landmarks that describe the shape, at each resolution level taken into account during the segmentation optimization procedure, a distinct set of optimal features is determined. The selection of features is automatic, using the training images and sequential feature forward and backward selection. The new approach is tested on synthetic data and in four medical segmentation tasks: segmenting the right and left lung fields in a database of 230 chest radiographs, and segmenting the cerebellum and corpus callosum in a database of 90 slices from MRI brain images. In all cases, the new method produces significantly better results in terms of an overlap error measure (p<0.001 using a paired T-test) than the original active shape model scheme.

592 citations


Journal ArticleDOI
TL;DR: The ability of SVM to outperform several well-known methods developed for the widely studied problem of MC detection suggests that SVM is a promising technique for object detection in a medical imaging application.
Abstract: We investigate an approach based on support vector machines (SVMs) for detection of microcalcification (MC) clusters in digital mammograms, and propose a successive enhancement learning scheme for improved performance. SVM is a machine-learning method, based on the principle of structural risk minimization, which performs well when applied to data outside the training set. We formulate MC detection as a supervised-learning problem and apply SVM to develop the detection algorithm. We use the SVM to detect at each location in the image whether an MC is present or not. We tested the proposed method using a database of 76 clinical mammograms containing 1120 MCs. We use free-response receiver operating characteristic curves to evaluate detection performance, and compare the proposed algorithm with several existing methods. In our experiments, the proposed SVM framework outperformed all the other methods tested. In particular, a sensitivity as high as 94% was achieved by the SVM method at an error rate of one false-positive cluster per image. The ability of SVM to outperform several well-known methods developed for the widely studied problem of MC detection suggests that SVM is a promising technique for object detection in a medical imaging application.

574 citations


Dissertation
01 Jan 2002

570 citations


Journal ArticleDOI
TL;DR: A new automated method that performs unsupervised pixel purity determination and endmember extraction from multidimensional datasets; this is achieved by using both spatial and spectral information in a combined manner.
Abstract: Spectral mixture analysis provides an efficient mechanism for the interpretation and classification of remotely sensed multidimensional imagery. It aims to identify a set of reference signatures (also known as endmembers) that can be used to model the reflectance spectrum at each pixel of the original image. Thus, the modeling is carried out as a linear combination of a finite number of ground components. Although spectral mixture models have proved to be appropriate for the purpose of large hyperspectral dataset subpixel analysis, few methods are available in the literature for the extraction of appropriate endmembers in spectral unmixing. Most approaches have been designed from a spectroscopic viewpoint and, thus, tend to neglect the existing spatial correlation between pixels. This paper presents a new automated method that performs unsupervised pixel purity determination and endmember extraction from multidimensional datasets; this is achieved by using both spatial and spectral information in a combined manner. The method is based on mathematical morphology, a classic image processing technique that can be applied to the spectral domain while being able to keep its spatial characteristics. The proposed methodology is evaluated through a specifically designed framework that uses both simulated and real hyperspectral data.

556 citations


Journal ArticleDOI
TL;DR: This work applies the multiresolution wavelet transform to extract the waveletface and performs the linear discriminant analysis on waveletfaces to reinforce discriminant power.
Abstract: Feature extraction, discriminant analysis, and classification rules are three crucial issues for face recognition. We present hybrid approaches to handle three issues together. For feature extraction, we apply the multiresolution wavelet transform to extract the waveletface. We also perform the linear discriminant analysis on waveletfaces to reinforce discriminant power. During classification, the nearest feature plane (NFP) and nearest feature space (NFS) classifiers are explored for robust decisions in presence of wide facial variations. Their relationships to conventional nearest neighbor and nearest feature line classifiers are demonstrated. In the experiments, the discriminant waveletface incorporated with the NFS classifier achieves the best face recognition performance.

483 citations


Journal ArticleDOI
Rainer Lienhart1, A. Wernicke
TL;DR: This work proposes a novel method for localizing and segmenting text in complex images and videos that is not only able to locate and segment text occurrences into large binary images, but is also able to track each text line with sub-pixel accuracy over the entire occurrence in a video.
Abstract: Many images, especially those used for page design on Web pages, as well as videos contain visible text. If these text occurrences could be detected, segmented, and recognized automatically, they would be a valuable source of high-level semantics for indexing and retrieval. We propose a novel method for localizing and segmenting text in complex images and videos. Text lines are identified by using a complex-valued multilayer feed-forward network trained to detect text at a fixed scale and position. The network's output at all scales and positions is integrated into a single text-saliency map, serving as a starting point for candidate text lines. In the case of video, these candidate text lines are refined by exploiting the temporal redundancy of text in video. Localized text lines are then scaled to a fixed height of 100 pixels and segmented into a binary image with black characters on white background. For videos, temporal redundancy is exploited to improve segmentation performance. Input images and videos can be of any size due to a true multiresolution approach. Moreover, the system is not only able to locate and segment text occurrences into large binary images, but is also able to track each text line with sub-pixel accuracy over the entire occurrence in a video, so that one text bitmap is created for all instances of that text line. Therefore, our text segmentation results can also be used for object-based video encoding such as that enabled by MPEG-4.

Journal ArticleDOI
TL;DR: Experiments show that anomaly classification performs very differently from anomaly detection, which can be implemented in a three-stage process, first by anomaly detection to find potential targets, followed by target discrimination to cluster the detected anomalies into separate target classes, and concluded by a classifier to achieve target classification.
Abstract: Anomaly detection becomes increasingly important in hyperspectral image analysis, since hyperspectral imagers can now uncover many material substances which were previously unresolved by multispectral sensors. Two types of anomaly detection are of interest and considered in this paper. One was previously developed by Reed and Yu to detect targets whose signatures are distinct from their surroundings. Another was designed to detect targets with low probabilities in an unknown image scene. Interestingly, they both operate the same form as does a matched filter. Moreover, they can be implemented in real-time processing, provided that the sample covariance matrix is replaced by the sample correlation matrix. One disadvantage of an anomaly detector is the lack of ability to discriminate the detected targets from another. In order to resolve this problem, the concept of target discrimination measures is introduced to cluster different types of anomalies into separate target classes. By using these class means as target information, the detected anomalies can be further classified. With inclusion of target discrimination in anomaly detection, anomaly classification can be implemented in a three-stage process, first by anomaly detection to find potential targets, followed by target discrimination to cluster the detected anomalies into separate target classes, and concluded by a classifier to achieve target classification. Experiments show that anomaly classification performs very differently from anomaly detection.

Journal ArticleDOI
TL;DR: The dyadic discrete wavelet transform approach is shown to significantly increase the overall classification accuracy and is tested using hyperspectral data for various agricultural applications.
Abstract: In this paper, the dyadic discrete wavelet transform is proposed for feature extraction from a high-dimensional data space. The wavelet's inherent multiresolutional properties are discussed in terms related to multispectral and hyperspectral remote sensing. Furthermore, various wavelet-based features are applied to the problem of automatic classification of specific ground vegetations from hyperspectral signatures. The wavelet transform features are evaluated using an automated statistical classifier. The system is tested using hyperspectral data for various agricultural applications. The experimental results demonstrate the promising discriminant capability of the wavelet-based features. The automated classification system consistently provides over 95% and 80% classification accuracy for endmember and mixed-signature applications, respectively. When compared to conventional feature extraction methods, the wavelet transform approach is shown to significantly increase the overall classification accuracy.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: The method consists of three major components: image preprocessing, feature extraction and classifier design, which uses an efficient approach called nearest feature line (NFL) for iris matching.
Abstract: Proposes a method for personal identification based on iris recognition. The method consists of three major components: image preprocessing, feature extraction and classifier design. A bank of circular symmetric filters is used to capture local iris characteristics to form a fixed length feature vector. In iris matching, an efficient approach called nearest feature line (NFL) is used. Constraints are imposed on the original NFL method to improve performance. Experimental results show that the proposed method has an encouraging performance.

Journal ArticleDOI
TL;DR: The searching capability of genetic algorithms has been exploited for automatically evolving the number of clusters as well as proper clustering of any data set and the proposed technique is able to distinguish some characteristic landcover types in the image.

01 Jan 2002
TL;DR: In this article, the use of the principal component analysis (PCA) as a preprocessing technique for the classification of hyperspectral images was studied. And the results showed that using only the first few principal component images can yield about 70 percent correct classification rate.
Abstract: The availability of hyperspectral images expands the capability of using image classification to study detailed characteristics of objects, but at a cost of having to deal with huge data sets. This work studies the use of the principal component analysis as a preprocessing technique for the classification of hyperspectral images. Two hyperspectral data sets, HYDICE and AVIRIS, were used for the study. A brief presentation of the principal component analysis approach is followed by an examination of the infor- mation contents of the principal component image bands, which revealed that only the first few bands contain significant information. The use of the first few principal component images can yield about 70 percent correct classification rate. This study suggests the benefit and efficiency of using the principal component analysis technique as a preprocessing step for the classification of hyperspectral images.

Journal ArticleDOI
TL;DR: Experimental results demonstrate the effectiveness of SVMs in texture classification, and it is shown that SVMs can incorporate conventional texture feature extraction methods within their own architecture, while also providing solutions to problems inherent in these methods.
Abstract: This paper investigates the application of support vector machines (SVMs) in texture classification. Instead of relying on an external feature extractor, the SVM receives the gray-level values of the raw pixels, as SVMs can generalize well even in high-dimensional spaces. Furthermore, it is shown that SVMs can incorporate conventional texture feature extraction methods within their own architecture, while also providing solutions to problems inherent in these methods. One-against-others decomposition is adopted to apply binary SVMs to multitexture classification, plus a neural network is used as an arbitrator to make final classifications from several one-against-others SVM outputs. Experimental results demonstrate the effectiveness of SVMs in texture classification.

Journal ArticleDOI
TL;DR: Several single and multiple classifiers, that are appropriate for the classification of multisource remote sensing and geographic data are considered and, in the experiments, the multiple classifier outperform the single classifiers in terms of overall accuracies.
Abstract: The combination of multisource remote sensing and geographic data is believed to offer improved accuracies in land cover classification. For such classification, the conventional parametric statistical classifiers, which have been applied successfully in remote sensing for the last two decades, are not appropriate, since a convenient multivariate statistical model does not exist for the data. In this paper, several single and multiple classifiers, that are appropriate for the classification of multisource remote sensing and geographic data are considered. The focus is on multiple classifiers: bagging algorithms, boosting algorithms, and consensus-theoretic classifiers. These multiple classifiers have different characteristics. The performance of the algorithms in terms of accuracies is compared for two multisource remote sensing and geographic datasets. In the experiments, the multiple classifiers outperform the single classifiers in terms of overall accuracies.

Proceedings ArticleDOI
20 May 2002
TL;DR: This paper presents progress toward an integrated, robust, real-time face detection and demographic analysis system and combines estimates from many facial detections in order to reduce the error rate.
Abstract: This paper presents progress toward an integrated, robust, real-time face detection and demographic analysis system. Faces are detected and extracted using the fast algorithm proposed by P. Viola and M.J. Jones (2001). Detected faces are passed to a demographic (gender and ethnicity) classifier which uses the same architecture as the face detector. This demographic classifier is extremely fast, and delivers error rates slightly better than the best-known classifiers. To counter the unconstrained and noisy sensing environment, demographic information is integrated across time for each individual. Therefore, the final demographic classification combines estimates from many facial detections in order to reduce the error rate. The entire system processes 10 frames per second on an 800-MHz Intel Pentium III.


Proceedings ArticleDOI
10 Dec 2002
TL;DR: This paper wants to show the possibilities of simple generalizations of the two-class classification, using voting and combinations of approximate posterior probabilities.
Abstract: The generalization from two-class classification to multiclass classification is not straightforward for discriminants which are not based on density estimation. Simple combining methods use voting, but this has the drawback of inconsequent labelings and ties. More advanced methods map the discriminant outputs to approximate posterior probability estimates and combine these, while other methods use error-correcting output codes. In this paper we want to show the possibilities of simple generalizations of the two-class classification, using voting and combinations of approximate posterior probabilities.

Journal ArticleDOI
TL;DR: In this article, a new dataset from the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) spaceborne sensor was used with support vector machine (SVM)-based algorithms for classification processing.

Proceedings ArticleDOI
20 May 2002
TL;DR: It is contended that the planar dynamics of a walking person are encoded in a 2D plot consisting of the pairwise image similarities of the sequence of images of the person, and that gait recognition can be achieved via standard pattern classification of these plots.
Abstract: A motion-based, correspondence-free technique or human gait recognition in monocular video is presented. We contend that the planar dynamics of a walking person are encoded in a 2D plot consisting of the pairwise image similarities of the sequence of images of the person, and that gait recognition can be achieved via standard pattern classification of these plots. We use background modelling to track the person for a number of frames and extract a sequence of segmented images of the person. The self-similarity plot is computed via correlation of each pair of images in this sequence. For recognition, the method applies principal component analysis to reduce the dimensionality of the plots, then uses the k-nearest neighbor rule in this reduced space to classify an unknown person. This method is robust to tracking and segmentation errors, and to variation in clothing and background. It is also invariant to small changes in camera viewpoint and walking speed. The method is tested on outdoor sequences of 44 people with 4 sequences of each taken on two different days, and achieves a classification rate of 77%. It is also tested on indoor sequences of 7 people walking on a treadmill, taken from 8 different viewpoints and on 7 different days. A classification rate of 78% is obtained for near-fronto-parallel views, and 65% on average over all view.

Journal ArticleDOI
TL;DR: An adaptive Bayesian contextual classification procedure that utilizes both spectral and spatial interpixel dependency contexts in estimation of statistics and classification is proposed, which can reach classification accuracies similar to that obtained by a pixelwise maximum likelihood pixel classifier with a very large training sample set.
Abstract: An adaptive Bayesian contextual classification procedure that utilizes both spectral and spatial interpixel dependency contexts in estimation of statistics and classification is proposed. Essentially, this classifier is the constructive coupling of an adaptive classification procedure and a Bayesian contextual classification procedure. In this classifier, the joint prior probabilities of the classes of each pixel and its spatial neighbors are modeled by the Markov random field. The estimation of statistics and classification are performed in a recursive manner to allow the establishment of the positive-feedback process in a computationally efficient manner. Experiments with real hyperspectral data show that, starting with a small training sample set, this classifier can reach classification accuracies similar to that obtained by a pixelwise maximum likelihood pixel classifier with a very large training sample set. Additionally, classification maps are produced that have significantly less speckle error.

Proceedings ArticleDOI
07 Nov 2002
TL;DR: Extensive experimentation and comparisons using real data, different features, and different classifiers demonstrate the superiority of the proposed approach which has achieved an average accuracy of 94.81% on completely novel test images.
Abstract: On-road vehicle detection is an important problem with application to driver assistance systems and autonomous, self-guided vehicles. The focus of this paper is on the problem of feature extraction and classification for rear-view vehicle detection. Specifically, we propose using Gabor filters for vehicle feature extraction and support vector machines (SVM) for vehicle detection. Gabor filters provide a mechanism for obtaining some degree of invariance to intensity due to global illumination, selectivity in scale, and selectivity in orientation. Basically, they are orientation and scale tunable edge and line detectors. Vehicles do contain strong edges and lines at different orientation and scales, thus, the statistics of these features (e.g., mean, standard deviation, and skewness) could be very powerful for vehicle detection. To provide robustness, these statistics are not extracted from the whole image but rather are collected from several subimages obtained by subdividing the original image into subwindows. These features are then used to train a SVM classifier. Extensive experimentation and comparisons using real data, different features (e.g., based on principal components analysis (PCA)), and different classifiers (e.g., neural networks (NN)) demonstrate the superiority of the proposed approach which has achieved an average accuracy of 94.81% on completely novel test images.

Journal ArticleDOI
01 Nov 2002
TL;DR: A robust support vector machine for pattern classification, which aims at solving the over-fitting problem when outliers exist in the training data set, and the generalization performance is improved significantly compared to that of the standard SVM training.
Abstract: This paper proposes a robust support vector machine for pattern classification, which aims at solving the over-fitting problem when outliers exist in the training data set. During the robust training phase, the distance between each data point and the center of class is used to calculate the adaptive margin. The incorporation of the average techniques to the standard support vector machine (SVM) training makes the decision function less detoured by outliers, and controls the amount of regularization automatically. Experiments for the bullet hole classification problem show that the number of the support vectors is reduced, and the generalization performance is improved significantly compared to that of the standard SVM training.

Journal ArticleDOI
TL;DR: For the linear discrimination of two stimuli in white Gaussian noise in the presence of internal noise, a method is described for estimating linear classification weights from the sum of noise images segregated by stimulus and response.
Abstract: For the linear discrimination of two stimuli in white Gaussian noise in the presence of internal noise, a method is described for estimating linear classification weights from the sum of noise images segregated by stimulus and response. The recommended method for combining the two response images for the same stimulus is to difference the average images. Weights are derived for combining images over stimuli and observers. Methods for estimating the level of internal noise are described with emphasis on the case of repeated presentations of the same noise sample. Simple tests for particular hypotheses about the weights are shown based on observer agreement with a noiseless version of the hypothesis.

Journal ArticleDOI
TL;DR: An image analysis and feature extraction algorithm was developed based on an expert's image reading and based on the automatic extracted features to give the expert new insights into the necessary features and the classification knowledge.

Journal ArticleDOI
01 Jul 2002
TL;DR: This paper will examine the research issues in image mining, current developments in imagemining, particularly, image mining frameworks, state-of-the-art techniques and systems, and identify some future research directions for image mining.
Abstract: Advances in image acquisition and storage technology have led to tremendous growth in very large and detailed image databases. These images, if analyzed, can reveal useful information to the human users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image mining is more than just an extension of data mining to image domain. It is an interdisciplinary endeavor that draws upon expertise in computer vision, image processing, image retrieval, data mining, machine learning, database, and artificial intelligence. In this paper, we will examine the research issues in image mining, current developments in image mining, particularly, image mining frameworks, state-of-the-art techniques and systems. We will also identify some future research directions for image mining.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: Results on combining depth information from a laser range-finder and color and texture image cues to segment ill-structured dirt, gravel, and asphalt roads as input to an autonomous road following system are described.
Abstract: We describe results on combining depth information from a laser range-finder and color and texture image cues to segment ill-structured dirt, gravel, and asphalt roads as input to an autonomous road following system. A large number of registered laser and camera images were captured at frame-rate on a variety,of rural roads, allowing laser features such as 3-D height and smoothness to be correlated with image features such as color histograms and Gabor filter responses. A small set of road models was generated by training separate neural networks on labeled feature vectors clustered by road "type." By first classifying the type of a novel road image, an appropriate second-stage classifier was selected to segment individual pixels, achieving a high degree of accuracy on arbitrary images from the dataset. Segmented images combined with laser range information and the vehicle's inertial navigation data were used to construct 3-D maps suitable for path planning.