Showing papers on "Contextual image classification published in 2005"
••
TL;DR: In this paper, the authors present a systematic survey of the common processing steps and core decision rules in modern change detection algorithms, including significance and hypothesis testing, predictive models, the shading model, and background modeling.
Abstract: Detecting regions of change in multiple images of the same scene taken at different times is of widespread interest due to a large number of applications in diverse disciplines, including remote sensing, surveillance, medical diagnosis and treatment, civil infrastructure, and underwater sensing. This paper presents a systematic survey of the common processing steps and core decision rules in modern change detection algorithms, including significance and hypothesis testing, predictive models, the shading model, and background modeling. We also discuss important preprocessing methods, approaches to enforcing the consistency of the change mask, and principles for evaluating and comparing the performance of change detection algorithms. It is hoped that our classification of algorithms into a relatively small number of categories will provide useful guidance to the algorithm designer.
1,693 citations
••
TL;DR: A method based on mathematical morphology for preprocessing of the hyperspectral data is proposed, using opening and closing morphological transforms to isolate bright and dark structures in images, where bright/dark means brighter/darker than the surrounding features in the images.
Abstract: Classification of hyperspectral data with high spatial resolution from urban areas is investigated. A method based on mathematical morphology for preprocessing of the hyperspectral data is proposed. In this approach, opening and closing morphological transforms are used in order to isolate bright (opening) and dark (closing) structures in images, where bright/dark means brighter/darker than the surrounding features in the images. A morphological profile is constructed based on the repeated use of openings and closings with a structuring element of increasing size, starting with one original image. In order to apply the morphological approach to hyperspectral data, principal components of the hyperspectral imagery are computed. The most significant principal components are used as base images for an extended morphological profile, i.e., a profile based on more than one original image. In experiments, two hyperspectral urban datasets are classified. The proposed method is used as a preprocessing method for a neural network classifier and compared to more conventional classification methods with different types of statistical computations and feature extraction.
1,308 citations
••
[...]
TL;DR: This work considers tracking as a binary classification problem, where an ensemble of weak classifiers is trained online to distinguish between the object and the background, and combines them into a strong classifier using AdaBoost.
Abstract: We consider tracking as a binary classification problem, where an ensemble of weak classifiers is trained online to distinguish between the object and the background. The ensemble of weak classifiers is combined into a strong classifier using AdaBoost. The strong classifier is then used to label pixels in the next frame as either belonging to the object or the background, giving a confidence map. The peak of the map, and hence the new position of the object, is found using mean shift. Temporal coherence is maintained by updating the ensemble with new weak classifiers that are trained online during tracking. We show a realization of this method and demonstrate it on several video sequences.
1,143 citations
••
17 Oct 2005TL;DR: This work treats object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics, and develops a model developed in the statistical text literature: probabilistic latent semantic analysis (pLSA).
Abstract: We seek to discover the object categories depicted in a set of unlabelled images. We achieve this using a model developed in the statistical text literature: probabilistic latent semantic analysis (pLSA). In text analysis, this is used to discover topics in a corpus using the bag-of-words document representation. Here we treat object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics. The model is applied to images by using a visual analogue of a word, formed by vector quantizing SIFT-like region descriptors. The topic discovery approach successfully translates to the visual domain: for a small set of objects, we show that both the object categories and their approximate spatial layout are found without supervision. Performance of this unsupervised method is compared to the supervised approach of Fergus et al. (2003) on a set of unseen images containing only one object per image. We also extend the bag-of-words vocabulary to include 'doublets' which encode spatially local co-occurring regions. It is demonstrated that this extended vocabulary gives a cleaner image segmentation. Finally, the classification and segmentation methods are applied to a set of images containing multiple objects per image. These results demonstrate that we can successfully build object class models from an unsupervised analysis of images.
1,129 citations
••
TL;DR: This work investigates two approaches based on the concept of random forests of classifiers implemented within a binary hierarchical multiclassifier system, with the goal of achieving improved generalization of the classifier in analysis of hyperspectral data, particularly when the quantity of training data is limited.
Abstract: Statistical classification of byperspectral data is challenging because the inputs are high in dimension and represent multiple classes that are sometimes quite mixed, while the amount and quality of ground truth in the form of labeled data is typically limited. The resulting classifiers are often unstable and have poor generalization. This work investigates two approaches based on the concept of random forests of classifiers implemented within a binary hierarchical multiclassifier system, with the goal of achieving improved generalization of the classifier in analysis of hyperspectral data, particularly when the quantity of training data is limited. A new classifier is proposed that incorporates bagging of training samples and adaptive random subspace feature selection within a binary hierarchical classifier (BHC), such that the number of features that is selected at each node of the tree is dependent on the quantity of associated training data. Results are compared to a random forest implementation based on the framework of classification and regression trees. For both methods, classification results obtained from experiments on data acquired by the National Aeronautics and Space Administration (NASA) Airborne Visible/Infrared Imaging Spectrometer instrument over the Kennedy Space Center, Florida, and by Hyperion on the NASA Earth Observing 1 satellite over the Okavango Delta of Botswana are superior to those from the original best basis BHC algorithm and a random subspace extension of the BHC.
984 citations
••
17 Oct 2005TL;DR: An optimally compact visual dictionary is learned by pair-wise merging of visual words from an initially large dictionary, and a novel statistical measure of discrimination is proposed which is optimized by each merge operation.
Abstract: This paper presents a new algorithm for the automatic recognition of object classes from images (categorization). Compact and yet discriminative appearance-based object class models are automatically learned from a set of training images. The method is simple and extremely fast, making it suitable for many applications such as semantic image retrieval, Web search, and interactive image editing. It classifies a region according to the proportions of different visual words (clusters in feature space). The specific visual words and the typical proportions in each object are learned from a segmented training set. The main contribution of this paper is twofold: i) an optimally compact visual dictionary is learned by pair-wise merging of visual words from an initially large dictionary. The final visual words are described by GMMs. ii) A novel statistical measure of discrimination is proposed which is optimized by each merge operation. High classification accuracy is demonstrated for nine object classes on photographs of real objects viewed under general lighting conditions, poses and viewpoints. The set of test images used for validation comprise: i) photographs acquired by us, ii) images from the Web and iii) images from the recently released Pascal dataset. The proposed algorithm performs well on both texture-rich objects (e.g. grass, sky, trees) and structure-rich ones (e.g. cars, bikes, planes)
968 citations
••
17 Oct 2005TL;DR: It is shown that dense representations outperform equivalent keypoint based ones on these tasks and that SVM or mutual information based feature selection starting from a dense codebook further improves the performance.
Abstract: Visual codebook based quantization of robust appearance descriptors extracted from local image patches is an effective means of capturing image statistics for texture analysis and scene classification. Codebooks are usually constructed by using a method such as k-means to cluster the descriptor vectors of patches sampled either densely ('textons') or sparsely ('bags of features' based on key-points or salience measures) from a set of training images. This works well for texture analysis in homogeneous images, but the images that arise in natural object recognition tasks have far less uniform statistics. We show that for dense sampling, k-means over-adapts to this, clustering centres almost exclusively around the densest few regions in descriptor space and thus failing to code other informative regions. This gives suboptimal codes that are no better than using randomly selected centres. We describe a scalable acceptance-radius based clusterer that generates better codebooks and study its performance on several image classification tasks. We also show that dense representations outperform equivalent keypoint based ones on these tasks and that SVM or mutual information based feature selection starting from a dense codebook further improves the performance.
817 citations
••
TL;DR: This work finds that color quantization can be as low as 64 bins per channel, although higher histogram sizes give better segmentation performance, and that the Bayesian classifier with the histogram technique and the multilayer perceptron classifier are found to perform better compared to other tested classifiers.
Abstract: This work presents a study of three important issues of the color pixel classification approach to skin segmentation: color representation, color quantization, and classification algorithm. Our analysis of several representative color spaces using the Bayesian classifier with the histogram technique shows that skin segmentation based on color pixel classification is largely unaffected by the choice of the color space. However, segmentation performance degrades when only chrominance channels are used in classification. Furthermore, we find that color quantization can be as low as 64 bins per channel, although higher histogram sizes give better segmentation performance. The Bayesian classifier with the histogram technique and the multilayer perceptron classifier are found to perform better compared to other tested classifiers, including three piecewise linear classifiers, three unimodal Gaussian classifiers, and a Gaussian mixture classifier.
810 citations
••
17 Oct 2005TL;DR: A new model, TSI-pLSA, is developed, which extends pLSA (as applied to visual words) to include spatial information in a translation and scale invariant manner, and can handle the high intra-class variability and large proportion of unrelated images returned by search engines.
Abstract: Current approaches to object category recognition require datasets of training images to be manually prepared, with varying degrees of supervision. We present an approach that can learn an object category from just its name, by utilizing the raw output of image search engines available on the Internet. We develop a new model, TSI-pLSA, which extends pLSA (as applied to visual words) to include spatial information in a translation and scale invariant manner. Our approach can handle the high intra-class variability and large proportion of unrelated images returned by search engines. We evaluate tire models on standard test sets, showing performance competitive with existing methods trained on hand prepared datasets
807 citations
••
20 Jun 2005TL;DR: The system operates in real-time, and obtained 93% correct generalization to novel subjects for a 7-way forced choice on the Cohn-Kanade expression dataset, and has a mean accuracy of 94.8%.
Abstract: We present a systematic comparison of machine learning methods applied to the problem of fully automatic recognition of facial expressions. We report results on a series of experiments comparing recognition engines, including AdaBoost, support vector machines, linear discriminant analysis. We also explored feature selection techniques, including the use of AdaBoost for feature selection prior to classification by SVM or LDA. Best results were obtained by selecting a subset of Gabor filters using AdaBoost followed by classification with support vector machines. The system operates in real-time, and obtained 93% correct generalization to novel subjects for a 7-way forced choice on the Cohn-Kanade expression dataset. The outputs of the classifiers change smoothly as a function of time and thus can be used to measure facial expression dynamics. We applied the system to to fully automated recognition of facial actions (FACS). The present system classifies 17 action units, whether they occur singly or in combination with other actions, with a mean accuracy of 94.8%. We present preliminary results for applying this system to spontaneous facial expressions.
654 citations
••
TL;DR: A new Euclidean distance for images, which is robust to small perturbation of images and can be embedded in most image classification techniques such as SVM, LDA, and PCA, is presented.
Abstract: We present a new Euclidean distance for images, which we call image Euclidean distance (IMED). Unlike the traditional Euclidean distance, IMED takes into account the spatial relationships of pixels. Therefore, it is robust to small perturbation of images. We argue that IMED is the only intuitively reasonable Euclidean distance for images. IMED is then applied to image recognition. The key advantage of this distance measure is that it can be embedded in most image classification techniques such as SVM, LDA, and PCA. The embedding is rather efficient by involving a transformation referred to as standardizing transform (ST). We show that ST is a transform domain smoothing. Using the face recognition technology (FERET) database and two state-of-the-art face identification algorithms, we demonstrate a consistent performance improvement of the algorithms embedded with the new metric over their original versions.
••
06 Jul 2005TL;DR: This paper discussed the most important stages of a fully implemented emotion recognition system including data analysis and classification, and used a music induction method which elicits natural emotional reactions from the subject.
Abstract: Little attention has been paid so far to physiological signals for emotion recognition compared to audio-visual emotion channels, such as facial expressions or speech. In this paper, we discuss the most important stages of a fully implemented emotion recognition system including data analysis and classification. For collecting physiological signals in different affective states, we used a music induction method which elicits natural emotional reactions from the subject. Four-channel biosensors are used to obtain electromyogram, electrocardiogram, skin conductivity and respiration changes. After calculating a sufficient amount of features from the raw signals, several feature selection/reduction methods are tested to extract a new feature set consisting of the most significant features for improving classification performance. Three well-known classifiers, linear discriminant function, k-nearest neighbour and multilayer perceptron, are then used to perform supervised classification
••
TL;DR: In this article, a generalized belief propagation algorithm is used to recover shading and reflectance intrinsic images from a single image using both color information and a classifier trained to recognize gray-scale patterns.
Abstract: Interpreting real-world images requires the ability distinguish the different characteristics of the scene that lead to its final appearance. Two of the most important of these characteristics are the shading and reflectance of each point in the scene. We present an algorithm that uses multiple cues to recover shading and reflectance intrinsic images from a single image. Using both color information and a classifier trained to recognize gray-scale patterns, given the lighting direction, each image derivative is classified as being caused by shading or a change in the surface's reflectance. The classifiers gather local evidence about the surface's form and color, which is then propagated using the generalized belief propagation algorithm. The propagation step disambiguates areas of the image where the correct classification is not clear from local evidence. We use real-world images to demonstrate results and show how each component of the system affects the results.
••
20 Jun 2005TL;DR: This work addresses the problem of segmenting 3D scan data into objects or object classes by using a recently proposed maximum-margin framework to discriminatively train the model from a set of labeled scans and automatically learn the relative importance of the features for the segmentation task.
Abstract: We address the problem of segmenting 3D scan data into objects or object classes. Our segmentation framework is based on a subclass of Markov random fields (MRFs) which support efficient graph-cut inference. The MRF models incorporate a large set of diverse features and enforce the preference that adjacent scan points have the same classification label. We use a recently proposed maximum-margin framework to discriminatively train the model from a set of labeled scans; as a result we automatically learn the relative importance of the features for the segmentation task. Performing graph-cut inference in the trained MRF can then be used to segment new scenes very efficiently. We test our approach on three large-scale datasets produced by different kinds of 3D sensors, showing its applicability to both outdoor and indoor environments containing diverse objects.
••
17 Oct 2005TL;DR: Probabilistic latent semantic analysis generates a compact scene representation, discriminative for accurate classification, and significantly more robust when less training data are available, and the ability of PLSA to automatically extract visually meaningful aspects is exploited to propose new algorithms for aspect-based image ranking and context-sensitive image segmentation.
Abstract: We present a new approach to model visual scenes in image collections, based on local invariant features and probabilistic latent space models. Our formulation provides answers to three open questions:(l) whether the invariant local features are suitable for scene (rather than object) classification; (2) whether unsupennsed latent space models can be used for feature extraction in the classification task; and (3) whether the latent space formulation can discover visual co-occurrence patterns, motivating novel approaches for image organization and segmentation. Using a 9500-image dataset, our approach is validated on each of these issues. First, we show with extensive experiments on binary and multi-class scene classification tasks, that a bag-of-visterm representation, derived from local invariant descriptors, consistently outperforms state-of-the-art approaches. Second, we show that probabilistic latent semantic analysis (PLSA) generates a compact scene representation, discriminative for accurate classification, and significantly more robust when less training data are available. Third, we have exploited the ability of PLSA to automatically extract visually meaningful aspects, to propose new algorithms for aspect-based image ranking and context-sensitive image segmentation.
••
TL;DR: Experimental results reveal that, by designing morphological filtering methods that take into account the complementary nature of spatial and spectral information in a simultaneous manner, it is possible to alleviate the problems related to each of them when taken separately.
Abstract: This work describes sequences of extended morphological transformations for filtering and classification of high-dimensional remotely sensed hyperspectral datasets. The proposed approaches are based on the generalization of concepts from mathematical morphology theory to multichannel imagery. A new vector organization scheme is described, and fundamental morphological vector operations are defined by extension. Extended morphological transformations, characterized by simultaneously considering the spatial and spectral information contained in hyperspectral datasets, are applied to agricultural and urban classification problems where efficacy in discriminating between subtly different ground covers is required. The methods are tested using real hyperspectral imagery collected by the National Aeronautics and Space Administration Jet Propulsion Laboratory Airborne Visible-Infrared Imaging Spectrometer and the German Aerospace Agency Digital Airborne Imaging Spectrometer (DAIS 7915). Experimental results reveal that, by designing morphological filtering methods that take into account the complementary nature of spatial and spectral information in a simultaneous manner, it is possible to alleviate the problems related to each of them when taken separately.
••
01 Sep 2005TL;DR: A robust segmentation technique based on an extension to the traditional fuzzy c-means (FCM) clustering algorithm is proposed and a neighborhood attraction, which is dependent on the relative location and features of neighboring pixels, is shown to improve the segmentation performance dramatically.
Abstract: Image segmentation is an indispensable process in the visualization of human tissues, particularly during clinical analysis of magnetic resonance (MR) images. Unfortunately, MR images always contain a significant amount of noise caused by operator performance, equipment, and the environment, which can lead to serious inaccuracies with segmentation. A robust segmentation technique based on an extension to the traditional fuzzy c-means (FCM) clustering algorithm is proposed in this paper. A neighborhood attraction, which is dependent on the relative location and features of neighboring pixels, is shown to improve the segmentation performance dramatically. The degree of attraction is optimized by a neural-network model. Simulated and real brain MR images with different noise levels are segmented to demonstrate the superiority of the proposed technique compared to other FCM-based methods. This segmentation method is a key component of an MR image-based classification system for brain tumors, currently being developed.
••
TL;DR: A validation study on statistical nonsupervised brain tissue classification techniques in magnetic resonance (MR) images demonstrates that methods relying on both intensity and spatial information are more robust to noise and field inhomogeneities and shows that simulated data results can be extended to real data.
Abstract: This paper presents a validation study on statistical nonsupervised brain tissue classification techniques in magnetic resonance (MR) images. Several image models assuming different hypotheses regarding the intensity distribution model, the spatial model and the number of classes are assessed. The methods are tested on simulated data for which the classification ground truth is known. Different noise and intensity nonuniformities are added to simulate real imaging conditions. No enhancement of the image quality is considered either before or during the classification process. This way, the accuracy of the methods and their robustness against image artifacts are tested. Classification is also performed on real data where a quantitative validation compares the methods' results with an estimated ground truth from manual segmentations by experts. Validity of the various classification methods in the labeling of the image as well as in the tissue volume is estimated with different local and global measures. Results demonstrate that methods relying on both intensity and spatial information are more robust to noise and field inhomogeneities. We also demonstrate that partial volume is not perfectly modeled, even though methods that account for mixture classes outperform methods that only consider pure Gaussian classes. Finally, we show that simulated data results can also be extended to real data.
••
TL;DR: Results on images returned by Google's Image Search reveal the potential of applying CLUE to real-world image data and integrating CLUE as a part of the interface for keyword-based image retrieval systems.
Abstract: In a typical content-based image retrieval (CBIR) system, target images (images in the database) are sorted by feature similarities with respect to the query. Similarities among target images are usually ignored. This paper introduces a new technique, cluster-based retrieval of images by unsupervised learning (CLUE), for improving user interaction with image retrieval systems by fully exploiting the similarity information. CLUE retrieves image clusters by applying a graph-theoretic clustering algorithm to a collection of images in the vicinity of the query. Clustering in CLUE is dynamic. In particular, clusters formed depend on which images are retrieved in response to the query. CLUE can be combined with any real-valued symmetric similarity measure (metric or nonmetric). Thus, it may be embedded in many current CBIR systems, including relevance feedback systems. The performance of an experimental image retrieval system using CLUE is evaluated on a database of around 60,000 images from COREL. Empirical results demonstrate improved performance compared with a CBIR system using the same image similarity measure. In addition, results on images returned by Google's Image Search reveal the potential of applying CLUE to real-world image data and integrating CLUE as a part of the interface for keyword-based image retrieval systems.
••
TL;DR: A new biometric approach to personal identification using eigenfinger and eigenpalm features, with fusion applied at the matching-score level is described, with effectiveness shown in terms of recognition rate, equal error rate, and total error rate.
Abstract: This paper presents a multimodal biometric identification system based on the features of the human hand. We describe a new biometric approach to personal identification using eigenfinger and eigenpalm features, with fusion applied at the matching-score level. The identification process can be divided into the following phases: capturing the image; preprocessing; extracting and normalizing the palm and strip-like finger subimages; extracting the eigenpalm and eigenfinger features based on the K-L transform; matching and fusion; and, finally, a decision based on the (k, l)-NN classifier and thresholding. The system was tested on a database of 237 people (1,820 hand images). The experimental results showed the effectiveness of the system in terms of the recognition rate (100 percent), the equal error rate (EER = 0.58 percent), and the total error rate (TER = 0.72 percent).
••
17 Oct 2005TL;DR: The major contributions are the application of boosted local contour-based features for object detection in a partially supervised learning framework, and an efficient new boosting procedure for simultaneously selecting features and estimating per-feature parameters.
Abstract: We present a novel categorical object detection scheme that uses only local contour-based features. A two-stage, partially supervised learning architecture is proposed: a rudimentary detector is learned from a very small set of segmented images and applied to a larger training set of un-segmented images; the second stage bootstraps these detections to learn an improved classifier while explicitly training against clutter. The detectors are learned with a boosting algorithm which creates a location-sensitive classifier using a discriminative set of features from a randomly chosen dictionary of contour fragments. We present results that are very competitive with other state-of-the-art object detection schemes and show robustness to object articulations, clutter, and occlusion. Our major contributions are the application of boosted local contour-based features for object detection in a partially supervised learning framework, and an efficient new boosting procedure for simultaneously selecting features and estimating per-feature parameters.
••
TL;DR: This paper solves the information-theoretic optimization problem by deriving the associated gradient flows and applying curve evolution techniques and uses level-set methods to implement the resulting evolution.
Abstract: In this paper, we present a new information-theoretic approach to image segmentation. We cast the segmentation problem as the maximization of the mutual information between the region labels and the image pixel intensities, subject to a constraint on the total length of the region boundaries. We assume that the probability densities associated with the image pixel intensities within each region are completely unknown a priori, and we formulate the problem based on nonparametric density estimates. Due to the nonparametric structure, our method does not require the image regions to have a particular type of probability distribution and does not require the extraction and use of a particular statistic. We solve the information-theoretic optimization problem by deriving the associated gradient flows and applying curve evolution techniques. We use level-set methods to implement the resulting evolution. The experimental results based on both synthetic and real images demonstrate that the proposed technique can solve a variety of challenging image segmentation problems. Furthermore, our method, which does not require any training, performs as good as methods based on training.
••
17 Oct 2005TL;DR: Experimental results show that the proposed joint Haar-like feature for detecting faces in images yields higher classification performance than Viola and Jones' detector; which uses a single feature for each weak classifier.
Abstract: In this paper, we propose a new distinctive feature, called joint Haar-like feature, for detecting faces in images. This is based on co-occurrence of multiple Haar-like features. Feature co-occurrence, which captures the structural similarities within the face class, makes it possible to construct an effective classifier. The joint Haar-like feature can be calculated very fast and has robustness against addition of noise and change in illumination. A face detector is learned by stagewise selection of the joint Haar-like features using AdaBoost. A small number of distinctive features achieve both computational efficiency and accuracy. Experimental results with 5, 676 face images and 30,000 nonface images show that our detector yields higher classification performance than Viola and Jones' detector; which uses a single feature for each weak classifier. Given the same number of features, our method reduces the error by 37%. Our detector is 2.6 times as fast as Viola and Jones' detector to achieve the same performance
••
17 Oct 2005
TL;DR: This paper adopts an appearance-based strategy, and conducts experiments on a new database which contains several samples of each of eleven material categories, imaged under a variety of pose, illumination and scale conditions, demonstrating that very significant gains can be achieved via different SVM-based classification techniques.
Abstract: Although a considerable amount of work has been published on material classification, relatively little of it studies situations with considerable variation within each class. Many experiments use the exact same sample, or different patches from the same image, for training and test sets. Thus, such studies are vulnerable to effectively recognising one particular sample of a material as opposed to the material category. In contrast, this paper places firm emphasis on the capability to generalise to previously unseen instances of materials. We adopt an appearance-based strategy, and conduct experiments on a new database which contains several samples of each of eleven material categories, imaged under a variety of pose, illumination and scale conditions. Together, these sources of intra-class variation provide a stern challenge indeed for recognition. Somewhat surprisingly, the difference in performance between various state-of-the-art texture descriptors proves rather small in this task. On the other hand, we clearly demonstrate that very significant gains can be achieved via different SVM-based classification techniques. Selecting appropriate kernel parameters proves crucial. This motivates a novel recognition scheme based on a decision tree. Each node contains an SVM to split one class from all others with a kernel parameter optimal for that particular node. Hence, each decision is made using a different, optimal, class-specific metric. Experiments show the superiority of this approach over several state-of-the-art classifiers
••
TL;DR: This paper investigated several state-of-the-art machine-learning methods for automated classification of clustered microcalcifications (MCs), and formulated differentiation of malignant from benign MCs as a supervised learning problem, and applied these learning methods to develop the classification algorithm.
Abstract: In this paper, we investigate several state-of-the-art machine-learning methods for automated classification of clustered microcalcifications (MCs). The classifier is part of a computer-aided diagnosis (CADx) scheme that is aimed to assisting radiologists in making more accurate diagnoses of breast cancer on mammograms. The methods we considered were: support vector machine (SVM), kernel Fisher discriminant (KFD), relevance vector machine (RVM), and committee machines (ensemble averaging and AdaBoost), of which most have been developed recently in statistical learning theory. We formulated differentiation of malignant from benign MCs as a supervised learning problem, and applied these learning methods to develop the classification algorithm. As input, these methods used image features automatically extracted from clustered MCs. We tested these methods using a database of 697 clinical mammograms from 386 cases, which included a wide spectrum of difficult-to-classify cases. We analyzed the distribution of the cases in this database using the multidimensional scaling technique, which reveals that in the feature space the malignant cases are not trivially separable from the benign ones. We used receiver operating characteristic (ROC) analysis to evaluate and to compare classification performance by the different methods. In addition, we also investigated how to combine information from multiple-view mammograms of the same case so that the best decision can be made by a classifier. In our experiments, the kernel-based methods (i.e., SVM, KFD, and RVM) yielded the best performance (A/sub z/=0.85, SVM), significantly outperforming a well-established, clinically-proven CADx approach that is based on neural network (A/sub z/=0.80).
••
20 Jun 2005
TL;DR: This work presents a novel, generic image classification method based on a recent machine learning algorithm (ensembles of extremely randomized decision trees) that is generic and robust to illumination, scale, and viewpoint changes.
Abstract: We present a novel, generic image classification method based on a recent machine learning algorithm (ensembles of extremely randomized decision trees). Images are classified using randomly extracted subwindows that are suitably normalized to yield robustness to certain image transformations. Our method is evaluated on four very different, publicly available datasets (COIL-100, ZuBuD, ETH-80, WANG). Our results show that our automatic approach is generic and robust to illumination, scale, and viewpoint changes. An extension of the method is proposed to improve its robustness with respect to rotation changes.
••
01 Jun 2005TL;DR: This paper addresses the problem of human-action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that are salient both in space and time.
Abstract: This paper addresses the problem of human-action recognition by introducing a sparse representation of image sequences as a collection of spatiotemporal events that are localized at points that are salient both in space and time. The spatiotemporal salient points are detected by measuring the variations in the information content of pixel neighborhoods not only in space but also in time. An appropriate distance metric between two collections of spatiotemporal salient points is introduced, which is based on the chamfer distance and an iterative linear time-warping technique that deals with time expansion or time-compression issues. A classification scheme that is based on relevance vector machines and on the proposed distance measure is proposed. Results on real image sequences from a small database depicting people performing 19 aerobic exercises are presented.
••
TL;DR: A change detection model based on Neighborhood Correlation Image logic, based on the fact that the same geographic area on two dates of imagery will tend to be highly correlated if little change has occurred, and uncorrelated when change occurs, is introduced.
••
17 Oct 2005TL;DR: A two-layer hierarchical formulation to exploit different levels of contextual information in images for robust classification and is general enough to be applied to different domains ranging from pixelwise image labeling to contextual object detection.
Abstract: We present a two-layer hierarchical formulation to exploit different levels of contextual information in images for robust classification. Each layer is modeled as a conditional field that allows one to capture arbitrary observation-dependent label interactions. The proposed framework has two main advantages. First, it encodes both the short-range interactions (e.g., pixelwise label smoothing) as well as the long-range interactions (e.g., relative configurations of objects or regions) in a tractable manner. Second, the formulation is general enough to be applied to different domains ranging from pixelwise image labeling to contextual object detection. The parameters of the model are learned using a sequential maximum-likelihood approximation. The benefits of the proposed framework are demonstrated on four different datasets and comparison results are presented
••
TL;DR: A classification strategy is described that allows the identification of samples drawn from unknown classes through the application of a suitable Bayesian decision rule, based on support vector machines (SVMs) for the estimation of probability density functions and on a recursive procedure to generate prior probability estimates for known and unknown classes.
Abstract: A general problem of supervised remotely sensed image classification assumes prior knowledge to be available for all the thematic classes that are present in the considered dataset. However, the ground-truth map representing that prior knowledge usually does not really describe all the land-cover typologies in the image, and the generation of a complete training set often represents a time-consuming, difficult and expensive task. This problem affects the performances of supervised classifiers, which erroneously assign each sample drawn from an unknown class to one of the known classes. In the present paper, a classification strategy is described that allows the identification of samples drawn from unknown classes through the application of a suitable Bayesian decision rule. The proposed approach is based on support vector machines (SVMs) for the estimation of probability density functions and on a recursive procedure to generate prior probability estimates for known and unknown classes. In the experiments, both a synthetic dataset and two real datasets were used.