scispace - formally typeset
Search or ask a question

Showing papers on "Contextual image classification published in 2007"


Journal ArticleDOI
TL;DR: It is suggested that designing a suitable image‐processing procedure is a prerequisite for a successful classification of remotely sensed data into a thematic map and the selection of a suitable classification method is especially significant for improving classification accuracy.
Abstract: Image classification is a complex process that may be affected by many factors. This paper examines current practices, problems, and prospects of image classification. The emphasis is placed on the summarization of major advanced classification approaches and the techniques used for improving classification accuracy. In addition, some important issues affecting classification performance are discussed. This literature review suggests that designing a suitable image-processing procedure is a prerequisite for a successful classification of remotely sensed data into a thematic map. Effective use of multiple features of remotely sensed data and the selection of a suitable classification method are especially significant for improving classification accuracy. Non-parametric classifiers such as neural network, decision tree classifier, and knowledge-based classification have increasingly become important approaches for multisource data classification. Integration of remote sensing, geographical information systems (GIS), and expert system emerges as a new research frontier. More research, however, is needed to identify and reduce uncertainties in the image-processing chain to improve classification accuracy.

2,741 citations


Journal ArticleDOI
TL;DR: The task of multi-label classification is introduced, the sparse related literature is organizes into a structured presentation and comparative experimental results of certain multilabel classification methods are performed.
Abstract: Nowadays, multi-label classification methods are increasingly required by modern applications, such as protein function classification, music categorization and semantic scene classification. This paper introduces the task of multi-label classification, organizes the sparse related literature into a structured presentation and performs comparative experimental results of certain multi-label classification methods. It also contributes the definition of concepts for the quantification of the multi-label nature of a data set.

2,592 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: This work shows that Fisher kernels can actually be understood as an extension of the popular bag-of-visterms, and proposes to apply this framework to image categorization where the input signals are images and where the underlying generative model is a visual vocabulary: a Gaussian mixture model which approximates the distribution of low-level features in images.
Abstract: Within the field of pattern classification, the Fisher kernel is a powerful framework which combines the strengths of generative and discriminative approaches. The idea is to characterize a signal with a gradient vector derived from a generative probability model and to subsequently feed this representation to a discriminative classifier. We propose to apply this framework to image categorization where the input signals are images and where the underlying generative model is a visual vocabulary: a Gaussian mixture model which approximates the distribution of low-level features in images. We show that Fisher kernels can actually be understood as an extension of the popular bag-of-visterms. Our approach demonstrates excellent performance on two challenging databases: an in-house database of 19 object/scene categories and the recently released VOC 2006 database. It is also very practical: it has low computational needs both at training and test time and vocabularies trained on one set of categories can be applied to another set without any significant loss in performance.

1,874 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: It is shown that selecting the ROI adds about 5% to the performance and, together with the other improvements, the result is about a 10% improvement over the state of the art for Caltech-256.
Abstract: We explore the problem of classifying images by the object categories they contain in the case of a large number of object categories. To this end we combine three ingredients: (i) shape and appearance representations that support spatial pyramid matching over a region of interest. This generalizes the representation of Lazebnik et al., (2006) from an image to a region of interest (ROI), and from appearance (visual words) alone to appearance and local shape (edge distributions); (ii) automatic selection of the regions of interest in training. This provides a method of inhibiting background clutter and adding invariance to the object instance 's position; and (iii) the use of random forests (and random ferns) as a multi-way classifier. The advantage of such classifiers (over multi-way SVM for example) is the ease of training and testing. Results are reported for classification of the Caltech-101 and Caltech-256 data sets. We compare the performance of the random forest/ferns classifier with a benchmark multi-way SVM classifier. It is shown that selecting the ROI adds about 5% to the performance and, together with the other improvements, the result is about a 10% improvement over the state of the art for Caltech-256.

1,401 citations


Journal ArticleDOI
TL;DR: A general tensor discriminant analysis (GTDA) is developed as a preprocessing step for LDA for face recognition and achieves good performance for gait recognition based on image sequences from the University of South Florida (USF) HumanID Database.
Abstract: Traditional image representations are not suited to conventional classification methods such as the linear discriminant analysis (LDA) because of the undersample problem (USP): the dimensionality of the feature space is much higher than the number of training samples. Motivated by the successes of the two-dimensional LDA (2DLDA) for face recognition, we develop a general tensor discriminant analysis (GTDA) as a preprocessing step for LDA. The benefits of GTDA, compared with existing preprocessing methods such as the principal components analysis (PCA) and 2DLDA, include the following: 1) the USP is reduced in subsequent classification by, for example, LDA, 2) the discriminative information in the training tensors is preserved, and 3) GTDA provides stable recognition rates because the alternating projection optimization algorithm to obtain a solution of GTDA converges, whereas that of 2DLDA does not. We use human gait recognition to validate the proposed GTDA. The averaged gait images are utilized for gait representation. Given the popularity of Gabor-function-based image decompositions for image understanding and object recognition, we develop three different Gabor-function-based image representations: 1) GaborD is the sum of Gabor filter responses over directions, 2) GaborS is the sum of Gabor filter responses over scales, and 3) GaborSD is the sum of Gabor filter responses over scales and directions. The GaborD, GaborS, and GaborSD representations are applied to the problem of recognizing people from their averaged gait images. A large number of experiments were carried out to evaluate the effectiveness (recognition rate) of gait recognition based on first obtaining a Gabor, GaborD, GaborS, or GaborSD image representation, then using GDTA to extract features and, finally, using LDA for classification. The proposed methods achieved good performance for gait recognition based on image sequences from the University of South Florida (USF) HumanID Database. Experimental comparisons are made with nine state-of-the-art classification methods in gait recognition.

1,160 citations


Journal ArticleDOI
TL;DR: It is suggested that the inner-distance can be used as a replacement for the Euclidean distance to build more accurate descriptors for complex shapes, especially for those with articulated parts.
Abstract: Part structure and articulation are of fundamental importance in computer and human vision. We propose using the inner-distance to build shape descriptors that are robust to articulation and capture part structure. The inner-distance is defined as the length of the shortest path between landmark points within the shape silhouette. We show that it is articulation insensitive and more effective at capturing part structures than the Euclidean distance. This suggests that the inner-distance can be used as a replacement for the Euclidean distance to build more accurate descriptors for complex shapes, especially for those with articulated parts. In addition, texture information along the shortest path can be used to further improve shape classification. With this idea, we propose three approaches to using the inner-distance. The first method combines the inner-distance and multidimensional scaling (MDS) to build articulation invariant signatures for articulated shapes. The second method uses the inner-distance to build a new shape descriptor based on shape contexts. The third one extends the second one by considering the texture information along shortest paths. The proposed approaches have been tested on a variety of shape databases, including an articulated shape data set, MPEG7 CE-Shape-1, Kimia silhouettes, the ETH-80 data set, two leaf data sets, and a human motion silhouette data set. In all the experiments, our methods demonstrate effective performance compared with other algorithms

1,123 citations


Journal ArticleDOI
Shai Avidan1
TL;DR: This work considers tracking as a binary classification problem, where an ensemble of weak classifiers is trained online to distinguish between the object and the background, and combines them into a strong classifier using AdaBoost.
Abstract: We consider tracking as a binary classification problem, where an ensemble of weak classifiers is trained online to distinguish between the object and the background. The ensemble of weak classifiers is combined into a strong classifier using AdaBoost. The strong classifier is then used to label pixels in the next frame as either belonging to the object or the background, giving a confidence map. The peak of the map and, hence, the new position of the object, is found using mean shift. Temporal coherence is maintained by updating the ensemble with new weak classifiers that are trained online during tracking. We show a realization of this method and demonstrate it on several video sequences

1,109 citations


Journal ArticleDOI
TL;DR: The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost and to be fairly robust to parameter tuning.
Abstract: A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning

962 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: This paper uses a number of sport games such as snow boarding, rock climbing or badminton to demonstrate event classification and proposes a first attempt to classify events in static images by integrating scene and object categorizations.
Abstract: We propose a first attempt to classify events in static images by integrating scene and object categorizations. We define an event in a static image as a human activity taking place in a specific environment. In this paper, we use a number of sport games such as snow boarding, rock climbing or badminton to demonstrate event classification. Our goal is to classify the event in the image as well as to provide a number of semantic labels to the objects and scene environment within the image. For example, given a rowing scene, our algorithm recognizes the event as rowing by classifying the environment as a lake and recognizing the critical objects in the image as athletes, rowing boat, water, etc. We achieve this integrative and holistic recognition through a generative graphical model. We have assembled a highly challenging database of 8 widely varied sport events. We show that our system is capable of classifying these event classes at 73.4% accuracy. While each component of the model contributes to the final recognition, using scene or objects alone cannot achieve this performance.

858 citations


Proceedings ArticleDOI
01 Dec 2007
TL;DR: This paper employs probabilistic neural network (PNN) with image and data processing techniques to implement a general purpose automated leaf recognition for plant classification with an accuracy greater than 90%.
Abstract: In this paper, we employ probabilistic neural network (PNN) with image and data processing techniques to implement a general purpose automated leaf recognition for plant classification. 12 leaf features are extracted and orthogonalized into 5 principal variables which consist the input vector of the PNN. The PNN is trained by 1800 leaves to classify 32 kinds of plants with an accuracy greater than 90%. Compared with other approaches, our algorithm is an accurate artificial intelligence approach which is fast in execution and easy in implementation.

823 citations


Journal ArticleDOI
TL;DR: A multitask learning procedure, based on boosted decision stumps, that reduces the computational and sample complexity by finding common features that can be shared across the classes (and/or views) and considerably reduce the computational cost of multiclass object detection.
Abstract: We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (runtime) computational complexity and the (training-time) sample complexity scale linearly with the number of classes to be detected. We present a multitask learning procedure, based on boosted decision stumps, that reduces the computational and sample complexity by finding common features that can be shared across the classes (and/or views). The detectors for each class are trained jointly, rather than independently. For a given performance level, the total number of features required and, therefore, the runtime cost of the classifier, is observed to scale approximately logarithmically with the number of classes. The features selected by joint training are generic edge-like features, whereas the features chosen by training each class separately tend to be more object-specific. The generic features generalize better and considerably reduce the computational cost of multiclass object detection

Journal ArticleDOI
TL;DR: The introduction of the composite-kernel framework drastically improves results, and the new fast formulation ranks almost linearly in the computational cost, rather than cubic as in the original method, thus allowing the use of this method in remote-sensing applications.
Abstract: This paper presents a semi-supervised graph-based method for the classification of hyperspectral images. The method is designed to handle the special characteristics of hyperspectral images, namely, high-input dimension of pixels, low number of labeled samples, and spatial variability of the spectral signature. To alleviate these problems, the method incorporates three ingredients, respectively. First, being a kernel-based method, it combats the curse of dimensionality efficiently. Second, following a semi-supervised approach, it exploits the wealth of unlabeled samples in the image, and naturally gives relative importance to the labeled ones through a graph-based methodology. Finally, it incorporates contextual information through a full family of composite kernels. Noting that the graph method relies on inverting a huge kernel matrix formed by both labeled and unlabeled samples, we originally introduce the Nystro umlm method in the formulation to speed up the classification process. The presented semi-supervised-graph-based method is compared to state-of-the-art support vector machines in the classification of hyperspectral data. The proposed method produces better classification maps, which capture the intrinsic structure collectively revealed by labeled and unlabeled points. Good and stable accuracy is produced in ill-posed classification problems (high dimensional spaces and low number of labeled samples). In addition, the introduction of the composite-kernel framework drastically improves results, and the new fast formulation ranks almost linearly in the computational cost, rather than cubic as in the original method, thus allowing the use of this method in remote-sensing applications.

Journal ArticleDOI
TL;DR: A simple context-based scene recognition algorithm for mobile robotics applications that presents the advantage of being biologically plausible and of having low-computational complexity, sharing its low-level features with a model for visual attention that may operate concurrently on a robot.
Abstract: We describe and validate a simple context-based scene recognition algorithm for mobile robotics applications. The system can differentiate outdoor scenes from various sites on a college campus using a multiscale set of early-visual features, which capture the "gist" of the scene into a low-dimensional signature vector. Distinct from previous approaches, the algorithm presents the advantage of being biologically plausible and of having low-computational complexity, sharing its low-level features with a model for visual attention that may operate concurrently on a robot. We compare classification accuracy using scenes filmed at three outdoor sites on campus (13,965 to 34,711 frames per site). Dividing each site into nine segments, we obtain segment classification rates between 84.21 percent and 88.62 percent. Combining scenes from all sites (75,073 frames in total) yields 86.45 percent correct classification, demonstrating the generalization and scalability of the approach

Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper shows that formulating the problem in a Naive Bayesian classification framework makes such preprocessing unnecessary and produces an algorithm that is simple, efficient, and robust, and it scales well to handle large number of classes.
Abstract: While feature point recognition is a key component of modern approaches to object detection, existing approaches require computationally expensive patch preprocessing to handle perspective distortion. In this paper, we show that formulating the problem in a Naive Bayesian classification framework makes such preprocessing unnecessary and produces an algorithm that is simple, efficient, and robust. Furthermore, it scales well to handle large number of classes. To recognize the patches surrounding keypoints, our classifier uses hundreds of simple binary features and models class posterior probabilities. We make the problem computationally tractable by assuming independence between arbitrary sets of features. Even though this is not strictly true, we demonstrate that our classifier nevertheless performs remarkably well on image datasets containing very significant perspective changes.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: This paper introduces an algorithm for learning shapelet features, a set of mid-level features that are built from low-level gradient information that discriminates between pedestrian and non-pedestrian classes on the INRIA dataset.
Abstract: In this paper, we address the problem of detecting pedestrians in still images. We introduce an algorithm for learning shapelet features, a set of mid-level features. These features are focused on local regions of the image and are built from low-level gradient information that discriminates between pedestrian and non-pedestrian classes. Using Ad-aBoost, these shapelet features are created as a combination of oriented gradient responses. To train the final classifier, we use AdaBoost for a second time to select a subset of our learned shapelets. By first focusing locally on smaller feature sets, our algorithm attempts to harvest more useful information than by examining all the low-level features together. We present quantitative results demonstrating the effectiveness of our algorithm. In particular, we obtain an error rate 14 percentage points lower (at 10-6 FPPW) than the previous state of the art detector of Dalal and Triggs on the INRIA dataset.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A hierarchical model that can be characterized as a constellation of bags-of-features and that is able to combine both spatial and spatial-temporal features is proposed and shown to improve the classification performance over bag of feature models.
Abstract: We present a novel model for human action categorization. A video sequence is represented as a collection of spatial and spatial-temporal features by extracting static and dynamic interest points. We propose a hierarchical model that can be characterized as a constellation of bags-of-features and that is able to combine both spatial and spatial-temporal features. Given a novel video sequence, the model is able to categorize human actions in a frame-by-frame basis. We test the model on a publicly available human action dataset [2] and show that our new method performs well on the classification task. We also conducted control experiments to show that the use of the proposed mixture of hierarchical models improves the classification performance over bag of feature models. An additional experiment shows that using both dynamic and static features provides a richer representation of human actions when compared to the use of a single feature type, as demonstrated by our evaluation in the classification task.

Journal ArticleDOI
TL;DR: This paper presents a technique for dimensionality reduction to deal with hyperspectral images based on a hierarchical clustering structure to group bands to minimize the intracluster variance and maximize the intercluster variance.
Abstract: Hyperspectral imaging involves large amounts of information. This paper presents a technique for dimensionality reduction to deal with hyperspectral images. The proposed method is based on a hierarchical clustering structure to group bands to minimize the intracluster variance and maximize the intercluster variance. This aim is pursued using information measures, such as distances based on mutual information or Kullback-Leibler divergence, in order to reduce data redundancy and non useful information among image bands. Experimental results include a comparison among some relevant and recent methods for hyperspectral band selection using no labeled information, showing their performance with regard to pixel image classification tasks. The technique that is presented has a stable behavior for different image data sets and a noticeable accuracy, mainly when selecting small sets of bands.

Journal ArticleDOI
TL;DR: This paper presents a method for classification of structural brain magnetic resonance (MR) images, by using a combination of deformation-based morphometry and machine learning methods, which demonstrates not only high classification accuracy but also good stability.
Abstract: This paper presents a method for classification of structural brain magnetic resonance (MR) images, by using a combination of deformation-based morphometry and machine learning methods. A morphological representation of the anatomy of interest is first obtained using a high-dimensional mass-preserving template warping method, which results in tissue density maps that constitute local tissue volumetric measurements. Regions that display strong correlations between tissue volume and classification (clinical) variables are extracted using a watershed segmentation algorithm, taking into account the regional smoothness of the correlation map which is estimated by a cross-validation strategy to achieve robustness to outliers. A volume increment algorithm is then applied to these regions to extract regional volumetric features, from which a feature selection technique using support vector machine (SVM)-based criteria is used to select the most discriminative features, according to their effect on the upper bound of the leave-one-out generalization error. Finally, SVM-based classification is applied using the best set of features, and it is tested using a leave-one-out cross-validation strategy. The results on MR brain images of healthy controls and schizophrenia patients demonstrate not only high classification accuracy (91.8% for female subjects and 90.8% for male subjects), but also good stability with respect to the number of features selected and the size of SVM kernel used

Proceedings ArticleDOI
26 Dec 2007
TL;DR: Non-metric similarities between pairs of images by matching SIFT features are derived and affinity propagation successfully identifies meaningful categories, which provide a natural summarization of the training images and can be used to classify new input images.
Abstract: Unsupervised categorization of images or image parts is often needed for image and video summarization or as a preprocessing step in supervised methods for classification, tracking and segmentation. While many metric-based techniques have been applied to this problem in the vision community, often, the most natural measures of similarity (e.g., number of matching SIFT features) between pairs of images or image parts is non-metric. Unsupervised categorization by identifying a subset of representative exemplars can be efficiently performed with the recently-proposed 'affinity propagation' algorithm. In contrast to k-centers clustering, which iteratively refines an initial randomly-chosen set of exemplars, affinity propagation simultaneously considers all data points as potential exemplars and iteratively exchanges messages between data points until a good solution emerges. When applied to the Olivetti face data set using a translation-invariant non-metric similarity, affinity propagation achieves a much lower reconstruction error and nearly halves the classification error rate, compared to state-of-the-art techniques. For the more challenging problem of unsupervised categorization of images from the CaltechlOl data set, we derived non-metric similarities between pairs of images by matching SIFT features. Affinity propagation successfully identifies meaningful categories, which provide a natural summarization of the training images and can be used to classify new input images.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A time-efficient action detection method based on dynamic learning of subspaces for tensor CCA for the case that actions are not aligned in the space-time domain is proposed.
Abstract: We introduce a new framework, namely tensor canonical correlation analysis (TCCA) which is an extension of classical canonical correlation analysis (CCA) to multidimensional data arrays (or tensors) and apply this for action/gesture classification in videos. By tensor CCA, joint space-time linear relationships of two video volumes are inspected to yield flexible and descriptive similarity features of the two videos. The TCCA features are combined with a discriminative feature selection scheme and a nearest neighbor classifier for action classification. In addition, we propose a time-efficient action detection method based on dynamic learning of subspaces for tensor CCA for the case that actions are not aligned in the space-time domain. The proposed method delivered significantly better accuracy and comparable detection speed over state-of-the-art methods on the KTH action data set as well as self-recorded hand gesture data sets.

Journal ArticleDOI
TL;DR: This paper addresses the problem of image segmentation by means of active contours, whose evolution is driven by the gradient flow derived from an energy functional that is based on the Bhattacharyya distance, and proposes a method for automatically adjusting the smoothness properties of the empirical distributions.
Abstract: This paper addresses the problem of image segmentation by means of active contours, whose evolution is driven by the gradient flow derived from an energy functional that is based on the Bhattacharyya distance. In particular, given the values of a photometric variable (or of a set thereof), which is to be used for classifying the image pixels, the active contours are designed to converge to the shape that results in maximal discrepancy between the empirical distributions of the photometric variable inside and outside of the contours. The above discrepancy is measured by means of the Bhattacharyya distance that proves to be an extremely useful tool for solving the problem at hand. The proposed methodology can be viewed as a generalization of the segmentation methods, in which active contours maximize the difference between a finite number of empirical moments of the ldquoinsiderdquo and ldquooutsiderdquo distributions. Furthermore, it is shown that the proposed methodology is very versatile and flexible in the sense that it allows one to easily accommodate a diversity of the image features based on which the segmentation should be performed. As an additional contribution, a method for automatically adjusting the smoothness properties of the empirical distributions is proposed. Such a procedure is crucial in situations when the number of data samples (supporting a certain segmentation class) varies considerably in the course of the evolution of the active contour. In this case, the smoothness properties of the empirical distributions have to be properly adjusted to avoid either over- or underestimation artifacts. Finally, a number of relevant segmentation results are demonstrated and some further research directions are discussed.

Journal ArticleDOI
TL;DR: The results of large-scale experiments demonstrate that the novel automatic target recognition (ATR) scheme outperforms the state-of-the-art systems reported in the literature.
Abstract: The paper proposed a novel automatic target recognition (ATR) system for classification of three types of ground vehicles in the moving and stationary target acquisition and recognition (MSTAR) public release database. First MSTAR image chips are represented as fine and raw feature vectors, where raw features compensate for the target pose estimation error that corrupts fine image features. Then, the chips are classified by using the adaptive boosting (AdaBoost) algorithm with the radial basis function (RBF) network as the base learner. Since the RBF network is a binary classifier, the multiclass problem was decomposed into a set of binary ones through the error-correcting output codes (ECOC) method, specifying a dictionary of code words for the set of three possible classes. AdaBoost combines the classification results of the RBF network for each binary problem into a code word, which is then "decoded" as one of the code words (i.e., ground-vehicle classes) in the specified dictionary. Along with classification, within the AdaBoost framework, we also conduct efficient fusion of the fine and raw image-feature vectors. The results of large-scale experiments demonstrate that our ATR scheme outperforms the state-of-the-art systems reported in the literature

Proceedings ArticleDOI
TL;DR: A framework for compressive classification that operates directly on the compressive measurements without first reconstructing the image is proposed, and the effectiveness of the smashed filter for target classification using very few measurements is demonstrated.
Abstract: The theory of compressive sensing (CS) enables the reconstruction of a sparse or compressible image or signal from a small set of linear, non-adaptive (even random) projections. However, in many applications, including object and target recognition, we are ultimately interested in making a decision about an image rather than computing a reconstruction. We propose here a framework for compressive classification that operates directly on the compressive measurements without first reconstructing the image. We dub the resulting dimensionally reduced matched filter the smashed filter. The first part of the theory maps traditional maximum likelihood hypothesis testing into the compressive domain; we find that the number of measurements required for a given classification performance level does not depend on the sparsity or compressibility of the images but only on the noise level. The second part of the theory applies the generalized maximum likelihood method to deal with unknown transformations such as the translation, scale, or viewing angle of a target object. We exploit the fact the set of transformed images forms a low-dimensional, nonlinear manifold in the high-dimensional image space. We find that the number of measurements required for a given classification performance level grows linearly in the dimensionality of the manifold but only logarithmically in the number of pixels/samples and image classes. Using both simulations and measurements from a new single-pixel compressive camera, we demonstrate the effectiveness of the smashed filter for target classification using very few measurements.

Journal ArticleDOI
TL;DR: A multiobjective optimization algorithm is utilized to tackle the problem of fuzzy partitioning where a number of fuzzy cluster validity indexes are simultaneously optimized and the resultant set of near-Pareto-optimal solutions contains aNumber of nondominated solutions, which the user can judge relatively and pick up the most promising one according to the problem requirements.
Abstract: An important approach for unsupervised landcover classification in remote sensing images is the clustering of pixels in the spectral domain into several fuzzy partitions. In this paper, a multiobjective optimization algorithm is utilized to tackle the problem of fuzzy partitioning where a number of fuzzy cluster validity indexes are simultaneously optimized. The resultant set of near-Pareto-optimal solutions contains a number of nondominated solutions, which the user can judge relatively and pick up the most promising one according to the problem requirements. Real-coded encoding of the cluster centers is used for this purpose. Results demonstrating the effectiveness of the proposed technique are provided for numeric remote sensing data described in terms of feature vectors. Different landcover regions in remote sensing imagery have also been classified using the proposed technique to establish its efficiency

Proceedings ArticleDOI
26 Dec 2007
TL;DR: This paper treats tracking as a foreground/background classification problem and proposes an online semi- supervised learning framework that improves each individual classifier using the information from other features, thus leading to a more robust tracker.
Abstract: This paper treats tracking as a foreground/background classification problem and proposes an online semi- supervised learning framework. Initialized with a small number of labeled samples, semi-supervised learning treats each new sample as unlabeled data. Classification of new data and updating of the classifier are achieved simultaneously in a co-training framework. The object is represented using independent features and an online support vector machine (SVM) is built for each feature. The predictions from different features are fused by combining the confidence map from each classifier using a classifier weighting method which creates a final classifier that performs better than any classifier based on a single feature. The semi-supervised learning approach then uses the output of the combined confidence map to generate new samples and update the SVMs online. With this approach, the tracker gains increasing knowledge of the object and background and continually improves itself over time. Compared to other discriminative trackers, the online semi-supervised learning approach improves each individual classifier using the information from other features, thus leading to a more robust tracker. Experiments show that this framework performs better than state-of-the-art tracking algorithms on challenging sequences.

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A family of kernels between images, defined as kernels between their respective segmentation graphs, based on soft matching of subtree-patterns of the respective graphs, leveraging the natural structure of images while remaining robust to the associated segmentation process uncertainty.
Abstract: We propose a family of kernels between images, defined as kernels between their respective segmentation graphs. The kernels are based on soft matching of subtree-patterns of the respective graphs, leveraging the natural structure of images while remaining robust to the associated segmentation process uncertainty. Indeed, output from morphological segmentation is often represented by a labelled graph, each vertex corresponding to a segmented region, with edges joining neighboring regions. However, such image representations have mostly remained underused for learning tasks, partly because of the observed instability of the segmentation process and the inherent hardness of inexact graph matching with uncertain graphs. Our kernels count common virtual substructures amongst images, which enables to perform efficient supervised classification of natural images with a support vector machine. Moreover, the kernel machinery allows us to take advantage of recent advances in kernel-based learning: (i) semi-supervised learning reduces the required number of labelled images, while (ii) multiple kernel learning algorithms efficiently select the most relevant similarity measures between images within our family.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a new color transform model to find important "vehicle color" for quickly locating possible vehicle candidates, and three important features including corners, edge maps, and coefficients of wavelet transforms, are used for constructing a cascade multichannel classifier.
Abstract: This paper presents a novel vehicle detection approach for detecting vehicles from static images using color and edges. Different from traditional methods, which use motion features to detect vehicles, this method introduces a new color transform model to find important "vehicle color" for quickly locating possible vehicle candidates. Since vehicles have various colors under different weather and lighting conditions, seldom works were proposed for the detection of vehicles using colors. The proposed new color transform model has excellent capabilities to identify vehicle pixels from background, even though the pixels are lighted under varying illuminations. After finding possible vehicle candidates, three important features, including corners, edge maps, and coefficients of wavelet transforms, are used for constructing a cascade multichannel classifier. According to this classifier, an effective scanning can be performed to verify all possible candidates quickly. The scanning process can be quickly achieved because most background pixels are eliminated in advance by the color feature. Experimental results show that the integration of global color features and local edge features is powerful in the detection of vehicles. The average accuracy rate of vehicle detection is 94.9%

Journal ArticleDOI
TL;DR: It is shown experimentally that the proposed nonlinear image deformation models performs very well for four different handwritten digit recognition tasks and for the classification of medical images, thus showing high generalization capacity.
Abstract: We present the application of different nonlinear image deformation models to the task of image recognition The deformation models are especially suited for local changes as they often occur in the presence of image object variability We show that, among the discussed models, there is one approach that combines simplicity of implementation, low-computational complexity, and highly competitive performance across various real-world image recognition tasks We show experimentally that the model performs very well for four different handwritten digit recognition tasks and for the classification of medical images, thus showing high generalization capacity In particular, an error rate of 054 percent on the MNIST benchmark is achieved, as well as the lowest reported error rate, specifically 126 percent, in the 2005 international ImageCLEF evaluation of medical image specifically categorization

Journal ArticleDOI
TL;DR: This letter introduces an embedded-feature-selection (EFS) algorithm that is tailored to operate with support vector machines (SVMs) to perform band selection and classification simultaneously.
Abstract: Hyperspectral images consist of large number of bands which require sophisticated analysis to extract. One approach to reduce computational cost, information representation, and accelerate knowledge discovery is to eliminate bands that do not add value to the classification and analysis method which is being applied. In particular, algorithms that perform band elimination should be designed to take advantage of the structure of the classification method used. This letter introduces an embedded-feature-selection (EFS) algorithm that is tailored to operate with support vector machines (SVMs) to perform band selection and classification simultaneously. We have successfully applied this algorithm to determine a reasonable subset of bands without any user-defined stopping criteria on some sample AVIRIS images; a problem occurs in benchmarking recursive-feature-elimination methods for the SVMs.

Journal ArticleDOI
TL;DR: An iris classification method is proposed that divides the segmented and normalized iris image into six regions, makes an independent feature extraction and comparison for each region, and combines each of the dissimilarity values through a classification rule.
Abstract: This paper focuses on noncooperative iris recognition, i.e., the capture of iris images at large distances, under less controlled lighting conditions, and without active participation of the subjects. This increases the probability of capturing very heterogeneous images (regarding focus, contrast, or brightness) and with several noise factors (iris obstructions and reflections). Current iris recognition systems are unable to deal with noisy data and substantially increase their error rates, especially the false rejections, in these conditions. We propose an iris classification method that divides the segmented and normalized iris image into six regions, makes an independent feature extraction and comparison for each region, and combines each of the dissimilarity values through a classification rule. Experiments show a substantial decrease, higher than 40 percent, of the false rejection rates in the recognition of noisy iris images