scispace - formally typeset
Search or ask a question

Showing papers on "Contextual image classification published in 2008"


Proceedings ArticleDOI
23 Jun 2008
TL;DR: It is argued that two practices commonly used in image classification methods, have led to the inferior performance of NN-based image classifiers: Quantization of local image descriptors (used to generate "bags-of-words ", codebooks) and Computation of 'image-to-image' distance, instead of ' image- to-class' distance.
Abstract: State-of-the-art image classification methods require an intensive learning/training stage (using SVM, Boosting, etc.) In contrast, non-parametric nearest-neighbor (NN) based image classifiers require no training time and have other favorable properties. However, the large performance gap between these two families of approaches rendered NN-based image classifiers useless. We claim that the effectiveness of non-parametric NN-based image classification has been considerably undervalued. We argue that two practices commonly used in image classification methods, have led to the inferior performance of NN-based image classifiers: (i) Quantization of local image descriptors (used to generate "bags-of-words ", codebooks). (ii) Computation of 'image-to-image' distance, instead of 'image-to-class' distance. We propose a trivial NN-based classifier - NBNN, (Naive-Bayes nearest-neighbor), which employs NN- distances in the space of the local image descriptors (and not in the space of images). NBNN computes direct 'image- to-class' distances without descriptor quantization. We further show that under the Naive-Bayes assumption, the theoretically optimal image classifier can be accurately approximated by NBNN. Although NBNN is extremely simple, efficient, and requires no learning/training phase, its performance ranks among the top leading learning-based image classifiers. Empirical comparisons are shown on several challenging databases (Caltech-101 ,Caltech-256 and Graz-01).

1,228 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: The proposed semantic texton forests are ensembles of decision trees that act directly on image pixels, and therefore do not need the expensive computation of filter-bank responses or local descriptors, and give at least a five-fold increase in execution speed.
Abstract: We propose semantic texton forests, efficient and powerful new low-level features. These are ensembles of decision trees that act directly on image pixels, and therefore do not need the expensive computation of filter-bank responses or local descriptors. They are extremely fast to both train and test, especially compared with k-means clustering and nearest-neighbor assignment of feature descriptors. The nodes in the trees provide (i) an implicit hierarchical clustering into semantic textons, and (ii) an explicit local classification estimate. Our second contribution, the bag of semantic textons, combines a histogram of semantic textons over an image region with a region prior category distribution. The bag of semantic textons is computed over the whole image for categorization, and over local rectangular regions for segmentation. Including both histogram and region prior allows our segmentation algorithm to exploit both textural and semantic context. Our third contribution is an image-level prior for segmentation that emphasizes those categories that the automatic categorization believes to be present. We evaluate on two datasets including the very challenging VOC 2007 segmentation dataset. Our results significantly advance the state-of-the-art in segmentation accuracy, and furthermore, our use of efficient decision forests gives at least a five-fold increase in execution speed.

1,162 citations


Journal ArticleDOI
TL;DR: An overview of how to use remote sensing imagery to classify and map vegetation cover is presented, focusing on the comparisons of popular remote sensing sensors, commonly adopted image processing methods and prevailing classification accuracy assessments.
Abstract: Aims Mapping vegetation through remotely sensed images involves various considerations, processes and techniques. Increasing availability of remotely sensed images due to the rapid advancement of remote sensing technology expands the horizon of our choices of imagery sources. Various sources of imagery are known for their differences in spectral, spatial, radioactive and temporal characteristics and thus are suitable for different purposes of vegetation mapping. Generally, it needs to develop a vegetation classification at first for classifying and mapping vegetation cover from remote sensed images either at a community level or species level. Then, correlations of the vegetation types (communities or species) within this classification system with discernible spectral characteristics of remote sensed imagery have to be identified. These spectral classes of the imagery are finally translated into the vegetation types in the image interpretation process, which is also called image processing. This paper presents an overview of how to use remote sensing imagery to classify and map vegetation cover. Methods Specifically, this paper focuses on the comparisons of popular remote sensing sensors, commonly adopted image processing methods and prevailing classification accuracy assessments. Important findings The basic concepts, available imagery sources and classification techniques of remote sensing imagery related to vegetation mapping were introduced, analyzed and compared. The advantages and limitations of using remote sensing imagery for vegetation cover mapping were provided to iterate the importance of thorough understanding of the related concepts and careful design of the technical procedures, which can be utilized to study vegetation cover from remote sensed images.

1,102 citations


Journal ArticleDOI
TL;DR: An approach has been proposed which is based on using several principal components from the hyperspectral data and build morphological profiles which can be used all together in one extended morphological profile for classification of urban structures.
Abstract: A method is proposed for the classification of urban hyperspectral data with high spatial resolution. The approach is an extension of previous approaches and uses both the spatial and spectral information for classification. One previous approach is based on using several principal components (PCs) from the hyperspectral data and building several morphological profiles (MPs). These profiles can be used all together in one extended MP. A shortcoming of that approach is that it was primarily designed for classification of urban structures and it does not fully utilize the spectral information in the data. Similarly, the commonly used pixelwise classification of hyperspectral data is solely based on the spectral content and lacks information on the structure of the features in the image. The proposed method overcomes these problems and is based on the fusion of the morphological information and the original hyperspectral data, i.e., the two vectors of attributes are concatenated into one feature vector. After a reduction of the dimensionality, the final classification is achieved by using a support vector machine classifier. The proposed approach is tested in experiments on ROSIS data from urban areas. Significant improvements are achieved in terms of accuracies when compared to results obtained for approaches based on the use of MPs based on PCs only and conventional spectral classification. For instance, with one data set, the overall accuracy is increased from 79% to 83% without any feature reduction and to 87% with feature reduction. The proposed approach also shows excellent results with a limited training set.

1,092 citations


Journal ArticleDOI
TL;DR: This work proposes a suitable extension of label ranking that incorporates the calibrated scenario and substantially extends the expressive power of existing approaches and suggests a conceptually novel technique for extending the common learning by pairwise comparison approach to the multilabel scenario, a setting previously not being amenable to the pairwise decomposition technique.
Abstract: Label ranking studies the problem of learning a mapping from instances to rankings over a predefined set of labels. Hitherto existing approaches to label ranking implicitly operate on an underlying (utility) scale which is not calibrated in the sense that it lacks a natural zero point. We propose a suitable extension of label ranking that incorporates the calibrated scenario and substantially extends the expressive power of these approaches. In particular, our extension suggests a conceptually novel technique for extending the common learning by pairwise comparison approach to the multilabel scenario, a setting previously not being amenable to the pairwise decomposition technique. The key idea of the approach is to introduce an artificial calibration label that, in each example, separates the relevant from the irrelevant labels. We show that this technique can be viewed as a combination of pairwise preference learning and the conventional relevance classification technique, where a separate classifier is trained to predict whether a label is relevant or not. Empirical results in the area of text categorization, image classification and gene analysis underscore the merits of the calibrated model in comparison to state-of-the-art multilabel learning methods.

825 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A simple yet powerful branch-and-bound scheme that allows efficient maximization of a large class of classifier functions over all possible subimages and converges to a globally optimal solution typically in sublinear time is proposed.
Abstract: Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To perform localization, one can take a sliding window approach, but this strongly increases the computational cost, because the classifier function has to be evaluated over a large set of candidate subwindows. In this paper, we propose a simple yet powerful branch-and-bound scheme that allows efficient maximization of a large class of classifier functions over all possible subimages. It converges to a globally optimal solution typically in sublinear time. We show how our method is applicable to different object detection and retrieval scenarios. The achieved speedup allows the use of classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest neighbor classifiers based on the chi2-distance. We demonstrate state-of-the-art performance of the resulting systems on the UIUC Cars dataset, the PASCAL VOC 2006 dataset and in the PASCAL VOC 2007 competition.

801 citations


Journal ArticleDOI
TL;DR: This work introduces a novel vocabulary using dense color SIFT descriptors and investigates the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM).
Abstract: We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos.

778 citations


Journal ArticleDOI
TL;DR: An experimental comparison of a large number of different image descriptors for content-based image retrieval is presented and the often used, but very simple, color histogram performs well in the comparison and thus can be recommended as a simple baseline for many applications.
Abstract: An experimental comparison of a large number of different image descriptors for content-based image retrieval is presented. Many of the papers describing new techniques and descriptors for content-based image retrieval describe their newly proposed methods as most appropriate without giving an in-depth comparison with all methods that were proposed earlier. In this paper, we first give an overview of a large variety of features for content-based image retrieval and compare them quantitatively on four different tasks: stock photo retrieval, personal photo collection retrieval, building retrieval, and medical image retrieval. For the experiments, five different, publicly available image databases are used and the retrieval performance of the features is analyzed in detail. This allows for a direct comparison of all features considered in this work and furthermore will allow a comparison of newly proposed features to these in the future. Additionally, the correlation of the features is analyzed, which opens the way for a simple and intuitive method to find an initial set of suitable features for a new task. The article concludes with recommendations which features perform well for what type of data. Interestingly, the often used, but very simple, color histogram performs well in the comparison and thus can be recommended as a simple baseline for many applications.

641 citations


Journal ArticleDOI
TL;DR: This work presents an online method that makes it possible to detect when an image comes from an already perceived scene using local shape and color information, and extends the bag-of-words method used in image classification to incremental conditions and relies on Bayesian filtering to estimate loop-closure probability.
Abstract: In robotic applications of visual simultaneous localization and mapping techniques, loop-closure detection and global localization are two issues that require the capacity to recognize a previously visited place from current camera measurements. We present an online method that makes it possible to detect when an image comes from an already perceived scene using local shape and color information. Our approach extends the bag-of-words method used in image classification to incremental conditions and relies on Bayesian filtering to estimate loop-closure probability. We demonstrate the efficiency of our solution by real-time loop-closure detection under strong perceptual aliasing conditions in both indoor and outdoor image sequences taken with a handheld camera.

521 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A method constructing mid-level motion features which are built from low-level optical flow information are developed, tuned to discriminate between different classes of action, and are efficient to compute at run-time.
Abstract: This paper presents a method for human action recognition based on patterns of motion Previous approaches to action recognition use either local features describing small patches or large-scale features describing the entire human figure We develop a method constructing mid-level motion features which are built from low-level optical flow information These features are focused on local regions of the image sequence and are created using a variant of AdaBoost These features are tuned to discriminate between different classes of action, and are efficient to compute at run-time A battery of classifiers based on these mid-level features is created and used to classify input sequences State-of-the-art results are presented on a variety of standard datasets

519 citations


Journal ArticleDOI
TL;DR: The elevation channel of the first LIDAR return was very effective for the separation of species with similar spectral signatures but different mean heights, and the SVM classifier proved to be very robust and accurate in the exploitation of the considered multisource data.
Abstract: In this paper, we propose an analysis on the joint effect of hyperspectral and light detection and ranging (LIDAR) data for the classification of complex forest areas. In greater detail, we present: 1) an advanced system for the joint use of hyperspectral and LIDAR data in complex classification problems; 2) an investigation on the effectiveness of the very promising support vector machines (SVMs) and Gaussian maximum likelihood with leave-one-out-covariance algorithm classifiers for the analysis of complex forest scenarios characterized from a high number of species in a multisource framework; and 3) an analysis on the effectiveness of different LIDAR returns and channels (elevation and intensity) for increasing the classification accuracy obtained with hyperspectral images, particularly in relation to the discrimination of very similar classes. Several experiments carried out on a complex forest area in Italy provide interesting conclusions on the effectiveness and potentialities of the joint use of hyperspectral and LIDAR data and on the accuracy of the different classification techniques analyzed in the proposed system. In particular, the elevation channel of the first LIDAR return was very effective for the separation of species with similar spectral signatures but different mean heights, and the SVM classifier proved to be very robust and accurate in the exploitation of the considered multisource data.

Journal ArticleDOI
TL;DR: An overview of the main concepts underlying ANNs, including the main architectures and learning algorithms, are presented, and the main tasks that involve ANNs in remote sensing are described.
Abstract: Artificial neural networks (ANNs) have become a popular tool in the analysis of remotely sensed data. Although significant progress has been made in image classification based upon neural networks, a number of issues remain to be resolved. This paper reviews remotely sensed data analysis with neural networks. First, we present an overview of the main concepts underlying ANNs, including the main architectures and learning algorithms. Then, the main tasks that involve ANNs in remote sensing are described. The limitations and crucial issues relating to the application of the neural network approach are discussed. A brief review of the implementation of ANNs in some of the most popular image processing software packages is presented. Finally, we discuss the application perspectives of neural networks in remote sensing image analysis.

Journal ArticleDOI
TL;DR: A new automatic visual recognition system based only on local contour features, capable of localizing objects in space and scale, is proposed and compared with other methods based on contour and local descriptors in a detailed evaluation over 17 challenging categories.
Abstract: Psychophysical studies show that we can recognize objects using fragments of outline contour alone. This paper proposes a new automatic visual recognition system based only on local contour features, capable of localizing objects in space and scale. The system first builds a class-specific codebook of local fragments of contour using a novel formulation of chamfer matching. These local fragments allow recognition that is robust to within-class variation, pose changes, and articulation. Boosting combines these fragments into a cascaded sliding-window classifier, and mean shift is used to select strong responses as a final set of detection. We show how learning can be performed iteratively on both training and test sets to bootstrap an improved classifier. We compare with other methods based on contour and local descriptors in our detailed evaluation over 17 challenging categories and obtain highly competitive results. The results confirm that contour is indeed a powerful cue for multiscale and multiclass visual object recognition.

Journal ArticleDOI
TL;DR: The experimental result shows that the proposed unsupervised band selection algorithms based on band similarity measurement can yield a better result in terms of information conservation and class separability than other widely used techniques.
Abstract: Band selection is a common approach to reduce the data dimensionality of hyperspectral imagery. It extracts several bands of importance in some sense by taking advantage of high spectral correlation. Driven by detection or classification accuracy, one would expect that, using a subset of original bands, the accuracy is unchanged or tolerably degraded, whereas computational burden is significantly relaxed. When the desired object information is known, this task can be achieved by finding the bands that contain the most information about these objects. When the desired object information is unknown, i.e., unsupervised band selection, the objective is to select the most distinctive and informative bands. It is expected that these bands can provide an overall satisfactory detection and classification performance. In this letter, we propose unsupervised band selection algorithms based on band similarity measurement. The experimental result shows that our approach can yield a better result in terms of information conservation and class separability than other widely used techniques.

Journal ArticleDOI
TL;DR: One of the findings was that the automatic face alignment methods did not increase the gender classification rates, but manual alignment increased classification rates a little, which suggests that automatic alignment would be useful when the alignment methods are further improved.
Abstract: We present a systematic study on gender classification with automatically detected and aligned faces. We experimented with 120 combinations of automatic face detection, face alignment, and gender classification. One of the findings was that the automatic face alignment methods did not increase the gender classification rates. However, manual alignment increased classification rates a little, which suggests that automatic alignment would be useful when the alignment methods are further improved. We also found that the gender classification methods performed almost equally well with different input image sizes. In any case, the best classification rate was achieved with a support vector machine. A neural network and Adaboost achieved almost as good classification rates as the support vector machine and could be used in applications where classification speed is considered more important than the maximum classification accuracy.

Journal ArticleDOI
TL;DR: A general framework based on kernel methods for the integration of heterogeneous sources of information for multitemporal classification of remote sensing images and the development of nonlinear kernel classifiers for the well-known difference and ratioing change detection methods is presented.
Abstract: The multitemporal classification of remote sensing images is a challenging problem, in which the efficient combination of different sources of information (e.g., temporal, contextual, or multisensor) can improve the results. In this paper, we present a general framework based on kernel methods for the integration of heterogeneous sources of information. Using the theoretical principles in this framework, three main contributions are presented. First, a novel family of kernel-based methods for multitemporal classification of remote sensing images is presented. The second contribution is the development of nonlinear kernel classifiers for the well-known difference and ratioing change detection methods by formulating them in an adequate high-dimensional feature space. Finally, the presented methodology allows the integration of contextual information and multisensor images with different levels of nonlinear sophistication. The binary support vector (SV) classifier and the one-class SV domain description classifier are evaluated by using both linear and nonlinear kernel functions. Good performance on synthetic and real multitemporal classification scenarios illustrates the generalization of the framework and the capabilities of the proposed algorithms.

Journal ArticleDOI
TL;DR: In this paper, 3D point cloud data from LiDAR can be used as a tool for extracting simple roughness information relevant for the condition of below canopy flow, as well as roughness relevant for more complex tree morphology that affects the flow when it enters the canopy levels.

Journal ArticleDOI
TL;DR: This work introduces Extremely Randomized Clustering Forests-ensembles of randomly created clustering trees-and shows that they provide more accurate results, much faster training and testing, and good resistance to background clutter.
Abstract: Some of the most effective recent methods for content-based image classification work by quantizing image descriptors, and accumulating histograms of the resulting visual word codes. Large numbers of descriptors and large codebooks are required for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests-ensembles of randomly created clustering trees-and show that they provide more accurate results, much faster training and testing, and good resistance to background clutter. Second, an efficient image classification method is proposed. It combines ERC-Forests and saliency maps very closely with the extraction of image information. For a given image, a classifier builds a saliency map online and uses it to classify the image. We show in several state-of-the-art image classification tasks that this method can speed up the classification process enormously. Finally, we show that the proposed ERC-Forests can also be used very successfully for learning distance between images. The distance computation algorithm consists of learning the characteristic differences between local descriptors sampled from pairs of same or different objects. These differences are vector quantized by ERC-Forests and the similarity measure is computed from this quantization. The similarity measure has been evaluated on four very different datasets and always outperforms the state-of-the-art competitive approaches.

Journal ArticleDOI
TL;DR: Images fusion procedures for the fusion of multi-spectral ASTER data and a RadarSAT-1 SAR scene are explored to determine which fusion procedure merged the largest amount of SAR texture into the ASTER scenes, while also preserving the spectral content.
Abstract: The use of disparate data sources within a pixel level image fusion procedure has been well documented for pan-sharpening studies. The present paper explores various image fusion procedures for the fusion of multi-spectral ASTER data and a RadarSAT-1 SAR scene. The research sought to determine which fusion procedure merged the largest amount of SAR texture into the ASTER scenes, while also preserving the spectral content. An additional application based maximum likelihood classification assessment was also undertaken. Three SAR scenes were tested namely, one backscatter scene and two textural measures calculated using grey level co-occurrence matrices (GLCM). Each of these were fused to the ASTER data using the following established approaches; Brovey transformation, Intensity Hue and Saturation, Principal Component Substitution, Discrete wavelet transformation, and a modified discrete wavelet transformation using the IHS approach. Resulting data sets were assessed using qualitative and quantitative (entropy, universal image quality index, maximum likelihood classification) approaches. Results from the study indicated that while all post fusion data sets contained more information (entropy analysis), only the frequency-based fusion approaches managed to preserve the spectral quality of the original imagery. Furthermore results also indicated that the textural (mean, contrast) SAR scenes did not add any significant amount of information to the post-fusion imagery. Classification accuracy was not improved when comparing ASTER optical data and pseudo optical bands generated from the fusion analysis. Accuracies range from 68.4% for the ASTER data to well below 50% for the component substitution methods. Frequency based approaches also returned lower accuracies when compared to the unfused optical data. The present study essentially replicated (pan-sharpening) studies using the high resolution SAR scene as a pseudo panchromatic band.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: An auto-context algorithm that learns an integrated low-level and context model, and is very general and easy to implement, and has the potential to be used for a wide variety of problems of multi-variate labeling.
Abstract: The notion of using context information for solving high-level vision problems has been increasingly realized in the field. However, how to learn an effective and efficient context model, together with the image appearance, remains mostly unknown. The current literature using Markov random fields (MRFs) and conditional random fields (CRFs) often involves specific algorithm design, in which the modeling and computing stages are studied in isolation. In this paper, we propose an auto-context algorithm. Given a set of training images and their corresponding label maps, we first learn a classifier on local image patches. The discriminative probability (or classification confidence) maps by the learned classifier are then used as context information, in addition to the original image patches, to train a new classifier. The algorithm then iterates to approach the ground truth. Auto-context learns an integrated low-level and context model, and is very general and easy to implement. Under nearly the identical parameter setting in the training, we apply the algorithm on three challenging vision applications: object segmentation, human body configuration, and scene region labeling. It typically takes about 30 ~ 70 seconds to run the algorithm in testing. Moreover, the scope of the proposed algorithm goes beyond high-level vision. It has the potential to be used for a wide variety of problems of multi-variate labeling.

Journal ArticleDOI
TL;DR: A novel method, referred to as LRTA, is proposed, which performs both spatial lower rank approximation and spectral DR, which achieves denoising reduction and DR in hyperspectral image analysis.
Abstract: In hyperspectral image (HSI) analysis, classification requires spectral dimensionality reduction (DR). While common DR methods use linear algebra, we propose a multilinear algebra method to jointly achieve denoising reduction and DR. Multilinear tools consider HSI data as a whole by processing jointly spatial and spectral ways. The lower rank-(K1, K2, K3) tensor approximation [LRTA-(K1, K2, K3)] was successfully applied to denoise multiway data such as color images. First, we demonstrate that the LRTA-(K1, K2, K3) performs well as a denoising preprocessing to improve classification results. Then, we propose a novel method, referred to as LRTAdr-(K1, K2, D3), which performs both spatial lower rank approximation and spectral DR. The classification algorithm Spectral Angle Mapper is applied to the output of the following three DR and noise reduction methods to compare their efficiency: the proposed LRTAdr-(K1, K2, D3), PCAdr, and PCAdr associated with Wiener filtering or soft shrinkage of wavelet transform coefficients.

Proceedings Article
08 Dec 2008
TL;DR: Through experiments on the text-aided image classification and cross-language classification tasks, it is demonstrated that the translated learning framework can greatly outperform many state-of-the-art baseline methods.
Abstract: This paper investigates a new machine learning strategy called translated learning. Unlike many previous learning tasks, we focus on how to use labeled data from one feature space to enhance the classification of other entirely different learning spaces. For example, we might wish to use labeled text data to help learn a model for classifying image data, when the labeled images are difficult to obtain. An important aspect of translated learning is to build a "bridge" to link one feature space (known as the "source space") to another space (known as the "target space") through a translator in order to migrate the knowledge from source to target. The translated learning solution uses a language model to link the class labels to the features in the source spaces, which in turn is translated to the features in the target spaces. Finally, this chain of linkages is completed by tracing back to the instances in the target spaces. We show that this path of linkage can be modeled using a Markov chain and risk minimization. Through experiments on the text-aided image classification and cross-language classification tasks, we demonstrate that our translated learning framework can greatly outperform many state-of-the-art baseline methods.

Journal ArticleDOI
TL;DR: A greater awareness of the problems encountered in accuracy assessment may help ensure that perceptions of classification accuracy are realistic and reduce unfair criticism of thematic maps derived from remote sensing.
Abstract: Thematic mapping via a classification analysis is one of the most common applications of remote sensing. The accuracy of image classifications is, however, often viewed negatively. Here, it is suggested that the approach to the evaluation of image classification accuracy typically adopted in remote sensing may often be unfair, commonly being rather harsh and misleading. It is stressed that the widely used target accuracy of 85% can be inappropriate and that the approach to accuracy assessment adopted commonly in remote sensing is pessimistically biased. Moreover, the maps produced by other communities, which are often used unquestioningly, may have a low accuracy if evaluated from the standard perspective adopted in remote sensing. A greater awareness of the problems encountered in accuracy assessment may help ensure that perceptions of classification accuracy are realistic and reduce unfair criticism of thematic maps derived from remote sensing.

Journal ArticleDOI
TL;DR: An active learning technique that efficiently updates existing classifiers by using fewer labeled data points than semisupervised methods is proposed that is well suited for learning or adapting classifiers when there is substantial change in the spectral signatures between labeled and unlabeled data.
Abstract: Obtaining training data for land cover classification using remotely sensed data is time consuming and expensive especially for relatively inaccessible locations. Therefore, designing classifiers that use as few labeled data points as possible is highly desirable. Existing approaches typically make use of small-sample techniques and semisupervision to deal with the lack of labeled data. In this paper, we propose an active learning technique that efficiently updates existing classifiers by using fewer labeled data points than semisupervised methods. Further, unlike semisupervised methods, our proposed technique is well suited for learning or adapting classifiers when there is substantial change in the spectral signatures between labeled and unlabeled data. Thus, our active learning approach is also useful for classifying a series of spatially/temporally related images, wherein the spectral signatures vary across the images. Our interleaved semisupervised active learning method was tested on both single and spatially/temporally related hyperspectral data sets. We present empirical results that establish the superior performance of our proposed approach versus other active learning and semisupervised methods.

Journal ArticleDOI
TL;DR: The theoretical analysis of the effects of PCA on the discrimination power of the projected subspace is presented from a general pattern classification perspective for two possible scenarios: when PCA is used as a simple dimensionality reduction tool and when it is used to recondition an ill-posed LDA formulation.
Abstract: Dimensionality reduction is a necessity in most hyperspectral imaging applications. Tradeoffs exist between unsupervised statistical methods, which are typically based on principal components analysis (PCA), and supervised ones, which are often based on Fisher's linear discriminant analysis (LDA), and proponents for each approach exist in the remote sensing community. Recently, a combined approach known as subspace LDA has been proposed, where PCA is employed to recondition ill-posed LDA formulations. The key idea behind this approach is to use a PCA transformation as a preprocessor to discard the null space of rank-deficient scatter matrices, so that LDA can be applied on this reconditioned space. Thus, in theory, the subspace LDA technique benefits from the advantages of both methods. In this letter, we present a theoretical analysis of the effects (often ill effects) of PCA on the discrimination power of the projected subspace. The theoretical analysis is presented from a general pattern classification perspective for two possible scenarios: (1) when PCA is used as a simple dimensionality reduction tool and (2) when it is used to recondition an ill-posed LDA formulation. We also provide experimental evidence of the ineffectiveness of both scenarios for hyperspectral target recognition applications.

Journal ArticleDOI
TL;DR: An approach for one-shot multi- class classification of multispectral data was evaluated and was more accurate than the approaches based on a series of binary classifications and had other advantages relative to the binary SVM-based approaches.
Abstract: Support vector machines (SVMs) have considerable potential for supervised classification analyses, but their binary nature has been a constraint on their use in remote sensing. This typically requires a multiclass analysis be broken down into a series of binary classifications, following either the one-against-one or one-against-all strategies. However, the binary SVM can be extended for a one-shot multiclass classification needing a single optimization operation. Here, an approach for one-shot multi- class classification of multispectral data was evaluated against approaches based on binary SVM for a set of five-class classifications. The one-shot multiclass classification was more accurate (92.00%) than the approaches based on a series of binary classifications (89.22% and 91.33%). Additionally, the one-shot multi- class SVM had other advantages relative to the binary SVM-based approaches, notably the need to be optimized only once for the parameters C and 7 as opposed to five times for one-against-all and ten times for the one-against-one approach, respectively, and used fewer support vectors, 215 as compared to 243 and 246 for the binary based approaches. Similar trends were also apparent in results of analyses of a data set of larger dimensionality. It was also apparent that the conventional one-against-all strategy could not be guaranteed to yield a complete confusion matrix that can greatly limit the assessment and later use of a classification derived by that method.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: A partially-blurred-image classification and analysis framework for automatically detecting images containing blurred regions and recognizing the blur types for those regions without needing to perform blur kernel estimation and image deblurring is proposed.
Abstract: In this paper, we propose a partially-blurred-image classification and analysis framework for automatically detecting images containing blurred regions and recognizing the blur types for those regions without needing to perform blur kernel estimation and image deblurring. We develop several blur features modeled by image color, gradient, and spectrum information, and use feature parameter training to robustly classify blurred images. Our blur detection is based on image patches, making region-wise training and classification in one image efficient. Extensive experiments show that our method works satisfactorily on challenging image data, which establishes a technical foundation for solving several computer vision problems, such as motion analysis and image restoration, using the blur information.

Journal ArticleDOI
TL;DR: A multi-purpose image classifier that can be applied to a wide variety of image classification tasks without modifications or fine-tuning, and yet provide classification accuracy comparable to state-of-the-art task-specific image classifiers.

Journal ArticleDOI
TL;DR: This research involves the study and implementation of a new pattern recognition technique introduced within the framework of statistical learning theory called Support Vector Machines (SVMs), and its application to remote‐sensing image classification.
Abstract: Land use classification is an important part of many remote sensing applications. A lot of research has gone into the application of statistical and neural network classifiers to remote-sensing images. This research involves the study and implementation of a new pattern recognition technique introduced within the framework of statistical learning theory called Support Vector Machines (SVMs), and its application to remote-sensing image classification. Standard classifiers such as Artificial Neural Network (ANN) need a number of training samples that exponentially increase with the dimension of the input feature space. With a limited number of training samples, the classification rate thus decreases as the dimensionality increases. SVMs are independent of the dimensionality of feature space as the main idea behind this classification technique is to separate the classes with a surface that maximizes the margin between them, using boundary pixels to create the decision surface. Results from SVMs are compared with traditional Maximum Likelihood Classification (MLC) and an ANN classifier. The findings suggest that the ANN and SVM classifiers perform better than the traditional MLC. The SVM and the ANN show comparable results. However, accuracy is dependent on factors such as the number of hidden nodes (in the case of ANN) and kernel parameters (in the case of SVM). The training time taken by the SVM is several magnitudes less.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: This paper proposes a novel optimization framework that unifies codebook generation with classifier training, and demonstrates the value of unifying representation and classification into a single optimization framework.
Abstract: The idea of representing images using a bag of visual words is currently popular in object category recognition. Since this representation is typically constructed using unsupervised clustering, the resulting visual words may not capture the desired information. Recent work has explored the construction of discriminative visual codebooks that explicitly consider object category information. However, since the codebook generation process is still disconnected from that of classifier training, the set of resulting visual words, while individually discriminative, may not be those best suited for the classifier. This paper proposes a novel optimization framework that unifies codebook generation with classifier training. In our approach, each image feature is encoded by a sequence of ldquovisual bitsrdquo optimized for each category. An image, which can contain objects from multiple categories, is represented using aggregates of visual bits for each category. Classifiers associated with different categories determine how well a given image corresponds to each category. Based on the performance of these classifiers on the training data, we augment the visual words by generating additional bits. The classifiers are then updated to incorporate the new representation. These two phases are repeated until the desired performance is achieved. Experiments compare our approach to standard clustering-based methods and with state-of-the-art discriminative visual codebook generation. The significant improvements over previous techniques clearly demonstrate the value of unifying representation and classification into a single optimization framework.