scispace - formally typeset
Search or ask a question

Showing papers on "Contextual image classification published in 2010"


Proceedings ArticleDOI
13 Jun 2010
TL;DR: This paper presents a simple but effective coding scheme called Locality-constrained Linear Coding (LLC) in place of the VQ coding in traditional SPM, using the locality constraints to project each descriptor into its local-coordinate system, and the projected coordinates are integrated by max pooling to generate the final representation.
Abstract: The traditional SPM approach based on bag-of-features (BoF) requires nonlinear classifiers to achieve good image classification performance. This paper presents a simple but effective coding scheme called Locality-constrained Linear Coding (LLC) in place of the VQ coding in traditional SPM. LLC utilizes the locality constraints to project each descriptor into its local-coordinate system, and the projected coordinates are integrated by max pooling to generate the final representation. With linear classifier, the proposed approach performs remarkably better than the traditional nonlinear SPM, achieving state-of-the-art performance on several benchmarks. Compared with the sparse coding strategy [22], the objective function used by LLC has an analytical solution. In addition, the paper proposes a fast approximated LLC method by first performing a K-nearest-neighbor search and then solving a constrained least square fitting problem, bearing computational complexity of O(M + K2). Hence even with very large codebooks, our system can still process multiple frames per second. This efficiency significantly adds to the practical values of LLC for real applications.

3,307 citations


Book ChapterDOI
05 Sep 2010
TL;DR: In an evaluation involving hundreds of thousands of training images, it is shown that classifiers learned on Flickr groups perform surprisingly well and that they can complement classifier learned on more carefully annotated datasets.
Abstract: The Fisher kernel (FK) is a generic framework which combines the benefits of generative and discriminative approaches. In the context of image classification the FK was shown to extend the popular bag-of-visual-words (BOV) by going beyond count statistics. However, in practice, this enriched representation has not yet shown its superiority over the BOV. In the first part we show that with several well-motivated modifications over the original framework we can boost the accuracy of the FK. On PASCAL VOC 2007 we increase the Average Precision (AP) from 47.9% to 58.3%. Similarly, we demonstrate state-of-the-art accuracy on CalTech 256. A major advantage is that these results are obtained using only SIFT descriptors and costless linear classifiers. Equipped with this representation, we can now explore image classification on a larger scale. In the second part, as an application, we compare two abundant resources of labeled images to learn classifiers: ImageNet and Flickr groups. In an evaluation involving hundreds of thousands of training images we show that classifiers learned on Flickr groups perform surprisingly well (although they were not intended for this purpose) and that they can complement classifiers learned on more carefully annotated datasets.

2,961 citations


Journal ArticleDOI
TL;DR: A novel approach of face identification by formulating the pattern recognition problem in terms of linear regression, using a fundamental concept that patterns from a single-object class lie on a linear subspace, and introducing a novel Distance-based Evidence Fusion (DEF) algorithm.
Abstract: In this paper, we present a novel approach of face identification by formulating the pattern recognition problem in terms of linear regression. Using a fundamental concept that patterns from a single-object class lie on a linear subspace, we develop a linear model representing a probe image as a linear combination of class-specific galleries. The inverse problem is solved using the least-squares method and the decision is ruled in favor of the class with the minimum reconstruction error. The proposed Linear Regression Classification (LRC) algorithm falls in the category of nearest subspace classification. The algorithm is extensively evaluated on several standard databases under a number of exemplary evaluation protocols reported in the face recognition literature. A comparative study with state-of-the-art algorithms clearly reflects the efficacy of the proposed approach. For the problem of contiguous occlusion, we propose a Modular LRC approach, introducing a novel Distance-based Evidence Fusion (DEF) algorithm. The proposed methodology achieves the best results ever reported for the challenging problem of scarf occlusion.

972 citations


Journal ArticleDOI
TL;DR: It is demonstrated that explicitly modeling visual word assignment ambiguity improves classification performance compared to the hard assignment of the traditional codebook model, and the proposed model performs consistently.
Abstract: This paper studies automatic image classification by modeling soft assignment in the popular codebook model. The codebook model describes an image as a bag of discrete visual words selected from a vocabulary, where the frequency distributions of visual words in an image allow classification. One inherent component of the codebook model is the assignment of discrete visual words to continuous image features. Despite the clear mismatch of this hard assignment with the nature of continuous features, the approach has been successfully applied for some years. In this paper, we investigate four types of soft assignment of visual words to image features. We demonstrate that explicitly modeling visual word assignment ambiguity improves classification performance compared to the hard assignment of the traditional codebook model. The traditional codebook model is compared against our method for five well-known data sets: 15 natural scenes, Caltech-101, Caltech-256, and Pascal VOC 2007/2008. We demonstrate that large codebook vocabulary sizes completely deteriorate the performance of the traditional model, whereas the proposed model performs consistently. Moreover, we show that our method profits in high-dimensional feature spaces and reaps higher benefits when increasing the number of image categories.

854 citations


Proceedings ArticleDOI
25 Oct 2010
TL;DR: This work investigates and develops methods to extract and combine low-level features that represent the emotional content of an image, and uses these for image emotion classification.
Abstract: Images can affect people on an emotional level. Since the emotions that arise in the viewer of an image are highly subjective, they are rarely indexed. However there are situations when it would be helpful if images could be retrieved based on their emotional content. We investigate and develop methods to extract and combine low-level features that represent the emotional content of an image, and use these for image emotion classification. Specifically, we exploit theoretical and empirical concepts from psychology and art theory to extract image features that are specific to the domain of artworks with emotional expression. For testing and training, we use three data sets: the International Affective Picture System (IAPS); a set of artistic photography from a photo sharing site (to investigate whether the conscious use of colors and textures displayed by the artists improves the classification); and a set of peer rated abstract paintings to investigate the influence of the features and ratings on pictures without contextual content. Improved classification results are obtained on the International Affective Picture System (IAPS), compared to state of the art work.

734 citations


Journal ArticleDOI
TL;DR: This paper shows that formulating the problem in a naive Bayesian classification framework makes such preprocessing unnecessary and produces an algorithm that is simple, efficient, and robust, and it scales well as the number of classes grows.
Abstract: While feature point recognition is a key component of modern approaches to object detection, existing approaches require computationally expensive patch preprocessing to handle perspective distortion. In this paper, we show that formulating the problem in a naive Bayesian classification framework makes such preprocessing unnecessary and produces an algorithm that is simple, efficient, and robust. Furthermore, it scales well as the number of classes grows. To recognize the patches surrounding keypoints, our classifier uses hundreds of simple binary features and models class posterior probabilities. We make the problem computationally tractable by assuming independence between arbitrary sets of features. Even though this is not strictly true, we demonstrate that our classifier nevertheless performs remarkably well on image data sets containing very significant perspective changes.

726 citations


Journal ArticleDOI
TL;DR: The classification maps obtained by considering different APs result in a better description of the scene than those obtained with an MP, and the usefulness of APs in modeling the spatial information present in the images is proved.
Abstract: Morphological attribute profiles (APs) are defined as a generalization of the recently proposed morphological profiles (MPs). APs provide a multilevel characterization of an image created by the sequential application of morphological attribute filters that can be used to model different kinds of the structural information. According to the type of the attributes considered in the morphological attribute transformation, different parametric features can be modeled. The generation of APs, thanks to an efficient implementation, strongly reduces the computational load required for the computation of conventional MPs. Moreover, the characterization of the image with different attributes leads to a more complete description of the scene and to a more accurate modeling of the spatial information than with the use of conventional morphological filters based on a predefined structuring element. Here, the features extracted by the proposed operators were used for the classification of two very high resolution panchromatic images acquired by Quickbird on the city of Trento, Italy. The experimental analysis proved the usefulness of APs in modeling the spatial information present in the images. The classification maps obtained by considering different APs result in a better description of the scene (both in terms of thematic and geometric accuracy) than those obtained with an MP.

721 citations


Journal ArticleDOI
TL;DR: A novel method for accurate spectral-spatial classification of hyperspectral images by means of a Markov random field regularization is presented, which improves classification accuracies when compared to other classification approaches.
Abstract: The high number of spectral bands acquired by hyperspectral sensors increases the capability to distinguish physical materials and objects, presenting new challenges to image analysis and classification. This letter presents a novel method for accurate spectral-spatial classification of hyperspectral images. The proposed technique consists of two steps. In the first step, a probabilistic support vector machine pixelwise classification of the hyperspectral image is applied. In the second step, spatial contextual information is used for refining the classification results obtained in the first step. This is achieved by means of a Markov random field regularization. Experimental results are presented for three hyperspectral airborne images and compared with those obtained by recently proposed advanced spectral-spatial classification techniques. The proposed method improves classification accuracies when compared to other classification approaches.

697 citations


Book ChapterDOI
05 Sep 2010
TL;DR: In this article, the authors proposed a new framework for image classification using local visual descriptors, which performs a nonlinear feature transformation on descriptors and aggregates the results together to form image-level representations, and finally applies a classification model.
Abstract: This paper introduces a new framework for image classification using local visual descriptors. The pipeline first performs a non-linear feature transformation on descriptors, then aggregates the results together to form image-level representations, and finally applies a classification model. For all the three steps we suggest novel solutions which make our approach appealing in theory, more scalable in computation, and transparent in classification. Our experiments demonstrate that the proposed classification method achieves state-of-the-art accuracy on the well-known PASCAL benchmarks.

559 citations


Book ChapterDOI
05 Sep 2010
TL;DR: A study of large scale categorization including a series of challenging experiments on classification with more than 10,000 image classes finds that computational issues become crucial in algorithm design and conventional wisdom from a couple of hundred image categories does not necessarily hold when the number of categories increases.
Abstract: Image classification is a critical task for both humans and computers. One of the challenges lies in the large scale of the semantic space. In particular, humans can recognize tens of thousands of object classes and scenes. No computer vision algorithm today has been tested at this scale. This paper presents a study of large scale categorization including a series of challenging experiments on classification with more than 10, 000 image classes. We find that a) computational issues become crucial in algorithm design; b) conventional wisdom from a couple of hundred image categories on relative performance of different classifiers does not necessarily hold when the number of categories increases; c) there is a surprisingly strong relationship between the structure of WordNet (developed for studying language) and the difficulty of visual categorization; d) classification can be improved by exploiting the semantic hierarchy. Toward the future goal of developing automatic vision algorithms to recognize tens of thousands or even millions of image categories, we make a series of observations and arguments about dataset scale, category density, and image hierarchy.

559 citations


Journal ArticleDOI
TL;DR: The proposed approach can provide classification accuracies that are similar or higher than those achieved by other supervised methods for the considered scenes, and indicates that the use of a spatial prior can greatly improve the final results with respect to a case in which only the learned class densities are considered.
Abstract: This paper presents a new semisupervised segmentation algorithm, suited to high-dimensional data, of which remotely sensed hyperspectral image data sets are an example. The algorithm implements two main steps: 1) semisupervised learning of the posterior class distributions followed by 2) segmentation, which infers an image of class labels from a posterior distribution built on the learned class distributions and on a Markov random field. The posterior class distributions are modeled using multinomial logistic regression, where the regressors are learned using both labeled and, through a graph-based technique, unlabeled samples. Such unlabeled samples are actively selected based on the entropy of the corresponding class label. The prior on the image of labels is a multilevel logistic model, which enforces segmentation results in which neighboring labels belong to the same class. The maximum a posteriori segmentation is computed by the α-expansion min-cut-based integer optimization algorithm. Our experimental results, conducted using synthetic and real hyperspectral image data sets collected by the Airborne Visible/Infrared Imaging Spectrometer system of the National Aeronautics and Space Administration Jet Propulsion Laboratory over the regions of Indian Pines, IN, and Salinas Valley, CA, reveal that the proposed approach can provide classification accuracies that are similar or higher than those achieved by other supervised methods for the considered scenes. Our results also indicate that the use of a spatial prior can greatly improve the final results with respect to a case in which only the learned class densities are considered, confirming the importance of jointly considering spatial and spectral information in hyperspectral image segmentation.

Journal ArticleDOI
TL;DR: The results show that the novel variant named elongated quinary patterns (EQP) is a very performing method among those proposed in this work for extracting information from a texture in all the tested datasets.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A framework for supervised similarity learning based on embedding the input data from two arbitrary spaces into the Hamming space is proposed and the utility and efficiency of such a generic approach is demonstrated on several challenging applications including cross-representation shape retrieval and alignment of multi-modal medical images.
Abstract: Visual understanding is often based on measuring similarity between observations. Learning similarities specific to a certain perception task from a set of examples has been shown advantageous in various computer vision and pattern recognition problems. In many important applications, the data that one needs to compare come from different representations or modalities, and the similarity between such data operates on objects that may have different and often incommensurable structure and dimensionality. In this paper, we propose a framework for supervised similarity learning based on embedding the input data from two arbitrary spaces into the Hamming space. The mapping is expressed as a binary classification problem with positive and negative examples, and can be efficiently learned using boosting algorithms. The utility and efficiency of such a generic approach is demonstrated on several challenging applications including cross-representation shape retrieval and alignment of multi-modal medical images.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: This work considers a scenario where keywords are associated with the training images, e.g. as found on photo sharing websites, and learns a strong Multiple Kernel Learning (MKL) classifier using both the image content and keywords, and uses it to score unlabeled images.
Abstract: In image categorization the goal is to decide if an image belongs to a certain category or not. A binary classifier can be learned from manually labeled images; while using more labeled examples improves performance, obtaining the image labels is a time consuming process. We are interested in how other sources of information can aid the learning process given a fixed amount of labeled images. In particular, we consider a scenario where keywords are associated with the training images, e.g. as found on photo sharing websites. The goal is to learn a classifier for images alone, but we will use the keywords associated with labeled and unlabeled images to improve the classifier using semi-supervised learning. We first learn a strong Multiple Kernel Learning (MKL) classifier using both the image content and keywords, and use it to score unlabeled images. We then learn classifiers on visual features only, either support vector machines (SVM) or least-squares regression (LSR), from the MKL output values on both the labeled and unlabeled images. In our experiments on 20 classes from the PASCAL VOC'07 set and 38 from the MIR Flickr set, we demonstrate the benefit of our semi-supervised approach over only using the labeled images. We also present results for a scenario where we do not use any manual labeling but directly learn classifiers from the image tags. The semi-supervised approach also improves classification accuracy in this case.

Proceedings Article
06 Dec 2010
TL;DR: A model based on a Boltzmann machine with third-order connections that can learn how to accumulate information about a shape over several fixations is described, showing that it can perform at least as well as a model trained on whole images.
Abstract: We describe a model based on a Boltzmann machine with third-order connections that can learn how to accumulate information about a shape over several fixations. The model uses a retina that only has enough high resolution pixels to cover a small area of the image, so it must decide on a sequence of fixations and it must combine the "glimpse" at each fixation with the location of the fixation before integrating the information with information from other glimpses of the same object. We evaluate this model on a synthetic dataset and two image classification datasets, showing that it can perform at least as well as a model trained on whole images.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: Experiments show that the supervised dictionary improves the performance of the proposed model significantly over the unsupervised dictionary, leading to state-of-the-art performance on diverse image databases and implying its great potential in handling large scale datasets in real applications.
Abstract: In this paper, we propose a novel supervised hierarchical sparse coding model based on local image descriptors for classification tasks. The supervised dictionary training is performed via back-projection, by minimizing the training error of classifying the image level features, which are extracted by max pooling over the sparse codes within a spatial pyramid. Such a max pooling procedure across multiple spatial scales offer the model translation invariant properties, similar to the Convolutional Neural Network (CNN). Experiments show that our supervised dictionary improves the performance of the proposed model significantly over the unsupervised dictionary, leading to state-of-the-art performance on diverse image databases. Further more, our supervised model targets learning linear features, implying its great potential in handling large scale datasets in real applications.

Book ChapterDOI
05 Sep 2010
TL;DR: KSR is essentially the sparse coding technique in a high dimensional feature space mapped by implicit mapping function that outperforms sparse coding and EMK, and achieves state-of-the-art performance for image classification and face recognition on publicly available datasets.
Abstract: Recent research has shown the effectiveness of using sparse coding(Sc) to solve many computer vision problems. Motivated by the fact that kernel trick can capture the nonlinear similarity of features, which may reduce the feature quantization error and boost the sparse coding performance, we propose Kernel Sparse Representation(KSR). KSR is essentially the sparse coding technique in a high dimensional feature space mapped by implicit mapping function. We apply KSR to both image classification and face recognition. By incorporating KSR into Spatial Pyramid Matching(SPM), we propose KSRSPM for image classification. KSRSPM can further reduce the information loss in feature quantization step compared with Spatial Pyramid Matching using Sparse Coding(ScSPM). KSRSPM can be both regarded as the generalization of Efficient Match Kernel(EMK) and an extension of ScSPM. Compared with sparse coding, KSR can learn more discriminative sparse codes for face recognition. Extensive experimental results show that KSR outperforms sparse coding and EMK, and achieves state-of-the-art performance for image classification and face recognition on publicly available datasets.

Proceedings Article
06 Dec 2010
TL;DR: This work highlights the kernel view of orientation histograms, and shows that they are equivalent to a certain type of match kernels over image patches, and designs a family of kernel descriptors which provide a unified and principled framework to turn pixel attributes into compact patch-level features.
Abstract: The design of low-level image features is critical for computer vision algorithms. Orientation histograms, such as those in SIFT [16] and HOG [3], are the most successful and popular features for visual object and scene recognition. We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches. This novel view allows us to design a family of kernel descriptors which provide a unified and principled framework to turn pixel attributes (gradient, color, local binary pattern, etc.) into compact patch-level features. In particular, we introduce three types of match kernels to measure similarities between image patches, and construct compact low-dimensional kernel descriptors from these match kernels using kernel principal component analysis (KPCA) [23]. Kernel descriptors are easy to design and can turn any type of pixel attribute into patch-level features. They outperform carefully tuned and sophisticated features including SIFT and deep belief networks. We report superior performance on standard image classification benchmarks: Scene-15, Caltech-101, CIFAR10 and CIFAR10-ImageNet.

Proceedings ArticleDOI
21 Jun 2010
TL;DR: A fast method for segmentation of large-size long-range 3D point clouds that especially lends itself for later classification of objects that requires less runtime while at the same time yielding segmentation results that are better suited forLater classification of the identified objects.
Abstract: This paper describes a fast method for segmentation of large-size long-range 3D point clouds that especially lends itself for later classification of objects. Our approach is targeted at high-speed autonomous ground robot mobility, so real-time performance of the segmentation method plays a critical role. This is especially true as segmentation is considered only a necessary preliminary for the more important task of object classification that is itself computationally very demanding. Efficiency is achieved in our approach by splitting the segmentation problem into two simpler subproblems of lower complexity: local ground plane estimation followed by fast 2D connected components labeling. The method's performance is evaluated on real data acquired in different outdoor scenes, and the results are compared to those of existing methods. We show that our method requires less runtime while at the same time yielding segmentation results that are better suited for later classification of the identified objects.

Journal ArticleDOI
TL;DR: Experimental results of SDSOI on a large image set captured by optical sensors from multiple satellites show that the approach is effective in distinguishing between ships and nonships, and obtains a satisfactory ship detection performance.
Abstract: Ship detection from remote sensing imagery is very important, with a wide array of applications in areas such as fishery management, vessel traffic services, and naval warfare. This paper focuses on the issue of ship detection from spaceborne optical images (SDSOI). Although advantages of synthetic-aperture radar (SAR) result in that most of current ship detection approaches are based on SAR images, disadvantages of SAR still exist, such as the limited number of SAR sensors, the relatively long revisit cycle, and the relatively lower resolution. With the increasing number of and the resulting improvement in continuous coverage of the optical sensors, SDSOI can partly overcome the shortcomings of SAR-based approaches and should be investigated to help satisfy the requirements of real-time ship monitoring. In SDSOI, several factors such as clouds, ocean waves, and small islands affect the performance of ship detection. This paper proposes a novel hierarchical complete and operational SDSOI approach based on shape and texture features, which is considered a sequential coarse-to-fine elimination process of false alarms. First, simple shape analysis is adopted to eliminate evident false candidates generated by image segmentation with global and local information and to extract ship candidates with missing alarms as low as possible. Second, a novel semisupervised hierarchical classification approach based on various features is presented to distinguish between ships and nonships to remove most false alarms. Besides a complete and operational SDSOI approach, the other contributions of our approach include the following three aspects: 1) it classifies ship candidates by using their class probability distributions rather than the direct extracted features; 2) the relevant classes are automatically built by the samples' appearances and their feature attribute in a semisupervised mode; and 3) besides commonly used shape and texture features, a new texture operator, i.e., local multiple patterns, is introduced to enhance the representation ability of the feature set in feature extraction. Experimental results of SDSOI on a large image set captured by optical sensors from multiple satellites show that our approach is effective in distinguishing between ships and nonships, and obtains a satisfactory ship detection performance.

Journal ArticleDOI
TL;DR: The results based on a QuickBird satellite image indicate that segmentation accuracies decrease with increasing segmentation scales and the negative impacts of under-segmentation errors become significantly large at large scales.
Abstract: The advantages of object-based classification over the traditional pixel-based approach are well documented. However, the potential limitations of object-based classification remain less explored. In this letter, we assess the advantages and limitations of an object-based approach to remote sensing image classification relative to a pixel-based approach. We first quantified the negative impacts of under-segmentation errors on the potential accuracy of object-based classification by developing a new segmentation accuracy measure. Then we evaluated the advantages and limitations of object-based classification by quantifying their overall effects relative to pixel-based classification, with respect to their classification units and features at multiple segmentation scales. The results based on a QuickBird satellite image indicate that (1) segmentation accuracies decrease with increasing segmentation scales and the negative impacts of under-segmentation errors become significantly large at large scales and (2...

Proceedings ArticleDOI
03 Dec 2010
TL;DR: An SRC oriented unsupervised MFL algorithm is proposed in this paper and the experimental results on benchmark face databases demonstrated the improvements brought by the proposed M FL algorithm over original SRC.
Abstract: Face recognition (FR) is an active yet challenging topic in computer vision applications. As a powerful tool to represent high dimensional data, recently sparse representation based classification (SRC) has been successfully used for FR. This paper discusses the metaface learning (MFL) of face images under the framework of SRC. Although directly using the training samples as dictionary bases can achieve good FR performance, a well learned dictionary matrix can lead to higher FR rate with less dictionary atoms. An SRC oriented unsupervised MFL algorithm is proposed in this paper and the experimental results on benchmark face databases demonstrated the improvements brought by the proposed MFL algorithm over original SRC.

Proceedings ArticleDOI
03 May 2010
TL;DR: Experiments show that the interest points in conjunction with a boosted patch classifier are significantly better in detecting body parts in depth images than state-of-the-art sliding-window based detectors.
Abstract: We deal with the problem of detecting and identifying body parts in depth images at video frame rates. Our solution involves a novel interest point detector for mesh and range data that is particularly well suited for analyzing human shape. The interest points, which are based on identifying geodesic extrema on the surface mesh, coincide with salient points of the body, which can be classified as, e.g., hand, foot or head using local shape descriptors. Our approach also provides a natural way of estimating a 3D orientation vector for a given interest point. This can be used to normalize the local shape descriptors to simplify the classification problem as well as to directly estimate the orientation of body parts in space. Experiments involving ground truth labels acquired via an active motion capture system show that our interest points in conjunction with a boosted patch classifier are significantly better in detecting body parts in depth images than state-of-the-art sliding-window based detectors.

Journal ArticleDOI
TL;DR: The proposed approach gives rise to an operational classifier, as opposed to previously presented transductive or Laplacian support vector machines (TSVM or LapSVM, respectively), which constitutes a general framework for building computationally efficient semisupervised methods.
Abstract: A framework for semisupervised remote sensing image classification based on neural networks is presented. The methodology consists of adding a flexible embedding regularizer to the loss function used for training neural networks. Training is done using stochastic gradient descent with additional balancing constraints to avoid falling into local minima. The method constitutes a generalization of both supervised and unsupervised methods and can handle millions of unlabeled samples. Therefore, the proposed approach gives rise to an operational classifier, as opposed to previously presented transductive or Laplacian support vector machines (TSVM or LapSVM, respectively). The proposed methodology constitutes a general framework for building computationally efficient semisupervised methods. The method is compared with LapSVM and TSVM in semisupervised scenarios, to SVM in supervised settings, and to online and batch k-means for unsupervised learning. Results demonstrate the improved classification accuracy and scalability of this approach on several hyperspectral image classification problems.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: Experimental results on challenging real-world datasets show that the feature combination capability of the proposed algorithm is competitive to the state-of-the-art multiple kernel learning methods.
Abstract: We address the problem of computing joint sparse representation of visual signal across multiple kernel-based representations. Such a problem arises naturally in supervised visual recognition applications where one aims to reconstruct a test sample with multiple features from as few training subjects as possible. We cast the linear version of this problem into a multi-task joint covariate selection model [15], which can be very efficiently optimized via ker-nelizable accelerated proximal gradient method. Furthermore, two kernel-view extensions of this method are provided to handle the situations where descriptors and similarity functions are in the form of kernel matrices. We then investigate into two applications of our algorithm to feature combination: 1) fusing gray-level and LBP features for face recognition, and 2) combining multiple kernels for object categorization. Experimental results on challenging real-world datasets show that the feature combination capability of our proposed algorithm is competitive to the state-of-the-art multiple kernel learning methods.

Journal ArticleDOI
01 Sep 2010
TL;DR: In this paper, a 41-D feature vector is constructed for each pixel in the field of view of the image, encoding information on the local intensity structure, spatial properties, and geometry at multiple scales.
Abstract: This paper presents a method for automated vessel segmentation in retinal images. For each pixel in the field of view of the image, a 41-D feature vector is constructed, encoding information on the local intensity structure, spatial properties, and geometry at multiple scales. An AdaBoost classifier is trained on 789 914 gold standard examples of vessel and nonvessel pixels, then used for classifying previously unseen images. The algorithm was tested on the public digital retinal images for vessel extraction (DRIVE) set, frequently used in the literature and consisting of 40 manually labeled images with gold standard. Results were compared experimentally with those of eight algorithms as well as the additional manual segmentation provided by DRIVE. Training was conducted confined to the dedicated training set from the DRIVE database, and feature-based AdaBoost classifier (FABC) was tested on the 20 images from the test set. FABC achieved an area under the receiver operating characteristic (ROC) curve of 0.9561, in line with state-of-the-art approaches, but outperforming their accuracy (0.9597 versus 0.9473 for the nearest performer).

Journal ArticleDOI
TL;DR: Experimental results clearly demonstrate that the generation of an SVM-based classifier system with RFS significantly improves overall classification accuracy as well as producer's and user's accuracies.
Abstract: The accuracy of supervised land cover classifications depends on factors such as the chosen classification algorithm, adequate training data, the input data characteristics, and the selection of features. Hyperspectral imaging provides more detailed spectral and spatial information on the land cover than other remote sensing resources. Over the past ten years, traditional and formerly widely accepted statistical classification methods have been superseded by more recent machine learning algorithms, e.g., support vector machines (SVMs), or by multiple classifier systems (MCS). This can be explained by limitations of statistical approaches with regard to high-dimensional data, multimodal classes, and often limited availability of training data. In the presented study, MCSs based on SVM and random feature selection (RFS) are applied to explore the potential of a synergetic use of the two concepts. We investigated how the number of selected features and the size of the MCS influence classification accuracy using two hyperspectral data sets, from different environmental settings. In addition, experiments were conducted with a varying number of training samples. Accuracies are compared with regular SVM and random forests. Experimental results clearly demonstrate that the generation of an SVM-based classifier system with RFS significantly improves overall classification accuracy as well as producer's and user's accuracies. In addition, the ensemble strategy results in smoother, i.e., more realistic, classification maps than those from stand-alone SVM. Findings from the experiments were successfully transferred onto an additional hyperspectral data set.

Journal ArticleDOI
01 Oct 2010
TL;DR: The proposed scheme improves classification accuracies, when compared to previously proposed classification techniques, and provides accurate segmentation and classification maps.
Abstract: A new method for segmentation and classification of hyperspectral images is proposed. The method is based on the construction of a minimum spanning forest (MSF) from region markers. Markers are defined automatically from classification results. For this purpose, pixelwise classification is performed, and the most reliable classified pixels are chosen as markers. Each classification-derived marker is associated with a class label. Each tree in the MSF grown from a marker forms a region in the segmentation map. By assigning a class of each marker to all the pixels within the region grown from this marker, a spectral-spatial classification map is obtained. Furthermore, the classification map is refined using the results of a pixelwise classification and a majority voting within the spatially connected regions. Experimental results are presented for three hyperspectral airborne images. The use of different dissimilarity measures for the construction of the MSF is investigated. The proposed scheme improves classification accuracies, when compared to previously proposed classification techniques, and provides accurate segmentation and classification maps.

Proceedings ArticleDOI
03 Dec 2010
TL;DR: This work conducts a controlled online search to collect frontal face images of 150 pairs of public figures and celebrities, along with images of their parents or children, and proposes and evaluates a set of low-level image features for the challenge of kinship verification.
Abstract: We tackle the challenge of kinship verification using novel feature extraction and selection methods, automatically classifying pairs of face images as “related” or “unrelated” (in terms of kinship). First, we conducted a controlled online search to collect frontal face images of 150 pairs of public figures and celebrities, along with images of their parents or children. Next, we propose and evaluate a set of low-level image features for this classification problem. After selecting the most discriminative inherited facial features, we demonstrate a classification accuracy of 70.67% on a test set of image pairs using K-Nearest-Neighbors. Finally, we present an evaluation of human performance on this problem.

Journal ArticleDOI
TL;DR: A shape-based, hierarchical part-template matching approach to simultaneous human detection and segmentation combining local part-based and global shape-template-based schemes is proposed.
Abstract: We propose a shape-based, hierarchical part-template matching approach to simultaneous human detection and segmentation combining local part-based and global shape-template-based schemes. The approach relies on the key idea of matching a part-template tree to images hierarchically to detect humans and estimate their poses. For learning a generic human detector, a pose-adaptive feature computation scheme is developed based on a tree matching approach. Instead of traditional concatenation-style image location-based feature encoding, we extract features adaptively in the context of human poses and train a kernel-SVM classifier to separate human/nonhuman patterns. Specifically, the features are collected in the local context of poses by tracing around the estimated shape boundaries. We also introduce an approach to multiple occluded human detection and segmentation based on an iterative occlusion compensation scheme. The output of our learned generic human detector can be used as an initial set of human hypotheses for the iterative optimization. We evaluate our approaches on three public pedestrian data sets (INRIA, MIT-CBCL, and USC-B) and two crowded sequences from Caviar Benchmark and Munich Airport data sets.