scispace - formally typeset
Search or ask a question

Showing papers on "Feature vector published in 2008"


Book ChapterDOI
20 Oct 2008
TL;DR: It is shown how both an object class specific representation and a discriminative recognition model can be learned using the AdaBoost algorithm, which allows many different kinds of simple features to be combined into a single similarity function.
Abstract: Viewpoint invariant pedestrian recognition is an important yet under-addressed problem in computer vision. This is likely due to the difficulty in matching two objects with unknown viewpoint and pose. This paper presents a method of performing viewpoint invariant pedestrian recognition using an efficiently and intelligently designed object representation, the ensemble of localized features (ELF). Instead of designing a specific feature by hand to solve the problem, we define a feature space using our intuition about the problem and let a machine learning algorithm find the best representation. We show how both an object class specific representation and a discriminative recognition model can be learned using the AdaBoost algorithm. This approach allows many different kinds of simple features to be combined into a single similarity function. The method is evaluated using a viewpoint invariant pedestrian recognition dataset and the results are shown to be superior to all previous benchmarks for both recognition and reacquisition of pedestrians.

1,554 citations


Journal ArticleDOI
TL;DR: An approach has been proposed which is based on using several principal components from the hyperspectral data and build morphological profiles which can be used all together in one extended morphological profile for classification of urban structures.
Abstract: A method is proposed for the classification of urban hyperspectral data with high spatial resolution. The approach is an extension of previous approaches and uses both the spatial and spectral information for classification. One previous approach is based on using several principal components (PCs) from the hyperspectral data and building several morphological profiles (MPs). These profiles can be used all together in one extended MP. A shortcoming of that approach is that it was primarily designed for classification of urban structures and it does not fully utilize the spectral information in the data. Similarly, the commonly used pixelwise classification of hyperspectral data is solely based on the spectral content and lacks information on the structure of the features in the image. The proposed method overcomes these problems and is based on the fusion of the morphological information and the original hyperspectral data, i.e., the two vectors of attributes are concatenated into one feature vector. After a reduction of the dimensionality, the final classification is achieved by using a support vector machine classifier. The proposed approach is tested in experiments on ROSIS data from urban areas. Significant improvements are achieved in terms of accuracies when compared to results obtained for approaches based on the use of MPs based on PCs only and conventional spectral classification. For instance, with one data set, the overall accuracy is increased from 79% to 83% without any feature reduction and to 87% with feature reduction. The proposed approach also shows excellent results with a limited training set.

1,092 citations


Proceedings Article
13 Jul 2008
TL;DR: A new dimensionality reduction method is proposed to find a latent space, which minimizes the distance between distributions of the data in different domains in a latentspace, which can be treated as a bridge of transferring knowledge from the source domain to the target domain.
Abstract: Transfer learning addresses the problem of how to utilize plenty of labeled data in a source domain to solve related but different problems in a target domain, even when the training and testing problems have different distributions or features. In this paper, we consider transfer learning via dimensionality reduction. To solve this problem, we learn a low-dimensional latent feature space where the distributions between the source domain data and the target domain data are the same or close to each other. Onto this latent feature space, we project the data in related domains where we can apply standard learning algorithms to train classification or regression models. Thus, the latent feature space can be treated as a bridge of transferring knowledge from the source domain to the target domain. The main contribution of our work is that we propose a new dimensionality reduction method to find a latent space, which minimizes the distance between distributions of the data in different domains in a latent space. The effectiveness of our approach to transfer learning is verified by experiments in two real world applications: indoor WiFi localization and binary text classification.

640 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: This work introduces a novel approach to object categorization that incorporates two types of context-co-occurrence and relative location - with local appearance-based features and uses a conditional random field (CRF) to maximize object label agreement according to both semantic and spatial relevance.
Abstract: In this work we introduce a novel approach to object categorization that incorporates two types of context-co-occurrence and relative location - with local appearance-based features. Our approach, named CoLA (for co-occurrence, location and appearance), uses a conditional random field (CRF) to maximize object label agreement according to both semantic and spatial relevance. We model relative location between objects using simple pairwise features. By vector quantizing this feature space, we learn a small set of prototypical spatial relationships directly from the data. We evaluate our results on two challenging datasets: PASCAL 2007 and MSRC. The results show that combining co-occurrence and spatial context improves accuracy in as many as half of the categories compared to using co-occurrence alone.

558 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed Feature Annealed Independence Rules (FAIR) to select a subset of important features for high-dimensional classification, and the conditions under which all the important features can be selected by the two-sample t-statistic are established.
Abstract: Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.

542 citations


Book ChapterDOI
04 Dec 2008
TL;DR: A repository of graph data sets and corresponding benchmarks, covering a wide spectrum of different applications is introduced, to make the different approaches in graph based machine learning better comparable.
Abstract: In recent years the use of graph based representation has gained popularity in pattern recognition and machine learning. As a matter of fact, object representation by means of graphs has a number of advantages over feature vectors. Therefore, various algorithms for graph based machine learning have been proposed in the literature. However, in contrast with the emerging interest in graph based representation, a lack of standardized graph data sets for benchmarking can be observed. Common practice is that researchers use their own data sets, and this behavior cumbers the objective evaluation of the proposed methods. In order to make the different approaches in graph based machine learning better comparable, the present paper aims at introducing a repository of graph data sets and corresponding benchmarks, covering a wide spectrum of different applications.

484 citations


Journal ArticleDOI
TL;DR: An overview of the SVM, both one-class and two-class SVM methods, is first presented followed by its use in landslide susceptibility mapping, where it is concluded that two- class SVM possesses better prediction efficiency than logistic regression and one- Class SVM.

450 citations


Journal ArticleDOI
TL;DR: A general framework based on kernel methods for the integration of heterogeneous sources of information for multitemporal classification of remote sensing images and the development of nonlinear kernel classifiers for the well-known difference and ratioing change detection methods is presented.
Abstract: The multitemporal classification of remote sensing images is a challenging problem, in which the efficient combination of different sources of information (e.g., temporal, contextual, or multisensor) can improve the results. In this paper, we present a general framework based on kernel methods for the integration of heterogeneous sources of information. Using the theoretical principles in this framework, three main contributions are presented. First, a novel family of kernel-based methods for multitemporal classification of remote sensing images is presented. The second contribution is the development of nonlinear kernel classifiers for the well-known difference and ratioing change detection methods by formulating them in an adequate high-dimensional feature space. Finally, the presented methodology allows the integration of contextual information and multisensor images with different levels of nonlinear sophistication. The binary support vector (SV) classifier and the one-class SV domain description classifier are evaluated by using both linear and nonlinear kernel functions. Good performance on synthetic and real multitemporal classification scenarios illustrates the generalization of the framework and the capabilities of the proposed algorithms.

355 citations


Proceedings Article
13 Jul 2008
TL;DR: A novel algorithm is proposed to efficiently find the global optimal feature subset such that the subset-level score is maximized, and extensive experiments demonstrate the effectiveness of the proposed algorithm in comparison with the traditional methods for feature selection.
Abstract: Fisher score and Laplacian score are two popular feature selection algorithms, both of which belong to the general graph-based feature selection framework. In this framework, a feature subset is selected based on the corresponding score (subset-level score), which is calculated in a trace ratio form. Since the number of all possible feature subsets is very huge, it is often prohibitively expensive in computational cost to search in a brute force manner for the feature subset with the maximum subset-level score. Instead of calculating the scores of all the feature subsets, traditional methods calculate the score for each feature, and then select the leading features based on the rank of these feature-level scores. However, selecting the feature subset based on the feature-level score cannot guarantee the optimum of the subset-level score. In this paper, we directly optimize the subset-level score, and propose a novel algorithm to efficiently find the global optimal feature subset such that the subset-level score is maximized. Extensive experiments demonstrate the effectiveness of our proposed algorithm in comparison with the traditional methods for feature selection.

343 citations


Proceedings Article
08 Dec 2008
TL;DR: Through experiments on the text-aided image classification and cross-language classification tasks, it is demonstrated that the translated learning framework can greatly outperform many state-of-the-art baseline methods.
Abstract: This paper investigates a new machine learning strategy called translated learning. Unlike many previous learning tasks, we focus on how to use labeled data from one feature space to enhance the classification of other entirely different learning spaces. For example, we might wish to use labeled text data to help learn a model for classifying image data, when the labeled images are difficult to obtain. An important aspect of translated learning is to build a "bridge" to link one feature space (known as the "source space") to another space (known as the "target space") through a translator in order to migrate the knowledge from source to target. The translated learning solution uses a language model to link the class labels to the features in the source spaces, which in turn is translated to the features in the target spaces. Finally, this chain of linkages is completed by tracing back to the instances in the target spaces. We show that this path of linkage can be modeled using a Markov chain and risk minimization. Through experiments on the text-aided image classification and cross-language classification tasks, we demonstrate that our translated learning framework can greatly outperform many state-of-the-art baseline methods.

305 citations


Journal ArticleDOI
TL;DR: A method for the detection of double JPEG compression and a maximum-likelihood estimator of the primary quality factor are presented, essential for construction of accurate targeted and blind steganalysis methods for JPEG images.
Abstract: This paper presents a method for the detection of double JPEG compression and a maximum-likelihood estimator of the primary quality factor. These methods are essential for construction of accurate targeted and blind steganalysis methods for JPEG images. The proposed methods use support vector machine classifiers with feature vectors formed by histograms of low-frequency discrete cosine transformation coefficients. The performance of the algorithms is compared to selected prior art.

Journal ArticleDOI
TL;DR: This research involves the study and implementation of a new pattern recognition technique introduced within the framework of statistical learning theory called Support Vector Machines (SVMs), and its application to remote‐sensing image classification.
Abstract: Land use classification is an important part of many remote sensing applications. A lot of research has gone into the application of statistical and neural network classifiers to remote-sensing images. This research involves the study and implementation of a new pattern recognition technique introduced within the framework of statistical learning theory called Support Vector Machines (SVMs), and its application to remote-sensing image classification. Standard classifiers such as Artificial Neural Network (ANN) need a number of training samples that exponentially increase with the dimension of the input feature space. With a limited number of training samples, the classification rate thus decreases as the dimensionality increases. SVMs are independent of the dimensionality of feature space as the main idea behind this classification technique is to separate the classes with a surface that maximizes the margin between them, using boundary pixels to create the decision surface. Results from SVMs are compared with traditional Maximum Likelihood Classification (MLC) and an ANN classifier. The findings suggest that the ANN and SVM classifiers perform better than the traditional MLC. The SVM and the ANN show comparable results. However, accuracy is dependent on factors such as the number of hidden nodes (in the case of ANN) and kernel parameters (in the case of SVM). The training time taken by the SVM is several magnitudes less.

Journal ArticleDOI
TL;DR: A novel minutiae-based fingerprint matching algorithm that ranks 1st on DB3, the most difficult database in FVC2002, and on the average ranks 2nd on all 4 databases.

Proceedings ArticleDOI
23 Oct 2008
TL;DR: The evaluation indicates that the proposed online-update methods are accurate in approximating a full retrain of a RKMF model while the runtime of online-updating is in the range of milliseconds even for huge datasets like Netflix.
Abstract: Regularized matrix factorization models are known to generate high quality rating predictions for recommender systems. One of the major drawbacks of matrix factorization is that once computed, the model is static. For real-world applications dynamic updating a model is one of the most important tasks. Especially when ratings on new users or new items come in, updating the feature matrices is crucial.In this paper, we generalize regularized matrix factorization (RMF) to regularized kernel matrix factorization (RKMF). Kernels provide a flexible method for deriving new matrix factorization methods. Furthermore with kernels nonlinear interactions between feature vectors are possible. We propose a generic method for learning RKMF models. From this method we derive an online-update algorithm for RKMF models that allows to solve the new-user/new-item problem. Our evaluation indicates that our proposed online-update methods are accurate in approximating a full retrain of a RKMF model while the runtime of online-updating is in the range of milliseconds even for huge datasets like Netflix.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed method yields higher retrieval accuracy than some conventional methods even though its feature vector dimension is not higher than those of the latter for six test DBs.
Abstract: In this paper, we propose a content-based image retrieval method based on an efficient combination of multiresolution color and texture features. As its color features, color autocorrelo- grams of the hue and saturation component images in HSV color space are used. As its texture features, BDIP and BVLC moments of the value component image are adopted. The color and texture features are extracted in multiresolution wavelet domain and combined. The dimension of the combined feature vector is determined at a point where the retrieval accuracy becomes saturated. Experimental results show that the proposed method yields higher retrieval accuracy than some conventional methods even though its feature vector dimension is not higher than those of the latter for six test DBs. Especially, it demonstrates more excellent retrieval accuracy for queries and target images of various resolutions. In addition, the proposed method almost always shows performance gain in precision versus recall and in ANMRR over the other methods.

31 Dec 2008
TL;DR: This study indicates that linear SVMs with simple feature rankings are eective on data sets in the Causality Challenge, and shows that a feature ranking using weights from linear SVM models yields good performances, even when the training and testing data are not identically distributed.
Abstract: Feature ranking is useful to gain knowledge of data and identify relevant features. This article explores the performance of combining linear support vector machines with various feature ranking methods, and reports the experiments conducted when participating the Causality Challenge. Experiments show that a feature ranking using weights from linear SVM models yields good performances, even when the training and testing data are not identically distributed. Checking the dierence of Area Under Curve (AUC) with and without removing each feature also gives similar rankings. Our study indicates that linear SVMs with simple feature rankings are eective on data sets in the Causality Challenge.

Journal ArticleDOI
TL;DR: A new variant of the k-nearest neighbor (kNN) classifier based on the maximal margin principle is presented, characterized by resulting global decision boundaries of the piecewise linear type.
Abstract: In this paper, we present a new variant of the k-nearest neighbor (kNN) classifier based on the maximal margin principle. The proposed method relies on classifying a given unlabeled sample by first finding its k-nearest training samples. A local partition of the input feature space is then carried out by means of local support vector machine (SVM) decision boundaries determined after training a multiclass SVM classifier on the considered k training samples. The labeling of the unknown sample is done by looking at the local decision region to which it belongs. The method is characterized by resulting global decision boundaries of the piecewise linear type. However, the entire process can be kernelized through the determination of the k -nearest training samples in the transformed feature space by using a distance function simply reformulated on the basis of the adopted kernel. To illustrate the performance of the proposed method, an experimental analysis on three different remote sensing datasets is reported and discussed.

01 Jan 2008
TL;DR: The results show that the classification accuracy based on PCA is highly sensitive to the type of data and that the variance captured the principal components is not necessarily a vital indicator for the classification performance.
Abstract: Dimensionality reduction and feature subset selection are two techniques for reducing the attribute space of a feature set, which is an important component of both supervised and unsupervised classification or regression problems. While in feature subset selection a subset of the original attributes is extracted, dimensionality reduction in general produces linear combinations of the original attribute set. In this paper we investigate the relationship between several attribute space reduction techniques and the resulting classification accuracy for two very different application areas. On the one hand, we consider e-mail filtering, where the feature space contains various properties of e-mail messages, and on the other hand, we consider drug discovery problems, where quantitative representations of molecular structures are encoded in terms of information-preserving descriptor values. Subsets of the original attributes constructed by filter and wrapper techniques as well as subsets of linear combinations of the original attributes constructed by three different variants of the principle component analysis (PCA) are compared in terms of the classification performance achieved with various machine learning algorithms as well as in terms of runtime performance. We successively reduce the size of the attribute sets and investigate the changes in the classification results. Moreover, we explore the relationship between the variance captured in the linear combinations within PCA and the resulting classification accuracy. The results show that the classification accuracy based on PCA is highly sensitive to the type of data and that the variance captured the principal components is not necessarily a vital indicator for the classification performance.

Journal ArticleDOI
TL;DR: This study tested the support vector machines (SVM) as a method for constructing nonlinear transition rules for CA and demonstrated that the proposed model can achieve high accuracy and overcome some limitations of existing CA models in simulating complex urban systems.

Journal ArticleDOI
TL;DR: This paper presents a novel approach to unsupervised change detection in multispectral remote-sensing images by using a selective Bayesian thresholding for deriving a pseudotraining set that is necessary for initializing an adequately defined binary semisupervised support vector machine classifier.
Abstract: This paper presents a novel approach to unsupervised change detection in multispectral remote-sensing images. The proposed approach aims at extracting the change information by jointly analyzing the spectral channels of multitemporal images in the original feature space without any training data. This is accomplished by using a selective Bayesian thresholding for deriving a pseudotraining set that is necessary for initializing an adequately defined binary semisupervised support vector machine classifier. Starting from these initial seeds, the performs change detection in the original multitemporal feature space by gradually considering unlabeled patterns in the definition of the decision boundary between changed and unchanged pixels according to a semisupervised learning algorithm. This algorithm models the full complexity of the change-detection problem, which is only partially represented from the seed pixels included in the pseudotraining set. The values of the classifier parameters are then defined according to a novel unsupervised model-selection technique based on a similarity measure between change-detection maps obtained with different settings. Experimental results obtained on different multispectral remote-sensing images confirm the effectiveness of the proposed approach.

Posted Content
TL;DR: The extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the l1-norm or the block l1-norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: A set of methods for building informative and robust feature point representations, used for accurately labeling points in a 3D point cloud, based on the type of surface the point is lying on, are proposed.
Abstract: This paper proposes a set of methods for building informative and robust feature point representations, used for accurately labeling points in a 3D point cloud, based on the type of surface the point is lying on. The feature space comprises a multi-value histogram which characterizes the local geometry around a query point, is pose and sampling density invariant, and can cope well with noisy sensor data. We characterize 3D geometric primitives of interest and describe methods for obtaining discriminating features used in a machine learning algorithm. To validate our approach, we perform an in-depth analysis using different classifiers and show results with both synthetically generated datasets and real-world scans.

Journal ArticleDOI
TL;DR: An adaptive mean-shift (MS) analysis framework is proposed for object extraction and classification of hyperspectral imagery over urban areas and it is shown that the proposed MS-based analysis system is robust and obviously outperforms the other methods.
Abstract: In this paper, an adaptive mean-shift (MS) analysis framework is proposed for object extraction and classification of hyperspectral imagery over urban areas. The basic idea is to apply an MS to obtain an object-oriented representation of hyperspectral data and then use support vector machine to interpret the feature set. In order to employ MS for hyperspectral data effectively, a feature-extraction algorithm, nonnegative matrix factorization, is utilized to reduce the high-dimensional feature space. Furthermore, two bandwidth-selection algorithms are proposed for the MS procedure. One is based on the local structures, and the other exploits separability analysis. Experiments are conducted on two hyperspectral data sets, the DC Mall hyperspectral digital-imagery collection experiment and the Purdue campus hyperspectral mapper images. We evaluate and compare the proposed approach with the well-known commercial software eCognition (object-based analysis approach) and an effective spectral/spatial classifier for hyperspectral data, namely, the derivative of the morphological profile. Experimental results show that the proposed MS-based analysis system is robust and obviously outperforms the other methods.

Journal ArticleDOI
TL;DR: Using wndchrm can allow scientists to perform automated biological image analysis while avoiding the costly challenge of implementing computer vision and pattern recognition algorithms.
Abstract: Biological imaging is an emerging field, covering a wide range of applications in biological and clinical research. However, while machinery for automated experimenting and data acquisition has been developing rapidly in the past years, automated image analysis often introduces a bottleneck in high content screening. Wndchrm is an open source utility for biological image analysis. The software works by first extracting image content descriptors from the raw image, image transforms, and compound image transforms. Then, the most informative features are selected, and the feature vector of each image is used for classification and similarity measurement. Wndchrm has been tested using several publicly available biological datasets, and provided results which are favorably comparable to the performance of task-specific algorithms developed for these datasets. The simple user interface allows researchers who are not knowledgeable in computer vision methods and have no background in computer programming to apply image analysis to their data. We suggest that wndchrm can be effectively used for a wide range of biological image analysis tasks. Using wndchrm can allow scientists to perform automated biological image analysis while avoiding the costly challenge of implementing computer vision and pattern recognition algorithms.


Patent
27 Mar 2008
TL;DR: In this article, a tangible computer readable medium encoded with instructions for automatically generating metadata, wherein said execution of said instructions by one or more processors causes said “one or more processor to perform the steps comprising: a. creating at least one feature vector for each document in a dataset; b. extracting said one vector; c. recording said feature vector as a digital object; and d. augmenting metadata using said digital object to reduce the volume of said dataset, said augmenting capable of allowing a user to perform a search on said dataset.
Abstract: A tangible computer readable medium encoded with instructions for automatically generating metadata, wherein said execution of said instructions by one or more processors causes said “one or more processors” to perform the steps comprising: a. creating at least one feature vector for each document in a dataset; b. extracting said one feature vector; c. recording said feature vector as a digital object; and d. augmenting metadata using said digital object to reduce the volume of said dataset, said augmenting capable of allowing a user to perform a search on said dataset.

15 Sep 2008
TL;DR: In this article, the authors investigate the relationship between several attribute space reduction techniques and the resulting classification accuracy for two very different application areas, e.g., e-mail filtering and drug discovery.
Abstract: Dimensionality reduction and feature subset selection are two techniques for reducing the attribute space of a feature set, which is an important component of both supervised and unsupervised classification or regression problems. While in feature subset selection a subset of the original attributes is extracted, dimensionality reduction in general produces linear combinations of the original attribute set. In this paper we investigate the relationship between several attribute space reduction techniques and the resulting classification accuracy for two very different application areas. On the one hand, we consider e-mail filtering, where the feature space contains various properties of e-mail messages, and on the other hand, we consider drug discovery problems, where quantitative representations of molecular structures are encoded in terms of information-preserving descriptor values. Subsets of the original attributes constructed by filter and wrapper techniques as well as subsets of linear combinations of the original attributes constructed by three different variants of the principle component analysis (PCA) are compared in terms of the classification performance achieved with various machine learning algorithms as well as in terms of runtime performance. We successively reduce the size of the attribute sets and investigate the changes in the classification results. Moreover, we explore the relationship between the variance captured in the linear combinations within PCA and the resulting classification accuracy. The results show that the classification accuracy based on PCA is highly sensitive to the type of data and that the variance captured the principal components is not necessarily a vital indicator for the classification performance.

Journal ArticleDOI
TL;DR: A generative model that creates a one-to-many mapping from an idealized "identity" space to the observed data space to establish a probabilistic distance metric that allows a full posterior over possible matches to be established.
Abstract: Face recognition algorithms perform very unreliably when the pose of the probe face is different from the gallery face: typical feature vectors vary more with pose than with identity. We propose a generative model that creates a one-to-many mapping from an idealized "identity" space to the observed data space. In identity space, the representation for each individual does not vary with pose. We model the measured feature vector as being generated by a pose-contingent linear transformation of the identity variable in the presence of Gaussian noise. We term this model "tied" factor analysis. The choice of linear transformation (factors) depends on the pose, but the loadings are constant (tied) for a given individual. We use the EM algorithm to estimate the linear transformations and the noise parameters from training data. We propose a probabilistic distance metric that allows a full posterior over possible matches to be established. We introduce a novel feature extraction process and investigate recognition performance by using the FERET, XM2VTS, and PIE databases. Recognition performance compares favorably with contemporary approaches.

Proceedings ArticleDOI
25 Oct 2008
TL;DR: This work proposes head word feature and present two approaches to augment semantic features of such head words using WordNet and proposes a compact yet effective feature set.
Abstract: Question classification plays an important role in question answering. Features are the key to obtain an accurate question classifier. In contrast to Li and Roth (2002)'s approach which makes use of very rich feature space, we propose a compact yet effective feature set. In particular, we propose head word feature and present two approaches to augment semantic features of such head words using WordNet. In addition, Lesk's word sense disambiguation (WSD) algorithm is adapted and the depth of hypernym feature is optimized. With further augment of other standard features such as unigrams, our linear SVM and Maximum Entropy (ME) models reach the accuracy of 89.2% and 89.0% respectively over a standard benchmark dataset, which outperform the best previously reported accuracy of 86.2%.

Journal ArticleDOI
TL;DR: An approach to identify informative training samples was demonstrated for the classification of agricultural classes in south‐western part of Punjab state, India and it was apparent that the intelligently defined training set contained a greater proportion of support vectors than that acquired by the conventional approach.
Abstract: The accuracy of a supervised classification is dependent to a large extent on the training data used. The aim in training is often to capture a large training set to fully describe the classes spectrally, commonly with the requirements of a conventional statistical classifier in mind. However, it is not always necessary to provide a complete description of the classes, especially if using a support vector machine (SVM) as the classifier. An SVM seeks to fit an optimal hyperplane between the classes and uses only some of the training samples that lie at the edge of the class distributions in feature space (support vectors). This should allow the definition of the most informative training samples prior to the analysis. An approach to identify informative training samples was demonstrated for the classification of agricultural classes in south-western part of Punjab state, India. A small, intelligently selected, training dataset was acquired in the field with the aid of ancillary information. This dataset contained the data from training sites that were predicted before the classification to be amongst the most informative for an SVM classification. The intelligent training collection scheme yielded a classification of comparable accuracy, ∼91%, to one derived using a larger training set acquired by a conventional approach. Moreover, from inspection of the training sets it was apparent that the intelligently defined training set contained a greater proportion of support vectors (0.70), useful training sites, than that acquired by the conventional approach (0.41). By focusing on the most informative training samples, the intelligent scheme required less investment in training than the conventional approach and its adoption would have reduced the total financial outlay in classification production and evaluation by ∼26%. Additionally, the analysis highlighted the possibility to further reduce the training set size without any significant negative impact on classification accuracy.