scispace - formally typeset
Search or ask a question

Showing papers on "Feature vector published in 2006"


Journal ArticleDOI
TL;DR: This paper presents a novel and efficient facial image representation based on local binary pattern (LBP) texture features that is assessed in the face recognition problem under different challenges.
Abstract: This paper presents a novel and efficient facial image representation based on local binary pattern (LBP) texture features. The face image is divided into several regions from which the LBP feature distributions are extracted and concatenated into an enhanced feature vector to be used as a face descriptor. The performance of the proposed method is assessed in the face recognition problem under different challenges. Other applications and several extensions are also discussed

5,563 citations


Journal ArticleDOI
TL;DR: In this paper, a method for automated segmentation of the vasculature in retinal images is presented, which produces segmentations by classifying each image pixel as vessel or non-vessel, based on the pixel's feature vector.
Abstract: We present a method for automated segmentation of the vasculature in retinal images. The method produces segmentations by classifying each image pixel as vessel or nonvessel, based on the pixel's feature vector. Feature vectors are composed of the pixel's intensity and two-dimensional Gabor wavelet transform responses taken at multiple scales. The Gabor wavelet is capable of tuning to specific frequencies, thus allowing noise filtering and vessel enhancement in a single step. We use a Bayesian classifier with class-conditional probability density functions (likelihoods) described as Gaussian mixtures, yielding a fast classification, while being able to model complex decision surfaces. The probability distributions are estimated based on a training set of labeled pixels obtained from manual segmentations. The method's performance is evaluated on publicly available DRIVE (Staal et al.,2004) and STARE (Hoover et al.,2000) databases of manually labeled images. On the DRIVE database, it achieves an area under the receiver operating characteristic curve of 0.9614, being slightly superior than that presented by state-of-the-art approaches. We are making our implementation available as open source MATLAB scripts for researchers interested in implementation details, evaluation, or development of methods

1,435 citations


Journal ArticleDOI
TL;DR: This work proposes a learning method, MILES (multiple-instance learning via embedded instance selection), which converts the multiple- instance learning problem to a standard supervised learning problem that does not impose the assumption relating instance labels to bag labels.
Abstract: Multiple-instance problems arise from the situations where training class labels are attached to sets of samples (named bags), instead of individual samples within each bag (called instances). Most previous multiple-instance learning (MIL) algorithms are developed based on the assumption that a bag is positive if and only if at least one of its instances is positive. Although the assumption works well in a drug activity prediction problem, it is rather restrictive for other applications, especially those in the computer vision area. We propose a learning method, MILES (multiple-instance learning via embedded instance selection), which converts the multiple-instance learning problem to a standard supervised learning problem that does not impose the assumption relating instance labels to bag labels. MILES maps each bag into a feature space defined by the instances in the training bags via an instance similarity measure. This feature mapping often provides a large number of redundant or irrelevant features. Hence, 1-norm SVM is applied to select important features as well as construct classifiers simultaneously. We have performed extensive experiments. In comparison with other methods, MILES demonstrates competitive classification accuracy, high computation efficiency, and robustness to labeling uncertainty

766 citations


Journal ArticleDOI
TL;DR: This work considers the application of SVMs to speaker and language recognition and uses a sequence kernel that compares sequences of feature vectors and produces a measure of similarity to build upon a simpler mean-squared error classifier to produce a more accurate system.

542 citations


Proceedings Article
01 Jan 2006
TL;DR: A practical procedure for applying WCCN to an SVM-based speaker recognition system where the input feature vectors reside in a high-dimensional space and achieves improvements of up to 22% in EER and 28% in minimum decision cost function (DCF) over the previous baseline.
Abstract: This paper extends the within-class covariance normalization (WCCN) technique described in [1, 2] for training generalized linear kernels. We describe a practical procedure for applying WCCN to an SVM-based speaker recognition system where the input feature vectors reside in a high-dimensional space. Our approach involves using principal component analysis (PCA) to split the original feature space into two subspaces: a low-dimensional “PCA space” and a high-dimensional “PCA-complement space.” After performing WCCN in the PCA space, we concatenate the resulting feature vectors with a weighted version of their PCAcomplements. When applied to a state-of-the-art MLLR-SVM speaker recognition system, this approach achieves improvements of up to 22% in EER and 28% in minimum decision cost function (DCF) over our previous baseline. We also achieve substantial improvements over an MLLR-SVM system that performs WCCN in the PCA space but discards the PCA-complement.

461 citations


Proceedings Article
04 Dec 2006
TL;DR: This paper formalizes multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels, and proposes the MIMLBOOST and MIMLSVM algorithms which achieve good performance in an application to scene classification.
Abstract: In this paper, we formalize multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels Such a problem can occur in many real-world tasks, eg an image usually contains multiple patches each of which can be described by a feature vector, and the image can belong to multiple categories since its semantics can be recognized in different ways We analyze the relationship between multi-instance multi-label learning and the learning frameworks of traditional supervised learning, multi-instance learning and multi-label learning Then, we propose the MIMLBOOST and MIMLSVM algorithms which achieve good performance in an application to scene classification

455 citations


Journal ArticleDOI
TL;DR: This paper investigates the feasibility of an audio-based context recognition system developed and compared to the accuracy of human listeners in the same task, with particular emphasis on the computational complexity of the methods.
Abstract: The aim of this paper is to investigate the feasibility of an audio-based context recognition system. Here, context recognition refers to the automatic classification of the context or an environment around a device. A system is developed and compared to the accuracy of human listeners in the same task. Particular emphasis is placed on the computational complexity of the methods, since the application is of particular interest in resource-constrained portable devices. Simplistic low-dimensional feature vectors are evaluated against more standard spectral features. Using discriminative training, competitive recognition accuracies are achieved with very low-order hidden Markov models (1-3 Gaussian components). Slight improvement in recognition accuracy is observed when linear data-driven feature transformations are applied to mel-cepstral features. The recognition rate of the system as a function of the test sequence length appears to converge only after about 30 to 60 s. Some degree of accuracy can be achieved even with less than 1-s test sequence lengths. The average reaction time of the human listeners was 14 s, i.e., somewhat smaller, but of the same order as that of the system. The average recognition accuracy of the system was 58% against 69%, obtained in the listening tests in recognizing between 24 everyday contexts. The accuracies in recognizing six high-level classes were 82% for the system and 88% for the subjects.

436 citations


Journal ArticleDOI
TL;DR: This paper presents a complete framework that starts with the extraction of various local regions of either discontinuity or homogeneity, and uses Boosting to learn a subset of feature vectors (weak hypotheses) and to combine them into one final hypothesis for each visual category.
Abstract: This paper explores the power and the limitations of weakly supervised categorization. We present a complete framework that starts with the extraction of various local regions of either discontinuity or homogeneity. A variety of local descriptors can be applied to form a set of feature vectors for each local region. Boosting is used to learn a subset of such feature vectors (weak hypotheses) and to combine them into one final hypothesis for each visual category. This combination of individual extractors and descriptors leads to recognition rates that are superior to other approaches which use only one specific extractor/descriptor setting. To explore the limitation of our system, we had to set up new, highly complex image databases that show the objects of interest at varying scales and poses, in cluttered background, and under considerable occlusion. We obtain classification results up to 81 percent ROC-equal error rate on the most complex of our databases. Our approach outperforms all comparable solutions on common databases.

422 citations


Journal ArticleDOI
TL;DR: This paper provides an overview of the recently proposed rule extraction techniques for SVMs and introduces two others taken from the artificial neural networks domain, being Trepan and G-REX, which rank at the top of comprehensible classification techniques.
Abstract: In recent years, Support Vector Machines (SVMs) were successfully applied to a wide range of applications. Their good performance is achieved by an implicit non-linear transformation of the original problem to a high-dimensional (possibly infinite) feature space in which a linear decision hyperplane is constructed that yields a nonlinear classifier in the input space. However, since the classifier is described as a complex mathematical function, it is rather incomprehensible for humans. This opacity property prevents them from being used in many real- life applications where both accuracy and comprehensibility are required, such as medical diagnosis and credit risk evaluation. To overcome this limitation, rules can be extracted from the trained SVM that are interpretable by humans and keep as much of the accuracy of the SVM as possible. In this paper, we will provide an overview of the recently proposed rule extraction techniques for SVMs and introduce two others taken from the artificial neural networks domain, being Trepan and G-REX. The described techniques are compared using publicly avail- able datasets, such as Ripley's synthetic dataset and the multi-class iris dataset. We will also look at medical diagnosis and credit scoring where comprehensibility is a key requirement and even a regulatory recommendation. Our experiments show that the SVM rule extraction techniques lose only a small percentage in performance compared to SVMs and therefore rank at the top of comprehensible classification techniques.

392 citations


Proceedings ArticleDOI
17 Jun 2006
TL;DR: It is shown that architectures such as convolutional networks are good at learning invariant features, but not always optimal for classification, while Support Vector Machines are good for producing decision surfaces from wellbehaved feature vectors, but cannot learn complicated invariances.
Abstract: The detection and recognition of generic object categories with invariance to viewpoint, illumination, and clutter requires the combination of a feature extractor and a classifier. We show that architectures such as convolutional networks are good at learning invariant features, but not always optimal for classification, while Support Vector Machines are good at producing decision surfaces from wellbehaved feature vectors, but cannot learn complicated invariances. We present a hybrid system where a convolutional network is trained to detect and recognize generic objects, and a Gaussian-kernel SVM is trained from the features learned by the convolutional network. Results are given on a large generic object recognition task with six categories (human figures, four-legged animals, airplanes, trucks, cars, and "none of the above"), with multiple instances of each object category under various poses, illuminations, and backgrounds. On the test set, which contains different object instances than the training set, an SVM alone yields a 43.3% error rate, a convolutional net alone yields 7.2% and an SVM on top of features produced by the convolutional net yields 5.9%.

373 citations


Dissertation
17 Jul 2006
TL;DR: This thesis introduces grids of locally normalised Histograms of Oriented Gradients (HOG) as descriptors for object detection in static images and proposes descriptors based on oriented histograms of differential optical flow to detect moving humans in videos.
Abstract: This thesis targets the detection of humans and other object classes in images and videos. Our focus is on developing robust feature extraction algorithms that encode image regions as highdimensional feature vectors that support high accuracy object/non-object decisions. To test our feature sets we adopt a relatively simple learning framework that uses linear Support Vector Machines to classify each possible image region as an object or as a non-object. The approach is data-driven and purely bottom-up using low-level appearance and motion vectors to detect objects. As a test case we focus on person detection as people are one of the most challenging object classes with many applications, for example in film and video analysis, pedestrian detection for smart cars and video surveillance. Nevertheless we do not make any strong class specific assumptions and the resulting object detection framework also gives state-of-the-art performance for many other classes including cars, motorbikes, cows and sheep. This thesis makes four main contributions. Firstly, we introduce grids of locally normalised Histograms of Oriented Gradients (HOG) as descriptors for object detection in static images. The HOG descriptors are computed over dense and overlapping grids of spatial blocks, with image gradient orientation features extracted at fixed resolution and gathered into a highdimensional feature vector. They are designed to be robust to small changes in image contour locations and directions, and significant changes in image illumination and colour, while remaining highly discriminative for overall visual form. We show that unsmoothed gradients, fine orientation voting, moderately coarse spatial binning, strong normalisation and overlapping blocks are all needed for good performance. Secondly, to detect moving humans in videos, we propose descriptors based on oriented histograms of differential optical flow. These are similar to static HOG descriptors, but instead of image gradients, they are based on local differentials of dense optical flow. They encode the noisy optical flow estimates into robust feature vectors in a manner that is robust to the overall camera motion. Several variants are proposed, some capturing motion boundaries while others encode the relative motions of adjacent image regions. Thirdly, we propose a general method based on kernel density estimation for fusing multiple overlapping detections, that takes into account the number of detections, their confidence scores and the scales of the detections. Lastly, we present work in progress on a parts based approach to person detection that first detects local body parts like heads, torso, and legs and then fuses them to create a global overall person detector.

Journal ArticleDOI
01 Jan 2006
TL;DR: This paper introduces a new information gain and divergence-based feature selection method for statistical machine learning-based text categorization without relying on more complex dependence models.
Abstract: Most previous works of feature selection emphasized only the reduction of high dimensionality of the feature space. But in cases where many features are highly redundant with each other, we must utilize other means, for example, more complex dependence models such as Bayesian network classifiers. In this paper, we introduce a new information gain and divergence-based feature selection method for statistical machine learning-based text categorization without relying on more complex dependence models. Our feature selection method strives to reduce redundancy between features while maintaining information gain in selecting appropriate features for text categorization. Empirical results are given on a number of dataset, showing that our feature selection method is more effective than Koller and Sahami's method [Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In Proceedings of ICML-96, 13th international conference on machine learning], which is one of greedy feature selection methods, and conventional information gain which is commonly used in feature selection for text categorization. Moreover, our feature selection method sometimes produces more improvements of conventional machine learning algorithms over support vector machines which are known to give the best classification accuracy.

Journal ArticleDOI
TL;DR: A novel real-time electromyogram (EMG) pattern recognition for the control of a multifunction myoelectric hand from four channel EMG signals using a wavelet packet transform and a linear-nonlinear feature projection composed of principal components analysis (PCA) and a self-organizing feature map (SOFM).
Abstract: This paper proposes a novel real-time electromyogram (EMG) pattern recognition for the control of a multifunction myoelectric hand from four channel EMG signals. To extract a feature vector from the EMG signal, we use a wavelet packet transform that is a generalized version of wavelet transform. For dimensionality reduction and nonlinear mapping of the features, we also propose a linear-nonlinear feature projection composed of principal components analysis (PCA) and a self-organizing feature map (SOFM). The dimensionality reduction by PCA simplifies the structure of the classifier and reduces processing time for the pattern recognition. The nonlinear mapping by SOFM transforms the PCA-reduced features into a new feature space with high class separability. Finally, a multilayer perceptron (MLP) is used as the classifier. Using an analysis of class separability by feature projections, we show that the recognition accuracy depends more on the class separability of the projected features than on the MLP's class separation ability. Consequently, the proposed linear-nonlinear projection method improves class separability and recognition accuracy. We implement a real-time control system for a multifunction virtual hand. Our experimental results show that all processes, including virtual hand control, are completed within 125 ms, and the proposed method is applicable to real-time myoelectric hand control without an operational time delay

Journal ArticleDOI
TL;DR: A novel pattern recognition framework that integrates Gabor image representation, a novel multiclass kernel Fisher analysis (KFA) method, and fractional power polynomial models for improving pattern recognition performance is presented.
Abstract: This paper presents a novel pattern recognition framework by capitalizing on dimensionality increasing techniques. In particular, the framework integrates Gabor image representation, a novel multiclass kernel Fisher analysis (KFA) method, and fractional power polynomial models for improving pattern recognition performance. Gabor image representation, which increases dimensionality by incorporating Gabor filters with different scales and orientations, is characterized by spatial frequency, spatial locality, and orientational selectivity for coping with image variabilities such as illumination variations. The KFA method first performs nonlinear mapping from the input space to a high-dimensional feature space, and then implements the multiclass Fisher discriminant analysis in the feature space. The significance of the nonlinear mapping is that it increases the discriminating power of the KFA method, which is linear in the feature space but nonlinear in the input space. The novelty of the KFA method comes from the fact that 1) it extends the two-class kernel Fisher methods by addressing multiclass pattern classification problems and 2) it improves upon the traditional generalized discriminant analysis (GDA) method by deriving a unique solution (compared to the GDA solution, which is not unique). The fractional power polynomial models further improve performance of the proposed pattern recognition framework. Experiments on face recognition using both the FERET database and the FRGC (face recognition grand challenge) databases show the feasibility of the proposed framework. In particular, experimental results using the FERET database show that the KFA method performs better than the GDA method and the fractional power polynomial models help both the KFA method and the GDA method improve their face recognition performance. Experimental results using the FRGC databases show that the proposed pattern recognition framework improves face recognition performance upon the BEE baseline algorithm and the LDA-based baseline algorithm by large margins.

Journal ArticleDOI
TL;DR: It is shown that there is no statistically significant difference in performance of the proposed method for PQ classification when different wavelets are chosen, which means one can choose the wavelet with short wavelet filter length to achieve good classification results as well as small computational cost.
Abstract: This paper proposed a novel approach for the Power Quality (PQ) disturbances classification based on the wavelet transform and self organizing learning array (SOLAR) system. Wavelet transform is utilized to extract feature vectors for various PQ disturbances based on the multiresolution analysis (MRA). These feature vectors then are applied to a SOLAR system for training and testing. SOLAR has three advantageous over a typical neural network: data driven learning, local interconnections and entropy based self-organization. Several typical PQ disturbances are taken into consideration in this paper. Comparison research between the proposed method, the support vector machine (SVM) method and existing literature reports show that the proposed method can provide accurate classification results. By the hypothesis test of the averages, it is shown that there is no statistically significant difference in performance of the proposed method for PQ classification when different wavelets are chosen. This means one can choose the wavelet with short wavelet filter length to achieve good classification results as well as small computational cost. Gaussian white noise is considered and the Monte Carlo method is used to simulate the performance of the proposed method in different noise conditions.

Proceedings ArticleDOI
18 Dec 2006
TL;DR: This paper proposes a new approach to construct high speed payload-based anomaly IDS intended to be accurate and hard to evade, and uses a feature clustering algorithm originally proposed for text classification problems to reduce the dimensionality of the feature space.
Abstract: Unsupervised or unlabeled learning approaches for network anomaly detection have been recently proposed. In particular, recent work on unlabeled anomaly detection focused on high speed classification based on simple payload statistics. For example, PAYL, an anomaly IDS, measures the occurrence frequency in the payload of n-grams. A simple model of normal traffic is then constructed according to this description of the packets' content. It has been demonstrated that anomaly detectors based on payload statistics can be "evaded" by mimicry attacks using byte substitution and padding techniques. In this paper we propose a new approach to construct high speed payload-based anomaly IDS intended to be accurate and hard to evade. We propose a new technique to extract the features from the payload. We use a feature clustering algorithm originally proposed for text classification problems to reduce the dimensionality of the feature space. Accuracy and hardness of evasion are obtained by constructing our anomaly-based IDS using an ensemble of one-class SVM classifiers that work on different feature spaces.

Book ChapterDOI
16 Aug 2006
TL;DR: Two kernel-based reinforcement learning algorithms, the e – KRL and the least squares kernel based reinforcement learning (LS-KRL) are proposed and an example shows that the proposed methods can deal effectively with the reinforcement learning problem without having to explore many states.
Abstract: We consider the problem of approximating the cost-to-go functions in reinforcement learning By mapping the state implicitly into a feature space, we perform a simple algorithm in the feature space, which corresponds to a complex algorithm in the original state space Two kernel-based reinforcement learning algorithms, the e -insensitive kernel based reinforcement learning (e – KRL) and the least squares kernel based reinforcement learning (LS-KRL) are proposed An example shows that the proposed methods can deal effectively with the reinforcement learning problem without having to explore many states

Journal ArticleDOI
TL;DR: New data-driven models based on Statistical Learning Theory that were used to forecast flows at two time scales: seasonal flow volumes and hourly stream flows showed a promising performance in solving site-specific, real-time water resources management problems.

Journal ArticleDOI
TL;DR: This paper focuses on optimizing vector quantization (VQ) based speaker identification, which reduces the number of test vectors by pre-quantizing the test sequence prior to matching, and thenumber of speakers by pruning out unlikely speakers during the identification process.
Abstract: In speaker identification, most of the computation originates from the distance or likelihood computations between the feature vectors of the unknown speaker and the models in the database. The identification time depends on the number of feature vectors, their dimensionality, the complexity of the speaker models and the number of speakers. In this paper, we concentrate on optimizing vector quantization (VQ) based speaker identification. We reduce the number of test vectors by pre-quantizing the test sequence prior to matching, and the number of speakers by pruning out unlikely speakers during the identification process. The best variants are then generalized to Gaussian mixture model (GMM) based modeling. We apply the algorithms also to efficient cohort set search for score normalization in speaker verification. We obtain a speed-up factor of 16:1 in the case of VQ-based modeling with minor degradation in the identification accuracy, and 34:1 in the case of GMM-based modeling. An equal error rate of 7% can be reached in 0.84 s on average when the length of test utterance is 30.4 s.

Patent
Pavel Berkhim1, Zhichen Xu1, Jianchang Mao1, Daniel E. Rose1, Abe Taha1, Farzin Maghoul1 
02 Aug 2006
TL;DR: In this paper, the authors proposed a method for trust propagation in which a first feature vector for a first user, calculating a second feature for a second user and comparing the similarity value with the second feature vector to calculate a similarity value.
Abstract: The present invention is directed towards systems and methods for trust propagation. The method according to one embodiment comprises calculating a first feature vector for a first user, calculating a second feature for a second user and comparing the first feature vector with the second feature vector to calculate a similarity value. A determination is made as to whether the similarity value falls within a threshold. If the similarity value falls within the threshold, a relationship is recorded between the first user and the second user in a first user profile and a second user profile.

Journal ArticleDOI
TL;DR: An ensemble learning framework based on random sampling on all three key components of a classification system: the feature space, training samples, and subspace parameters is developed, and a robust random sampling face recognition system integrating shape, texture, and Gabor responses is constructed.
Abstract: Subspace face recognition often suffers from two problems: (1) the training sample set is small compared with the high dimensional feature vector; (2) the performance is sensitive to the subspace dimension. Instead of pursuing a single optimal subspace, we develop an ensemble learning framework based on random sampling on all three key components of a classification system: the feature space, training samples, and subspace parameters. Fisherface and Null Space LDA (N-LDA) are two conventional approaches to address the small sample size problem. But in many cases, these LDA classifiers are overfitted to the training set and discard some useful discriminative information. By analyzing different overfitting problems for the two kinds of LDA classifiers, we use random subspace and bagging to improve them respectively. By random sampling on feature vectors and training samples, multiple stabilized Fisherface and N-LDA classifiers are constructed and the two groups of complementary classifiers are integrated using a fusion rule, so nearly all the discriminative information is preserved. In addition, we further apply random sampling on parameter selection in order to overcome the difficulty of selecting optimal parameters in our algorithms. Then, we use the developed random sampling framework for the integration of multiple features. A robust random sampling face recognition system integrating shape, texture, and Gabor responses is finally constructed.

01 Jan 2006
TL;DR: This work applies the information bottleneck method to find word-clusters that preserve the information about document categories and use these clusters as features for classification, and shows that when the training sample is small word clusters can yield significant improvement in classification accuracy.
Abstract: The recently introduced Information Bottleneck method [21] provides an information theoretic framework, for extracting features of one variable, that are relevant for the values of another variable. Several previous works already suggested applying this method for document clustering, gene expression data analysis, spectral analysis and more. In this work we present a novel implementation of this method for supervised text classification. Specifically, we apply the information bottleneck method to find word-clusters that preserve the information about document categories and use these clusters as features for classification. Previous work [1] used a similar clustering procedure to show that word-clusters can significantly reduce the feature space dimensionality, with only a minor change in classification accuracy. In this work we reproduce these results and go further to show that when the training sample is small word clusters can yield significant improvement in classification accuracy (up to 18%) over the performance using the words directly.

Journal ArticleDOI
TL;DR: A new multivariate search technique is introduced, that is less sensitive to the noise in the data and computationally feasible as well and the robustness and reliability of the novel multivariate feature selection method are compared.

Book ChapterDOI
07 May 2006
TL;DR: A novel algorithm called Common Discriminant Feature Extraction specially tailored to the inter-modality face recognition problem is proposed and two nonlinear extensions of the algorithm are developed: one is based on kernelization, while the other is a multi-mode framework.
Abstract: Recently, the wide deployment of practical face recognition systems gives rise to the emergence of the inter-modality face recognition problem. In this problem, the face images in the database and the query images captured on spot are acquired under quite different conditions or even using different equipments. Conventional approaches either treat the samples in a uniform model or introduce an intermediate conversion stage, both of which would lead to severe performance degradation due to the great discrepancies between different modalities. In this paper, we propose a novel algorithm called Common Discriminant Feature Extraction specially tailored to the inter-modality problem. In the algorithm, two transforms are simultaneously learned to transform the samples in both modalities respectively to the common feature space. We formulate the learning objective by incorporating both the empirical discriminative power and the local smoothness of the feature transformation. By explicitly controlling the model complexity through the smoothness constraint, we can effectively reduce the risk of overfitting and enhance the generalization capability. Furthermore, to cope with the nongaussian distribution and diverse variations in the sample space, we develop two nonlinear extensions of the algorithm: one is based on kernelization, while the other is a multi-mode framework. These extensions substantially improve the recognition performance in complex situation. Extensive experiments are conducted to test our algorithms in two application scenarios: optical image-infrared image recognition and photo-sketch recognition. Our algorithms show excellent performance in the experiments.

Journal ArticleDOI
TL;DR: A novel feature selection method named filtered and supported sequential forward search (FS_SFS) in the context of support vector machines (SVM) is presented, which has two important properties to reduce the time of computation.

Journal Article
TL;DR: Wang et al. as mentioned in this paper proposed a Common Discriminant Feature Extraction (CDFE) algorithm for inter-modality face recognition, where two transforms are simultaneously learned to transform the samples in both modalities respectively to the common feature space.
Abstract: Recently, the wide deployment of practical face recognition systems gives rise to the emergence of the inter-modality face recognition problem. In this problem, the face images in the database and the query images captured on spot are acquired under quite different conditions or even using different equipments. Conventional approaches either treat the samples in a uniform model or introduce an intermediate conversion stage, both of which would lead to severe performance degradation due to the great discrepancies between different modalities. In this paper, we propose a novel algorithm called Common Discriminant Feature Extraction specially tailored to the inter-modality problem. In the algorithm, two transforms are simultaneously learned to transform the samples in both modalities respectively to the common feature space. We formulate the learning objective by incorporating both the empirical discriminative power and tlie local smoothness of the feature transformation. By explicitly controlling the model complexity through the smoothness constraint, we can effectively reduce the risk of overfitting and enhance the generalization capability. Furthermore, to cope with the nongaussian distribution and diverse variations in the sample space, we develop two non-linear extensions of the algorithm: one is based on kernelization, while the other is a multi-mode framework. These extensions substantially improve the recognition performance in complex situation. Extensive experiments are conducted to test our algorithms in two application scenarios: optical image-infrared image recognition and photo-sketch recognition. Our algorithms show excellent performance in the experiments.

Journal ArticleDOI
TL;DR: A system for human behaviour recognition in video sequences that combines Bayesian networks and belief propagation, non-parametric sampling from a previously learned database of actions, and Hidden Markov Models which encode scene rules are used to smooth sequences of actions.

Patent
01 Sep 2006
TL;DR: In this article, the authors predicted the language of origin of a word or named entity using estimates of frequency of occurrence of the word or entity in different languages in a variety of different languages.
Abstract: The language of origin of a word or named entity is predicted using estimates of frequency of occurrence of the word or named entity in different languages. In one embodiment, the normalized frequency of occurrence of the word or named entity in a variety of different languages is estimated and the values are used as features in a feature vector which is scored and used to identify language of origin.

Book ChapterDOI
21 Aug 2006
TL;DR: Following the rising interest towards the Support Vector Machine, various studies showed that SVM outperforms other classification algorithms, so should the authors just not bother about other classificationgorithms and opt always for SVM?
Abstract: Document classification has already been widely studied. In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms. Recently, following the rising interest towards the Support Vector Machine, various studies showed that SVM outperforms other classification algorithms. So should we just not bother about other classification algorithms and opt always for SVM ?

Proceedings ArticleDOI
27 Oct 2006
TL;DR: This paper presents a HMM-based methodology for action recogni-tion using star skeleton as a representative descriptor of human posture, and implements a system to automatically recognize ten different types of actions.
Abstract: This paper presents a HMM-based methodology for action recogni-tion using star skeleton as a representative descriptor of human posture. Star skeleton is a fast skeletonization technique by connecting from centroid of target object to contour extremes. To use star skeleton as feature for action recognition, we clearly define the fea-ture as a five-dimensional vector in star fashion because the head and four limbs are usually local extremes of human shape. In our proposed method, an action is composed of a series of star skeletons over time. Therefore, time-sequential images expressing human action are transformed into a feature vector sequence. Then the fea-ture vector sequence must be transformed into symbol sequence so that HMM can model the action. We design a posture codebook, which contains representative star skeletons of each action type and define a star distance to measure the similarity between feature vec-tors. Each feature vector of the sequence is matched against the codebook and is assigned to the symbol that is most similar. Conse-quently, the time-sequential images are converted to a symbol posture sequence. We use HMMs to model each action types to be recognized. In the training phase, the model parameters of the HMM of each category are optimized so as to best describe the training symbol sequences. For human action recognition, the model which best matches the observed symbol sequence is selected as the recog-nized category. We implement a system to automatically recognize ten different types of actions, and the system has been tested on real human action videos in two cases. One case is the classification of 100 video clips, each containing a single action type. A 98% recog-nition rate is obtained. The other case is a more realistic situation in which human takes a series of actions combined. An action-series recognition is achieved by referring a period of posture history using a sliding window scheme. The experimental results show promising performance.