scispace - formally typeset
Search or ask a question

Showing papers on "Feature (machine learning) published in 2007"


Journal ArticleDOI
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Abstract: (2007). Pattern Recognition and Machine Learning. Technometrics: Vol. 49, No. 3, pp. 366-366.

18,802 citations


Journal ArticleDOI
TL;DR: A hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation is described.
Abstract: We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation. We demonstrate the strength of the approach on a range of recognition tasks: From invariant single object recognition in clutter to multiclass categorization problems and complex scene understanding tasks that rely on the recognition of both shape-based as well as texture-based objects. Given the biological constraints that the system had to satisfy, the approach performs surprisingly well: It has the capability of learning from only a few training examples and competes with state-of-the-art systems. We also discuss the existence of a universal, redundant dictionary of features that could handle the recognition of most object categories. In addition to its relevance for computer vision, the success of this approach suggests a plausibility proof for a class of feedforward models of object recognition in cortex

1,779 citations


Journal ArticleDOI
TL;DR: The basis of the supervised pattern recognition techniques mostly used in food analysis are reviewed, making special emphasis on the practical requirements of the measured data and discussing common misconceptions and errors that might arise.

854 citations


Proceedings Article
01 Jan 2007
TL;DR: An overview of the set of features, related, among others, to timbre, tonality, rhythm or form, that can be extracted with the MIRtoolbox, an integrated set of functions written in Matlab dedicated to the extraction of musical features from audio files.
Abstract: We present the MIRtoolbox, an integrated set of functions written in Matlab, dedicated to the extraction of musical features from audio files The design is based on a modular framework: the different algorithms are decomposed into stages, formalized using a minimal set of elementary mechanisms, and integrating different variants proposed by alternative approaches – including new strategies we have developed –, that users can select and parametrize This paper offers an overview of the set of features, related, among others, to timbre, tonality, rhythm or form, that can be extracted with the MIRtoolbox One particular analysis is provided as an example The toolbox also includes functions for statistical analysis, segmentation and clustering Particular attention has been paid to the design of a syntax that offers both simplicity of use and transparent adaptiveness to a multiplicity of possible input types Each feature extraction method can accept as argument an audio file, or any preliminary result from intermediary stages of the chain of operations Also the same syntax can be used for analyses of single audio files, batches of files, series of audio segments, multi-channel signals, etc For that purpose, the data and methods of the toolbox are organised in an object-oriented architecture 1 MOTIVATION AND APPROACH MIRtoolbox is a Matlab toolbox dedicated to the extraction of musically-related features from audio recordings It has been designed in particular with the objective of enabling the computation of a large range of features from databases of audio files, that can be subjected to statistical analyses Few softwares have been proposed in this area One particularity of our own approach relies in the use of the Matlab computing environment, which offers good visualisation capabilities and gives access to a large variety of other toolboxes In particular, the MIRtoolbox makes use of functions available in public-domain toolboxes such as the Auditory Toolbox [6], NetLab [5] and SOMtoolbox [10] Other toolboxes, such as the Statistics toolbox or the Neural Network toolbox from MathWorks, can be directly used for further analyses of the features extracted c © 2007 Austrian Computer Society (OCG) by MIRtoolbox without having to export the data from one software to another Such computational framework, because of its general objectives, could be useful to the research community in Music Information Retrieval (MIR), but also for educational purposes For that reason, particular attention has been paid concerning the ease of use of the toolbox In particular, complex analytic processes can be designed using a very simple syntax, whose expressive power comes from the use of an object-oriented paradigm The different musical features extracted from the audio files are highly interdependent: in particular, as can be seen in figure 1, some features are based on the same initial computations In order to improve the computational efficiency, it is important to avoid redundant computations of these common components Each of these intermediary components, and the final musical features, are therefore considered as building blocks that can been freely articulated one with each other Besides, in keeping with the objective of optimal ease of use of the toolbox, each building block has been conceived in a way that it can adapt to the type of input data For instance, the computation of the MFCCs can be based on the waveform of the initial audio signal, or on the intermediary representations such as spectrum, or mel-scale spectrum (see Fig 1) Similarly, autocorrelation is computed for different range of delays depending on the type of input data (audio waveform, envelope, spectrum) This decomposition of all feature extraction algorithms into a common set of building blocks has the advantage of offering a synthetic overview of the different approaches studied in this domain of research 2 FEATURE EXTRACTION 21 Feature overview Figure 1 shows an overview of the main features implemented in the toolbox All the different processes start from the audio signal (on the left) and form a chain of operations proceeding to right Each musical feature is related to one of the musical dimensions traditionally defined in music theory Boldface characters highlight features related to pitch and tonality Bold italics indicate features related to rhythm Simple italics highlight a large set of features that can be associated to timbre and dynamics Among them, all the operators in grey italics can be Audio signal waveform Zero-crossing rate RMS energy Envelope Low Energy Rate Attack Slope Attack Time Envelope Autocorrelation Tempo Onsets

677 citations


Journal ArticleDOI
TL;DR: Through both simulated data and real life data, it is shown that this method performs very well in multivariate classification problems, often outperforms the PAM method and can be as competitive as the support vector machines classifiers.
Abstract: In this paper, we introduce a modified version of linear discriminant analysis, called the "shrunken centroids regularized discriminant analysis" (SCRDA). This method generalizes the idea of the "nearest shrunken centroids" (NSC) (Tibshirani and others, 2003) into the classical discriminant analysis. The SCRDA method is specially designed for classification problems in high dimension low sample size situations, for example, microarray data. Through both simulated data and real life data, it is shown that this method performs very well in multivariate classification problems, often outperforms the PAM method (using the NSC algorithm) and can be as competitive as the support vector machines classifiers. It is also suitable for feature elimination purpose and can be used as gene selection method. The open source R package for this method (named "rda") is available on CRAN (http://www.r-project.org) for download and testing.

602 citations


Proceedings ArticleDOI
17 Jun 2007
TL;DR: A hierarchical model that can be characterized as a constellation of bags-of-features and that is able to combine both spatial and spatial-temporal features is proposed and shown to improve the classification performance over bag of feature models.
Abstract: We present a novel model for human action categorization. A video sequence is represented as a collection of spatial and spatial-temporal features by extracting static and dynamic interest points. We propose a hierarchical model that can be characterized as a constellation of bags-of-features and that is able to combine both spatial and spatial-temporal features. Given a novel video sequence, the model is able to categorize human actions in a frame-by-frame basis. We test the model on a publicly available human action dataset [2] and show that our new method performs well on the classification task. We also conducted control experiments to show that the use of the proposed mixture of hierarchical models improves the classification performance over bag of feature models. An additional experiment shows that using both dynamic and static features provides a richer representation of human actions when compared to the use of a single feature type, as demonstrated by our evaluation in the classification task.

486 citations


Journal ArticleDOI
TL;DR: This paper investigates a novel approach based on fuzzy-rough sets, fuzzy rough feature selection (FRFS), that addresses problems and retains dataset semantics and is applied to two challenging domains where a feature reducing step is important; namely, web content classification and complex systems monitoring.
Abstract: Attribute selection (AS) refers to the problem of selecting those input attributes or features that are most predictive of a given outcome; a problem encountered in many areas such as machine learning, pattern recognition and signal processing. Unlike other dimensionality reduction methods, attribute selectors preserve the original meaning of the attributes after reduction. This has found application in tasks that involve datasets containing huge numbers of attributes (in the order of tens of thousands) which, for some learning algorithms, might be impossible to process further. Recent examples include text processing and web content classification. AS techniques have also been applied to small and medium-sized datasets in order to locate the most informative attributes for later use. One of the many successful applications of rough set theory has been to this area. The rough set ideology of using only the supplied data and no other information has many benefits in AS, where most other methods require supplementary knowledge. However, the main limitation of rough set-based attribute selection in the literature is the restrictive requirement that all data is discrete. In classical rough set theory, it is not possible to consider real-valued or noisy data. This paper investigates a novel approach based on fuzzy-rough sets, fuzzy rough feature selection (FRFS), that addresses these problems and retains dataset semantics. FRFS is applied to two challenging domains where a feature reducing step is important; namely, web content classification and complex systems monitoring. The utility of this approach is demonstrated and is compared empirically with several dimensionality reducers. In the experimental studies, FRFS is shown to equal or improve classification accuracy when compared to the results from unreduced data. Classifiers that use a lower dimensional set of attributes which are retained by fuzzy-rough reduction outperform those that employ more attributes returned by the existing crisp rough reduction method. In addition, it is shown that FRFS is more powerful than the other AS techniques in the comparative study

408 citations


Proceedings Article
01 Nov 2007
TL;DR: In this paper, the authors explore information retrieval methods such as tf-Idf structure with support vector machines, parametric and nonparametric methods with supervised and unsupervised classification techniques in authorship attribution.
Abstract: Authorship attribution is the process of determining the writer of a document. In literature, there are lots of classification techniques conducted in this process. In this paper we explore information retrieval methods such as tf-Idf structure with support vector machines, parametric and nonparametric methods with supervised and unsupervised (clustering) classification techniques in authorship attribution. We performed various experiments with articles gathered from Turkish newspaper Milliyet. We performed experiments on different features extracted from these texts with different classifiers, and combined these results to improve our success rates. We identified which classifiers give satisfactory results on which feature sets. According to experiments, the success rates dramatically changes with different combinations, however the best among them are support vector classifier with bag of words, and Gaussian with function words.

400 citations


Journal ArticleDOI
TL;DR: A simple and efficient hybrid attribute reduction algorithm based on a generalized fuzzy-rough model based on fuzzy relations is introduced and the technique of variable precision fuzzy inclusion in computing decision positive region can get the optimal classification performance.

390 citations


Journal Article
TL;DR: The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears.
Abstract: In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences---generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3-D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches.

383 citations


Journal ArticleDOI
TL;DR: It is suggested that hierarchical Bayesian models can help to explain how overhypotheses about feature variability and the grouping of categories into ontological kinds like objects and substances are acquired.
Abstract: Inductive learning is impossible without overhypotheses, or constraints on the hypotheses considered by the learner. Some of these overhypotheses must be innate, but we suggest that hierarchical Bayesian models can help to explain how the rest are acquired. To illustrate this claim, we develop models that acquire two kinds of overhypotheses--overhypotheses about feature variability (e.g. the shape bias in word learning) and overhypotheses about the grouping of categories into ontological kinds like objects and substances.

Proceedings ArticleDOI
15 Apr 2007
TL;DR: This paper develops a strategy to set minimum support in frequent pattern mining for generating useful patterns, and demonstrates that the frequent pattern-based classification framework can achieve good scalability and high accuracy in classifying large datasets.
Abstract: The application of frequent patterns in classification appeared in sporadic studies and achieved initial success in the classification of relational data, text documents and graphs. In this paper, we conduct a systematic exploration of frequent pattern-based classification, and provide solid reasons supporting this methodology. It was well known that feature combinations (patterns) could capture more underlying semantics than single features. However, inclusion of infrequent patterns may not significantly improve the accuracy due to their limited predictive power. By building a connection between pattern frequency and discriminative measures such as information gain and Fisher score, we develop a strategy to set minimum support in frequent pattern mining for generating useful patterns. Based on this strategy, coupled with a proposed feature selection algorithm, discriminative frequent patterns can be generated for building high quality classifiers. We demonstrate that the frequent pattern-based classification framework can achieve good scalability and high accuracy in classifying large datasets. Empirical studies indicate that significant improvement in classification accuracy is achieved (up to 12% in UCI datasets) using the so-selected discriminative frequent patterns.

Journal ArticleDOI
TL;DR: A new classification method, referred to as move median centers (MMC) hypersphere classifier, for the leaf database based on digital morphological feature is proposed, which is more robust than the one based on contour features since those significant curvature points are hard to find.

Journal ArticleDOI
TL;DR: An approach to automatic visual recognition of expressive face and upper-body gestures from video sequences suitable for use in a vision-based affective multi-modal framework and fuse facial expression and affective body gesture information at the feature and at the decision level is presented.

Proceedings Article
03 Dec 2007
TL;DR: This paper furnishes an alternative means of expressing the ARD cost function using auxiliary functions that naturally addresses both of these issues and suggest alternative cost functions and update procedures for selecting features and promoting sparse solutions in a variety of general situations.
Abstract: Automatic relevance determination (ARD) and the closely-related sparse Bayesian learning (SBL) framework are effective tools for pruning large numbers of irrelevant features leading to a sparse explanatory subset. However, popular update rules used for ARD are either difficult to extend to more general problems of interest or are characterized by non-ideal convergence properties. Moreover, it remains unclear exactly how ARD relates to more traditional MAP estimation-based methods for learning sparse representations (e.g., the Lasso). This paper furnishes an alternative means of expressing the ARD cost function using auxiliary functions that naturally addresses both of these issues. First, the proposed reformulation of ARD can naturally be optimized by solving a series of re-weighted l1 problems. The result is an efficient, extensible algorithm that can be implemented using standard convex programming toolboxes and is guaranteed to converge to a local minimum (or saddle point). Secondly, the analysis reveals that ARD is exactly equivalent to performing standard MAP estimation in weight space using a particular feature- and noise-dependent, non-factorial weight prior. We then demonstrate that this implicit prior maintains several desirable advantages over conventional priors with respect to feature selection. Overall these results suggest alternative cost functions and update procedures for selecting features and promoting sparse solutions in a variety of general situations. In particular, the methodology readily extends to handle problems such as non-negative sparse coding and covariance component estimation.

Journal ArticleDOI
TL;DR: A trainable feature extractor based on the LeNet5 convolutional neural network architecture is introduced to solve the first problem in a black box scheme without prior knowledge on the data and the results show that the system can outperform both SVMs and Le net5 while providing performances comparable to the best performance on this database.

Journal ArticleDOI
TL;DR: This paper evaluates six different neural network system architectures for multi-class pattern classification along the dimensions of imbalanced data, large number of pattern classes, large vs. small training data through experiments conducted on well-known benchmark data.

Journal ArticleDOI
TL;DR: Results confirm that the proposed method is applicable to real-time EMG pattern recognition for multifunction myoelectric hand control and produces a better performance for the class separability, plus the LDA-projected features improve the classification accuracy with a short processing time.
Abstract: Electromyographic (EMG) pattern recognition is essential for the control of a multifunction myoelectric hand. The main goal of this study was to develop an efficient feature- projection method for EMG pattern recognition. To this end, a linear supervised feature projection is proposed that utilizes a linear discriminant analysis (LDA). First, a wavelet packet transform (WPT) is performed to extract a feature vector from four-channel EMG signals. To dimensionally reduce and cluster the WPT features, an LDA, then, incorporates class information into the learning procedure, and identifies a linear matrix to maximize the class separability for the projected features. Finally, a multilayer perceptron classifies the LDA-reduced features into nine hand motions. To evaluate the performance of the LDA for WPT features, the LDA is compared with three other feature-projection methods. From a visualization and quantitative comparison, it is shown that the LDA produces a better performance for the class separability, plus the LDA-projected features improve the classification accuracy with a short processing time. A real-time pattern-recognition system is then implemented for a multifunction myoelectric hand. Experiments show that the proposed method achieves a 97.4% recognition accuracy, and all processes, including the generation of control commands for the myoelectric hand, are completed within 97 ms. Consequently, these results confirm that the proposed method is applicable to real-time EMG pattern recognition for multifunction myoelectric hand control.

Journal ArticleDOI
TL;DR: This system aims at applications in the field of human-robot interaction, where it is important to do run-on recognition in real-time to allow for robot egomotion and not to rely on manual initialization.

Proceedings Article
06 Jan 2007
TL;DR: In this article, a feature by itself may have little correlation with the target concept, but when it is combined with some other features, they can be strongly correlated with target concept.
Abstract: Feature interaction presents a challenge to feature selection for classification. A feature by itself may have little correlation with the target concept, but when it is combined with some other features, they can be strongly correlated with the target concept. Unintentional removal of these features can result in poor classification performance. Handling feature interaction can be computationally intractable. Recognizing the presence of feature interaction, we propose to efficiently handle feature interaction to achieve efficient feature selection and present extensive experimental results of evaluation.

Journal ArticleDOI
TL;DR: A hybrid method based on ant colony optimization and artificial neural networks (ANNs) to address feature selection is presented, yielding promising results.
Abstract: One of the significant research problems in multivariate analysis is the selection of a subset of input variables that can predict the desired output with an acceptable level of accuracy. This goal is attained through the elimination of the variables that produce noise or, are strictly correlated with other already selected variables. Feature subset selection (selection of the input variables) is important in correlation analysis and in the field of classification and modeling. This paper presents a hybrid method based on ant colony optimization and artificial neural networks (ANNs) to address feature selection. The proposed hybrid model is demonstrated using data sets from the domain of medical diagnosis, yielding promising results.

Journal ArticleDOI
TL;DR: The proposed methodology to capture facial physiological patterns using the bioheat information contained in thermal imagery has merit and demonstrates the feasibility of the physiological framework in face recognition and open the way for further methodological and experimental research in the area.
Abstract: The current dominant approaches to face recognition rely on facial characteristics that are on or over the skin. Some of these characteristics have low permanency can be altered, and their phenomenology varies significantly with environmental factors (e.g., lighting). Many methodologies have been developed to address these problems to various degrees. However, the current framework of face recognition research has a potential weakness due to its very nature. We present a novel framework for face recognition based on physiological information. The motivation behind this effort is to capitalize on the permanency of innate characteristics that are under the skin. To establish feasibility, we propose a specific methodology to capture facial physiological patterns using the bioheat information contained in thermal imagery. First, the algorithm delineates the human face from the background using the Bayesian framework. Then, it localizes the superficial blood vessel network using image morphology. The extracted vascular network produces contour shapes that are characteristic to each individual. The branching points of the skeletonized vascular network are referred to as thermal minutia points (TMPs) and constitute the feature database. To render the method robust to facial pose variations, we collect for each subject to be stored in the database five different pose images (center, midleft profile, left profile, midright profile, and right profile). During the classification stage, the algorithm first estimates the pose of the test image. Then, it matches the local and global TMP structures extracted from the test image with those of the corresponding pose images in the database. We have conducted experiments on a multipose database of thermal facial images collected in our laboratory, as well as on the time-gap database of the University of Notre Dame. The good experimental results show that the proposed methodology has merit, especially with respect to the problem of low permanence over time. More importantly, the results demonstrate the feasibility of the physiological framework in face recognition and open the way for further methodological and experimental research in the area

Book ChapterDOI
27 Aug 2007
TL;DR: In this paper, a discriminative face representation derived by the Linear Discriminant Analysis (LDA) of multi-scale local binary pattern histograms is proposed for face recognition.
Abstract: A novel discriminative face representation derived by the Linear Discriminant Analysis (LDA) of multi-scale local binary pattern histograms is proposed for face recognition The face image is first partitioned into several non-overlapping regions In each region, multi-scale local binary uniform pattern histograms1 are extracted and concatenated into a regional feature The features are then projected on the LDA space to be used as a discriminative facial descriptor The method is implemented and tested in face identification on the standard Feret database and in face verification on the XM2VTS database with very promising results

Proceedings ArticleDOI
17 Jun 2007
TL;DR: A feature mining paradigm for image classification is proposed and several feature mining strategies are examined to alleviate the burden of manual feature design, which is a key problem in computer vision and machine learning.
Abstract: The efficiency and robustness of a vision system is often largely determined by the quality of the image features available to it. In data mining, one typically works with immense volumes of raw data, which demands effective algorithms to explore the data space. In analogy to data mining, the space of meaningful features for image analysis is also quite vast. Recently, the challenges associated with these problem areas have become more tractable through progress made in machine learning and concerted research effort in manual feature design by domain experts. In this paper, we propose a feature mining paradigm for image classification and examine several feature mining strategies. We also derive a principled approach for dealing with features with varying computational demands. Our goal is to alleviate the burden of manual feature design, which is a key problem in computer vision and machine learning. We include an in-depth empirical study on three typical data sets and offer theoretical explanations for the performance of various feature mining strategies. As a final confirmation of our ideas, we show results of a system, that utilizing feature mining strategies matches or outperforms the best reported results on pedestrian classification (where considerable effort has been devoted to expert feature design).

Proceedings ArticleDOI
17 Jun 2007
TL;DR: An algorithm that learns a similarity measure for comparing never seen objects that is fast to learn, robust due to the redundant information they carry and they have been proved to be very good clusterers is proposed.
Abstract: In this paper we propose and evaluate an algorithm that learns a similarity measure for comparing never seen objects. The measure is learned from pairs of training images labeled "same" or "different". This is far less informative than the commonly used individual image labels (e.g., "car model X"), but it is cheaper to obtain. The proposed algorithm learns the characteristic differences between local descriptors sampled from pairs of "same" and "different" images. These differences are vector quantized by an ensemble of extremely randomized binary trees, and the similarity measure is computed from the quantized differences. The extremely randomized trees are fast to learn, robust due to the redundant information they carry and they have been proved to be very good clusterers. Furthermore, the trees efficiently combine different feature types (SIFT and geometry). We evaluate our innovative similarity measure on four very different datasets and consistently outperform the state-of-the-art competitive approaches.

Journal ArticleDOI
TL;DR: A parametric bootstrap model for more accurate estimation of the prediction error that is tailored to the microarray data by borrowing from the extensive research in identifying differentially expressed genes, especially the local false discovery rate is proposed.
Abstract: Motivation: Logistic regression is a standard method for building prediction models for a binary outcome and has been extended for disease classification with microarray data by many authors. A feature (gene) selection step, however, must be added to penalized logistic modeling due to a large number of genes and a small number of subjects. Model selection for this two-step approach requires new statistical tools because prediction error estimation ignoring the feature selection step can be severely downward biased. Generic methods such as cross-validation and non-parametric bootstrap can be very ineffective due to the big variability in the prediction error estimate. Results: We propose a parametric bootstrap model for more accurate estimation of the prediction error that is tailored to the microarray data by borrowing from the extensive research in identifying differentially expressed genes, especially the local false discovery rate. The proposed method provides guidance on the two critical issues in model selection: the number of genes to include in the model and the optimal shrinkage for the penalized logistic regression. We show that selecting more than 20 genes usually helps little in further reducing the prediction error. Application to Golub's leukemia data and our own cervical cancer data leads to highly accurate prediction models. Availability: R library GeneLogit at http://geocities.com/jg_liao Contact: [email protected]

Proceedings ArticleDOI
27 Aug 2007
TL;DR: This paper reports on classification results for emotional user states (4 classes, German database of children interacting with a pet robot), where six sites computed acoustic and linguistic features independently from each other, following in part different strategies.
Abstract: In this paper, we report on classification results for emotional user states (4 classes, German database of children interacting with a pet robot). Six sites computed acoustic and linguistic features independently from each other, following in part different strategies. A total of 4244 features were pooled together and grouped into 12 low level descriptor types and 6 functional types. For each of these groups, classification results using Support Vector Machines and Random Forests are reported for the full set of features, and for 150 features each with the highest individual Information Gain Ratio. The performance for the different groups varies mostly between ≈ 50% and ≈ 60%. Index Terms: emotional user states, automatic classification, feature types, functionals

01 Jan 2007
TL;DR: If sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical and the differences in performance between different features become insignificant as the feature-space dimension is sufficiently large.
Abstract: In this paper, we examine the role of feature selection in face recognition from the perspective of sparse representation. We cast the recognition problem as finding a sparse representation of the test image features w.r.t. the training set. The sparse representation can be accurately and efficiently computed by `-minimization. The proposed simple algorithm generalizes conventional face recognition classifiers such as nearest neighbors and nearest subspaces. Using face recognition under varying illumination and expression as an example, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficient and whether the sparse representation is correctly found. We conduct extensive experiments to validate the significance of imposing sparsity using the Extended Yale B database and the AR database. Our thorough evaluation shows that, using conventional features such as Eigenfaces and facial parts, the proposed algorithm achieves much higher recognition accuracy on face images with variation in either illumination or expression. Furthermore, other unconventional features such as severely downsampled images and randomly projected features perform almost equally well with the increase of feature dimensions. The differences in performance between different features become insignificant as the feature-space dimension is sufficiently large.

Book ChapterDOI
19 Sep 2007
TL;DR: The multimodal approach gave an improvement of more than 10% with respect to the most successful unimodal system and the fusion performance at the feature level showed better results than the one performed at the decision level.
Abstract: In this paper we present a multimodal approach for the recognition of eight emotions that integrates information from facial expressions, body movement and gestures and speech. We trained and tested a model with a Bayesian classifier, using a multimodal corpus with eight emotions and ten subjects. First individual classifiers were trained for each modality. Then data were fused at the feature level and the decision level. Fusing multimodal data increased very much the recognition rates in comparison with the unimodal systems: the multimodal approach gave an improvement of more than 10% with respect to the most successful unimodal system. Further, the fusion performed at the feature level showed better results than the one performed at the decision level.

Proceedings ArticleDOI
15 Apr 2007
TL;DR: The gammatone features presented here lead to competitive results on the EPPS English task, and considerable improvements were obtained by subsequent combination to a number of standard acoustic features, i.e. MFCC, PLP, MF-PLP, and VTLN plus voicedness.
Abstract: In this work, an acoustic feature set based on a gammatone filterbank is introduced for large vocabulary speech recognition. The gammatone features presented here lead to competitive results on the EPPS English task, and considerable improvements were obtained by subsequent combination to a number of standard acoustic features, i.e. MFCC, PLP, MF-PLP, and VTLN plus voicedness. Best results were obtained when combining gammatone features to all other features using weighted ROVER, resulting in a relative improvement of about 12% in word error rate compared to the best single feature system. We also found that ROVER gives better results for feature combination than both log-linear model combination and LDA.