scispace - formally typeset
Search or ask a question

Showing papers on "Feature (machine learning) published in 2010"


Proceedings Article
21 Jun 2010
TL;DR: A novel 3D CNN model for action recognition that extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames.
Abstract: We consider the fully automated recognition of actions in uncontrolled environment. Most existing work relies on domain knowledge to construct complex handcrafted features from inputs. In addition, the environments are usually assumed to be controlled. Convolutional neural networks (CNNs) are a type of deep models that can act directly on the raw inputs, thus automating the process of feature construction. However, such models are currently limited to handle 2D inputs. In this paper, we develop a novel 3D CNN model for action recognition. This model extracts features from both spatial and temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames. The developed model generates multiple channels of information from the input frames, and the final feature representation is obtained by combining information from all channels. We apply the developed model to recognize human actions in real-world environment, and it achieves superior performance without relying on handcrafted features.

4,087 citations


Book ChapterDOI
05 Sep 2010
TL;DR: This work proposes to use binary strings as an efficient feature point descriptor, which is called BRIEF, and shows that it is highly discriminative even when using relatively few bits and can be computed using simple intensity difference tests.
Abstract: We propose to use binary strings as an efficient feature point descriptor, which we call BRIEF. We show that it is highly discriminative even when using relatively few bits and can be computed using simple intensity difference tests. Furthermore, the descriptor similarity can be evaluated using the Hamming distance, which is very efficient to compute, instead of the L2 norm as is usually done. As a result, BRIEF is very fast both to build and to match. We compare it against SURF and U-SURF on standard benchmarks and show that it yields a similar or better recognition performance, while running in a fraction of the time required by either.

3,558 citations


Book ChapterDOI
05 Sep 2010
TL;DR: This paper introduces a method that adapts object models acquired in a particular visual domain to new imaging conditions by learning a transformation that minimizes the effect of domain-induced changes in the feature distribution.
Abstract: Domain adaptation is an important emerging topic in computer vision. In this paper, we present one of the first studies of domain shift in the context of object recognition. We introduce a method that adapts object models acquired in a particular visual domain to new imaging conditions by learning a transformation that minimizes the effect of domain-induced changes in the feature distribution. The transformation is learned in a supervised manner and can be applied to categories for which there are no labeled examples in the new domain. While we focus our evaluation on object recognition tasks, the transform-based adaptation technique we develop is general and could be applied to nonimage data. Another contribution is a new multi-domain object database, freely available for download. We experimentally demonstrate the ability of our method to improve recognition on categories with few or no target domain labels and moderate to large changes in the imaging conditions.

2,624 citations


Journal ArticleDOI
TL;DR: Almann et al. as discussed by the authors introduced a heuristic for normalizing feature importance measures that can correct the feature importance bias, based on repeated permutations of the outcome vector for estimating the distribution of measured importance for each variable in a non-informative setting.
Abstract: Motivation: In life sciences, interpretability of machine learning models is as important as their prediction accuracy. Linear models are probably the most frequently used methods for assessing feature relevance, despite their relative inflexibility. However, in the past years effective estimators of feature relevance have been derived for highly complex or non-parametric models such as support vector machines and RandomForest (RF) models. Recently, it has been observed that RF models are biased in such a way that categorical variables with a large number of categories are preferred. Results: In this work, we introduce a heuristic for normalizing feature importance measures that can correct the feature importance bias. The method is based on repeated permutations of the outcome vector for estimating the distribution of measured importance for each variable in a non-informative setting. The P-value of the observed importance provides a corrected measure of feature importance. We apply our method to simulated data and demonstrate that (i) non-informative predictors do not receive significant P-values, (ii) informative variables can successfully be recovered among non-informative variables and (iii) P-values computed with permutation importance (PIMP) are very helpful for deciding the significance of variables, and therefore improve model interpretability. Furthermore, PIMP was used to correct RF-based importance measures for two real-world case studies. We propose an improved RF model that uses the significant variables with respect to the PIMP measure and show that its prediction accuracy is superior to that of other existing models. Availability: R code for the method presented in this article is available at http://www.mpi-inf.mpg.de/~altmann/download/PIMP.R Contact:altmann@mpi-inf.mpg.de, laura.tolosi@mpi-inf.mpg.de Supplementary information:Supplementary data are available at Bioinformatics online.

925 citations


Journal ArticleDOI
TL;DR: The aim of this work is to analyze the missing data problem in pattern classification tasks, and to summarize and compare some of the well-known methods used for handling missing values.
Abstract: Pattern classification has been successfully applied in many problem domains, such as biometric recognition, document classification or medical diagnosis. Missing or unknown data are a common drawback that pattern recognition techniques need to deal with when solving real-life classification tasks. Machine learning approaches and methods imported from statistical learning theory have been most intensively studied and used in this subject. The aim of this work is to analyze the missing data problem in pattern classification tasks, and to summarize and compare some of the well-known methods used for handling missing values.

625 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: This work proposes to learn the shapes of space-time feature neighborhoods that are most discriminative for a given action category by extracting local motion and appearance features, quantizing them to a visual vocabulary, and forming candidate neighborhoods that form the most informative configurations.
Abstract: Recent work shows how to use local spatio-temporal features to learn models of realistic human actions from video. However, existing methods typically rely on a predefined spatial binning of the local descriptors to impose spatial information beyond a pure “bag-of-words” model, and thus may fail to capture the most informative space-time relationships. We propose to learn the shapes of space-time feature neighborhoods that are most discriminative for a given action category. Given a set of training videos, our method first extracts local motion and appearance features, quantizes them to a visual vocabulary, and then forms candidate neighborhoods consisting of the words associated with nearby points and their orientation with respect to the central interest point. Rather than dictate a particular scaling of the spatial and temporal dimensions to determine which points are near, we show how to learn the class-specific distance functions that form the most informative configurations. Descriptors for these variable-sized neighborhoods are then recursively mapped to higher-level vocabularies, producing a hierarchy of space-time configurations at successively broader scales. Our approach yields state-of-theart performance on the UCF Sports and KTH datasets.

562 citations


Journal ArticleDOI
TL;DR: It is found that non-causal feature selection methods cannot be interpreted causally even when they achieve excellent predictivity, so only local causal techniques should be used when insight into causal structure is sought.
Abstract: We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classification. The framework (Generalized Local Learning, or GLL) can be instantiated in numerous ways, giving rise to both existing state-of-the-art as well as novel algorithms. The resulting algorithms are sound under well-defined sufficient conditions. In a first set of experiments we evaluate several algorithms derived from this framework in terms of predictivity and feature set parsimony and compare to other local causal discovery methods and to state-of-the-art non-causal feature selection methods using real data. A second set of experimental evaluations compares the algorithms in terms of ability to induce local causal neighborhoods using simulated and resimulated data and examines the relation of predictivity with causal induction performance. Our experiments demonstrate, consistently with causal feature selection theory, that local causal feature selection methods (under broad assumptions encompassing appropriate family of distributions, types of classifiers, and loss functions) exhibit strong feature set parsimony, high predictivity and local causal interpretability. Although non-causal feature selection methods are often used in practice to shed light on causal relationships, we find that they cannot be interpreted causally even when they achieve excellent predictivity. Therefore we conclude that only local causal techniques should be used when insight into causal structure is sought. In a companion paper we examine in depth the behavior of GLL algorithms, provide extensions, and show how local techniques can be used for scalable and accurate global causal graph learning.

521 citations


Journal Article
TL;DR: The method is based on fundamental concepts from coalitional game theory and predictions are explained with contributions of individual feature values and overcome the method's initial exponential time complexity with a sampling-based approximation.
Abstract: We present a general method for explaining individual predictions of classification models. The method is based on fundamental concepts from coalitional game theory and predictions are explained with contributions of individual feature values. We overcome the method's initial exponential time complexity with a sampling-based approximation. In the experimental part of the paper we use the developed method on models generated by several well-known machine learning algorithms on both synthetic and real-world data sets. The results demonstrate that the method is efficient and that the explanations are intuitive and useful.

469 citations


Posted Content
TL;DR: The nearest neighbor (NN) technique is very simple, highly efficient and effective in the field of pattern recognition, text categorization, object recognition etc and structure based techniques reduce the computational complexity.
Abstract: The nearest neighbor (NN) technique is very simple, highly efficient and effective in the field of pattern recognition, text categorization, object recognition etc. Its simplicity is its main advantage, but the disadvantages can't be ignored even. The memory requirement and computation complexity also matter. Many techniques are developed to overcome these limitations. NN techniques are broadly classified into structure less and structure based techniques. In this paper, we present the survey of such techniques. Weighted kNN, Model based kNN, Condensed NN, Reduced NN, Generalized NN are structure less techniques whereas k-d tree, ball tree, Principal Axis Tree, Nearest Feature Line, Tunable NN, Orthogonal Search Tree are structure based algorithms developed on the basis of kNN. The structure less method overcome memory limitation and structure based techniques reduce the computational complexity.

443 citations


Journal ArticleDOI
TL;DR: This paper proposes local Gabor XOR patterns (LGXP), which encodes the Gabor phase by using the local XOR pattern (LXP) operator, and introduces block-based Fisher's linear discriminant (BFLD) to reduce the dimensionality of the proposed descriptor and at the same time enhance its discriminative power.
Abstract: Gabor features have been known to be effective for face recognition. However, only a few approaches utilize phase feature and they usually perform worse than those using magnitude feature. To investigate the potential of Gabor phase and its fusion with magnitude for face recognition, in this paper, we first propose local Gabor XOR patterns (LGXP), which encodes the Gabor phase by using the local XOR pattern (LXP) operator. Then, we introduce block-based Fisher's linear discriminant (BFLD) to reduce the dimensionality of the proposed descriptor and at the same time enhance its discriminative power. Finally, by using BFLD, we fuse local patterns of Gabor magnitude and phase for face recognition. We evaluate our approach on FERET and FRGC 2.0 databases. In particular, we perform comparative experimental studies of different local Gabor patterns. We also make a detailed comparison of their combinations with BFLD, as well as the fusion of different descriptors by using BFLD. Extensive experimental results verify the effectiveness of our LGXP descriptor and also show that our fusion approach outperforms most of the state-of-the-art approaches.

390 citations


Book
03 Mar 2010
TL;DR: An accompanying manual to Theodoridis/Koutroumbas, Pattern Recognition, that includes Matlab code of the most common methods and algorithms in the book, together with a descriptive summary and solved examples, and including real-life data sets in imaging and audio recognition.
Abstract: An accompanying manual to Theodoridis/Koutroumbas, Pattern Recognition, that includes Matlab code of the most common methods and algorithms in the book, together with a descriptive summary and solved examples, and including real-life data sets in imaging and audio recognition. *Matlab code and descriptive summary of the most common methods and algorithms in Theodoridis/Koutroumbas, Pattern Recognition 4e.*Solved examples in Matlab, including real-life data sets in imaging and audio recognition*Available separately or at a special package price with the main text (ISBN for package: 978-0-12-374491-3

Proceedings ArticleDOI
25 Oct 2010
TL;DR: This paper shows how to deploy a unique local Reference Frame to improve the accuracy and reduce the memory footprint of the well-known 3D Shape Context descriptor.
Abstract: The use of robust feature descriptors is now key for many 3D tasks such as 3D object recognition and surface alignment. Many descriptors have been proposed in literature which are based on a non-unique local Reference Frame and hence require the computation of multiple descriptions at each feature points. In this paper we show how to deploy a unique local Reference Frame to improve the accuracy and reduce the memory footprint of the well-known 3D Shape Context descriptor. We validate our proposal by means of an experimental analysis carried out on a large dataset of 3D scenes and addressing an object recognition scenario.

Book ChapterDOI
05 Sep 2010
TL;DR: This paper proposes an approach for human action recognition that integrates multiple feature channels from several entities such as objects, scenes and people, and forms the problem in a multiple instance learning (MIL) framework, based on several feature channels.
Abstract: In many cases, human actions can be identified not only by the singular observation of the human body in motion, but also properties of the surrounding scene and the related objects In this paper, we look into this problem and propose an approach for human action recognition that integrates multiple feature channels from several entities such as objects, scenes and people We formulate the problem in a multiple instance learning (MIL) framework, based on multiple feature channels By using a discriminative approach, we join multiple feature channels embedded to the MIL space Our experiments over the large YouTube dataset show that scene and object information can be used to complement person features for human action recognition

Proceedings ArticleDOI
22 Feb 2010
TL;DR: A novel local feature descriptor, the Local Directional Pattern (LDP), for recognizing human face, which is obtained by computing the edge response values in all eight directions at each pixel position and generating a code from the relative strength magnitude.
Abstract: This paper presents a novel local feature descriptor, the Local Directional Pattern (LDP), for recognizing human face. A LDP feature is obtained by computing the edge response values in all eight directions at each pixel position and generating a code from the relative strength magnitude. Each face is represented as a collection of LDP codes for the recognition process.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: It is demonstrated that Hough-voting can achieve state-of-the-art performance on several datasets covering a wide range of action-recognition scenarios.
Abstract: We present a method to classify and localize human actions in video using a Hough transform voting framework. Random trees are trained to learn a mapping between densely-sampled feature patches and their corresponding votes in a spatio-temporal-action Hough space. The leaves of the trees form a discriminative multi-class codebook that share features between the action classes and vote for action centers in a probabilistic manner. Using low-level features such as gradients and optical flow, we demonstrate that Hough-voting can achieve state-of-the-art performance on several datasets covering a wide range of action-recognition scenarios.

Journal ArticleDOI
TL;DR: A novel method based on logistic regression using a combination of L1 and L2 norm regularization that more accurately estimates discriminative brain regions across multiple conditions or groups is described, which achieves the twin objectives of identifying relevant discriminatives brain regions and accurately classifying fMRI data.

Journal ArticleDOI
TL;DR: Because binding proteins selectively can be a key feature of high-value probes and drugs, synthesizing compounds having features identified in this study may result in improved performance of screening collections.
Abstract: Using a diverse collection of small molecules generated from a variety of sources, we measured protein-binding activities of each individual compound against each of 100 diverse (sequence-unrelated) proteins using small-molecule microarrays. We also analyzed structural features, including complexity, of the small molecules. We found that compounds from different sources (commercial, academic, natural) have different protein-binding behaviors and that these behaviors correlate with general trends in stereochemical and shape descriptors for these compound collections. Increasing the content of sp3-hybridized and stereogenic atoms relative to compounds from commercial sources, which comprise the majority of current screening collections, improved binding selectivity and frequency. The results suggest structural features that synthetic chemists can target when synthesizing screening collections for biological discovery. Because binding proteins selectively can be a key feature of high-value probes and drugs, synthesizing compounds having features identified in this study may result in improved performance of screening collections.

Journal ArticleDOI
TL;DR: This paper gives an overview of major technological perspective and appreciation of the fundamental progress of speech recognition and also gives overview technique developed in each stage ofspeech recognition.
Abstract: The Speech is most prominent & primary mode of Communication among of human being. The communication among human computer interaction is called human computer interface. Speech has potential of being important mode of interaction with computer .This paper gives an overview of major technological perspective and appreciation of the fundamental progress of speech recognition and also gives overview technique developed in each stage of speech recognition. This paper helps in choosing the technique along with their relative merits & demerits. A comparative study of different technique is done as per stages. This paper is concludes with the decision on feature direction for developing technique in human computer interface system using Marathi Language.

Proceedings Article
02 Jun 2010
TL;DR: This work shows how features can easily be added to standard generative models for unsupervised learning, without requiring complex new training methods, and applies this technique to part-of-speech induction, grammar induction, word alignment, and word segmentation.
Abstract: We show how features can easily be added to standard generative models for unsupervised learning, without requiring complex new training methods. In particular, each component multinomial of a generative model can be turned into a miniature logistic regression model if feature locality permits. The intuitive EM algorithm still applies, but with a gradient-based M-step familiar from discriminative training of logistic regression models. We apply this technique to part-of-speech induction, grammar induction, word alignment, and word segmentation, incorporating a few linguistically-motivated features into the standard generative model for each task. These feature-enhanced models each outperform their basic counterparts by a substantial margin, and even compete with and surpass more complex state-of-the-art models.

Book ChapterDOI
22 Aug 2010
TL;DR: Bayesian decision making (BDM) results in the highest correct classification rate with relatively small computational cost, and principal component analysis (PCA) and sequential forward feature selection (SFFS) methods are employed for feature reduction.
Abstract: This paper provides a comparative study on the different techniques of classifying human activities that are performed using bodyworn miniature inertial and magnetic sensors. The classification techniques implemented and compared in this study are: Bayesian decision making (BDM), the least-squares method (LSM), the k-nearest neighbor algorithm (k-NN), dynamic time warping (DTW), support vector machines (SVM), and artificial neural networks (ANN). Daily and sports activities are classified using five sensor units worn by eight subjects on the chest, the arms, and the legs. Each sensor unit comprises a triaxial gyroscope, a triaxial accelerometer, and a triaxial magnetometer. Principal component analysis (PCA) and sequential forward feature selection (SFFS) methods are employed for feature reduction. For a small number of features, SFFS demonstrates better performance and should be preferable especially in real-time applications. The classifiers are validated using different cross-validation techniques. Among the different classifiers we have considered, BDM results in the highest correct classification rate with relatively small computational cost.

Proceedings Article
23 Aug 2010
TL;DR: It is found that features based on in-domain language models have the highest predictive power and Entity-density and POS-features, in particular nouns, are individually very useful but highly correlated.
Abstract: Several sets of explanatory variables - including shallow, language modeling, POS, syntactic, and discourse features - are compared and evaluated in terms of their impact on predicting the grade level of reading material for primary school students. We find that features based on in-domain language models have the highest predictive power. Entity-density (a discourse feature) and POS-features, in particular nouns, are individually very useful but highly correlated. Average sentence length (a shallow feature) is more useful - and less expensive to compute - than individual syntactic features. A judicious combination of features examined here results in a significant improvement over the state of the art.

Proceedings Article
01 Dec 2010
TL;DR: It is shown that pre-training can initialize weights to a point in the space where fine-tuning can be effective and thus is crucial in training deep structured models and in the recognition performance of a CD-DBN-HMM based large-vocabulary speech recognizer.
Abstract: Recently, deep learning techniques have been successfully applied to automatic speech recognition tasks -first to phonetic recognition with context-independent deep belief network (DBN) hidden Markov models (HMMs) and later to large vocabulary continuous speech recognition using context-dependent (CD) DBN-HMMs. In this paper, we report our most recent experiments designed to understand the roles of the two main phases of the DBN learning -pre-training and fine tuning -in the recognition performance of a CD-DBN-HMM based large-vocabulary speech recognizer. As expected, we show that pre-training can initialize weights to a point in the space where fine-tuning can be effective and thus is crucial in training deep structured models. However, a moderate increase of the amount of unlabeled pre-training data has an insignificant effect on the final recognition results as long as the original training size is sufficiently large to initialize the DBN weights. On the other hand, with additional labeled training data, the fine-tuning phase of DBN training can significantly improve the recognition accuracy.

Journal ArticleDOI
TL;DR: This paper proposes a convex energy-based framework to jointly perform feature selection and SVM parameter learning for linear and non-linear kernels.

Journal ArticleDOI
TL;DR: The multimodal approach increased the recognition rate by more than 10% when compared to the most successful unimodal system, and the best pairing is ‘gesture-speech’.
Abstract: In this paper a study on multimodal automatic emotion recognition during a speech-based interaction is presented. A database was constructed consisting of people pronouncing a sentence in a scenario where they interacted with an agent using speech. Ten people pronounced a sentence corresponding to a command while making 8 different emotional expressions. Gender was equally represented, with speakers of several different native languages including French, German, Greek and Italian. Facial expression, gesture and acoustic analysis of speech were used to extract features relevant to emotion. For the automatic classification of unimodal data, bimodal data and multimodal data, a system based on a Bayesian classifier was used. After performing an automatic classification of each modality, the different modalities were combined using a multimodal approach. Fusion of the modalities at the feature level (before running the classifier) and at the results level (combining results from classifier from each modality) were compared. Fusing the multimodal data resulted in a large increase in the recognition rates in comparison to the unimodal systems: the multimodal approach increased the recognition rate by more than 10% when compared to the most successful unimodal system. Bimodal emotion recognition based on all combinations of the modalities (i.e., ‘face-gesture’, ‘face-speech’ and ‘gesture-speech’) was also investigated. The results show that the best pairing is ‘gesture-speech’. Using all three modalities resulted in a 3.3% classification improvement over the best bimodal results.

Proceedings ArticleDOI
01 Dec 2010
TL;DR: A comprehensive survey of recent developments on gait recognition approaches, focusing on three major issues involved in a general gait Recognition system, namely gait image representation, feature dimensionality reduction and gait classification.
Abstract: Human identification by gait has created a great deal of interest in computer vision community due to its advantage of inconspicuous recognition at a relatively far distance. This paper provides a comprehensive survey of recent developments on gait recognition approaches. The survey emphasizes on three major issues involved in a general gait recognition system, namely gait image representation, feature dimensionality reduction and gait classification. Also, a review of the available public gait datasets is presented. The concluding discussions outline a number of research challenges and provide promising future directions for the field.

Journal ArticleDOI
TL;DR: The combination of input features selection and parameters optimization of machine learning methods improves the accuracy of software development effort.
Abstract: Context: In software industry, project managers usually rely on their previous experience to estimate the number men/hours required for each software project. The accuracy of such estimates is a key factor for the efficient application of human resources. Machine learning techniques such as radial basis function (RBF) neural networks, multi-layer perceptron (MLP) neural networks, support vector regression (SVR), bagging predictors and regression-based trees have recently been applied for estimating software development effort. Some works have demonstrated that the level of accuracy in software effort estimates strongly depends on the values of the parameters of these methods. In addition, it has been shown that the selection of the input features may also have an important influence on estimation accuracy. Objective: This paper proposes and investigates the use of a genetic algorithm method for simultaneously (1) select an optimal input feature subset and (2) optimize the parameters of machine learning methods, aiming at a higher accuracy level for the software effort estimates. Method: Simulations are carried out using six benchmark data sets of software projects, namely, Desharnais, NASA, COCOMO, Albrecht, Kemerer and Koten and Gray. The results are compared to those obtained by methods proposed in the literature using neural networks, support vector machines, multiple additive regression trees, bagging, and Bayesian statistical models. Results: In all data sets, the simulations have shown that the proposed GA-based method was able to improve the performance of the machine learning methods. The simulations have also demonstrated that the proposed method outperforms some recent methods reported in the recent literature for software effort estimation. Furthermore, the use of GA for feature selection considerably reduced the number of input features for five of the data sets used in our analysis. Conclusions: The combination of input features selection and parameters optimization of machine learning methods improves the accuracy of software development effort. In addition, this reduces model complexity, which may help understanding the relevance of each input feature. Therefore, some input parameters can be ignored without loss of accuracy in the estimations.

Journal ArticleDOI
TL;DR: An evaluation of using various methods for face recognition using wavelet decomposition and Eigenfaces method which is based on Principal Component Analysis (PCA) and Wavelet-SVM approach for classification step.
Abstract: In this study, we present an evaluation of using various methods for face recognition. As feature extracting techniques we benefit from wavelet decomposition and Eigenfaces method which is based on Principal Component Analysis (PCA). After generating feature vectors, distance classifier and Support Vector Machines (SVMs) are used for classification step. We examined the classification accuracy according to increasing dimension of training set, chosen feature extractor-classifier pairs and chosen kernel function for SVM classifier. As test set we used ORL face database which is known as a standard face database for face recognition applications including 400 images of 40 people. At the end of the overall separation task, we obtained the classification accuracy 98.1% with Wavelet-SVM approach for 240 image training set.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A novel solution to create a View Transformation Model (VTM) from the different point of view using Support Vector Regression (SVR) to achieve view-independent gait recognition and a new method to seek local Region of Interest (ROI) under one viewing angle for predicting the corresponding motion information under another viewing angle.
Abstract: Gait is a well recognized biometric feature that is used to identify a human at a distance. However, in real environment, appearance changes of individuals due to viewing angle changes cause many difficulties for gait recognition. This paper re-formulates this problem as a regression problem. A novel solution is proposed to create a View Transformation Model (VTM) from the different point of view using Support Vector Regression (SVR). To facilitate the process of regression, a new method is proposed to seek local Region of Interest (ROI) under one viewing angle for predicting the corresponding motion information under another viewing angle. Thus, the well constructed VTM is able to transfer gait information under one viewing angle into another viewing angle. This proposal can achieve view-independent gait recognition. It normalizes gait features under various viewing angles into a common viewing angle before similarity measurement is carried out. The extensive experimental results based on widely adopted benchmark dataset demonstrate that the proposed algorithm can achieve significantly better performance than the existing methods in literature.

Book ChapterDOI
22 Nov 2010
TL;DR: The proposed method combines Synthetic Minority Over-sampling Technique (SMOTE) and Complementary Neural Network (CMTNN) to handle the problem of classifying imbalanced data and shows that the proposed combination techniques can improve the performance for the class imbalance problem.
Abstract: In classification, when the distribution of the training data among classes is uneven, the learning algorithm is generally dominated by the feature of the majority classes. The features in the minority classes are normally difficult to be fully recognized. In this paper, a method is proposed to enhance the classification accuracy for the minority classes. The proposed method combines Synthetic Minority Over-sampling Technique (SMOTE) and Complementary Neural Network (CMTNN) to handle the problem of classifying imbalanced data. In order to demonstrate that the proposed technique can assist classification of imbalanced data, several classification algorithms have been used. They are Artificial Neural Network (ANN), k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM). The benchmark data sets with various ratios between the minority class and the majority class are obtained from the University of California Irvine (UCI) machine learning repository. The results show that the proposed combination techniques can improve the performance for the class imbalance problem.

Journal ArticleDOI
TL;DR: A novel technique for incremental recognition of the user's emotional state as it is applied in a sensitive artificial listener (SAL) system designed for socially competent human-machine communication.
Abstract: The automatic estimation of human affect from the speech signal is an important step towards making virtual agents more natural and human-like. In this paper, we present a novel technique for incremental recognition of the user's emotional state as it is applied in a sensitive artificial listener (SAL) system designed for socially competent human-machine communication. Our method is capable of using acoustic, linguistic, as well as long-range contextual information in order to continuously predict the current quadrant in a two-dimensional emotional space spanned by the dimensions valence and activation. The main system components are a hierarchical dynamic Bayesian network (DBN) for detecting linguistic keyword features and long short-term memory (LSTM) recurrent neural networks which model phoneme context and emotional history to predict the affective state of the user. Experimental evaluations on the SAL corpus of non-prototypical real-life emotional speech data consider a number of variants of our recognition framework: continuous emotion estimation from low-level feature frames is evaluated as a new alternative to the common approach of computing statistical functionals of given speech turns. Further performance gains are achieved by discriminatively training LSTM networks and by using bidirectional context information, leading to a quadrant prediction F1-measure of up to 51.3 %, which is only 7.6 % below the average inter-labeler consistency.