scispace - formally typeset
Search or ask a question

Showing papers on "Feature (machine learning) published in 2011"


Journal ArticleDOI
TL;DR: This work proposes a novel dimensionality reduction framework for reducing the distance between domains in a latent space for domain adaptation and proposes both unsupervised and semisupervised feature extraction approaches, which can dramatically reduce thedistance between domain distributions by projecting data onto the learned transfer components.
Abstract: Domain adaptation allows knowledge from a source domain to be transferred to a different but related target domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we first propose to find such a representation through a new learning method, transfer component analysis (TCA), for domain adaptation. TCA tries to learn some transfer components across domains in a reproducing kernel Hilbert space using maximum mean miscrepancy. In the subspace spanned by these transfer components, data properties are preserved and data distributions in different domains are close to each other. As a result, with the new representations in this subspace, we can apply standard machine learning methods to train classifiers or regression models in the source domain for use in the target domain. Furthermore, in order to uncover the knowledge hidden in the relations between the data labels from the source and target domains, we extend TCA in a semisupervised learning setting, which encodes label information into transfer components learning. We call this extension semisupervised TCA. The main contribution of our work is that we propose a novel dimensionality reduction framework for reducing the distance between domains in a latent space for domain adaptation. We propose both unsupervised and semisupervised feature extraction approaches, which can dramatically reduce the distance between domain distributions by projecting data onto the learned transfer components. Finally, our approach can handle large datasets and naturally lead to out-of-sample generalization. The effectiveness and efficiency of our approach are verified by experiments on five toy datasets and two real-world applications: cross-domain indoor WiFi localization and cross-domain text classification.

3,195 citations


Proceedings Article
28 Jun 2011
TL;DR: A deep learning approach is proposed which learns to extract a meaningful representation for each review in an unsupervised fashion and clearly outperform state-of-the-art methods on a benchmark composed of reviews of 4 types of Amazon products.
Abstract: The exponential increase in the availability of online reviews and recommendations makes sentiment classification an interesting topic in academic and industrial research. Reviews can span so many different domains that it is difficult to gather annotated training data for all of them. Hence, this paper studies the problem of domain adaptation for sentiment classifiers, hereby a system is trained on labeled reviews from one source domain but is meant to be deployed on another. We propose a deep learning approach which learns to extract a meaningful representation for each review in an unsupervised fashion. Sentiment classifiers trained with this high-level feature representation clearly outperform state-of-the-art methods on a benchmark composed of reviews of 4 types of Amazon products. Furthermore, this method scales well and allowed us to successfully perform domain adaptation on a larger industrial-strength dataset of 22 domains.

1,769 citations


Proceedings ArticleDOI
03 Oct 2011
TL;DR: The “German Traffic Sign Recognition Benchmark” is a multi-category classification competition held at IJCNN 2011, and a comprehensive, lifelike dataset of more than 50,000 traffic sign images has been collected.
Abstract: The “German Traffic Sign Recognition Benchmark” is a multi-category classification competition held at IJCNN 2011. Automatic recognition of traffic signs is required in advanced driver assistance systems and constitutes a challenging real-world computer vision and pattern recognition problem. A comprehensive, lifelike dataset of more than 50,000 traffic sign images has been collected. It reflects the strong variations in visual appearance of signs due to distance, illumination, weather conditions, partial occlusions, and rotations. The images are complemented by several precomputed feature sets to allow for applying machine learning algorithms without background knowledge in image processing. The dataset comprises 43 classes with unbalanced class frequencies. Participants have to classify two test sets of more than 12,500 images each. Here, the results on the first of these sets, which was used in the first evaluation stage of the two-fold challenge, are reported. The methods employed by the participants who achieved the best results are briefly described and compared to human traffic sign recognition performance and baseline results.

902 citations


Proceedings ArticleDOI
20 Jun 2011
TL;DR: A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding is presented, which learns a single over-complete dictionary and an optimal linear classifier jointly and yields dictionaries so that feature points with the same class labels have similar sparse codes.
Abstract: A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse codes during the dictionary learning process. More specifically, we introduce a new label consistent constraint called ‘discriminative sparse-code error’ and combine it with the reconstruction error and the classification error to form a unified objective function. The optimal solution is efficiently obtained using the K-SVD algorithm. Our algorithm learns a single over-complete dictionary and an optimal linear classifier jointly. It yields dictionaries so that feature points with the same class labels have similar sparse codes. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse coding techniques for face and object category recognition under the same learning conditions.

780 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: The new recognition methodology named dynamic bag-of-words is developed, which considers sequential nature of human activities while maintaining advantages of the bag- of-words to handle noisy observations, and reliably recognizes ongoing activities from streaming videos with a high accuracy.
Abstract: In this paper, we present a novel approach of human activity prediction. Human activity prediction is a probabilistic process of inferring ongoing activities from videos only containing onsets (i.e. the beginning part) of the activities. The goal is to enable early recognition of unfinished activities as opposed to the after-the-fact classification of completed activities. Activity prediction methodologies are particularly necessary for surveillance systems which are required to prevent crimes and dangerous activities from occurring. We probabilistically formulate the activity prediction problem, and introduce new methodologies designed for the prediction. We represent an activity as an integral histogram of spatio-temporal features, efficiently modeling how feature distributions change over time. The new recognition methodology named dynamic bag-of-words is developed, which considers sequential nature of human activities while maintaining advantages of the bag-of-words to handle noisy observations. Our experiments confirm that our approach reliably recognizes ongoing activities from streaming videos with a high accuracy.

617 citations


Book ChapterDOI
17 Mar 2011
TL;DR: This article surveys some representative link prediction methods by categorizing them by the type of models, largely considering three types of models: first, the traditional (non-Bayesian) models which extract a set of features to train a binary classification model, and second, the probabilistic approaches which model the joint-probability among the entities in a network by Bayesian graphical models.
Abstract: Link prediction is an important task for analying social networks which also has applications in other domains like, information retrieval, bioinformatics and e-commerce There exist a variety of techniques for link prediction, ranging from feature-based classification and kernel-based method to matrix factorization and probabilistic graphical models These methods differ from each other with respect to model complexity, prediction performance, scalability, and generalization ability In this article, we survey some representative link prediction methods by categorizing them by the type of the models We largely consider three types of models: first, the traditional (non-Bayesian) models which extract a set of features to train a binary classification model Second, the probabilistic approaches which model the joint-probability among the entities in a network by Bayesian graphical models And, finally the linear algebraic approach which computes the similarity between the nodes in a network by rank-reduced similarity matrices We discuss various existing link prediction models that fall in these broad categories and analyze their strength and weakness We conclude the survey with a discussion on recent developments and future research direction

566 citations


Journal ArticleDOI
TL;DR: This paper investigates a simple but powerful approach to make robust use of HOG features for face recognition by proposing to extract HOG descriptors from a regular grid and identifying the necessity of performing dimensionality reduction to remove noise and make the classification process less prone to overfitting.

553 citations


Journal ArticleDOI
TL;DR: This paper makes a comparative study of the effectiveness of ensemble technique for sentiment classification, with the aim of efficiently integrating different feature sets and classification algorithms to synthesize a more accurate classification procedure.

543 citations


Proceedings ArticleDOI
18 Sep 2011
TL;DR: This paper applies large-scale algorithms for learning the features automatically from unlabeled data to construct highly effective classifiers for both detection and recognition to be used in a high accuracy end-to-end system.
Abstract: Reading text from photographs is a challenging problem that has received a significant amount of attention. Two key components of most systems are (i) text detection from images and (ii) character recognition, and many recent methods have been proposed to design better feature representations and models for both. In this paper, we apply methods recently developed in machine learning -- specifically, large-scale algorithms for learning the features automatically from unlabeled data -- and show that they allow us to construct highly effective classifiers for both detection and recognition to be used in a high accuracy end-to-end system.

402 citations


Proceedings Article
12 Dec 2011
TL;DR: An algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident, and is named CODA (Co-training for domain adaptation).
Abstract: Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident. Our algorithm is a variant of co-training [7], and we name it CODA (Co-training for domain adaptation). Unlike the original co-training work, we do not assume a particular feature split. Instead, for each iteration of co-training, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. CODA significantly out-performs the state-of-the-art on the 12-domain benchmark data set of Blitzer et al. [4]. Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance.

402 citations


Proceedings ArticleDOI
06 Nov 2011
TL;DR: This work proposes a method for recognizing attributes, such as the gender, hair style and types of clothes of people under large variation in viewpoint, pose, articulation and occlusion typical of personal photo album images, using a part-based approach based on poselets.
Abstract: We propose a method for recognizing attributes, such as the gender, hair style and types of clothes of people under large variation in viewpoint, pose, articulation and occlusion typical of personal photo album images. Robust attribute classifiers under such conditions must be invariant to pose, but inferring the pose in itself is a challenging problem. We use a part-based approach based on poselets. Our parts implicitly decompose the aspect (the pose and viewpoint). We train attribute classifiers for each such aspect and we combine them together in a discriminative model. We propose a new dataset of 8000 people with annotated attributes. Our method performs very well on this dataset, significantly outperforming a baseline built on the spatial pyramid match kernel method. On gender recognition we outperform a commercial face recognition system.

Journal ArticleDOI
TL;DR: The results show that the hybrid system performed substantially better than source separation or missing data mask estimation at lower signal-to-noise ratios (SNRs), achieving up to 57.1% accuracy at SNR = -5 dB.
Abstract: This paper proposes to use exemplar-based sparse representations for noise robust automatic speech recognition. First, we describe how speech can be modeled as a linear combination of a small number of exemplars from a large speech exemplar dictionary. The exemplars are time-frequency patches of real speech, each spanning multiple time frames. We then propose to model speech corrupted by additive noise as a linear combination of noise and speech exemplars, and we derive an algorithm for recovering this sparse linear combination of exemplars from the observed noisy speech. We describe how the framework can be used for doing hybrid exemplar-based/HMM recognition by using the exemplar-activations together with the phonetic information associated with the exemplars. As an alternative to hybrid recognition, the framework also allows us to take a source separation approach which enables exemplar-based feature enhancement as well as missing data mask estimation. We evaluate the performance of these exemplar-based methods in connected digit recognition on the AURORA-2 database. Our results show that the hybrid system performed substantially better than source separation or missing data mask estimation at lower signal-to-noise ratios (SNRs), achieving up to 57.1% accuracy at SNR = -5 dB. Although not as effective as two baseline recognizers at higher SNRs, the novel approach offers a promising direction of future research on exemplar-based ASR.

Proceedings ArticleDOI
03 Oct 2011
TL;DR: This work describes the approach that won the preliminary phase of the German traffic sign recognition benchmark with a better-than-human recognition rate, and obtains an even better recognition rate by further training the nets.
Abstract: We describe the approach that won the preliminary phase of the German traffic sign recognition benchmark with a better-than-human recognition rate of 98.98%.We obtain an even better recognition rate of 99.15% by further training the nets. Our fast, fully parameterizable GPU implementation of a Convolutional Neural Network does not require careful design of pre-wired feature extractors, which are rather learned in a supervised way. A CNN/MLP committee further boosts recognition performance.

Proceedings Article
28 Jun 2011
TL;DR: This paper forms the problem of multi-task learning of shared feature representations among tasks, while simultaneously determining "with whom" each task should share as a mixed integer programming and provides an alternating minimization technique to solve the optimization problem of jointly identifying grouping structures and parameters.
Abstract: In multi-task learning (MTL), multiple tasks are learnt jointly. A major assumption for this paradigm is that all those tasks are indeed related so that the joint training is appropriate and beneficial. In this paper, we study the problem of multi-task learning of shared feature representations among tasks, while simultaneously determining "with whom" each task should share. We formulate the problem as a mixed integer programming and provide an alternating minimization technique to solve the optimization problem of jointly identifying grouping structures and parameters. The algorithm mono-tonically decreases the objective function and converges to a local optimum. Compared to the standard MTL paradigm where all tasks are in a single group, our algorithm improves its performance with statistical significance for three out of the four datasets we have studied. We also demonstrate its advantage over other task grouping techniques investigated in literature.

Proceedings ArticleDOI
20 Jun 2011
TL;DR: A new face descriptor based on coupled information-theoretic encoding is used to capture discriminative local face structures and to effectively match photos and sketches by reducing the modality gap at the feature extraction stage.
Abstract: Automatic face photo-sketch recognition has important applications for law enforcement. Recent research has focused on transforming photos and sketches into the same modality for matching or developing advanced classification algorithms to reduce the modality gap between features extracted from photos and sketches. In this paper, we propose a new inter-modality face recognition approach by reducing the modality gap at the feature extraction stage. A new face descriptor based on coupled information-theoretic encoding is used to capture discriminative local face structures and to effectively match photos and sketches. Guided by maximizing the mutual information between photos and sketches in the quantized feature spaces, the coupled encoding is achieved by the proposed coupled information-theoretic projection tree, which is extended to the randomized forest to further boost the performance. We create the largest face sketch database including sketches of 1, 194 people from the FERET database. Experiments on this large scale dataset show that our approach significantly outperforms the state-of-the-art methods.

Proceedings ArticleDOI
01 Nov 2011
TL;DR: A home-monitoring oriented human activity recognition benchmark database, based on the combination of a color video camera and a depth sensor, and two multi-modality fusion schemes, which naturally combine color and depth information.
Abstract: In this paper, we present a home-monitoring oriented human activity recognition benchmark database, based on the combination of a color video camera and a depth sensor. Our contributions are two-fold: 1) We have created a publicly releasable human activity video database (i.e., named as RGBD-HuDaAct), which contains synchronized color-depth video streams, for the task of human daily activity recognition. This database aims at encouraging more research efforts on human activity recognition based on multi-modality sensor combination (e.g., color plus depth). 2) Two multi-modality fusion schemes, which naturally combine color and depth information, have been developed from two state-of-the-art feature representation methods for action recognition, i.e., spatio-temporal interest points (STIPs) and motion history images (MHIs). These depth-extended feature representation methods are evaluated comprehensively and superior recognition performances over their uni-modality (e.g., color only) counterparts are demonstrated.

Journal ArticleDOI
TL;DR: A new human face recognition algorithm based on bidirectional two dimensional principal component analysis (B2DPCA) and extreme learning machine (ELM) and a subband that exhibits a maximum standard deviation is dimensionally reduced using an improved dimensionality reduction technique.

Journal ArticleDOI
TL;DR: Curvature information is incorporated in two subsampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration.
Abstract: This paper describes how to incorporate sampled curvature information in a Newton-CG method and in a limited memory quasi-Newton method for statistical learning. The motivation for this work stems from supervised machine learning applications involving a very large number of training points. We follow a batch approach, also known in the stochastic optimization literature as a sample average approximation approach. Curvature information is incorporated in two subsampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration. A crucial feature of our technique is that Hessian-vector multiplications are carried out with a significantly smaller sample size than is used for the function and gradient. The efficiency of the proposed methods is illustrated using a machine learning application involving speech recognition.

Journal ArticleDOI
TL;DR: The present study investigates the neural code of facial identity perception with the aim of ascertaining its distributed nature and informational basis, and uses a sequence of multivariate pattern analyses applied to functional magnetic resonance imaging (fMRI) data to map out and characterize a cortical system responsible for individuation.
Abstract: Face individuation is one of the most impressive achievements of our visual system, and yet uncovering the neural mechanisms subserving this feat appears to elude traditional approaches to functional brain data analysis. The present study investigates the neural code of facial identity perception with the aim of ascertaining its distributed nature and informational basis. To this end, we use a sequence of multivariate pattern analyses applied to functional magnetic resonance imaging (fMRI) data. First, we combine information-based brain mapping and dynamic discrimination analysis to locate spatiotemporal patterns that support face classification at the individual level. This analysis reveals a network of fusiform and anterior temporal areas that carry information about facial identity and provides evidence that the fusiform face area responds with distinct patterns of activation to different face identities. Second, we assess the information structure of the network using recursive feature elimination. We find that diagnostic information is distributed evenly among anterior regions of the mapped network and that a right anterior region of the fusiform gyrus plays a central role within the information network mediating face individuation. These findings serve to map out and characterize a cortical system responsible for individuation. More generally, in the context of functionally defined networks, they provide an account of distributed processing grounded in information-based architectures.

Proceedings ArticleDOI
01 Dec 2011
TL;DR: This paper explores the performance of DBNs in a state-of-the-art LVCSR system, showing improvements over Multi-Layer Perceptrons (MLPs) and GMM/HMMs across a variety of features on an English Broadcast News task.
Abstract: To date, there has been limited work in applying Deep Belief Networks (DBNs) for acoustic modeling in LVCSR tasks, with past work using standard speech features. However, a typical LVCSR system makes use of both feature and model-space speaker adaptation and discriminative training. This paper explores the performance of DBNs in a state-of-the-art LVCSR system, showing improvements over Multi-Layer Perceptrons (MLPs) and GMM/HMMs across a variety of features on an English Broadcast News task. In addition, we provide a recipe for data parallelization of DBN training, showing that data parallelization can provide linear speed-up in the number of machines, without impacting WER.

Proceedings ArticleDOI
09 Feb 2011
TL;DR: This paper models the sentiment analysis of product reviews problem as a semi-supervised learning problem, and proposes a method to automatically identify some labeled examples that outperforms existing state-of-the-art methods.
Abstract: In sentiment analysis of product reviews, one important problem is to produce a summary of opinions based on product features/attributes (also called aspects). However, for the same feature, people can express it with many different words or phrases. To produce a useful summary, these words and phrases, which are domain synonyms, need to be grouped under the same feature group. Although several methods have been proposed to extract product features from reviews, limited work has been done on clustering or grouping of synonym features. This paper focuses on this task. Classic methods for solving this problem are based on unsupervised learning using some forms of distributional similarity. However, we found that these methods do not do well. We then model it as a semi-supervised learning problem. Lexical characteristics of the problem are exploited to automatically identify some labeled examples. Empirical evaluation shows that the proposed method outperforms existing state-of-the-art methods by a large margin.

Book ChapterDOI
05 Sep 2011
TL;DR: This work proposes to employ "oblique" random forests (oRF) built from multivariate trees which explicitly learn optimal split directions at internal nodes using linear discriminative models, rather than using random coefficients as the original oRF.
Abstract: In his original paper on random forests, Breiman proposed two different decision tree ensembles: one generated from "orthogonal" trees with thresholds on individual features in every split, and one from "oblique" trees separating the feature space by randomly oriented hyperplanes. In spite of a rising interest in the random forest framework, however, ensembles built from orthogonal trees (RF) have gained most, if not all, attention so far. In the present work we propose to employ "oblique" random forests (oRF) built from multivariate trees which explicitly learn optimal split directions at internal nodes using linear discriminative models, rather than using random coefficients as the original oRF. This oRF outperforms RF, as well as other classifiers, on nearly all data sets but those with discrete factorial features. Learned node models perform distinctively better than random splits. An oRF feature importance score shows to be preferable over standard RF feature importance scores such as Gini or permutation importance. The topology of the oRF decision space appears to be smoother and better adapted to the data, resulting in improved generalization performance. Overall, the oRF propose here may be preferred over standard RF on most learning tasks involving numerical and spectral data.

Journal ArticleDOI
TL;DR: A new hybrid genetic algorithm (HGA) for feature selection (FS), called HGAFS, which produces consistently better performances on selecting the subsets of salient features with resulting better classification accuracies.

Proceedings ArticleDOI
09 May 2011
TL;DR: This work defines a view-to-object distance where a novel view is compared simultaneously to all views of a previous object, and shows that this measure leads to superior classification performance on object category and instance recognition.
Abstract: In this work we address joint object category and instance recognition in the context of RGB-D (depth) cameras. Motivated by local distance learning, where a novel view of an object is compared to individual views of previously seen objects, we define a view-to-object distance where a novel view is compared simultaneously to all views of a previous object. This novel distance is based on a weighted combination of feature differences between views. We show, through jointly learning per-view weights, that this measure leads to superior classification performance on object category and instance recognition. More importantly, the proposed distance allows us to find a sparse solution via Group-Lasso regularization, where a small subset of representative views of an object is identified and used, with the rest discarded. This significantly reduces computational cost without compromising recognition accuracy. We evaluate the proposed technique, Instance Distance Learning (IDL), on the RGB-D Object Dataset, which consists of 300 object instances in 51 everyday categories and about 250,000 views of objects with both RGB color and depth. We empirically compare IDL to several alternative state-of-the-art approaches and also validate the use of visual and shape cues and their combination.

Proceedings ArticleDOI
20 Jun 2011
TL;DR: Experimental results on a very large database show that the KPLS is significantly better than the popular SVM method, and outperform the state-of-the-art approaches in human age estimation.
Abstract: Human age estimation has recently become an active research topic in computer vision and pattern recognition, because of many potential applications in reality. In this paper we propose to use the kernel partial least squares (KPLS) regression for age estimation. The KPLS (or linear PLS) method has several advantages over previous approaches: (1) the KPLS can reduce feature dimensionality and learn the aging function simultaneously in a single learning framework, instead of performing each task separately using different techniques; (2) the KPLS can find a small number of latent variables, e.g., 20, to project thousands of features into a very low-dimensional subspace, which may have great impact on real-time applications; and (3) the KPLS regression has an output vector that can contain multiple labels, so that several related problems, e.g., age estimation, gender classification, and ethnicity estimation can be solved altogether. This is the first time that the kernel PLS method is introduced and applied to solve a regression problem in computer vision with high accuracy. Experimental results on a very large database show that the KPLS is significantly better than the popular SVM method, and outperform the state-of-the-art approaches in human age estimation.

Journal ArticleDOI
TL;DR: Standard machine learning techniques naive Bayes and SVM are incorporated into the domain of online Cantonese-written restaurant reviews to automatically classify user reviews as positive or negative, finding that accuracy is influenced by interaction between the classification models and the feature options.
Abstract: Research highlights? Naive Bayes and SVM are used for Cantonese sentiment classification. ? Accuracy is influenced by interaction between classification models and features. ? Naive Bayes classifier achieves as well as or better accuracy than SVM. ? Character-based bigrams are better features than unigrams and trigrams in capturing Cantonese sentiment. Cantonese is an important dialect in some regions of Southern China. Local online users often represent their opinions and experiences on the web with written Cantonese. Although the information in those reviews is valuable to potential consumers and sellers, the huge amount of web reviews make it difficult to give an unbiased evaluation to a product and the Cantonese reviews are unintelligible for Mandarin Chinese speakers.In this paper, standard machine learning techniques naive Bayes and SVM are incorporated into the domain of online Cantonese-written restaurant reviews to automatically classify user reviews as positive or negative. The effects of feature presentations and feature sizes on classification performance are discussed. We find that accuracy is influenced by interaction between the classification models and the feature options. The naive Bayes classifier achieves as well as or better accuracy than SVM. Character-based bigrams are proved better features than unigrams and trigrams in capturing Cantonese sentiment orientation.

Proceedings ArticleDOI
21 Mar 2011
TL;DR: A method for automatic emotion recognition that uses support vector machine (SVM) and largest margin nearest neighbour (LMNN) and compares the results to the pre-computed FERA 2011 emotion challenge baseline.
Abstract: We propose a method for automatic emotion recognition as part of the FERA 2011 competition. The system extracts pyramid of histogram of gradients (PHOG) and local phase quantisation (LPQ) features for encoding the shape and appearance information. For selecting the key frames, K-means clustering is applied to the normalised shape vectors derived from constraint local model (CLM) based face tracking on the image sequences. Shape vectors closest to the cluster centers are then used to extract the shape and appearance features. We demonstrate the results on the SSPNET GEMEP-FERA dataset. It comprises of both person specific and person independent partitions. For emotion classification we use support vector machine (SVM) and largest margin nearest neighbour (LMNN) and compare our results to the pre-computed FERA 2011 emotion challenge baseline.

Proceedings ArticleDOI
07 Nov 2011
TL;DR: Experimental results indicate that physical features are always among the top features selected by different feature selection methods and the recognition accuracy is generally improved to 90%, or 8% better than when only statistical features are used.
Abstract: Human activity recognition is important for many applications. This paper describes a human activity recognition framework based on feature selection techniques. The objective is to identify the most important features to recognize human activities. We first design a set of new features (called physical features) based on the physical parameters of human motion to augment the commonly used statistical features. To systematically analyze the impact of the physical features on the performance of the recognition system, a single-layer feature selection framework is developed. Experimental results indicate that physical features are always among the top features selected by different feature selection methods and the recognition accuracy is generally improved to 90%, or 8% better than when only statistical features are used. Moreover, we show that the performance is further improved by 3.8% by extending the single-layer framework to a multi-layer framework which takes advantage of the inherent structure of human activities and performs feature selection and classification in a hierarchical manner.

Journal ArticleDOI
TL;DR: This paper presents flexible algorithms for learning MBC structures from data based on filter, wrapper and hybrid approaches, and derives theoretical results on how to minimize the expected loss under standard 0-1 loss functions.

Proceedings ArticleDOI
16 Jul 2011
TL;DR: This paper proposes an method to leverage topics at multiple granularity, which can model the short text more precisely and compared the proposed method with the state-of-the-art baseline over one open data set.
Abstract: Understanding the rapidly growing short text is very important. Short text is different from traditional documents in its shortness and sparsity, which hinders the application of conventional machine learning and text mining algorithms. Two major approaches have been exploited to enrich the representation of short text. One is to fetch contextual information of a short text to directly add more text; the other is to derive latent topics from existing large corpus, which are used as features to enrich the representation of short text. The latter approach is elegant and efficient in most cases. The major trend along this direction is to derive latent topics of certain granularity through well-known topic models such as latent Dirichlet allocation (LDA). However, topics of certain granularity are usually not sufficient to set up effective feature spaces. In this paper, we move forward along this direction by proposing an method to leverage topics at multiple granularity, which can model the short text more precisely. Taking short text classification as an example, we compared our proposed method with the state-of-the-art baseline over one open data set. Our method reduced the classification error by 20.25% and 16.68% respectively on two classifiers.