Showing papers on "Feature (machine learning) published in 2011"

PDF

Open Access

Journal Article•DOI•

Domain Adaptation via Transfer Component Analysis

[...]

Sinno Jialin Pan, Ivor W. Tsang¹, James T. Kwok², Qiang Yang²•Institutions (2)

Nanyang Technological University¹, Hong Kong University of Science and Technology²

01 Feb 2011-IEEE Transactions on Neural Networks

TL;DR: This work proposes a novel dimensionality reduction framework for reducing the distance between domains in a latent space for domain adaptation and proposes both unsupervised and semisupervised feature extraction approaches, which can dramatically reduce thedistance between domain distributions by projecting data onto the learned transfer components.

...read moreread less

Abstract: Domain adaptation allows knowledge from a source domain to be transferred to a different but related target domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we first propose to find such a representation through a new learning method, transfer component analysis (TCA), for domain adaptation. TCA tries to learn some transfer components across domains in a reproducing kernel Hilbert space using maximum mean miscrepancy. In the subspace spanned by these transfer components, data properties are preserved and data distributions in different domains are close to each other. As a result, with the new representations in this subspace, we can apply standard machine learning methods to train classifiers or regression models in the source domain for use in the target domain. Furthermore, in order to uncover the knowledge hidden in the relations between the data labels from the source and target domains, we extend TCA in a semisupervised learning setting, which encodes label information into transfer components learning. We call this extension semisupervised TCA. The main contribution of our work is that we propose a novel dimensionality reduction framework for reducing the distance between domains in a latent space for domain adaptation. We propose both unsupervised and semisupervised feature extraction approaches, which can dramatically reduce the distance between domain distributions by projecting data onto the learned transfer components. Finally, our approach can handle large datasets and naturally lead to out-of-sample generalization. The effectiveness and efficiency of our approach are verified by experiments on five toy datasets and two real-world applications: cross-domain indoor WiFi localization and cross-domain text classification.

...read moreread less

3,195 citations

Proceedings Article•

Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach

[...]

Xavier Glorot¹, Antoine Bordes¹, Antoine Bordes², Yoshua Bengio¹•Institutions (2)

Université de Montréal¹, University of Technology of Compiègne²

28 Jun 2011

TL;DR: A deep learning approach is proposed which learns to extract a meaningful representation for each review in an unsupervised fashion and clearly outperform state-of-the-art methods on a benchmark composed of reviews of 4 types of Amazon products.

...read moreread less

Abstract: The exponential increase in the availability of online reviews and recommendations makes sentiment classification an interesting topic in academic and industrial research. Reviews can span so many different domains that it is difficult to gather annotated training data for all of them. Hence, this paper studies the problem of domain adaptation for sentiment classifiers, hereby a system is trained on labeled reviews from one source domain but is meant to be deployed on another. We propose a deep learning approach which learns to extract a meaningful representation for each review in an unsupervised fashion. Sentiment classifiers trained with this high-level feature representation clearly outperform state-of-the-art methods on a benchmark composed of reviews of 4 types of Amazon products. Furthermore, this method scales well and allowed us to successfully perform domain adaptation on a larger industrial-strength dataset of 22 domains.

...read moreread less

1,769 citations

Proceedings Article•DOI•

The German Traffic Sign Recognition Benchmark: A multi-class classification competition

[...]

Johannes Stallkamp¹, Marc Schlipsing¹, Jan Salmen¹, Christian Igel²•Institutions (2)

Ruhr University Bochum¹, University of Copenhagen²

03 Oct 2011

TL;DR: The “German Traffic Sign Recognition Benchmark” is a multi-category classification competition held at IJCNN 2011, and a comprehensive, lifelike dataset of more than 50,000 traffic sign images has been collected.

...read moreread less

Abstract: The “German Traffic Sign Recognition Benchmark” is a multi-category classification competition held at IJCNN 2011. Automatic recognition of traffic signs is required in advanced driver assistance systems and constitutes a challenging real-world computer vision and pattern recognition problem. A comprehensive, lifelike dataset of more than 50,000 traffic sign images has been collected. It reflects the strong variations in visual appearance of signs due to distance, illumination, weather conditions, partial occlusions, and rotations. The images are complemented by several precomputed feature sets to allow for applying machine learning algorithms without background knowledge in image processing. The dataset comprises 43 classes with unbalanced class frequencies. Participants have to classify two test sets of more than 12,500 images each. Here, the results on the first of these sets, which was used in the first evaluation stage of the two-fold challenge, are reported. The methods employed by the participants who achieved the best results are briefly described and compared to human traffic sign recognition performance and baseline results.

...read moreread less

902 citations

Proceedings Article•DOI•

Learning a discriminative dictionary for sparse coding via label consistent K-SVD

[...]

Zhuolin Jiang¹, Zhe Lin², Larry S. Davis¹•Institutions (2)

University of Maryland, College Park¹, Adobe Systems²

20 Jun 2011

TL;DR: A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding is presented, which learns a single over-complete dictionary and an optimal linear classifier jointly and yields dictionaries so that feature points with the same class labels have similar sparse codes.

...read moreread less

Abstract: A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse codes during the dictionary learning process. More specifically, we introduce a new label consistent constraint called ‘discriminative sparse-code error’ and combine it with the reconstruction error and the classification error to form a unified objective function. The optimal solution is efficiently obtained using the K-SVD algorithm. Our algorithm learns a single over-complete dictionary and an optimal linear classifier jointly. It yields dictionaries so that feature points with the same class labels have similar sparse codes. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse coding techniques for face and object category recognition under the same learning conditions.

...read moreread less

780 citations

Proceedings Article•DOI•

Human activity prediction: Early recognition of ongoing activities from streaming videos

[...]

Michael S. Ryoo¹•Institutions (1)

Electronics and Telecommunications Research Institute¹

06 Nov 2011

TL;DR: The new recognition methodology named dynamic bag-of-words is developed, which considers sequential nature of human activities while maintaining advantages of the bag- of-words to handle noisy observations, and reliably recognizes ongoing activities from streaming videos with a high accuracy.

...read moreread less

Abstract: In this paper, we present a novel approach of human activity prediction. Human activity prediction is a probabilistic process of inferring ongoing activities from videos only containing onsets (i.e. the beginning part) of the activities. The goal is to enable early recognition of unfinished activities as opposed to the after-the-fact classification of completed activities. Activity prediction methodologies are particularly necessary for surveillance systems which are required to prevent crimes and dangerous activities from occurring. We probabilistically formulate the activity prediction problem, and introduce new methodologies designed for the prediction. We represent an activity as an integral histogram of spatio-temporal features, efficiently modeling how feature distributions change over time. The new recognition methodology named dynamic bag-of-words is developed, which considers sequential nature of human activities while maintaining advantages of the bag-of-words to handle noisy observations. Our experiments confirm that our approach reliably recognizes ongoing activities from streaming videos with a high accuracy.

...read moreread less

617 citations

Book Chapter•DOI•

A Survey of Link Prediction in Social Networks

[...]

Mohammad Al Hasan¹, Mohammed J. Zaki²•Institutions (2)

Indiana University – Purdue University Indianapolis¹, Rensselaer Polytechnic Institute²

17 Mar 2011

TL;DR: This article surveys some representative link prediction methods by categorizing them by the type of models, largely considering three types of models: first, the traditional (non-Bayesian) models which extract a set of features to train a binary classification model, and second, the probabilistic approaches which model the joint-probability among the entities in a network by Bayesian graphical models.

...read moreread less

Abstract: Link prediction is an important task for analying social networks which also has applications in other domains like, information retrieval, bioinformatics and e-commerce There exist a variety of techniques for link prediction, ranging from feature-based classification and kernel-based method to matrix factorization and probabilistic graphical models These methods differ from each other with respect to model complexity, prediction performance, scalability, and generalization ability In this article, we survey some representative link prediction methods by categorizing them by the type of the models We largely consider three types of models: first, the traditional (non-Bayesian) models which extract a set of features to train a binary classification model Second, the probabilistic approaches which model the joint-probability among the entities in a network by Bayesian graphical models And, finally the linear algebraic approach which computes the similarity between the nodes in a network by rank-reduced similarity matrices We discuss various existing link prediction models that fall in these broad categories and analyze their strength and weakness We conclude the survey with a discussion on recent developments and future research direction

...read moreread less

566 citations

Journal Article•DOI•

Face recognition using Histograms of Oriented Gradients

[...]

Oscar Deniz¹, Gloria Bueno¹, Jesús Salido¹, F. De la Torre²•Institutions (2)

University of Castilla–La Mancha¹, Carnegie Mellon University²

01 Sep 2011-Pattern Recognition Letters

TL;DR: This paper investigates a simple but powerful approach to make robust use of HOG features for face recognition by proposing to extract HOG descriptors from a regular grid and identifying the necessity of performing dimensionality reduction to remove noise and make the classification process less prone to overfitting.

...read moreread less

553 citations

Journal Article•DOI•

Ensemble of feature sets and classification algorithms for sentiment classification

[...]

Rui Xia¹, Chengqing Zong¹, Shoushan Li²•Institutions (2)

Chinese Academy of Sciences¹, Soochow University (Suzhou)²

01 Mar 2011-Information Sciences

TL;DR: This paper makes a comparative study of the effectiveness of ensemble technique for sentiment classification, with the aim of efficiently integrating different feature sets and classification algorithms to synthesize a more accurate classification procedure.

...read moreread less

543 citations

Proceedings Article•DOI•

Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning

[...]

Adam Coates¹, Blake Carpenter¹, Carl Case¹, Sanjeev Satheesh¹, Bipin Suresh¹, Tao Wang¹, David J. Wu¹, Andrew Y. Ng¹ - Show less +4 more•Institutions (1)

Stanford University¹

18 Sep 2011

TL;DR: This paper applies large-scale algorithms for learning the features automatically from unlabeled data to construct highly effective classifiers for both detection and recognition to be used in a high accuracy end-to-end system.

...read moreread less

Abstract: Reading text from photographs is a challenging problem that has received a significant amount of attention. Two key components of most systems are (i) text detection from images and (ii) character recognition, and many recent methods have been proposed to design better feature representations and models for both. In this paper, we apply methods recently developed in machine learning -- specifically, large-scale algorithms for learning the features automatically from unlabeled data -- and show that they allow us to construct highly effective classifiers for both detection and recognition to be used in a high accuracy end-to-end system.

...read moreread less

402 citations

Proceedings Article•

Co-Training for Domain Adaptation

[...]

Minmin Chen¹, Kilian Q. Weinberger¹, John Blitzer²•Institutions (2)

Washington University in St. Louis¹, Google²

12 Dec 2011

TL;DR: An algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident, and is named CODA (Co-training for domain adaptation).

...read moreread less

Abstract: Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident. Our algorithm is a variant of co-training [7], and we name it CODA (Co-training for domain adaptation). Unlike the original co-training work, we do not assume a particular feature split. Instead, for each iteration of co-training, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. CODA significantly out-performs the state-of-the-art on the 12-domain benchmark data set of Blitzer et al. [4]. Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance.

...read moreread less

402 citations

Proceedings Article•DOI•

Describing people: A poselet-based approach to attribute classification

[...]

Lubomir Bourdev¹, Subhransu Maji¹, Jitendra Malik¹•Institutions (1)

University of California, Berkeley¹

06 Nov 2011

TL;DR: This work proposes a method for recognizing attributes, such as the gender, hair style and types of clothes of people under large variation in viewpoint, pose, articulation and occlusion typical of personal photo album images, using a part-based approach based on poselets.

...read moreread less

Abstract: We propose a method for recognizing attributes, such as the gender, hair style and types of clothes of people under large variation in viewpoint, pose, articulation and occlusion typical of personal photo album images. Robust attribute classifiers under such conditions must be invariant to pose, but inferring the pose in itself is a challenging problem. We use a part-based approach based on poselets. Our parts implicitly decompose the aspect (the pose and viewpoint). We train attribute classifiers for each such aspect and we combine them together in a discriminative model. We propose a new dataset of 8000 people with annotated attributes. Our method performs very well on this dataset, significantly outperforming a baseline built on the spatial pyramid match kernel method. On gender recognition we outperform a commercial face recognition system.

...read moreread less

Journal Article•DOI•

Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition

[...]

Jort F. Gemmeke¹, Tuomas Virtanen², Antti Hurmalainen²•Institutions (2)

Radboud University Nijmegen¹, Tampere University of Technology²

01 Sep 2011-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The results show that the hybrid system performed substantially better than source separation or missing data mask estimation at lower signal-to-noise ratios (SNRs), achieving up to 57.1% accuracy at SNR = -5 dB.

...read moreread less

Abstract: This paper proposes to use exemplar-based sparse representations for noise robust automatic speech recognition. First, we describe how speech can be modeled as a linear combination of a small number of exemplars from a large speech exemplar dictionary. The exemplars are time-frequency patches of real speech, each spanning multiple time frames. We then propose to model speech corrupted by additive noise as a linear combination of noise and speech exemplars, and we derive an algorithm for recovering this sparse linear combination of exemplars from the observed noisy speech. We describe how the framework can be used for doing hybrid exemplar-based/HMM recognition by using the exemplar-activations together with the phonetic information associated with the exemplars. As an alternative to hybrid recognition, the framework also allows us to take a source separation approach which enables exemplar-based feature enhancement as well as missing data mask estimation. We evaluate the performance of these exemplar-based methods in connected digit recognition on the AURORA-2 database. Our results show that the hybrid system performed substantially better than source separation or missing data mask estimation at lower signal-to-noise ratios (SNRs), achieving up to 57.1% accuracy at SNR = -5 dB. Although not as effective as two baseline recognizers at higher SNRs, the novel approach offers a promising direction of future research on exemplar-based ASR.

...read moreread less

Proceedings Article•DOI•

A committee of neural networks for traffic sign classification

[...]

Dan Ciresan¹, Ueli Meier¹, Jonathan Masci¹, Jürgen Schmidhuber¹•Institutions (1)

University of Lugano¹

03 Oct 2011

TL;DR: This work describes the approach that won the preliminary phase of the German traffic sign recognition benchmark with a better-than-human recognition rate, and obtains an even better recognition rate by further training the nets.

...read moreread less

Abstract: We describe the approach that won the preliminary phase of the German traffic sign recognition benchmark with a better-than-human recognition rate of 98.98%.We obtain an even better recognition rate of 99.15% by further training the nets. Our fast, fully parameterizable GPU implementation of a Convolutional Neural Network does not require careful design of pre-wired feature extractors, which are rather learned in a supervised way. A CNN/MLP committee further boosts recognition performance.

...read moreread less

Proceedings Article•

Learning with Whom to Share in Multi-task Feature Learning

[...]

Zhuoliang Kang¹, Kristen Grauman², Fei Sha¹•Institutions (2)

University of Southern California¹, University of Texas System²

28 Jun 2011

TL;DR: This paper forms the problem of multi-task learning of shared feature representations among tasks, while simultaneously determining "with whom" each task should share as a mixed integer programming and provides an alternating minimization technique to solve the optimization problem of jointly identifying grouping structures and parameters.

...read moreread less

Abstract: In multi-task learning (MTL), multiple tasks are learnt jointly. A major assumption for this paradigm is that all those tasks are indeed related so that the joint training is appropriate and beneficial. In this paper, we study the problem of multi-task learning of shared feature representations among tasks, while simultaneously determining "with whom" each task should share. We formulate the problem as a mixed integer programming and provide an alternating minimization technique to solve the optimization problem of jointly identifying grouping structures and parameters. The algorithm mono-tonically decreases the objective function and converges to a local optimum. Compared to the standard MTL paradigm where all tasks are in a single group, our algorithm improves its performance with statistical significance for three out of the four datasets we have studied. We also demonstrate its advantage over other task grouping techniques investigated in literature.

...read moreread less

Proceedings Article•DOI•

Coupled information-theoretic encoding for face photo-sketch recognition

[...]

Wei Zhang¹, Xiaogang Wang¹, Xiaoou Tang²•Institutions (2)

The Chinese University of Hong Kong¹, Chinese Academy of Sciences²

20 Jun 2011

TL;DR: A new face descriptor based on coupled information-theoretic encoding is used to capture discriminative local face structures and to effectively match photos and sketches by reducing the modality gap at the feature extraction stage.

...read moreread less

Abstract: Automatic face photo-sketch recognition has important applications for law enforcement. Recent research has focused on transforming photos and sketches into the same modality for matching or developing advanced classification algorithms to reduce the modality gap between features extracted from photos and sketches. In this paper, we propose a new inter-modality face recognition approach by reducing the modality gap at the feature extraction stage. A new face descriptor based on coupled information-theoretic encoding is used to capture discriminative local face structures and to effectively match photos and sketches. Guided by maximizing the mutual information between photos and sketches in the quantized feature spaces, the coupled encoding is achieved by the proposed coupled information-theoretic projection tree, which is extended to the randomized forest to further boost the performance. We create the largest face sketch database including sketches of 1, 194 people from the FERET database. Experiments on this large scale dataset show that our approach significantly outperforms the state-of-the-art methods.

...read moreread less

Proceedings Article•DOI•

RGBD-HuDaAct: A color-depth video database for human daily activity recognition

[...]

Bingbing Ni¹, Gang Wang¹, Pierre Moulin²•Institutions (2)

Agency for Science, Technology and Research¹, University of Illinois at Urbana–Champaign²

01 Nov 2011

TL;DR: A home-monitoring oriented human activity recognition benchmark database, based on the combination of a color video camera and a depth sensor, and two multi-modality fusion schemes, which naturally combine color and depth information.

...read moreread less

Abstract: In this paper, we present a home-monitoring oriented human activity recognition benchmark database, based on the combination of a color video camera and a depth sensor. Our contributions are two-fold: 1) We have created a publicly releasable human activity video database (i.e., named as RGBD-HuDaAct), which contains synchronized color-depth video streams, for the task of human daily activity recognition. This database aims at encouraging more research efforts on human activity recognition based on multi-modality sensor combination (e.g., color plus depth). 2) Two multi-modality fusion schemes, which naturally combine color and depth information, have been developed from two state-of-the-art feature representation methods for action recognition, i.e., spatio-temporal interest points (STIPs) and motion history images (MHIs). These depth-extended feature representation methods are evaluated comprehensively and superior recognition performances over their uni-modality (e.g., color only) counterparts are demonstrated.

...read moreread less

Journal Article•DOI•

Human face recognition based on multidimensional PCA and extreme learning machine

[...]

Abdul Adeel Mohammed¹, Rashid Minhas¹, Q. M. Jonathan Wu¹, Maher A. Sid-Ahmed¹•Institutions (1)

University of Windsor¹

01 Oct 2011-Pattern Recognition

TL;DR: A new human face recognition algorithm based on bidirectional two dimensional principal component analysis (B2DPCA) and extreme learning machine (ELM) and a subband that exhibits a maximum standard deviation is dimensionally reduced using an improved dimensionality reduction technique.

...read moreread less

Journal Article•DOI•

On the use of stochastic hessian information in optimization methods for machine learning

[...]

Richard H. Byrd¹, Gillian M. Chin², Will Neveitt³, Jorge Nocedal²•Institutions (3)

University of Colorado Boulder¹, Northwestern University², Google³

22 Sep 2011-Siam Journal on Optimization

TL;DR: Curvature information is incorporated in two subsampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration.

...read moreread less

Abstract: This paper describes how to incorporate sampled curvature information in a Newton-CG method and in a limited memory quasi-Newton method for statistical learning. The motivation for this work stems from supervised machine learning applications involving a very large number of training points. We follow a batch approach, also known in the stochastic optimization literature as a sample average approximation approach. Curvature information is incorporated in two subsampled Hessian algorithms, one based on a matrix-free inexact Newton iteration and one on a preconditioned limited memory BFGS iteration. A crucial feature of our technique is that Hessian-vector multiplications are carried out with a significantly smaller sample size than is used for the function and gradient. The efficiency of the proposed methods is illustrated using a machine learning application involving speech recognition.

...read moreread less

Journal Article•DOI•

Unraveling the distributed neural code of facial identity through spatiotemporal pattern analysis

[...]

Adrian Nestor¹, David C. Plaut, Marlene Behrmann•Institutions (1)

Carnegie Mellon University¹

14 Jun 2011-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: The present study investigates the neural code of facial identity perception with the aim of ascertaining its distributed nature and informational basis, and uses a sequence of multivariate pattern analyses applied to functional magnetic resonance imaging (fMRI) data to map out and characterize a cortical system responsible for individuation.

...read moreread less

Abstract: Face individuation is one of the most impressive achievements of our visual system, and yet uncovering the neural mechanisms subserving this feat appears to elude traditional approaches to functional brain data analysis. The present study investigates the neural code of facial identity perception with the aim of ascertaining its distributed nature and informational basis. To this end, we use a sequence of multivariate pattern analyses applied to functional magnetic resonance imaging (fMRI) data. First, we combine information-based brain mapping and dynamic discrimination analysis to locate spatiotemporal patterns that support face classification at the individual level. This analysis reveals a network of fusiform and anterior temporal areas that carry information about facial identity and provides evidence that the fusiform face area responds with distinct patterns of activation to different face identities. Second, we assess the information structure of the network using recursive feature elimination. We find that diagnostic information is distributed evenly among anterior regions of the mapped network and that a right anterior region of the fusiform gyrus plays a central role within the information network mediating face individuation. These findings serve to map out and characterize a cortical system responsible for individuation. More generally, in the context of functionally defined networks, they provide an account of distributed processing grounded in information-based architectures.

...read moreread less

Proceedings Article•DOI•

Making Deep Belief Networks effective for large vocabulary continuous speech recognition

[...]

Tara N. Sainath¹, Brian Kingsbury¹, Bhuvana Ramabhadran¹, Petr Fousek¹, Petr Novak¹, Abdelrahman Mohamed² - Show less +2 more•Institutions (2)

IBM¹, University of Toronto²

01 Dec 2011

TL;DR: This paper explores the performance of DBNs in a state-of-the-art LVCSR system, showing improvements over Multi-Layer Perceptrons (MLPs) and GMM/HMMs across a variety of features on an English Broadcast News task.

...read moreread less

Abstract: To date, there has been limited work in applying Deep Belief Networks (DBNs) for acoustic modeling in LVCSR tasks, with past work using standard speech features. However, a typical LVCSR system makes use of both feature and model-space speaker adaptation and discriminative training. This paper explores the performance of DBNs in a state-of-the-art LVCSR system, showing improvements over Multi-Layer Perceptrons (MLPs) and GMM/HMMs across a variety of features on an English Broadcast News task. In addition, we provide a recipe for data parallelization of DBN training, showing that data parallelization can provide linear speed-up in the number of machines, without impacting WER.

...read moreread less

Proceedings Article•DOI•

Clustering product features for opinion mining

[...]

Zhongwu Zhai¹, Bing Liu², Hua Xu¹, Peifa Jia¹•Institutions (2)

Tsinghua University¹, University of Illinois at Chicago²

09 Feb 2011

TL;DR: This paper models the sentiment analysis of product reviews problem as a semi-supervised learning problem, and proposes a method to automatically identify some labeled examples that outperforms existing state-of-the-art methods.

...read moreread less

Abstract: In sentiment analysis of product reviews, one important problem is to produce a summary of opinions based on product features/attributes (also called aspects). However, for the same feature, people can express it with many different words or phrases. To produce a useful summary, these words and phrases, which are domain synonyms, need to be grouped under the same feature group. Although several methods have been proposed to extract product features from reviews, limited work has been done on clustering or grouping of synonym features. This paper focuses on this task. Classic methods for solving this problem are based on unsupervised learning using some forms of distributional similarity. However, we found that these methods do not do well. We then model it as a semi-supervised learning problem. Lexical characteristics of the problem are exploited to automatically identify some labeled examples. Empirical evaluation shows that the proposed method outperforms existing state-of-the-art methods by a large margin.

...read moreread less

Book Chapter•DOI•

On oblique random forests

[...]

Bjoern H. Menze¹, B. Michael Kelm¹, Daniel N. Splitthoff¹, Ullrich Koethe¹, Fred A. Hamprecht¹ - Show less +1 more•Institutions (1)

Interdisciplinary Center for Scientific Computing¹

05 Sep 2011

TL;DR: This work proposes to employ "oblique" random forests (oRF) built from multivariate trees which explicitly learn optimal split directions at internal nodes using linear discriminative models, rather than using random coefficients as the original oRF.

...read moreread less

Abstract: In his original paper on random forests, Breiman proposed two different decision tree ensembles: one generated from "orthogonal" trees with thresholds on individual features in every split, and one from "oblique" trees separating the feature space by randomly oriented hyperplanes. In spite of a rising interest in the random forest framework, however, ensembles built from orthogonal trees (RF) have gained most, if not all, attention so far. In the present work we propose to employ "oblique" random forests (oRF) built from multivariate trees which explicitly learn optimal split directions at internal nodes using linear discriminative models, rather than using random coefficients as the original oRF. This oRF outperforms RF, as well as other classifiers, on nearly all data sets but those with discrete factorial features. Learned node models perform distinctively better than random splits. An oRF feature importance score shows to be preferable over standard RF feature importance scores such as Gini or permutation importance. The topology of the oRF decision space appears to be smoother and better adapted to the data, resulting in improved generalization performance. Overall, the oRF propose here may be preferred over standard RF on most learning tasks involving numerical and spectral data.

...read moreread less

Journal Article•DOI•

A new local search based hybrid genetic algorithm for feature selection

[...]

Md. Monirul Kabir¹, Md. Shahjahan², Kazuyuki Murase¹•Institutions (2)

University of Fukui¹, Khulna University of Engineering & Technology²

01 Oct 2011-Neurocomputing

TL;DR: A new hybrid genetic algorithm (HGA) for feature selection (FS), called HGAFS, which produces consistently better performances on selecting the subsets of salient features with resulting better classification accuracies.

...read moreread less

Proceedings Article•DOI•

Sparse distance learning for object recognition combining RGB and depth information

[...]

Kevin Lai¹, Liefeng Bo¹, Xiaofeng Ren², Dieter Fox¹•Institutions (2)

University of Washington¹, Intel²

09 May 2011

TL;DR: This work defines a view-to-object distance where a novel view is compared simultaneously to all views of a previous object, and shows that this measure leads to superior classification performance on object category and instance recognition.

...read moreread less

Abstract: In this work we address joint object category and instance recognition in the context of RGB-D (depth) cameras. Motivated by local distance learning, where a novel view of an object is compared to individual views of previously seen objects, we define a view-to-object distance where a novel view is compared simultaneously to all views of a previous object. This novel distance is based on a weighted combination of feature differences between views. We show, through jointly learning per-view weights, that this measure leads to superior classification performance on object category and instance recognition. More importantly, the proposed distance allows us to find a sparse solution via Group-Lasso regularization, where a small subset of representative views of an object is identified and used, with the rest discarded. This significantly reduces computational cost without compromising recognition accuracy. We evaluate the proposed technique, Instance Distance Learning (IDL), on the RGB-D Object Dataset, which consists of 300 object instances in 51 everyday categories and about 250,000 views of objects with both RGB color and depth. We empirically compare IDL to several alternative state-of-the-art approaches and also validate the use of visual and shape cues and their combination.

...read moreread less

Proceedings Article•DOI•

Simultaneous dimensionality reduction and human age estimation via kernel partial least squares regression

[...]

Guodong Guo¹, Guowang Mu²•Institutions (2)

West Virginia University¹, Hebei University of Technology²

20 Jun 2011

TL;DR: Experimental results on a very large database show that the KPLS is significantly better than the popular SVM method, and outperform the state-of-the-art approaches in human age estimation.

...read moreread less

Abstract: Human age estimation has recently become an active research topic in computer vision and pattern recognition, because of many potential applications in reality. In this paper we propose to use the kernel partial least squares (KPLS) regression for age estimation. The KPLS (or linear PLS) method has several advantages over previous approaches: (1) the KPLS can reduce feature dimensionality and learn the aging function simultaneously in a single learning framework, instead of performing each task separately using different techniques; (2) the KPLS can find a small number of latent variables, e.g., 20, to project thousands of features into a very low-dimensional subspace, which may have great impact on real-time applications; and (3) the KPLS regression has an output vector that can contain multiple labels, so that several related problems, e.g., age estimation, gender classification, and ethnicity estimation can be solved altogether. This is the first time that the kernel PLS method is introduced and applied to solve a regression problem in computer vision with high accuracy. Experimental results on a very large database show that the KPLS is significantly better than the popular SVM method, and outperform the state-of-the-art approaches in human age estimation.

...read moreread less

Journal Article•DOI•

Sentiment classification of Internet restaurant reviews written in Cantonese

[...]

Ziqiong Zhang¹, Qiang Ye¹, Zili Zhang¹, Yijun Li¹•Institutions (1)

Harbin Institute of Technology¹

01 Jun 2011-Expert Systems With Applications

TL;DR: Standard machine learning techniques naive Bayes and SVM are incorporated into the domain of online Cantonese-written restaurant reviews to automatically classify user reviews as positive or negative, finding that accuracy is influenced by interaction between the classification models and the feature options.

...read moreread less

Abstract: Research highlights? Naive Bayes and SVM are used for Cantonese sentiment classification. ? Accuracy is influenced by interaction between classification models and features. ? Naive Bayes classifier achieves as well as or better accuracy than SVM. ? Character-based bigrams are better features than unigrams and trigrams in capturing Cantonese sentiment. Cantonese is an important dialect in some regions of Southern China. Local online users often represent their opinions and experiences on the web with written Cantonese. Although the information in those reviews is valuable to potential consumers and sellers, the huge amount of web reviews make it difficult to give an unbiased evaluation to a product and the Cantonese reviews are unintelligible for Mandarin Chinese speakers.In this paper, standard machine learning techniques naive Bayes and SVM are incorporated into the domain of online Cantonese-written restaurant reviews to automatically classify user reviews as positive or negative. The effects of feature presentations and feature sizes on classification performance are discussed. We find that accuracy is influenced by interaction between the classification models and the feature options. The naive Bayes classifier achieves as well as or better accuracy than SVM. Character-based bigrams are proved better features than unigrams and trigrams in capturing Cantonese sentiment orientation.

...read moreread less

Proceedings Article•DOI•

Emotion recognition using PHOG and LPQ features

[...]

Abhinav Dhall¹, Akshay Asthana¹, Roland Goecke², Tom Gedeon¹•Institutions (2)

Australian National University¹, University of Canberra²

21 Mar 2011

TL;DR: A method for automatic emotion recognition that uses support vector machine (SVM) and largest margin nearest neighbour (LMNN) and compares the results to the pre-computed FERA 2011 emotion challenge baseline.

...read moreread less

Abstract: We propose a method for automatic emotion recognition as part of the FERA 2011 competition. The system extracts pyramid of histogram of gradients (PHOG) and local phase quantisation (LPQ) features for encoding the shape and appearance information. For selecting the key frames, K-means clustering is applied to the normalised shape vectors derived from constraint local model (CLM) based face tracking on the image sequences. Shape vectors closest to the cluster centers are then used to extract the shape and appearance features. We demonstrate the results on the SSPNET GEMEP-FERA dataset. It comprises of both person specific and person independent partitions. For emotion classification we use support vector machine (SVM) and largest margin nearest neighbour (LMNN) and compare our results to the pre-computed FERA 2011 emotion challenge baseline.

...read moreread less

Proceedings Article•DOI•

A feature selection-based framework for human activity recognition using wearable multimodal sensors

[...]

Mi Zhang¹, Alexander A. Sawchuk¹•Institutions (1)

University of Southern California¹

07 Nov 2011

TL;DR: Experimental results indicate that physical features are always among the top features selected by different feature selection methods and the recognition accuracy is generally improved to 90%, or 8% better than when only statistical features are used.

...read moreread less

Abstract: Human activity recognition is important for many applications. This paper describes a human activity recognition framework based on feature selection techniques. The objective is to identify the most important features to recognize human activities. We first design a set of new features (called physical features) based on the physical parameters of human motion to augment the commonly used statistical features. To systematically analyze the impact of the physical features on the performance of the recognition system, a single-layer feature selection framework is developed. Experimental results indicate that physical features are always among the top features selected by different feature selection methods and the recognition accuracy is generally improved to 90%, or 8% better than when only statistical features are used. Moreover, we show that the performance is further improved by 3.8% by extending the single-layer framework to a multi-layer framework which takes advantage of the inherent structure of human activities and performs feature selection and classification in a hierarchical manner.

...read moreread less

Journal Article•DOI•

Multi-dimensional classification with Bayesian networks

[...]

Concha Bielza¹, Guo-Zheng Li², Pedro Larraòaga¹•Institutions (2)

Technical University of Madrid¹, Katholieke Universiteit Leuven²

01 Sep 2011-International Journal of Approximate Reasoning

TL;DR: This paper presents flexible algorithms for learning MBC structures from data based on filter, wrapper and hybrid approaches, and derives theoretical results on how to minimize the expected loss under standard 0-1 loss functions.

...read moreread less

Proceedings Article•DOI•

Short text classification improved by learning multi-granularity topics

[...]

Meng-En Chen¹, Xiaoming Jin¹, Dou Shen•Institutions (1)

Tsinghua University¹

16 Jul 2011

TL;DR: This paper proposes an method to leverage topics at multiple granularity, which can model the short text more precisely and compared the proposed method with the state-of-the-art baseline over one open data set.

...read moreread less

Abstract: Understanding the rapidly growing short text is very important. Short text is different from traditional documents in its shortness and sparsity, which hinders the application of conventional machine learning and text mining algorithms. Two major approaches have been exploited to enrich the representation of short text. One is to fetch contextual information of a short text to directly add more text; the other is to derive latent topics from existing large corpus, which are used as features to enrich the representation of short text. The latter approach is elegant and efficient in most cases. The major trend along this direction is to derive latent topics of certain granularity through well-known topic models such as latent Dirichlet allocation (LDA). However, topics of certain granularity are usually not sufficient to set up effective feature spaces. In this paper, we move forward along this direction by proposing an method to leverage topics at multiple granularity, which can model the short text more precisely. Taking short text classification as an example, we compared our proposed method with the state-of-the-art baseline over one open data set. Our method reduced the classification error by 20.25% and 16.68% respectively on two classifiers.

...read moreread less

Collapse