scispace - formally typeset
Search or ask a question

Showing papers on "Support vector machine published in 2013"


Journal ArticleDOI
TL;DR: This paper explores the nature of open set recognition and formalizes its definition as a constrained minimization problem, and introduces a novel “1-vs-set machine,” which sculpts a decision space from the marginal distances of a 1-class or binary SVM with a linear kernel.
Abstract: To date, almost all experimental evaluations of machine learning-based recognition algorithms in computer vision have taken the form of “closed set” recognition, whereby all testing classes are known at training time. A more realistic scenario for vision applications is “open set” recognition, where incomplete knowledge of the world is present at training time, and unknown classes can be submitted to an algorithm during testing. This paper explores the nature of open set recognition and formalizes its definition as a constrained minimization problem. The open set recognition problem is not well addressed by existing algorithms because it requires strong generalization. As a step toward a solution, we introduce a novel “1-vs-set machine,” which sculpts a decision space from the marginal distances of a 1-class or binary SVM with a linear kernel. This methodology applies to several different applications in computer vision where open set recognition is a challenging problem, including object recognition and face verification. We consider both in this work, with large scale cross-dataset experiments performed over the Caltech 256 and ImageNet sets, as well as face matching experiments performed over the Labeled Faces in the Wild set. The experiments highlight the effectiveness of machines adapted for open set evaluation compared to existing 1-class and binary SVMs for the same tasks.

1,029 citations


Journal Article
TL;DR: In this article, a convergence analysis of stochastic dual coordinate coordinate ascent (SDCA) is presented, showing that this class of methods enjoy strong theoretical guarantees that are comparable or better than SGD.
Abstract: Stochastic Gradient Descent (SGD) has become popular for solving large scale supervised machine learning optimization problems such as SVM, due to their strong theoretical guarantees. While the closely related Dual Coordinate Ascent (DCA) method has been implemented in various software packages, it has so far lacked good convergence analysis. This paper presents a new analysis of Stochastic Dual Coordinate Ascent (SDCA) showing that this class of methods enjoy strong theoretical guarantees that are comparable or better than SGD. This analysis justifies the effectiveness of SDCA for practical applications.

986 citations


Journal ArticleDOI
TL;DR: The experimental results show that the two PSO-based multi-objective algorithms can automatically evolve a set of nondominated solutions and the first algorithm outperforms the two conventional methods, the single objective method, and the two-stage algorithm.
Abstract: Classification problems often have a large number of features in the data sets, but not all of them are useful for classification. Irrelevant and redundant features may even reduce the performance. Feature selection aims to choose a small number of relevant features to achieve similar or even better classification performance than using all features. It has two main conflicting objectives of maximizing the classification performance and minimizing the number of features. However, most existing feature selection algorithms treat the task as a single objective problem. This paper presents the first study on multi-objective particle swarm optimization (PSO) for feature selection. The task is to generate a Pareto front of nondominated solutions (feature subsets). We investigate two PSO-based multi-objective feature selection algorithms. The first algorithm introduces the idea of nondominated sorting into PSO to address feature selection problems. The second algorithm applies the ideas of crowding, mutation, and dominance to PSO to search for the Pareto front solutions. The two multi-objective algorithms are compared with two conventional feature selection methods, a single objective feature selection method, a two-stage feature selection algorithm, and three well-known evolutionary multi-objective algorithms on 12 benchmark data sets. The experimental results show that the two PSO-based multi-objective algorithms can automatically evolve a set of nondominated solutions. The first algorithm outperforms the two conventional methods, the single objective method, and the two-stage algorithm. It achieves comparable results with the existing three well-known multi-objective algorithms in most cases. The second algorithm achieves better results than the first algorithm and all other methods mentioned previously.

855 citations


Proceedings Article
05 Dec 2013
TL;DR: In this paper, a simple unbiased estimator of any loss is provided, and performance bounds for empirical risk minimization in the presence of iid data with noisy labels are obtained, leading to an efficient algorithm for empirical minimization.
Abstract: In this paper, we theoretically study the problem of binary classification in the presence of random classification noise—the learner, instead of seeing the true labels, sees labels that have independently been flipped with some small probability. Moreover, random label noise is class-conditional— the flip probability depends on the class. We provide two approaches to suitably modify any given surrogate loss function. First, we provide a simple unbiased estimator of any loss, and obtain performance bounds for empirical risk minimization in the presence of iid data with noisy labels. If the loss function satisfies a simple symmetry condition, we show that the method leads to an efficient algorithm for empirical minimization. Second, by leveraging a reduction of risk minimization under noisy labels to classification with weighted 0-1 loss, we suggest the use of a simple weighted surrogate loss, for which we are able to obtain strong empirical risk bounds. This approach has a very remarkable consequence — methods used in practice such as biased SVM and weighted logistic regression are provably noise-tolerant. On a synthetic non-separable dataset, our methods achieve over 88% accuracy even when 40% of the labels are corrupted, and are competitive with respect to recently proposed methods for dealing with label noise in several benchmark datasets.

815 citations


Proceedings ArticleDOI
01 Sep 2013
TL;DR: This paper presents a method of image restoration for projective ground images which lie on a projection orthogonal to the camera axis and proposes instant estimation of a blur kernel arising from the projective transform and the subsequent interpolation of sparse data.
Abstract: This paper presents a method of image restoration for projective ground images which lie on a projection orthogonal to the camera axis. The ground images are initially transformed using homography, and then the proposed image restoration is applied. The process is performed in the dual-tree complex wavelet transform domain in conjunction with L0 reweighting and L2 minimisation (L0RL2) employed to solve this ill-posed problem. We also propose instant estimation of a blur kernel arising from the projective transform and the subsequent interpolation of sparse data. Subjective results show significant improvement of image quality. Furthermore, classification of surface type at various distances (evaluated using a support vector machine classifier) is also improved for the images restored using our proposed algorithm.

764 citations


Posted Content
TL;DR: The results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.
Abstract: Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and bioinformatics. For classification tasks, most of these "deep learning" models employ the softmax activation function for prediction and minimize cross-entropy loss. In this paper, we demonstrate a small but consistent advantage of replacing the softmax layer with a linear support vector machine. Learning minimizes a margin-based loss instead of the cross-entropy loss. While there have been various combinations of neural nets and SVMs in prior art, our results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.

760 citations


Journal ArticleDOI
TL;DR: An empirical comparison between SVM and ANN regarding document-level sentiment analysis is presented and it is indicated that ANN produce superior or at least comparable results to SVM's, even on the context of unbalanced data.
Abstract: Document-level sentiment classification aims to automate the task of classifying a textual review, which is given on a single topic, as expressing a positive or negative sentiment. In general, supervised methods consist of two stages: (i) extraction/selection of informative features and (ii) classification of reviews by using learning models like Support Vector Machines (SVM) and Nai@?ve Bayes (NB). SVM have been extensively and successfully used as a sentiment learning approach while Artificial Neural Networks (ANN) have rarely been considered in comparative studies in the sentiment analysis literature. This paper presents an empirical comparison between SVM and ANN regarding document-level sentiment analysis. We discuss requirements, resulting models and contexts in which both approaches achieve better levels of classification accuracy. We adopt a standard evaluation context with popular supervised methods for feature selection and weighting in a traditional bag-of-words model. Except for some unbalanced data contexts, our experiments indicated that ANN produce superior or at least comparable results to SVM's. Specially on the benchmark dataset of Movies reviews, ANN outperformed SVM by a statistically significant difference, even on the context of unbalanced data. Our results have also confirmed some potential limitations of both models, which have been rarely discussed in the sentiment classification literature, like the computational cost of SVM at the running time and ANN at the training time.

616 citations


Journal ArticleDOI
TL;DR: The SARIMA model coupled with a Kalman filter is the most accurate model; however, the proposed seasonal support vector regressor turns out to be highly competitive when performing forecasts during the most congested periods.
Abstract: The literature on short-term traffic flow forecasting has undergone great development recently. Many works, describing a wide variety of different approaches, which very often share similar features and ideas, have been published. However, publications presenting new prediction algorithms usually employ different settings, data sets, and performance measurements, making it difficult to infer a clear picture of the advantages and limitations of each model. The aim of this paper is twofold. First, we review existing approaches to short-term traffic flow forecasting methods under the common view of probabilistic graphical models, presenting an extensive experimental comparison, which proposes a common baseline for their performance analysis and provides the infrastructure to operate on a publicly available data set. Second, we present two new support vector regression models, which are specifically devised to benefit from typical traffic flow seasonality and are shown to represent an interesting compromise between prediction accuracy and computational efficiency. The SARIMA model coupled with a Kalman filter is the most accurate model; however, the proposed seasonal support vector regressor turns out to be highly competitive when performing forecasts during the most congested periods.

580 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: The decision function for verification is proposed to be viewed as a joint model of a distance metric and a locally adaptive thresholding rule, and the inference on the decision function is formulated as a second-order large-margin regularization problem, and an efficient algorithm is provided in its dual from.
Abstract: This paper considers the person verification problem in modern surveillance and video retrieval systems. The problem is to identify whether a pair of face or human body images is about the same person, even if the person is not seen before. Traditional methods usually look for a distance (or similarity) measure between images (e.g., by metric learning algorithms), and make decisions based on a fixed threshold. We show that this is nevertheless insufficient and sub-optimal for the verification problem. This paper proposes to learn a decision function for verification that can be viewed as a joint model of a distance metric and a locally adaptive thresholding rule. We further formulate the inference on our decision function as a second-order large-margin regularization problem, and provide an efficient algorithm in its dual from. We evaluate our algorithm on both human body verification and face verification problems. Our method outperforms not only the classical metric learning algorithm including LMNN and ITML, but also the state-of-the-art in the computer vision community.

533 citations


Journal ArticleDOI
TL;DR: This work proposes to learn more linearly separable and discriminative features from raw acoustic features and train linear SVMs, which are much easier and faster to train than kernel SVMs.
Abstract: Formulating speech separation as a binary classification problem has been shown to be effective. While good separation performance is achieved in matched test conditions using kernel support vector machines (SVMs), separation in unmatched conditions involving new speakers and environments remains a big challenge. A simple yet effective method to cope with the mismatch is to include many different acoustic conditions into the training set. However, large-scale training is almost intractable for kernel machines due to computational complexity. To enable training on relatively large datasets, we propose to learn more linearly separable and discriminative features from raw acoustic features and train linear SVMs, which are much easier and faster to train than kernel SVMs. For feature learning, we employ standard pre-trained deep neural networks (DNNs). The proposed DNN-SVM system is trained on a variety of acoustic conditions within a reasonable amount of time. Experiments on various test mixtures demonstrate good generalization to unseen speakers and background noises.

460 citations


Journal ArticleDOI
TL;DR: A new family of generalized composite kernels which exhibit great flexibility when combining the spectral and the spatial information contained in the hyperspectral data, without any weight parameters are constructed.
Abstract: This paper presents a new framework for the development of generalized composite kernel machines for hyperspectral image classification. We construct a new family of generalized composite kernels which exhibit great flexibility when combining the spectral and the spatial information contained in the hyperspectral data, without any weight parameters. The classifier adopted in this work is the multinomial logistic regression, and the spatial information is modeled from extended multiattribute profiles. In order to illustrate the good performance of the proposed framework, support vector machines are also used for evaluation purposes. Our experimental results with real hyperspectral images collected by the National Aeronautics and Space Administration Jet Propulsion Laboratory's Airborne Visible/Infrared Imaging Spectrometer and the Reflective Optics Spectrographic Imaging System indicate that the proposed framework leads to state-of-the-art classification performance in complex analysis scenarios.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper proposes a simple, efficient, and effective method to learn parts incrementally, starting from a single part occurrence with an Exemplar SVM, and can learn parts which are significantly more informative and for a fraction of the cost, compared to previous part-learning methods.
Abstract: The automatic discovery of distinctive parts for an object or scene class is challenging since it requires simultaneously to learn the part appearance and also to identify the part occurrences in images. In this paper, we propose a simple, efficient, and effective method to do so. We address this problem by learning parts incrementally, starting from a single part occurrence with an Exemplar SVM. In this manner, additional part instances are discovered and aligned reliably before being considered as training examples. We also propose entropy-rank curves as a means of evaluating the distinctiveness of parts shareable between categories and use them to select useful parts out of a set of candidates. We apply the new representation to the task of scene categorisation on the MIT Scene 67 benchmark. We show that our method can learn parts which are significantly more informative and for a fraction of the cost, compared to previous part-learning methods such as Singh et al. [28]. We also show that a well constructed bag of words or Fisher vector model can substantially outperform the previous state-of-the-art classification performance on this data.

Journal ArticleDOI
TL;DR: A novel PSO-SVM model has been proposed that hybridized the particle swarm optimization (PSO) and SVM to improve the EMG signal classification accuracy and validate the superiority of the SVM method compared to conventional machine learning methods.

Journal ArticleDOI
TL;DR: A new multifeature model, aiming to construct a support vector machine (SVM) ensemble combining multiple spectral and spatial features at both pixel and object levels is proposed, which provides more accurate classification results compared to the voting and probabilistic models.
Abstract: In recent years, the resolution of remotely sensed imagery has become increasingly high in both the spectral and spatial domains, which simultaneously provides more plentiful spectral and spatial information. Accordingly, the accurate interpretation of high-resolution imagery depends on effective integration of the spectral, structural and semantic features contained in the images. In this paper, we propose a new multifeature model, aiming to construct a support vector machine (SVM) ensemble combining multiple spectral and spatial features at both pixel and object levels. The features employed in this study include a gray-level co-occurrence matrix, differential morphological profiles, and an urban complexity index. Subsequently, three algorithms are proposed to integrate the multifeature SVMs: certainty voting, probabilistic fusion, and an object-based semantic approach, respectively. The proposed algorithms are compared with other multifeature SVM methods including the vector stacking, feature selection, and composite kernels. Experiments are conducted on the hyperspectral digital imagery collection experiment DC Mall data set and two WorldView-2 data sets. It is found that the multifeature model with semantic-based postprocessing provides more accurate classification results (an accuracy improvement of 1-4% for the three experimental data sets) compared to the voting and probabilistic models.

Journal ArticleDOI
TL;DR: D-ADMM is proven to converge when the network is bipartite or when all the functions are strongly convex, although in practice, convergence is observed even when these conditions are not met.
Abstract: We propose a distributed algorithm, named Distributed Alternating Direction Method of Multipliers (D-ADMM), for solving separable optimization problems in networks of interconnected nodes or agents. In a separable optimization problem there is a private cost function and a private constraint set at each node. The goal is to minimize the sum of all the cost functions, constraining the solution to be in the intersection of all the constraint sets. D-ADMM is proven to converge when the network is bipartite or when all the functions are strongly convex, although in practice, convergence is observed even when these conditions are not met. We use D-ADMM to solve the following problems from signal processing and control: average consensus, compressed sensing, and support vector machines. Our simulations show that D-ADMM requires less communications than state-of-the-art algorithms to achieve a given accuracy level. Algorithms with low communication requirements are important, for example, in sensor networks, where sensors are typically battery-operated and communicating is the most energy consuming operation.

Journal ArticleDOI
TL;DR: Two distance-based classifiers, the k-nearest neighbor (k-NN) and nearest class mean (NCM) classifiers are considered, and a new metric learning approach is introduced for the latter, and an extension of the NCM classifier is introduced to allow for richer class representations.
Abstract: We study large-scale image classification methods that can incorporate new classes and training images continuously over time at negligible cost. To this end, we consider two distance-based classifiers, the k-nearest neighbor (k-NN) and nearest class mean (NCM) classifiers, and introduce a new metric learning approach for the latter. We also introduce an extension of the NCM classifier to allow for richer class representations. Experiments on the ImageNet 2010 challenge dataset, which contains over 106 training images of 1,000 classes, show that, surprisingly, the NCM classifier compares favorably to the more flexible k-NN classifier. Moreover, the NCM performance is comparable to that of linear SVMs which obtain current state-of-the-art performance. Experimentally, we study the generalization performance to classes that were not used to learn the metrics. Using a metric learned on 1,000 classes, we show results for the ImageNet-10K dataset which contains 10,000 classes, and obtain performance that is competitive with the current state-of-the-art while being orders of magnitude faster. Furthermore, we show how a zero-shot class prior based on the ImageNet hierarchy can improve performance when few training images are available.

Journal ArticleDOI
01 Feb 2013
TL;DR: A forecasting model based on chaotic mapping, firefly algorithm, and support vector regression (SVR) is proposed to predict stock market price and performs best based on two error measures, namely mean squared error (MSE) and mean absolute percent error (MAPE).
Abstract: Due to the inherent non-linearity and non-stationary characteristics of financial stock market price time series, conventional modeling techniques such as the Box-Jenkins autoregressive integrated moving average (ARIMA) are not adequate for stock market price forecasting. In this paper, a forecasting model based on chaotic mapping, firefly algorithm, and support vector regression (SVR) is proposed to predict stock market price. The forecasting model has three stages. In the first stage, a delay coordinate embedding method is used to reconstruct unseen phase space dynamics. In the second stage, a chaotic firefly algorithm is employed to optimize SVR hyperparameters. Finally in the third stage, the optimized SVR is used to forecast stock market price. The significance of the proposed algorithm is 3-fold. First, it integrates both chaos theory and the firefly algorithm to optimize SVR hyperparameters, whereas previous studies employ a genetic algorithm (GA) to optimize these parameters. Second, it uses a delay coordinate embedding method to reconstruct phase space dynamics. Third, it has high prediction accuracy due to its implementation of structural risk minimization (SRM). To show the applicability and superiority of the proposed algorithm, we selected the three most challenging stock market time series data from NASDAQ historical quotes, namely Intel, National Bank shares and Microsoft daily closed (last) stock price, and applied the proposed algorithm to these data. Compared with genetic algorithm-based SVR (SVR-GA), chaotic genetic algorithm-based SVR (SVR-CGA), firefly-based SVR (SVR-FA), artificial neural networks (ANNs) and adaptive neuro-fuzzy inference systems (ANFIS), the proposed model performs best based on two error measures, namely mean squared error (MSE) and mean absolute percent error (MAPE).

Journal ArticleDOI
TL;DR: Two important improvements to the SVR based load forecasting method are introduced, i.e., procedure for generation of model inputs and subsequent model input selection using feature selection algorithms and the use of the particle swarm global optimization based technique for the optimization of SVR hyper-parameters reduces the operator interaction.
Abstract: This paper presents a generic strategy for short-term load forecasting (STLF) based on the support vector regression machines (SVR). Two important improvements to the SVR based load forecasting method are introduced, i.e., procedure for generation of model inputs and subsequent model input selection using feature selection algorithms. One of the objectives of the proposed strategy is to reduce the operator interaction in the model-building procedure. The proposed use of feature selection algorithms for automatic model input selection and the use of the particle swarm global optimization based technique for the optimization of SVR hyper-parameters reduces the operator interaction. To confirm the effectiveness of the proposed modeling strategy, the model has been trained and tested on two publicly available and well-known load forecasting data sets and compared to the state-of-the-art STLF algorithms yielding improved accuracy.

Journal ArticleDOI
TL;DR: Novel cooperative spectrum sensing algorithms for cognitive radio (CR) networks based on machine learning techniques which are used for pattern classification outperform the existing state-of-the-art CSS techniques.
Abstract: We propose novel cooperative spectrum sensing (CSS) algorithms for cognitive radio (CR) networks based on machine learning techniques which are used for pattern classification. In this regard, unsupervised (e.g., K-means clustering and Gaussian mixture model (GMM)) and supervised (e.g., support vector machine (SVM) and weighted K-nearest-neighbor (KNN)) learning-based classification techniques are implemented for CSS. For a radio channel, the vector of the energy levels estimated at CR devices is treated as a feature vector and fed into a classifier to decide whether the channel is available or not. The classifier categorizes each feature vector into either of the two classes, namely, the "channel available class" and the "channel unavailable class". Prior to the online classification, the classifier needs to go through a training phase. For classification, the K-means clustering algorithm partitions the training feature vectors into K clusters, where each cluster corresponds to a combined state of primary users (PUs) and then the classifier determines the class the test energy vector belongs to. The GMM obtains a mixture of Gaussian density functions that well describes the training feature vectors. In the case of the SVM, the support vectors (i.e., a subset of training vectors which fully specify the decision function) are obtained by maximizing the margin between the separating hyperplane and the training feature vectors. Furthermore, the weighted KNN classification technique is proposed for CSS for which the weight of each feature vector is calculated by evaluating the area under the receiver operating characteristic (ROC) curve of that feature vector. The performance of each classification technique is quantified in terms of the average training time, the sample classification delay, and the ROC curve. Our comparative results clearly reveal that the proposed algorithms outperform the existing state-of-the-art CSS techniques.

Journal ArticleDOI
TL;DR: A new visualization approach based on a Sensitivity Analysis (SA) to extract human understandable knowledge from supervised learning black box data mining models, such as Neural Networks, Support Vector Machines and ensembles, including Random Forests.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: A set of features derived from skeleton tracking of the human body and depth maps for the purpose of action recognition are proposed, and a new descriptor for spatio-temporal feature extraction from color and depth images is introduced.
Abstract: We propose a set of features derived from skeleton tracking of the human body and depth maps for the purpose of action recognition. The descriptors proposed are easy to implement, produce relatively small-sized feature sets, and the multi-class classification scheme is fast and suitable for real-time applications. We intuitively characterize actions using pairwise affinities between view-invariant joint angles features over the performance of an action. Additionally, a new descriptor for spatio-temporal feature extraction from color and depth images is introduced. This descriptor involves an application of a modified histogram of oriented gradients (HOG) algorithm. The application produces a feature set at every frame, and these features are collected into a 2D array which then the same algorithm is applied to again (the approach is termed HOG2). Both feature sets are evaluated in a bag-of-words scheme using a linear SVM, showing state-of-the-art results on public datasets from different domains of human-computer interaction.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: A novel approach based on Support Vector Machine and Bayesian filtering is proposed for online lane change intention prediction that is able to predict driver intention to change lanes on average 1.3 seconds in advance, with a maximum prediction horizon of 3.29 seconds.
Abstract: Predicting driver behavior is a key component for Advanced Driver Assistance Systems (ADAS). In this paper, a novel approach based on Support Vector Machine and Bayesian filtering is proposed for online lane change intention prediction. The approach uses the multiclass probabilistic outputs of the Support Vector Machine as an input to the Bayesian filter, and the output of the Bayesian filter is used for the final prediction of lane changes. A lane tracker integrated in a passenger vehicle is used for real-world data collection for the purpose of training and testing. Data from different drivers on different highways were used to evaluate the robustness of the approach. The results demonstrate that the proposed approach is able to predict driver intention to change lanes on average 1.3 seconds in advance, with a maximum prediction horizon of 3.29 seconds.

Proceedings ArticleDOI
11 Aug 2013
TL;DR: This work applies two modifications in order to make one-class SVMs more suitable for unsupervised anomaly detection: Robust one- Class SVMs and eta one- class SVMs, with the key idea, that outliers should contribute less to the decision boundary as normal instances.
Abstract: Support Vector Machines (SVMs) have been one of the most successful machine learning techniques for the past decade. For anomaly detection, also a semi-supervised variant, the one-class SVM, exists. Here, only normal data is required for training before anomalies can be detected. In theory, the one-class SVM could also be used in an unsupervised anomaly detection setup, where no prior training is conducted. Unfortunately, it turns out that a one-class SVM is sensitive to outliers in the data. In this work, we apply two modifications in order to make one-class SVMs more suitable for unsupervised anomaly detection: Robust one-class SVMs and eta one-class SVMs. The key idea of both modifications is, that outliers should contribute less to the decision boundary as normal instances. Experiments performed on datasets from UCI machine learning repository show that our modifications are very promising: Comparing with other standard unsupervised anomaly detection algorithms, the enhanced one-class SVMs are superior on two out of four datasets. In particular, the proposed eta one-class SVM has shown the most promising results.

Journal ArticleDOI
TL;DR: A new kernel is derived by establishing a connection with the Riemannian geometry of symmetric positive definite matrices, effectively replacing the traditional spatial filtering approach for motor imagery EEG-based classification in brain-computer interface applications.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: A transductive learning method is introduced, which is referred to Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific biases and achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject.
Abstract: Automatic facial action unit (AFA) detection from video is a long-standing problem in facial expression analysis. Most approaches emphasize choices of features and classifiers. They neglect individual differences in target persons. People vary markedly in facial morphology (e.g., heavy versus delicate brows, smooth versus deeply etched wrinkles) and behavior. Individual differences can dramatically influence how well generic classifiers generalize to previously unseen persons. While a possible solution would be to train person-specific classifiers, that often is neither feasible nor theoretically compelling. The alternative that we propose is to personalize a generic classifier in an unsupervised manner (no additional labels for the test subjects are required). We introduce a transductive learning method, which we refer to Selective Transfer Machine (STM), to personalize a generic classifier by attenuating person-specific biases. STM achieves this effect by simultaneously learning a classifier and re-weighting the training samples that are most relevant to the test subject. To evaluate the effectiveness of STM, we compared STM to generic classifiers and to cross-domain learning methods in three major databases: CK+, GEMEP-FERA and RU-FACS. STM outperformed generic classifiers in all.

Journal ArticleDOI
TL;DR: A framework to classify time series based on a bag-of-features representation (TSBF) that provides a feature-based approach that can handle warping (although differently from DTW), and experimental results show that TSBF provides better results than competitive methods on benchmark datasets from the UCR time series database.
Abstract: Time series classification is an important task with many challenging applications. A nearest neighbor (NN) classifier with dynamic time warping (DTW) distance is a strong solution in this context. On the other hand, feature-based approaches have been proposed as both classifiers and to provide insight into the series, but these approaches have problems handling translations and dilations in local patterns. Considering these shortcomings, we present a framework to classify time series based on a bag-of-features representation (TSBF). Multiple subsequences selected from random locations and of random lengths are partitioned into shorter intervals to capture the local information. Consequently, features computed from these subsequences measure properties at different locations and dilations when viewed from the original series. This provides a feature-based approach that can handle warping (although differently from DTW). Moreover, a supervised learner (that handles mixed data types, different units, etc.) integrates location information into a compact codebook through class probability estimates. Additionally, relevant global features can easily supplement the codebook. TSBF is compared to NN classifiers and other alternatives (bag-of-words strategies, sparse spatial sample kernels, shapelets). Our experimental results show that TSBF provides better results than competitive methods on benchmark datasets from the UCR time series database.

Journal ArticleDOI
TL;DR: A novel nonparametric approach for traffic classification is proposed which can improve the classification performance effectively by incorporating correlated information into the classification process and its performance benefit from both theoretical and empirical perspectives.
Abstract: Traffic classification has wide applications in network management, from security monitoring to quality of service measurements. Recent research tends to apply machine learning techniques to flow statistical feature based classification methods. The nearest neighbor (NN)-based method has exhibited superior classification performance. It also has several important advantages, such as no requirements of training procedure, no risk of overfitting of parameters, and naturally being able to handle a huge number of classes. However, the performance of NN classifier can be severely affected if the size of training data is small. In this paper, we propose a novel nonparametric approach for traffic classification, which can improve the classification performance effectively by incorporating correlated information into the classification process. We analyze the new classification approach and its performance benefit from both theoretical and empirical perspectives. A large number of experiments are carried out on two real-world traffic data sets to validate the proposed approach. The results show the traffic classification performance can be improved significantly even under the extreme difficult circumstance of very few training samples.

Journal ArticleDOI
TL;DR: In this paper, a support vector machine (SVM) was used to estimate the state of charge (SOC) of a high capacity LiFeMnPO4 battery cell from an experimental dataset using a SVM approach.
Abstract: The aim of this study is to estimate the state of charge (SOC) of a high-capacity lithium iron manganese phosphate (LiFeMnPO4) battery cell from an experimental dataset using a support vector machine (SVM) approach. SVM is a type of learning machine based on statistical learning theory. Many applications require accurate measurement of battery SOC in order to give users an indication of available runtime. It is particularly important for electric vehicles or portable devices. In this paper, the proposed SOC estimator extracts model parameters from battery charging/discharging testing cycles, using cell current, cell voltage, and cell temperature as independent variables. Tests are carried out on a 60 Ah lithium-ion cell with the dynamic stress test cycle to set up the SVM model. The SVM SOC estimator maintains a high level of accuracy, better than 6% over all ranges of operation, whether the battery is charged/discharged at constant current or it is operating in a variable current profile.

01 Jan 2013
TL;DR: A depth image based real-time skeleton fitting algorithm for the hand, using an object recognition by parts approach, and the use of this hand modeler in an American Sign Language (ASL) digit recognition application are described.
Abstract: This paper describes a depth image based real-time skeleton fitting algorithm for the hand, using an object recognition by parts approach, and the use of this hand modeler in an American Sign Language (ASL) digit recognition application. In particular, we created a realistic 3D hand model that represents the hand with 21 different parts. Random decision forests (RDF) are trained on synthetic depth images generated by animating the hand model, which are then used to perform per pixel classification and assign each pixel to a hand part. The classification results are fed into a local mode finding algorithm to estimate the joint locations for the hand skeleton. The system can process depth images retrieved from Kinect in real-time at 30 fps. As an application of the system, we also describe a support vector machine (SVM) based recognition module for the ten digits of ASL based on our method, which attains a recognition rate of 99.9% on live depth images in real-time1.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper revisits some of the core assumptions in HOG+SVM and shows that by properly designing the feature pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for pedestrian detections, using a single rigid component.
Abstract: The current state of the art solutions for object detection describe each class by a set of models trained on discovered sub-classes (so called "components"), with each model itself composed of collections of interrelated parts (deformable models). These detectors build upon the now classic Histogram of Oriented Gradients+linear SVM combo. In this paper we revisit some of the core assumptions in HOG+SVM and show that by properly designing the feature pooling, feature selection, preprocessing, and training methods, it is possible to reach top quality, at least for pedestrian detections, using a single rigid component. Abstract We provide experiments for a large design space, that give insights into the design of classifiers, as well as relevant information for practitioners. Our best detector is fully feed-forward, has a single unified architecture, uses only histograms of oriented gradients and colour information in monocular static images, and improves over 23 other methods on the INRIA, ETH and Caltech-USA datasets, reducing the average miss-rate over HOG+SVM by more than 30%.