scispace - formally typeset
Search or ask a question

Showing papers on "Support vector machine published in 2012"


Proceedings Article
03 Dec 2012
TL;DR: This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms.
Abstract: The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a "black art" requiring expert experience, rules of thumb, or sometimes brute-force search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expertlevel performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.

5,654 citations


Journal ArticleDOI
01 Apr 2012
TL;DR: ELM provides a unified learning platform with a widespread type of feature mappings and can be applied in regression and multiclass classification applications directly and in theory, ELM can approximate any target continuous function and classify any disjoint regions.
Abstract: Due to the simplicity of their implementations, least square support vector machine (LS-SVM) and proximal support vector machine (PSVM) have been widely used in binary classification applications. The conventional LS-SVM and PSVM cannot be used in regression and multiclass classification applications directly, although variants of LS-SVM and PSVM have been proposed to handle such cases. This paper shows that both LS-SVM and PSVM can be simplified further and a unified learning framework of LS-SVM, PSVM, and other regularization algorithms referred to extreme learning machine (ELM) can be built. ELM works for the “generalized” single-hidden-layer feedforward networks (SLFNs), but the hidden layer (or called feature mapping) in ELM need not be tuned. Such SLFNs include but are not limited to SVM, polynomial network, and the conventional feedforward neural networks. This paper shows the following: 1) ELM provides a unified learning platform with a widespread type of feature mappings and can be applied in regression and multiclass classification applications directly; 2) from the optimization method point of view, ELM has milder optimization constraints compared to LS-SVM and PSVM; 3) in theory, compared to ELM, LS-SVM and PSVM achieve suboptimal solutions and require higher computational complexity; and 4) in theory, ELM can approximate any target continuous function and classify any disjoint regions. As verified by the simulation results, ELM tends to have better scalability and achieve similar (for regression and binary class cases) or much better (for multiclass cases) generalization performance at much faster learning speed (up to thousands times) than traditional SVM and LS-SVM.

4,835 citations


Book ChapterDOI
07 Oct 2012
TL;DR: Using the well-established theory of Circulant matrices, this work provides a link to Fourier analysis that opens up the possibility of extremely fast learning and detection with the Fast Fourier Transform, which can be done in the dual space of kernel machines as fast as with linear classifiers.
Abstract: Recent years have seen greater interest in the use of discriminative classifiers in tracking systems, owing to their success in object detection. They are trained online with samples collected during tracking. Unfortunately, the potentially large number of samples becomes a computational burden, which directly conflicts with real-time requirements. On the other hand, limiting the samples may sacrifice performance. Interestingly, we observed that, as we add more and more samples, the problem acquires circulant structure. Using the well-established theory of Circulant matrices, we provide a link to Fourier analysis that opens up the possibility of extremely fast learning and detection with the Fast Fourier Transform. This can be done in the dual space of kernel machines as fast as with linear classifiers. We derive closed-form solutions for training and detection with several types of kernels, including the popular Gaussian and polynomial kernels. The resulting tracker achieves performance competitive with the state-of-the-art, can be implemented with only a few lines of code and runs at hundreds of frames-per-second. MATLAB code is provided in the paper (see Algorithm 1).

2,197 citations


Book ChapterDOI
Léon Bottou1
01 Jan 2012
TL;DR: This chapter provides background material, explains why SGD is a good learning algorithm when the training set is large, and provides useful recommendations.
Abstract: Chapter 1 strongly advocates the stochastic back-propagation method to train neural networks. This is in fact an instance of a more general technique called stochastic gradient descent (SGD). This chapter provides background material, explains why SGD is a good learning algorithm when the training set is large, and provides useful recommendations.

1,666 citations


Posted Content
TL;DR: In this paper, a learning algorithm's generalization performance is modeled as a sample from a Gaussian process and the tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next.
Abstract: Machine learning algorithms frequently require careful tuning of model hyperparameters, regularization terms, and optimization parameters. Unfortunately, this tuning is often a "black art" that requires expert experience, unwritten rules of thumb, or sometimes brute-force search. Much more appealing is the idea of developing automatic approaches which can optimize the performance of a given learning algorithm to the task at hand. In this work, we consider the automatic tuning problem within the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). The tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next. Here we show how the effects of the Gaussian process prior and the associated inference procedure can have a large impact on the success or failure of Bayesian optimization. We show that thoughtful choices can lead to results that exceed expert-level performance in tuning machine learning algorithms. We also describe new algorithms that take into account the variable cost (duration) of learning experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization on a diverse set of contemporary algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.

1,110 citations


Proceedings Article
03 Dec 2012
TL;DR: In this paper, a Deep Boltzmann Machine (DBM) is proposed for learning a generative model of data that consists of multiple and diverse input modalities, which can be used to extract a unified representation that fuses modalities together.
Abstract: A Deep Boltzmann Machine is described for learning a generative model of data that consists of multiple and diverse input modalities. The model can be used to extract a unified representation that fuses modalities together. We find that this representation is useful for classification and information retrieval tasks. The model works by learning a probability density over the space of multimodal inputs. It uses states of latent variables as representations of the input. The model can extract this representation even when some modalities are absent by sampling from the conditional distribution over them and filling them in. Our experimental results on bi-modal data consisting of images and text show that the Multimodal DBM can learn a good generative model of the joint space of image and text inputs that is useful for information retrieval from both unimodal and multimodal queries. We further demonstrate that this model significantly outperforms SVMs and LDA on discriminative tasks. Finally, we compare our model to other deep learning methods, including autoencoders and deep belief networks, and show that it achieves noticeable gains.

1,002 citations


Journal ArticleDOI
TL;DR: This paper proposes an alternative and well-known machine learning tool-ensemble classifiers implemented as random forests-and argues that they are ideally suited for steganalysis.
Abstract: Today, the most accurate steganalysis methods for digital media are built as supervised classifiers on feature vectors extracted from the media. The tool of choice for the machine learning seems to be the support vector machine (SVM). In this paper, we propose an alternative and well-known machine learning tool-ensemble classifiers implemented as random forests-and argue that they are ideally suited for steganalysis. Ensemble classifiers scale much more favorably w.r.t. the number of training examples and the feature dimensionality with performance comparable to the much more complex SVMs. The significantly lower training complexity opens up the possibility for the steganalyst to work with rich (high-dimensional) cover models and train on larger training sets-two key elements that appear necessary to reliably detect modern steganographic algorithms. Ensemble classification is portrayed here as a powerful developer tool that allows fast construction of steganography detectors with markedly improved detection accuracy across a wide range of embedding methods. The power of the proposed framework is demonstrated on three steganographic methods that hide messages in JPEG images.

967 citations


Journal ArticleDOI
TL;DR: This work introduces explicit feature maps for the additive class of kernels, such as the intersection, Hellinger's, and χ2 kernels, commonly used in computer vision, and enables their use in large scale problems.
Abstract: Large scale nonlinear support vector machines (SVMs) can be approximated by linear ones using a suitable feature map. The linear SVMs are in general much faster to learn and evaluate (test) than the original nonlinear SVMs. This work introduces explicit feature maps for the additive class of kernels, such as the intersection, Hellinger's, and χ2 kernels, commonly used in computer vision, and enables their use in large scale problems. In particular, we: 1) provide explicit feature maps for all additive homogeneous kernels along with closed form expression for all common kernels; 2) derive corresponding approximate finite-dimensional feature maps based on a spectral analysis; and 3) quantify the error of the approximation, showing that the error is independent of the data dimension and decays exponentially fast with the approximation order for selected kernels such as χ2. We demonstrate that the approximations have indistinguishable performance from the full kernels yet greatly reduce the train/test times of SVMs. We also compare with two other approximation methods: Nystrom's approximation of Perronnin et al. [1], which is data dependent, and the explicit map of Maji and Berg [2] for the intersection kernel, which, as in the case of our approximations, is data independent. The approximations are evaluated on a number of standard data sets, including Caltech-101 [3], Daimler-Chrysler pedestrians [4], and INRIA pedestrians [5].

804 citations


Book ChapterDOI
03 Dec 2012
TL;DR: This paper presents a system for human physical Activity Recognition using smartphone inertial sensors and proposes a novel hardware-friendly approach for multiclass classification that adapts the standard Support Vector Machine and exploits fixed-point arithmetic for computational cost reduction.
Abstract: Activity-Based Computing [1] aims to capture the state of the user and its environment by exploiting heterogeneous sensors in order to provide adaptation to exogenous computing resources. When these sensors are attached to the subject's body, they permit continuous monitoring of numerous physiological signals. This has appealing use in healthcare applications, e.g. the exploitation of Ambient Intelligence (AmI) in daily activity monitoring for elderly people. In this paper, we present a system for human physical Activity Recognition (AR) using smartphone inertial sensors. As these mobile phones are limited in terms of energy and computing power, we propose a novel hardware-friendly approach for multiclass classification. This method adapts the standard Support Vector Machine (SVM) and exploits fixed-point arithmetic for computational cost reduction. A comparison with the traditional SVM shows a significant improvement in terms of computational costs while maintaining similar accuracy, which can contribute to develop more sustainable systems for AmI.

802 citations


Journal ArticleDOI
TL;DR: In this paper, pixel-based and object-based image analysis approaches for classifying broad land cover classes over agricultural landscapes are compared using three supervised machine learning algorithms: decision tree (DT), random forest (RF), and the support vector machine (SVM).

785 citations


Proceedings Article
26 Jun 2012
TL;DR: In this paper, the authors investigate a family of poisoning attacks against Support Vector Machines (SVM) and demonstrate that an intelligent adversary can predict the change of the SVM's decision function due to malicious input and use this ability to construct malicious data.
Abstract: We investigate a family of poisoning attacks against Support Vector Machines (SVM). Such attacks inject specially crafted training data that increases the SVM's test error. Central to the motivation for these attacks is the fact that most learning algorithms assume that their training data comes from a natural or well-behaved distribution. However, this assumption does not generally hold in security-sensitive settings. As we demonstrate, an intelligent adversary can, to some extent, predict the change of the SVM's decision function due to malicious input and use this ability to construct malicious data. The proposed attack uses a gradient ascent strategy in which the gradient is computed based on properties of the SVM's optimal solution. This method can be kernelized and enables the attack to be constructed in the input space even for non-linear kernels. We experimentally demonstrate that our gradient ascent procedure reliably identifies good local maxima of the non-convex validation error surface, which significantly increases the classifier's test error.

Posted Content
TL;DR: It is demonstrated that an intelligent adversary can, to some extent, predict the change of the SVM's decision function due to malicious input and use this ability to construct malicious data.
Abstract: We investigate a family of poisoning attacks against Support Vector Machines (SVM). Such attacks inject specially crafted training data that increases the SVM's test error. Central to the motivation for these attacks is the fact that most learning algorithms assume that their training data comes from a natural or well-behaved distribution. However, this assumption does not generally hold in security-sensitive settings. As we demonstrate, an intelligent adversary can, to some extent, predict the change of the SVM's decision function due to malicious input and use this ability to construct malicious data. The proposed attack uses a gradient ascent strategy in which the gradient is computed based on properties of the SVM's optimal solution. This method can be kernelized and enables the attack to be constructed in the input space even for non-linear kernels. We experimentally demonstrate that our gradient ascent procedure reliably identifies good local maxima of the non-convex validation error surface, which significantly increases the classifier's test error.

Posted Content
TL;DR: A wavelet scattering network as discussed by the authors computes a translation invariant image representation, which is stable to deformations and preserves high frequency information for classification, cascading wavelet transform convolutions with nonlinear modulus and averaging operators.
Abstract: A wavelet scattering network computes a translation invariant image representation, which is stable to deformations and preserves high frequency information for classification. It cascades wavelet transform convolutions with non-linear modulus and averaging operators. The first network layer outputs SIFT-type descriptors whereas the next layers provide complementary invariant information which improves classification. The mathematical analysis of wavelet scattering networks explains important properties of deep convolution networks for classification. A scattering representation of stationary processes incorporates higher order moments and can thus discriminate textures having the same Fourier power spectrum. State of the art classification results are obtained for handwritten digits and texture discrimination, using a Gaussian kernel SVM and a generative PCA classifier.

Journal ArticleDOI
TL;DR: A hybrid model of integrating the synergy of two superior classifiers: Convolutional Neural Network (CNN) and Support Vector Machine (SVM) which have proven results in recognizing different types of patterns is presented.

Journal ArticleDOI
TL;DR: This paper proposes a general methodology, namely multi-modal multi-task (M3T) learning, to jointly predict multiple variables from multi- modal data, which can achieve better performance on both regression and classification tasks than the conventional learning methods.

Journal ArticleDOI
TL;DR: Comprehensive experiments on three domain adaptation data sets demonstrate that DTMKL-based methods outperform existing cross-domain learning and multiple kernel learning methods.
Abstract: Cross-domain learning methods have shown promising results by leveraging labeled patterns from the auxiliary domain to learn a robust classifier for the target domain which has only a limited number of labeled samples. To cope with the considerable change between feature distributions of different domains, we propose a new cross-domain kernel learning framework into which many existing kernel methods can be readily incorporated. Our framework, referred to as Domain Transfer Multiple Kernel Learning (DTMKL), simultaneously learns a kernel function and a robust classifier by minimizing both the structural risk functional and the distribution mismatch between the labeled and unlabeled samples from the auxiliary and target domains. Under the DTMKL framework, we also propose two novel methods by using SVM and prelearned classifiers, respectively. Comprehensive experiments on three domain adaptation data sets (i.e., TRECVID, 20 Newsgroups, and email spam data sets) demonstrate that DTMKL-based methods outperform existing cross-domain learning and multiple kernel learning methods.

Proceedings Article
03 Dec 2012
TL;DR: A new loss-augmented inference algorithm that is quadratic in the code length and inspired by latent structural SVMs is developed, showing strong retrieval performance on CIFAR-10 and MNIST, with promising classification results using no more than kNN on the binary codes.
Abstract: Motivated by large-scale multimedia applications we propose to learn mappings from high-dimensional data to binary codes that preserve semantic similarity. Binary codes are well suited to large-scale applications as they are storage efficient and permit exact sub-linear kNN search. The framework is applicable to broad families of mappings, and uses a flexible form of triplet ranking loss. We overcome discontinuous optimization of the discrete mappings by minimizing a piecewise-smooth upper bound on empirical loss, inspired by latent structural SVMs. We develop a new loss-augmented inference algorithm that is quadratic in the code length. We show strong retrieval performance on CIFAR-10 and MNIST, with promising classification results using no more than kNN on the binary codes.

Journal ArticleDOI
TL;DR: Different approaches to each of these phases that are able to deal with the regression problem are discussed, categorizing them in terms of their relevant characteristics and linking them to contributions from different fields.
Abstract: The goal of ensemble regression is to combine several models in order to improve the prediction accuracy in learning problems with a numerical target variable. The process of ensemble learning can be divided into three phases: the generation phase, the pruning phase, and the integration phase. We discuss different approaches to each of these phases that are able to deal with the regression problem, categorizing them in terms of their relevant characteristics and linking them to contributions from different fields. Furthermore, this work makes it possible to identify interesting areas for future research.

Journal ArticleDOI
TL;DR: This work introduced a machine learning approach, namely an ensemble of classifiers, to solve a gas discrimination problem over extended periods of time with high accuracy rates and performs better than the baseline competing methods.
Abstract: Sensor drift remains to be the most challenging problem in chemical sensing. To address this problem we have collected an extensive dataset for six different volatile organic compounds over a period of three years under tightly controlled operating conditions using an array of 16 metal-oxide gas sensors. The recordings were made using the same sensor array and a robust gas delivery system. To the best of our knowledge, this is one of the most comprehensive datasets available for the design and development of drift compensation methods, which is freely reachable on-line. We introduced a machine learning approach, namely an ensemble of classifiers, to solve a gas discrimination problem over extended periods of time with high accuracy rates. Experiments clearly indicate the presence of drift in the sensors during the period of three years and that it degrades the performance of the classifiers. Our proposed ensemble method based on support vector machines uses a weighted combination of classifiers trained at different points of time. As our experimental results illustrate, the ensemble of classifiers is able to cope well with sensor drift and performs better than the baseline competing methods.

Journal ArticleDOI
TL;DR: Support vector machine (SVM) was applied for land-cover characterization using MODIS time-series data and indicated that SVM’s had superior generalization capability, particularly with respect to small training sample sizes.
Abstract: Support vector machine (SVM) was applied for land-cover characterization using MODIS time-series data Classification performance was examined with respect to training sample size, sample variability, and landscape homogeneity (purity) The results were compared to two conventional nonparametric image classification algorithms: multilayer perceptron neural networks (NN) and classification and regression trees (CART) For 2001 MODIS time-series data, SVM generated overall accuracies ranging from 77% to 80% for training sample sizes from 20 to 800 pixels per class, compared to 67–76% and 62–73% for NN and CART, respectively These results indicated that SVM’s had superior generalization capability, particularly with respect to small training sample sizes There was also less variability of SVM performance when classification trials were repeated using different training sets Additionally, classification accuracies were directly related to sample homogeneity/heterogeneity The overall accuracies for the SVM algorithm were 91% (Kappa = 077) and 64% (Kappa = 034) for homogeneous and heterogeneous pixels, respectively The inclusion of heterogeneous pixels in the training sample did not increase overall accuracies Also, the SVM performance was examined for the classification of multiple year MODIS time-series data at annual intervals Finally, using only the SVM output values, a method was developed to directly classify pixel purity Approximately 65% of pixels within the Albemarle–Pamlico Basin study area were labeled as “functionally homogeneous” with an overall classification accuracy of 91% (Kappa = 079) The results indicated a high potential for regional scale operational land-cover characterization applications

Proceedings ArticleDOI
16 Jun 2012
TL;DR: A complex human activity dataset depicting two person interactions, including synchronized video, depth and motion capture data is created, and techniques related to Multiple Instance Learning (MIL) are explored, finding that the MIL based classifier outperforms SVMs when the sequences extend temporally around the interaction of interest.
Abstract: Human activity recognition has potential to impact a wide range of applications from surveillance to human computer interfaces to content based video retrieval. Recently, the rapid development of inexpensive depth sensors (e.g. Microsoft Kinect) provides adequate accuracy for real-time full-body human tracking for activity recognition applications. In this paper, we create a complex human activity dataset depicting two person interactions, including synchronized video, depth and motion capture data. Moreover, we use our dataset to evaluate various features typically used for indexing and retrieval of motion capture data, in the context of real-time detection of interaction activities via Support Vector Machines (SVMs). Experimentally, we find that the geometric relational features based on distance between all pairs of joints outperforms other feature choices. For whole sequence classification, we also explore techniques related to Multiple Instance Learning (MIL) in which the sequence is represented by a bag of body-pose features. We find that the MIL based classifier outperforms SVMs when the sequences extend temporally around the interaction of interest.

Journal ArticleDOI
TL;DR: The proposed framework employs local Fisher's discriminant analysis to reduce the dimensionality of the data while preserving its multi-dimensional structure, while a subsequent Gaussian mixture model or support vector machine provides effective classification of the reduced-dimension multimodal data.
Abstract: Hyperspectral imagery typically provides a wealth of information captured in a wide range of the electromagnetic spectrum for each pixel in the image; however, when used in statistical pattern-classification tasks, the resulting high-dimensional feature spaces often tend to result in ill-conditioned formulations. Popular dimensionality-reduction techniques such as principal component analysis, linear discriminant analysis, and their variants typically assume a Gaussian distribution. The quadratic maximum-likelihood classifier commonly employed for hyperspectral analysis also assumes single-Gaussian class-conditional distributions. Departing from this single-Gaussian assumption, a classification paradigm designed to exploit the rich statistical structure of the data is proposed. The proposed framework employs local Fisher's discriminant analysis to reduce the dimensionality of the data while preserving its multimodal structure, while a subsequent Gaussian mixture model or support vector machine provides effective classification of the reduced-dimension multimodal data. Experimental results on several different multiple-class hyperspectral-classification tasks demonstrate that the proposed approach significantly outperforms several traditional alternatives.

Journal ArticleDOI
TL;DR: GPR proved to be a fast and accurate nonlinear retrieval algorithm that can be potentially implemented for operational monitoring applications and provided confidence intervals of the estimates and insight in relevant bands, which are key advantages over the other methods.

Journal ArticleDOI
Caifeng Shan1
TL;DR: This paper investigates gender recognition on real-life faces using the recently built database, the Labeled Faces in the Wild (LFW), and local Binary Patterns (LBP) is employed to describe faces, and Adaboost is used to select the discriminative LBP features.

Journal ArticleDOI
TL;DR: The proposed system is accurate at high vehicle speeds, operates under a range of weather conditions, runs at an average speed of 20 frames per second, and recognizes all classes of ideogram-based (nontext) traffic symbols from an online road sign database.
Abstract: This paper proposes a novel system for the automatic detection and recognition of traffic signs. The proposed system detects candidate regions as maximally stable extremal regions (MSERs), which offers robustness to variations in lighting conditions. Recognition is based on a cascade of support vector machine (SVM) classifiers that were trained using histogram of oriented gradient (HOG) features. The training data are generated from synthetic template images that are freely available from an online database; thus, real footage road signs are not required as training data. The proposed system is accurate at high vehicle speeds, operates under a range of weather conditions, runs at an average speed of 20 frames per second, and recognizes all classes of ideogram-based (nontext) traffic symbols from an online road sign database. Comprehensive comparative results to illustrate the performance of the system are presented.

Journal ArticleDOI
TL;DR: With the combination of clustering method, ant colony algorithm and support vector machine, an efficient and reliable classifier is developed to judge a network visit to be normal or not.
Abstract: The efficiency of the intrusion detection is mainly depended on the dimension of data features. By using the gradually feature removal method, 19 critical features are chosen to represent for the various network visit. With the combination of clustering method, ant colony algorithm and support vector machine (SVM), an efficient and reliable classifier is developed to judge a network visit to be normal or not. Moreover, the accuracy achieves 98.6249% in 10-fold cross validation and the average Matthews correlation coefficient (MCC) achieves 0.861161.

Book ChapterDOI
07 Oct 2012
TL;DR: This work revisits a much older technique, viz.
Abstract: Object detection has over the past few years converged on using linear SVMs over HOG features. Training linear SVMs however is quite expensive, and can become intractable as the number of categories increase. In this work we revisit a much older technique, viz. Linear Discriminant Analysis, and show that LDA models can be trained almost trivially, and with little or no loss in performance. The covariance matrices we estimate capture properties of natural images. Whitening HOG features with these covariances thus removes naturally occuring correlations between the HOG features. We show that these whitened features (which we call WHO) are considerably better than the original HOG features for computing similarities, and prove their usefulness in clustering. Finally, we use our findings to produce an object detection system that is competitive on PASCAL VOC 2007 while being considerably easier to train and test.

Book ChapterDOI
07 Oct 2012
TL;DR: The goal is to devise classifiers which can incorporate images and classes on-the-fly at (near) zero cost and to explore k-nearest neighbor (k-NN) and nearest class mean (NCM) classifiers.
Abstract: We are interested in large-scale image classification and especially in the setting where images corresponding to new or existing classes are continuously added to the training set. Our goal is to devise classifiers which can incorporate such images and classes on-the-fly at (near) zero cost. We cast this problem into one of learning a metric which is shared across all classes and explore k-nearest neighbor (k-NN) and nearest class mean (NCM) classifiers. We learn metrics on the ImageNet 2010 challenge data set, which contains more than 1.2M training images of 1K classes. Surprisingly, the NCM classifier compares favorably to the more flexible k-NN classifier, and has comparable performance to linear SVMs. We also study the generalization performance, among others by using the learned metric on the ImageNet-10K dataset, and we obtain competitive performance. Finally, we explore zero-shot classification, and show how the zero-shot model can be combined very effectively with small training datasets.

Journal ArticleDOI
TL;DR: A new framework called domain adaptation machine (DAM) is proposed for the multiple source domain adaption problem and a new domain-dependent regularizer based on smoothness assumption is proposed, which enforces that the target classifier shares similar decision values with the relevant base classifiers on the unlabeled instances from the target domain.
Abstract: In this paper, we propose a new framework called domain adaptation machine (DAM) for the multiple source domain adaption problem. Under this framework, we learn a robust decision function (referred to as target classifier) for label prediction of instances from the target domain by leveraging a set of base classifiers which are prelearned by using labeled instances either from the source domains or from the source domains and the target domain. With the base classifiers, we propose a new domain-dependent regularizer based on smoothness assumption, which enforces that the target classifier shares similar decision values with the relevant base classifiers on the unlabeled instances from the target domain. This newly proposed regularizer can be readily incorporated into many kernel methods (e.g., support vector machines (SVM), support vector regression, and least-squares SVM (LS-SVM)). For domain adaptation, we also develop two new domain adaptation methods referred to as FastDAM and UniverDAM. In FastDAM, we introduce our proposed domain-dependent regularizer into LS-SVM as well as employ a sparsity regularizer to learn a sparse target classifier with the support vectors only from the target domain, which thus makes the label prediction on any test instance very fast. In UniverDAM, we additionally make use of the instances from the source domains as Universum to further enhance the generalization ability of the target classifier. We evaluate our two methods on the challenging TRECIVD 2005 dataset for the large-scale video concept detection task as well as on the 20 newsgroups and email spam datasets for document retrieval. Comprehensive experiments demonstrate that FastDAM and UniverDAM outperform the existing multiple source domain adaptation methods for the two applications.

Posted Content
TL;DR: A new learning method for heterogeneous domain adaptation (HDA), in which the data from the source domain and the target domain are represented by heterogeneous features with different dimensions, and it is demonstrated that HFA outperforms the existing HDA methods.
Abstract: We propose a new learning method for heterogeneous domain adaptation (HDA), in which the data from the source domain and the target domain are represented by heterogeneous features with different dimensions. Using two different projection matrices, we first transform the data from two domains into a common subspace in order to measure the similarity between the data from two domains. We then propose two new feature mapping functions to augment the transformed data with their original features and zeros. The existing learning methods (e.g., SVM and SVR) can be readily incorporated with our newly proposed augmented feature representations to effectively utilize the data from both domains for HDA. Using the hinge loss function in SVM as an example, we introduce the detailed objective function in our method called Heterogeneous Feature Augmentation (HFA) for a linear case and also describe its kernelization in order to efficiently cope with the data with very high dimensions. Moreover, we also develop an alternating optimization algorithm to effectively solve the nontrivial optimization problem in our HFA method. Comprehensive experiments on two benchmark datasets clearly demonstrate that HFA outperforms the existing HDA methods.