scispace - formally typeset
Search or ask a question

Showing papers on "Support vector machine published in 2009"


Proceedings ArticleDOI
20 Jun 2009
TL;DR: An extension of the SPM method is developed, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and a linear SPM kernel based on SIFT sparse codes is proposed, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.
Abstract: Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n2 ~ n3) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handle more than thousands of training images. In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes. This new approach remarkably reduces the complexity of SVMs to O(n) in training and a constant in testing. In a number of image categorization experiments, we find that, in terms of classification accuracy, the suggested linear SPM based on sparse coding of SIFT descriptors always significantly outperforms the linear SPM kernel on histograms, and is even better than the nonlinear SPM kernels, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.

3,017 citations


Journal ArticleDOI
TL;DR: This paper explores how cutting-plane methods can provide fast training not only for classification SVMs, but also for structural SVMs and presents an extensive empirical evaluation of the method applied to binary classification, multi-class classification, HMM sequence tagging, and CFG parsing.
Abstract: Discriminative training approaches like structural SVMs have shown much promise for building highly complex and accurate models in areas like natural language processing, protein structure prediction, and information retrieval. However, current training algorithms are computationally expensive or intractable on large datasets. To overcome this bottleneck, this paper explores how cutting-plane methods can provide fast training not only for classification SVMs, but also for structural SVMs. We show that for an equivalent "1-slack" reformulation of the linear SVM training problem, our cutting-plane method has time complexity linear in the number of training examples. In particular, the number of iterations does not depend on the number of training examples, and it is linear in the desired precision and the regularization parameter. Furthermore, we present an extensive empirical evaluation of the method applied to binary classification, multi-class classification, HMM sequence tagging, and CFG parsing. The experiments show that the cutting-plane algorithm is broadly applicable and fast in practice. On large datasets, it is typically several orders of magnitude faster than conventional training methods derived from decomposition methods like SVM-light, or conventional cutting-plane methods. Implementations of our methods are available at www.joachims.org .

1,134 citations


Journal ArticleDOI
TL;DR: A survey of time series prediction applications using a novel machine learning approach: support vector machines (SVM).
Abstract: Time series prediction techniques have been used in many real-world applications such as financial market prediction, electric utility load forecasting , weather and environmental state prediction, and reliability forecasting. The underlying system models and time series data generating processes are generally complex for these applications and the models for these systems are usually not known a priori. Accurate and unbiased estimation of the time series data produced by these systems cannot always be achieved using well known linear techniques, and thus the estimation process requires more advanced time series prediction algorithms. This paper provides a survey of time series prediction applications using a novel machine learning approach: support vector machines (SVM). The underlying motivation for using SVMs is the ability of this methodology to accurately forecast time series data when the underlying system processes are typically nonlinear, non-stationary and not defined a-priori. SVMs have also been proven to outperform other non-linear techniques including neural-network based non-linear prediction techniques such as multi-layer perceptrons.The ultimate goal is to provide the reader with insight into the applications using SVM for time series prediction, to give a brief tutorial on SVMs for time series prediction, to outline some of the advantages and challenges in using SVMs for time series prediction, and to provide a source for the reader to locate books, technical journals, and other online SVM research resources.

907 citations


Proceedings ArticleDOI
01 Sep 2009
TL;DR: Several models that aim at learning the correct weighting of different features from training data are studied, including multiple kernel learning as well as simple baseline methods and ensemble methods inspired by Boosting are derived.
Abstract: A key ingredient in the design of visual object classification systems is the identification of relevant class specific aspects while being robust to intra-class variations. While this is a necessity in order to generalize beyond a given set of training images, it is also a very difficult problem due to the high variability of visual appearance within each class. In the last years substantial performance gains on challenging benchmark datasets have been reported in the literature. This progress can be attributed to two developments: the design of highly discriminative and robust image features and the combination of multiple complementary features based on different aspects such as shape, color or texture. In this paper we study several models that aim at learning the correct weighting of different features from training data. These include multiple kernel learning as well as simple baseline methods. Furthermore we derive ensemble methods inspired by Boosting which are easily extendable to several multiclass setting. All methods are thoroughly evaluated on object classification datasets using a multitude of feature descriptors. The key results are that even very simple baseline methods, that are orders of magnitude faster than learning techniques are highly competitive with multiple kernel learning. Furthermore the Boosting type methods are found to produce consistently better results in all experiments. We provide insight of when combination methods can be expected to work and how the benefit of complementary features can be exploited most efficiently.

898 citations


Proceedings ArticleDOI
01 Sep 2009
TL;DR: This work uses multiple kernel learning of Varma and Ray (ICCV 2007) to learn an optimal combination of exponential χ2 kernels, each of which captures a different feature channel.
Abstract: Our objective is to obtain a state-of-the art object category detector by employing a state-of-the-art image classifier to search for the object in all possible image sub-windows. We use multiple kernel learning of Varma and Ray (ICCV 2007) to learn an optimal combination of exponential χ2 kernels, each of which captures a different feature channel. Our features include the distribution of edges, dense and sparse visual words, and feature descriptors at different levels of spatial organization.

873 citations


Journal ArticleDOI
01 Feb 2009
TL;DR: Of the four SVM variations considered in this paper, the novel granular SVMs-repetitive undersampling algorithm (GSVM-RU) is the best in terms of both effectiveness and efficiency.
Abstract: Traditional classification algorithms can be limited in their performance on highly unbalanced data sets. A popular stream of work for countering the problem of class imbalance has been the application of a sundry of sampling strategies. In this paper, we focus on designing modifications to support vector machines (SVMs) to appropriately tackle the problem of class imbalance. We incorporate different ldquorebalancerdquo heuristics in SVM modeling, including cost-sensitive learning, and over- and undersampling. These SVM-based strategies are compared with various state-of-the-art approaches on a variety of data sets by using various metrics, including G-mean, area under the receiver operating characteristic curve, F-measure, and area under the precision/recall curve. We show that we are able to surpass or match the previously known best algorithms on each data set. In particular, of the four SVM variations considered in this paper, the novel granular SVMs-repetitive undersampling algorithm (GSVM-RU) is the best in terms of both effectiveness and efficiency. GSVM-RU is effective, as it can minimize the negative effect of information loss while maximizing the positive effect of data cleaning in the undersampling process. GSVM-RU is efficient by extracting much less support vectors and, hence, greatly speeding up SVM prediction.

860 citations


Journal ArticleDOI
TL;DR: The feasibility of classifying different human activities based on micro-Doppler signatures is investigated and the potentials of classify human activities over extended time duration, through wall, and at oblique angles with respect to the radar are investigated and discussed.
Abstract: The feasibility of classifying different human activities based on micro-Doppler signatures is investigated. Measured data of 12 human subjects performing seven different activities are collected using a Doppler radar. The seven activities include running, walking, walking while holding a stick, crawling, boxing while moving forward, boxing while standing in place, and sitting still. Six features are extracted from the Doppler spectrogram. A support vector machine (SVM) is then trained using the measurement features to classify the activities. A multiclass classification is implemented using a decision-tree structure. Optimal parameters for the SVM are found through a fourfold cross-validation. The resulting classification accuracy is found to be more than 90%. The potentials of classifying human activities over extended time duration, through wall, and at oblique angles with respect to the radar are also investigated and discussed.

756 citations


Proceedings ArticleDOI
14 Jun 2009
TL;DR: A large-margin formulation and algorithm for structured output prediction that allows the use of latent variables and the generality and performance of the approach is demonstrated through three applications including motiffinding, noun-phrase coreference resolution, and optimizing precision at k in information retrieval.
Abstract: We present a large-margin formulation and algorithm for structured output prediction that allows the use of latent variables. Our proposal covers a large range of application problems, with an optimization problem that can be solved efficiently using Concave-Convex Programming. The generality and performance of the approach is demonstrated through three applications including motiffinding, noun-phrase coreference resolution, and optimizing precision at k in information retrieval.

729 citations


Proceedings Article
07 Dec 2009
TL;DR: A new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets are introduced that can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that the authors call multilayers kernel machines (MKMs).
Abstract: We introduce a new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets. These kernel functions can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that we call multilayer kernel machines (MKMs). We evaluate SVMs and MKMs with these kernel functions on problems designed to illustrate the advantages of deep architectures. On several problems, we obtain better results than previous, leading benchmarks from both SVMs with Gaussian kernels as well as deep belief nets.

705 citations


Journal ArticleDOI
TL;DR: A new spectral-spatial classification scheme for hyperspectral images is proposed that improves the classification accuracies and provides classification maps with more homogeneous regions, when compared to pixel wise classification.
Abstract: A new spectral-spatial classification scheme for hyperspectral images is proposed. The method combines the results of a pixel wise support vector machine classification and the segmentation map obtained by partitional clustering using majority voting. The ISODATA algorithm and Gaussian mixture resolving techniques are used for image clustering. Experimental results are presented for two hyperspectral airborne images. The developed classification scheme improves the classification accuracies and provides classification maps with more homogeneous regions, when compared to pixel wise classification. The proposed method performs particularly well for classification of images with large spatial structures and when different classes have dissimilar spectral responses and a comparable number of pixels.

704 citations


Proceedings ArticleDOI
20 Jun 2009
TL;DR: An uncertainty measure is proposed that generalizes margin-based uncertainty to the multi-class case and is easy to compute, so that active learning can handle a large number of classes and large data sizes efficiently.
Abstract: One of the principal bottlenecks in applying learning techniques to classification problems is the large amount of labeled training data required. Especially for images and video, providing training data is very expensive in terms of human time and effort. In this paper we propose an active learning approach to tackle the problem. Instead of passively accepting random training examples, the active learning algorithm iteratively selects unlabeled examples for the user to label, so that human effort is focused on labeling the most “useful” examples. Our method relies on the idea of uncertainty sampling, in which the algorithm selects unlabeled examples that it finds hardest to classify. Specifically, we propose an uncertainty measure that generalizes margin-based uncertainty to the multi-class case and is easy to compute, so that active learning can handle a large number of classes and large data sizes efficiently. We demonstrate results for letter and digit recognition on datasets from the UCI repository, object recognition results on the Caltech-101 dataset, and scene categorization results on a dataset of 13 natural scene categories. The proposed method gives large reductions in the number of training examples required over random selection to achieve similar classification accuracy, with little computational overhead.

Journal ArticleDOI
TL;DR: This study investigates several widely-used unsupervised and supervised term weighting methods on benchmark data collections in combination with SVM and kNN algorithms and proposes a new simple supervisedterm weighting method, tf.rf, to improve the terms' discriminating power for text categorization task.
Abstract: In vector space model (VSM), text representation is the task of transforming the content of a textual document into a vector in the term space so that the document could be recognized and classified by a computer or a classifier. Different terms (i.e. words, phrases, or any other indexing units used to identify the contents of a text) have different importance in a text. The term weighting methods assign appropriate weights to the terms to improve the performance of text categorization. In this study, we investigate several widely-used unsupervised (traditional) and supervised term weighting methods on benchmark data collections in combination with SVM and kNN algorithms. In consideration of the distribution of relevant documents in the collection, we propose a new simple supervised term weighting method, i.e. tf.rf, to improve the terms' discriminating power for text categorization task. From the controlled experimental results, these supervised term weighting methods have mixed performance. Specifically, our proposed supervised term weighting method, tf.rf, has a consistently better performance than other term weighting methods while other supervised term weighting methods based on information theory or statistical metric perform the worst in all experiments. On the other hand, the popularly used tf.idf method has not shown a uniformly good performance in terms of different data sets.

Journal ArticleDOI
TL;DR: This study verified the effectiveness and robustness of SVMs in the classification of remotely sensed images and showed that SVMs, especially with the use of radial basis function kernel, outperform the maximum likelihood classifier in terms of overall and individual class accuracies.

Proceedings ArticleDOI
01 Sep 2009
TL;DR: This paper describes a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set, and is shown to outperform state-of-the-art techniques on three varied datasets.
Abstract: Significant research has been devoted to detecting people in images and videos. In this paper we describe a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set. This augmentation results in an extremely high-dimensional feature space (more than 170,000 dimensions). In such high-dimensional spaces, classical machine learning algorithms such as SVMs are nearly intractable with respect to training. Furthermore, the number of training samples is much smaller than the dimensionality of the feature space, by at least an order of magnitude. Finally, the extraction of features from a densely sampled grid structure leads to a high degree of multicollinearity. To circumvent these data characteristics, we employ Partial Least Squares (PLS) analysis, an efficient dimensionality reduction technique, one which preserves significant discriminative information, to project the data onto a much lower dimensional subspace (20 dimensions, reduced from the original 170,000). Our human detection system, employing PLS analysis over the enriched descriptor set, is shown to outperform state-of-the-art techniques on three varied datasets including the popular INRIA pedestrian dataset, the low-resolution gray-scale DaimlerChrysler pedestrian dataset, and the ETHZ pedestrian dataset consisting of full-length videos of crowded scenes.

Journal ArticleDOI
TL;DR: A least squares version of the recently proposed twin support vector machine (TSVM) for binary classification has comparable classification accuracy to that of TSVM but with considerably lesser computational time.
Abstract: In this paper we formulate a least squares version of the recently proposed twin support vector machine (TSVM) for binary classification. This formulation leads to extremely simple and fast algorithm for generating binary classifiers based on two non-parallel hyperplanes. Here we attempt to solve two modified primal problems of TSVM, instead of two dual problems usually solved. We show that the solution of the two modified primal problems reduces to solving just two systems of linear equations as opposed to solving two quadratic programming problems along with two systems of linear equations in TSVM. Classification using nonlinear kernel also leads to systems of linear equations. Our experiments on publicly available datasets indicate that the proposed least squares TSVM has comparable classification accuracy to that of TSVM but with considerably lesser computational time. Since linear least squares TSVM can easily handle large datasets, we further went on to investigate its efficiency for text categorization applications. Computational results demonstrate the effectiveness of the proposed method over linear proximal SVM on all the text corpuses considered.

Proceedings Article
01 Jan 2009
TL;DR: Details of the new paradigm and corresponding algorithms are discussed, some new algorithms are introduced, several specific forms of privileged information are considered, and superiority of thenew learning paradigm over the classical learning paradigm when solving practical problems is demonstrated.
Abstract: In the Afterword to the second edition of the book "Estimation of Dependences Based on Empirical Data" by V. Vapnik, an advanced learning paradigm called Learning Using Hidden Information (LUHI) was introduced. This Afterword also suggested an extension of the SVM method (the so called SVM γ + method) to implement algorithms which address the LUHI paradigm (Vapnik, 1982-2006, Sections 2.4.2 and 2.5.3 of the Afterword). See also (Vapnik, Vashist, & Pavlovitch, 2008, 2009) for further development of the algorithms. In contrast to the existing machine learning paradigm where a teacher does not play an important role, the advanced learning paradigm considers some elements of human teaching. In the new paradigm along with examples, a teacher can provide students with hidden information that exists in explanations, comments, comparisons, and so on. This paper discusses details of the new paradigm 1 and corresponding algorithms, introduces some new algorithms, considers several specific forms of privileged information, demonstrates superiority of the new learning paradigm over the classical learning paradigm when solving practical problems, and discusses general questions related to the new ideas.

Journal ArticleDOI
01 May 2009
TL;DR: Experimental results show that the proposed model outperforms the SVR model with non-filtered forecasting variables and a random walk model.
Abstract: As financial time series are inherently noisy and non-stationary, it is regarded as one of the most challenging applications of time series forecasting. Due to the advantages of generalization capability in obtaining a unique solution, support vector regression (SVR) has also been successfully applied in financial time series forecasting. In the modeling of financial time series using SVR, one of the key problems is the inherent high noise. Thus, detecting and removing the noise are important but difficult tasks when building an SVR forecasting model. To alleviate the influence of noise, a two-stage modeling approach using independent component analysis (ICA) and support vector regression is proposed in financial time series forecasting. ICA is a novel statistical signal processing technique that was originally proposed to find the latent source signals from observed mixture signals without having any prior knowledge of the mixing mechanism. The proposed approach first uses ICA to the forecasting variables for generating the independent components (ICs). After identifying and removing the ICs containing the noise, the rest of the ICs are then used to reconstruct the forecasting variables which contain less noise and served as the input variables of the SVR forecasting model. In order to evaluate the performance of the proposed approach, the Nikkei 225 opening index and TAIEX closing index are used as illustrative examples. Experimental results show that the proposed model outperforms the SVR model with non-filtered forecasting variables and a random walk model.

Journal ArticleDOI
TL;DR: Two active learning algorithms for semiautomatic definition of training samples in remote sensing image classification, based on predefined heuristics, are proposed, which reach the same level of accuracy as larger data sets.
Abstract: In this paper, we propose two active learning algorithms for semiautomatic definition of training samples in remote sensing image classification. Based on predefined heuristics, the classifier ranks the unlabeled pixels and automatically chooses those that are considered the most valuable for its improvement. Once the pixels have been selected, the analyst labels them manually and the process is iterated. Starting with a small and nonoptimal training set, the model itself builds the optimal set of samples which minimizes the classification error. We have applied the proposed algorithms to a variety of remote sensing data, including very high resolution and hyperspectral images, using support vector machines. Experimental results confirm the consistency of the methods. The required number of training samples can be reduced to 10% using the methods proposed, reaching the same level of accuracy as larger data sets. A comparison with a state-of-the-art active learning method, margin sampling, is provided, highlighting advantages of the methods proposed. The effect of spatial resolution and separability of the classes on the quality of the selection of pixels is also discussed.

Journal Article
TL;DR: This work considers regularized support vector machines and shows that they are precisely equivalent to a new robust optimization formulation, thus establishing robustness as the reason regularized SVMs generalize well and gives a new proof of consistency of (kernelized) SVMs.
Abstract: We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization provides a robust optimization interpretation for the success of regularized SVMs. We use this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well.

Journal ArticleDOI
TL;DR: In this article, support vector machine (SVM) is used to predict hourly building cooling load, which can achieve better accuracy and generalization than the traditional back-propagation (BP) neural network model.

Journal ArticleDOI
TL;DR: A novel wrapper Algorithm for Feature Selection, using Support Vector Machines with kernel functions, based on a sequential backward selection, using the number of errors in a validation subset as the measure to decide which feature to remove in each iteration.

Journal ArticleDOI
TL;DR: A support vector classifier was trained that reliably distinguishes healthy volunteers from clinically depressed patients and two feature selection algorithms were implemented that incorporate reliability information into the feature selection process.
Abstract: The application of multivoxel pattern analysis methods has attracted increasing attention, particularly for brain state prediction and real-time functional MRI applications. Support vector classification is the most popular of these techniques, owing to reports that it has better prediction accuracy and is less sensitive to noise. Support vector classification was applied to learn functional connectivity patterns that distinguish patients with depression from healthy volunteers. In addition, two feature selection algorithms were implemented (one filter method, one wrapper method) that incorporate reliability information into the feature selection process. These reliability feature selections methods were compared to two previously proposed feature selection methods. A support vector classifier was trained that reliably distinguishes healthy volunteers from clinically depressed patients. The reliability feature selection methods outperformed previously utilized methods. The proposed framework for applying support vector classification to functional connectivity data is applicable to other disease states beyond major depression.

Book
28 Sep 2009
TL;DR: This book presents a meta-analysis of Mouse Urine Spectroscopy for Salival Analysis of the Effect of Mouthwash, which highlights the importance of knowing the carrier and removal status of the gas molecule.
Abstract: Acknowledgements. Preface. 1 Introduction. 1.1 Past, Present and Future. 1.2 About this Book. Bibliography. 2 Case Studies. 2.1 Introduction. 2.2 Datasets, Matrices and Vectors. 2.3 Case Study 1: Forensic Analysis of Banknotes. 2.4 Case Study 2: Near Infrared Spectroscopic Analysis of Food. 2.5 Case Study 3: Thermal Analysis of Polymers. 2.6 Case Study 4: Environmental Pollution using Headspace Mass Spectrometry. 2.7 Case Study 5: Human Sweat Analysed by Gas Chromatography Mass Spectrometry. 2.8 Case Study 6: Liquid Chromatography Mass Spectrometry of Pharmaceutical Tablets. 2.9 Case Study 7: Atomic Spectroscopy for the Study of Hypertension. 2.10 Case Study 8: Metabolic Profiling of Mouse Urine by Gas Chromatography of Urine Extracts. 2.11 Case Study 9: Nuclear Magnetic Resonance Spectroscopy for Salival Analysis of the Effect of Mouthwash. 2.12 Case Study 10: Simulations. 2.13 Case Study 11: Null Dataset. 2.14 Case Study 12: GCMS and Microbiology of Mouse Scent Marks. Bibliography. 3 Exploratory Data Analysis. 3.1 Introduction. 3.2 Principal Components Analysis. 3.2.1 Background. 3.2.2 Scores and Loadings. 3.2.3 Eigenvalues. 3.2.4 PCA Algorithm. 3.2.5 Graphical Representation. 3.3 Dissimilarity Indices, Principal Co-ordinates Analysis and Ranking. 3.3.1 Dissimilarity. 3.3.2 Principal Co-ordinates Analysis. 3.3.3 Ranking. 3.4 Self Organizing Maps. 3.4.1 Background. 3.4.2 SOM Algorithm. 3.4.3 Initialization. 3.4.4 Training. 3.4.5 Map Quality. 3.4.6 Visualization. Bibliography. 4 Preprocessing. 4.1 Introduction. 4.2 Data Scaling. 4.2.1 Transforming Individual Elements. 4.2.2 Row Scaling. 4.2.3 Column Scaling. 4.3 Multivariate Methods of Data Reduction. 4.3.1 Largest Principal Components. 4.3.2 Discriminatory Principal Components. 4.3.3 Partial Least Squares Discriminatory Analysis Scores. 4.4 Strategies for Data Preprocessing. 4.4.1 Flow Charts. 4.4.2 Level 1. 4.4.3 Level 2. 4.4.4 Level 3. 4.4.5 Level 4. Bibliography. 5 Two Class Classifiers. 5.1 Introduction. 5.1.1 Two Class Classifiers. 5.1.2 Preprocessing. 5.1.3 Notation. 5.1.4 Autoprediction and Class Boundaries. 5.2 Euclidean Distance to Centroids. 5.3 Linear Discriminant Analysis. 5.4 Quadratic Discriminant Analysis. 5.5 Partial Least Squares Discriminant Analysis. 5.5.1 PLS Method. 5.5.2 PLS Algorithm. 5.5.3 PLS-DA. 5.6 Learning Vector Quantization. 5.6.1 Voronoi Tesselation and Codebooks. 5.6.2 LVQ1. 5.6.3 LVQ3. 5.6.4 LVQ Illustration and Summary of Parameters. 5.7 Support Vector Machines. 5.7.1 Linear Learning Machines. 5.7.2 Kernels. 5.7.3 Controlling Complexity and Soft Margin SVMs. 5.7.4 SVM Parameters. Bibliography. 6 One Class Classifiers. 6.1 Introduction. 6.2 Distance Based Classifiers. 6.3 PC Based Models and SIMCA. 6.4 Indicators of Significance. 6.4.1 Gaussian Density Estimators and Chi-Squared. 6.4.2 Hotelling's T 2 . 6.4.3 D-Statistic. 6.4.4 Q-Statistic or Squared Prediction Error. 6.4.5 Visualization of D- and Q-Statistics for Disjoint PC Models. 6.4.6 Multivariate Normality and What to do if it Fails. 6.5 Support Vector Data Description. 6.6 Summarizing One Class Classifiers. 6.6.1 Class Membership Plots. 6.6.2 ROC Curves. Bibliography. 7 Multiclass Classifiers. 7.1 Introduction. 7.2 EDC, LDA and QDA. 7.3 LVQ. 7.4 PLS. 7.4.1 PLS2. 7.4.2 PLS1. 7.5 SVM. 7.6 One against One Decisions. Bibliography. 8 Validation and Optimization. 8.1 Introduction. 8.1.1 Validation. 8.1.2 Optimization. 8.2 Classification Abilities, Contingency Tables and Related Concepts. 8.2.1 Two Class Classifiers. 8.2.2 Multiclass Classifiers. 8.2.3 One Class Classifiers. 8.3 Validation. 8.3.1 Testing Models. 8.3.2 Test and Training Sets. 8.3.3 Predictions. 8.3.4 Increasing the Number of Variables for the Classifier. 8.4 Iterative Approaches for Validation. 8.4.1 Predictive Ability, Model Stability, Classification by Majority Vote and Cross Classification Rate. 8.4.2 Number of Iterations. 8.4.3 Test and Training Set Boundaries. 8.5 Optimizing PLS Models. 8.5.1 Number of Components: Cross-Validation and Bootstrap. 8.5.2 Thresholds and ROC Curves. 8.6 Optimizing Learning Vector Quantization Models. 8.7 Optimizing Support Vector Machine Models. Bibliography. 9 Determining Potential Discriminatory Variables. 9.1 Introduction. 9.1.1 Two Class Distributions. 9.1.2 Multiclass Distributions. 9.1.3 Multilevel and Multiway Distributions. 9.1.4 Sample Sizes. 9.1.5 Modelling after Variable Reduction. 9.1.6 Preliminary Variable Reduction. 9.2 Which Variables are most Significant?. 9.2.1 Basic Concepts: Statistical Indicators and Rank. 9.2.2 T-Statistic and Fisher Weights. 9.2.3 Multiple Linear Regression, ANOVA and the F-Ratio. 9.2.4 Partial Least Squares. 9.2.5 Relationship between the Indicator Functions. 9.3 How Many Variables are Significant? 9.3.1 Probabilistic Approaches. 9.3.2 Empirical Methods: Monte Carlo. 9.3.3 Cost/Benefit of Increasing the Number of Variables. Bibliography. 10 Bayesian Methods and Unequal Class Sizes. 10.1 Introduction. 10.2 Contingency Tables and Bayes' Theorem. 10.3 Bayesian Extensions to Classifiers. Bibliography. 11 Class Separation Indices. 11.1 Introduction. 11.2 Davies Bouldin Index. 11.3 Silhouette Width and Modified Silhouette Width. 11.3.1 Silhouette Width. 11.3.2 Modified Silhouette Width. 11.4 Overlap Coefficient. Bibliography. 12 Comparing Different Patterns. 12.1 Introduction. 12.2 Correlation Based Methods. 12.2.1 Mantel Test. 12.2.2 R V Coefficient. 12.3 Consensus PCA. 12.4 Procrustes Analysis. Bibliography. Index.

Journal ArticleDOI
TL;DR: A simple yet powerful branch and bound scheme that allows efficient maximization of a large class of quality functions over all possible subimages and converges to a globally optimal solution typically in linear or even sublinear time, in contrast to the quadratic scaling of exhaustive or sliding window search.
Abstract: Most successful object recognition systems rely on binary classification, deciding only if an object is present or not, but not providing information on the actual object location. To estimate the object's location, one can take a sliding window approach, but this strongly increases the computational cost because the classifier or similarity function has to be evaluated over a large set of candidate subwindows. In this paper, we propose a simple yet powerful branch and bound scheme that allows efficient maximization of a large class of quality functions over all possible subimages. It converges to a globally optimal solution typically in linear or even sublinear time, in contrast to the quadratic scaling of exhaustive or sliding window search. We show how our method is applicable to different object detection and image retrieval scenarios. The achieved speedup allows the use of classifiers for localization that formerly were considered too slow for this task, such as SVMs with a spatial pyramid kernel or nearest-neighbor classifiers based on the lambda2 distance. We demonstrate state-of-the-art localization performance of the resulting systems on the UIUC Cars data set, the PASCAL VOC 2006 data set, and in the PASCAL VOC 2007 competition.

BookDOI
23 Oct 2009
TL;DR: This paper presents a meta-modelling architecture for semi-supervised image classification of hyperspectral remote sensing data using a SVM and a proposed circular validation strategy for land-cover maps updating.
Abstract: About the editors. List of authors. Preface. Acknowledgments. List of symbols. List of abbreviations. I Introduction. 1 Machine learning techniques in remote sensing data analysis (Bjorn Waske, Mathieu Fauvel, Jon Atli Benediktsson and Jocelyn Chanussot). 1.1 Introduction. 1.2 Supervised classification: algorithms and applications. 1.3 Conclusion. Acknowledgments. References. 2 An introduction to kernel learning algorithms (Peter V. Gehler and Bernhard Scholkopf). 2.1 Introduction. 2.2 Kernels. 2.3 The representer theorem. 2.4 Learning with kernels. 2.5 Conclusion. References. II Supervised image classification. 3 The Support Vector Machine (SVM) algorithm for supervised classification of hyperspectral remote sensing data (J. Anthony Gualtieri). 3.1 Introduction. 3.2 Aspects of hyperspectral data and its acquisition. 3.3 Hyperspectral remote sensing and supervised classification. 3.4 Mathematical foundations of supervised classification. 3.5 From structural risk minimization to a support vector machine algorithm. 3.6 Benchmark hyperspectral data sets. 3.7 Results. 3.8 Using spatial coherence. 3.9 Why do SVMs perform better than other methods? 3.10 Conclusions. References. 4 On training and evaluation of SVM for remote sensing applications (Giles M. Foody). 4.1 Introduction. 4.2 Classification for thematic mapping. 4.3 Overview of classification by a SVM. 4.4 Training stage. 4.5 Testing stage. 4.6 Conclusion. Acknowledgments. References. 5 Kernel Fisher's Discriminant with heterogeneous kernels (M. Murat Dundar and Glenn Fung). 5.1 Introduction. 5.2 Linear Fisher's Discriminant. 5.3 Kernel Fisher Discriminant. 5.4 Kernel Fisher's Discriminant with heterogeneous kernels. 5.5 Automatic kernel selection KFD algorithm. 5.6 Numerical results. 5.7 Conclusion. References. 6 Multi-temporal image classification with kernels (Jordi Munoz-Mari, Luis Gomez-Choa, Manel Martinez-Ramon, Jose Luis Rojo-Alvarez, Javier Calpe-Maravilla and Gustavo Camps-Valls). 6.1 Introduction. 6.2 Multi-temporal classification and change detection with kernels. 6.3 Contextual and multi-source data fusion with kernels. 6.4 Multi-temporal/-source urban monitoring. 6.5 Conclusions. Acknowledgments. References. 7 Target detection with kernels (Nasser M. Nasrabadi). 7.1 Introduction. 7.2 Kernel learning theory. 7.3 Linear subspace-based anomaly detectors and their kernel versions. 7.4 Results. 7.5 Conclusion. References. 8 One-class SVMs for hyperspectral anomaly detection (Amit Banerjee, Philippe Burlina and Chris Diehl). 8.1 Introduction. 8.2 Deriving the SVDD. 8.3 SVDD function optimization. 8.4 SVDD algorithms for hyperspectral anomaly detection. 8.5 Experimental results. 8.6 Conclusions. References. III Semi-supervised image classification. 9 A domain adaptation SVM and a circular validation strategy for land-cover maps updating (Mattia Marconcini and Lorenzo Bruzzone). 9.1 Introduction. 9.2 Literature survey. 9.3 Proposed domain adaptation SVM. 9.4 Proposed circular validation strategy. 9.5 Experimental results. 9.6 Discussions and conclusion. References. 10 Mean kernels for semi-supervised remote sensing image classification (Luis Gomez-Chova, Javier Calpe-Maravilla, Lorenzo Bruzzone and Gustavo Camps-Valls). 10.1 Introduction. 10.2 Semi-supervised classification with mean kernels. 10.3 Experimental results. 10.4 Conclusions. Acknowledgments. References. IV Function approximation and regression. 11 Kernel methods for unmixing hyperspectral imagery (Joshua Broadwater, Amit Banerjee and Philippe Burlina). 11.1 Introduction. 11.2 Mixing models. 11.3 Proposed kernel unmixing algorithm. 11.4 Experimental results of the kernel unmixing algorithm. 11.5 Development of physics-based kernels for unmixing. 11.6 Physics-based kernel results. 11.7 Summary. References. 12 Kernel-based quantitative remote sensing inversion (Yanfei Wang, Changchun Yang and Xiaowen Li). 12.1 Introduction. 12.2 Typical kernel-based remote sensing inverse problems. 12.3 Well-posedness and ill-posedness. 12.4 Regularization. 12.5 Optimization techniques. 12.6 Kernel-based BRDF model inversion. 12.7 Aerosol particle size distribution function retrieval. 12.8 Conclusion. Acknowledgments. References. 13 Land and sea surface temperature estimation by support vector regression (Gabriele Moser and Sebastiano B. Serpico). 13.1 Introduction. 13.2 Previous work. 13.3 Methodology. 13.4 Experimental results. 13.5 Conclusions. Acknowledgments. References. V Kernel-based feature extraction. 14 Kernel multivariate analysis in remote sensing feature extraction (Jeronimo Arenas-Garcia and Kaare Brandt Petersen). 14.1 Introduction. 14.2 Multivariate analysis methods. 14.3 Kernel multivariate analysis. 14.4 Sparse Kernel OPLS. 14.5 Experiments: pixel-based hyperspectral image classification. 14.6 Conclusions. Acknowledgments. References. 15 KPCA algorithm for hyperspectral target/anomaly detection (Yanfeng Gu). 15.1 Introduction. 15.2 Motivation. 15.3 Kernel-based feature extraction in hyperspectral images. 15.4 Kernel-based target detection in hyperspectral images. 15.5 Kernel-based anomaly detection in hyperspectral images. 15.6 Conclusions. Acknowledgments References. 16 Remote sensing data Classification with kernel nonparametric feature extractions (Bor-Chen Kuo, Jinn-Min Yang and Cheng-Hsuan Li). 16.1 Introduction. 16.2 Related feature extractions. 16.3 Kernel-based NWFE and FLFE. 16.4 Eigenvalue resolution with regularization. 16.5 Experiments. 16.6 Comments and conclusions. References. Index.

Journal ArticleDOI
TL;DR: The authors' best machine learning technique applied to spatio-temporal patterns of EEG synchronization outperformed previous seizure prediction methods on the Freiburg dataset.

Proceedings Article
01 Jan 2009
TL;DR: A new speaker verification system architecture based on Joint Factor Analysis (JFA) as feature extractor is presented, using the use of the cosine kernel in the new total factor space to design two different systems: the first system is Support Vector Machines based, and the second one uses directly this kernel as a decision score.
Abstract: This paper presents a new speaker verification system architecture based on Joint Factor Analysis (JFA) as feature extractor. In this modeling, the JFA is used to define a new low-dimensional space named the total variability factor space, instead of both channel and speaker variability spaces for the classical JFA. The main contribution in this approach, is the use of the cosine kernel in the new total factor space to design two different systems: the first system is Support Vector Machines based, and the second one uses directly this kernel as a decision score. This last scoring method makes the process faster and less computation complex compared to others classical methods. We tested several intersession compensation methods in total factors, and we found that the combination of Linear Discriminate Analysis and Within Class Covariance Normalization achieved the best performance. We achieved a remarkable results using fast scoring method based only on cosine kernel especially for male trials, we yield an EER of 1.12% and MinDCF of 0.0094 on the English trials of the NIST 2008 SRE dataset. Index Terms: Total variability space, cosine kernel, fast scoring, support vector machines.

Proceedings ArticleDOI
14 Jun 2009
TL;DR: A new data-dependent regularizer based on smoothness assumption into Least-Squares SVM (LS-SVM), which enforces that the target classifier shares similar decision values with the auxiliary classifiers from relevant source domains on the unlabeled patterns of the target domain.
Abstract: We propose a multiple source domain adaptation method, referred to as Domain Adaptation Machine (DAM), to learn a robust decision function (referred to as target classifier) for label prediction of patterns from the target domain by leveraging a set of pre-computed classifiers (referred to as auxiliary/source classifiers) independently learned with the labeled patterns from multiple source domains. We introduce a new data-dependent regularizer based on smoothness assumption into Least-Squares SVM (LS-SVM), which enforces that the target classifier shares similar decision values with the auxiliary classifiers from relevant source domains on the unlabeled patterns of the target domain. In addition, we employ a sparsity regularizer to learn a sparse target classifier. Comprehensive experiments on the challenging TRECVID 2005 corpus demonstrate that DAM outperforms the existing multiple source domain adaptation methods for video concept detection in terms of effectiveness and efficiency.

Journal ArticleDOI
TL;DR: In this paper, an advanced learning paradigm called Learning Using Hidden Information (LUHI) was introduced, where a teacher can provide students with hidden information that exists in explanations, comments, comparisons, and so on.