scispace - formally typeset
Search or ask a question

Showing papers on "Support vector machine published in 2004"


Journal ArticleDOI
TL;DR: This tutorial gives an overview of the basic ideas underlying Support Vector (SV) machines for function estimation, and includes a summary of currently used algorithms for training SV machines, covering both the quadratic programming part and advanced methods for dealing with large datasets.
Abstract: In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

10,696 citations


Proceedings Article
01 Jan 2004
TL;DR: This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches and shows that it is simple, computationally efficient and intrinsically invariant.
Abstract: We present a novel method for generic visual categorization: the problem of identifying the object content of natural images while generalizing across variations inherent to the object class. This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches. We propose and compare two alternative implementations using different classifiers: Naive Bayes and SVM. The main advantages of the method are that it is simple, computationally efficient and intrinsically invariant. We present results for simultaneously classifying seven semantic visual categories. These results clearly demonstrate that the method is robust to background clutter and produces good categorization accuracy even without exploiting geometric information.

5,046 citations


Journal ArticleDOI
TL;DR: This paper addresses the problem of the classification of hyperspectral remote sensing images by support vector machines by understanding and assessing the potentialities of SVM classifiers in hyperdimensional feature spaces and concludes that SVMs are a valid and effective alternative to conventional pattern recognition approaches.
Abstract: This paper addresses the problem of the classification of hyperspectral remote sensing images by support vector machines (SVMs) First, we propose a theoretical discussion and experimental analysis aimed at understanding and assessing the potentialities of SVM classifiers in hyperdimensional feature spaces Then, we assess the effectiveness of SVMs with respect to conventional feature-reduction-based approaches and their performances in hypersubspaces of various dimensionalities To sustain such an analysis, the performances of SVMs are compared with those of two other nonparametric classifiers (ie, radial basis function neural networks and the K-nearest neighbor classifier) Finally, we study the potentially critical issue of applying binary SVMs to multiclass problems in hyperspectral data In particular, four different multiclass strategies are analyzed and compared: the one-against-all, the one-against-one, and two hierarchical tree-based strategies Different performance indicators have been used to support our experimental studies in a detailed and accurate way, ie, the classification accuracy, the computational time, the stability to parameter setting, and the complexity of the multiclass architecture The results obtained on a real Airborne Visible/Infrared Imaging Spectroradiometer hyperspectral dataset allow to conclude that, whatever the multiclass strategy adopted, SVMs are a valid and effective alternative to conventional pattern recognition approaches (feature-reduction procedures combined with a classification method) for the classification of hyperspectral remote sensing data

3,607 citations


Journal ArticleDOI
TL;DR: The Support Vector Data Description (SVDD) is presented which obtains a spherically shaped boundary around a dataset and analogous to the Support Vector Classifier it can be made flexible by using other kernel functions.
Abstract: Data domain description concerns the characterization of a data set. A good description covers all target data but includes no superfluous space. The boundary of a dataset can be used to detect novel data or outliers. We will present the Support Vector Data Description (SVDD) which is inspired by the Support Vector Classifier. It obtains a spherically shaped boundary around a dataset and analogous to the Support Vector Classifier it can be made flexible by using other kernel functions. The method is made robust against outliers in the training set and is capable of tightening the description by using negative examples. We show characteristics of the Support Vector Data Descriptions using artificial and real data.

2,789 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present two approaches for obtaining class probabilities, which can be reduced to linear systems and are easy to implement, and show conceptually and experimentally that the proposed approaches are more stable than the two existing popular methods: voting and the method by Hastie and Tibshirani (1998).
Abstract: Pairwise coupling is a popular multi-class classification method that combines all comparisons for each pair of classes. This paper presents two approaches for obtaining class probabilities. Both methods can be reduced to linear systems and are easy to implement. We show conceptually and experimentally that the proposed approaches are more stable than the two existing popular methods: voting and the method by Hastie and Tibshirani (1998)

1,888 citations


Journal Article
TL;DR: It is argued that a simple "one-vs-all" scheme is as accurate as any other approach, assuming that the underlying binary classifiers are well-tuned regularized classifiers such as support vector machines.
Abstract: We consider the problem of multiclass classification. Our main thesis is that a simple "one-vs-all" scheme is as accurate as any other approach, assuming that the underlying binary classifiers are well-tuned regularized classifiers such as support vector machines. This thesis is interesting in that it disagrees with a large body of recent published work on multiclass classification. We support our position by means of a critical review of the existing literature, a substantial collection of carefully controlled experimental work, and theoretical arguments.

1,841 citations


Journal ArticleDOI
TL;DR: This work describes a new analytical prescription for setting the value of insensitive zone epsilon, as a function of training sample size, and compares generalization performance of SVM regression under sparse sample settings with regression using 'least-modulus' loss (epsilon=0) and standard squared loss.

1,781 citations


Proceedings ArticleDOI
Andrew Y. Ng1
04 Jul 2004
TL;DR: A lower-bound is given showing that any rotationally invariant algorithm---including logistic regression with L1 regularization, SVMs, and neural networks trained by backpropagation---has a worst case sample complexity that grows at least linearly in the number of irrelevant features.
Abstract: We consider supervised learning in the presence of very many irrelevant features, and study two different regularization methods for preventing overfitting. Focusing on logistic regression, we show that using L1 regularization of the parameters, the sample complexity (i.e., the number of training examples required to learn "well,") grows only logarithmically in the number of irrelevant features. This logarithmic rate matches the best known bounds for feature selection, and indicates that L1 regularized logistic regression can be effective even if there are exponentially many irrelevant features as there are training examples. We also give a lower-bound showing that any rotationally invariant algorithm---including logistic regression with L2 regularization, SVMs, and neural networks trained by backpropagation---has a worst case sample complexity that grows at least linearly in the number of irrelevant features.

1,742 citations


Proceedings ArticleDOI
04 Jul 2004
TL;DR: Experimental results are presented that show that the proposed novel dual formulation of the QCQP as a second-order cone programming problem is significantly more efficient than the general-purpose interior point methods available in current optimization toolboxes.
Abstract: While classical kernel-based classifiers are based on a single kernel, in practice it is often desirable to base classifiers on combinations of multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for the support vector machine (SVM), and showed that the optimization of the coefficients of such a combination reduces to a convex optimization problem known as a quadratically-constrained quadratic program (QCQP). Unfortunately, current convex optimization toolboxes can solve this problem only for a small number of kernels and a small number of data points; moreover, the sequential minimal optimization (SMO) techniques that are essential in large-scale implementations of the SVM cannot be applied because the cost function is non-differentiable. We propose a novel dual formulation of the QCQP as a second-order cone programming problem, and show how to exploit the technique of Moreau-Yosida regularization to yield a formulation to which SMO techniques can be applied. We present experimental results that show that our SMO-based algorithm is significantly more efficient than the general-purpose interior point methods available in current optimization toolboxes.

1,625 citations


Proceedings ArticleDOI
22 Aug 2004
TL;DR: An approach to multi--task learning based on the minimization of regularization functionals similar to existing ones, such as the one for Support Vector Machines, that have been successfully used in the past for single-- task learning is presented.
Abstract: Past empirical work has shown that learning multiple related tasks from data simultaneously can be advantageous in terms of predictive performance relative to learning these tasks independently. In this paper we present an approach to multi--task learning based on the minimization of regularization functionals similar to existing ones, such as the one for Support Vector Machines (SVMs), that have been successfully used in the past for single--task learning. Our approach allows to model the relation between tasks in terms of a novel kernel function that uses a task--coupling parameter. We implement an instance of the proposed approach similar to SVMs and test it empirically using simulated as well as real data. The experimental results show that the proposed method performs better than existing multi--task learning methods and largely outperforms single--task learning using SVMs.

1,617 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second and proved impractical, while convolutional nets yielded 16/7% error.
Abstract: We assess the applicability of several popular learning methods for the problem of recognizing generic visual categories with invariance to pose, lighting, and surrounding clutter. A large dataset comprising stereo image pairs of 50 uniform-colored toys under 36 azimuths, 9 elevations, and 6 lighting conditions was collected (for a total of 194,400 individual images). The objects were 10 instances of 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. Five instances of each category were used for training, and the other five for testing. Low-resolution grayscale images of the objects with various amounts of variability and surrounding clutter were used for training and testing. Nearest neighbor methods, support vector machines, and convolutional networks, operating on raw pixels or on PCA-derived features were tested. Test error rates for unseen object instances placed on uniform backgrounds were around 13% for SVM and 7% for convolutional nets. On a segmentation/recognition task with highly cluttered images, SVM proved impractical, while convolutional nets yielded 16/7% error. A real-time version of the system was implemented that can detect and classify objects in natural scenes at around 10 frames per second.

Proceedings ArticleDOI
04 Jul 2004
TL;DR: This paper proposes to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs, and demonstrates the versatility and effectiveness of the method on problems ranging from supervised grammar learning and named-entity recognition, to taxonomic text classification and sequence alignment.
Abstract: Learning general functional dependencies is one of the main goals in machine learning. Recent progress in kernel-based methods has focused on designing flexible and powerful input representations. This paper addresses the complementary issue of problems involving complex outputs such as multiple dependent output variables and structured output spaces. We propose to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs. The resulting optimization problem is solved efficiently by a cutting plane algorithm that exploits the sparseness and structural decomposition of the problem. We demonstrate the versatility and effectiveness of our method on problems ranging from supervised grammar learning and named-entity recognition, to taxonomic text classification and sequence alignment.

Book ChapterDOI
20 Sep 2004
TL;DR: In this paper, the Enron corpus is used as a new test bed for email folder prediction, and the baseline results of a state-of-the-art classifier (Support Vector Machines) under various conditions.
Abstract: Automated classification of email messages into user-specific folders and information extraction from chronologically ordered email streams have become interesting areas in text learning research. However, the lack of large benchmark collections has been an obstacle for studying the problems and evaluating the solutions. In this paper, we introduce the Enron corpus as a new test bed. We analyze its suitability with respect to email folder prediction, and provide the baseline results of a state-of-the-art classifier (Support Vector Machines) under various conditions, including the cases of using individual sections (From, To, Subject and body) alone as the input to the classifier, and using all the sections in combination with regression weights.

Proceedings ArticleDOI
Tong Zhang1
04 Jul 2004
TL;DR: Stochastic gradient descent algorithms on regularized forms of linear prediction methods, related to online algorithms such as perceptron, are studied, and numerical rate of convergence for such algorithms is obtained.
Abstract: Linear prediction methods, such as least squares for regression, logistic regression and support vector machines for classification, have been extensively used in statistics and machine learning. In this paper, we study stochastic gradient descent (SGD) algorithms on regularized forms of linear prediction methods. This class of methods, related to online algorithms such as perceptron, are both efficient and very simple to implement. We obtain numerical rate of convergence for such algorithms, and discuss its implications. Experiments on text data will be provided to demonstrate numerical and statistical consequences of our theoretical findings.

Journal ArticleDOI
TL;DR: The feasibility of applying SVR in travel-time prediction is demonstrated and it is proved that SVR is applicable and performs well for traffic data analysis.
Abstract: Travel time is a fundamental measure in transportation. Accurate travel-time prediction also is crucial to the development of intelligent transportation systems and advanced traveler information systems. We apply support vector regression (SVR) for travel-time prediction and compare its results to other baseline travel-time prediction methods using real highway traffic data. Since support vector machines have greater generalization ability and guarantee global minima for given training data, it is believed that SVR will perform well for time series analysis. Compared to other baseline predictors, our results show that the SVR predictor can significantly reduce both relative mean errors and root-mean-squared errors of predicted travel times. We demonstrate the feasibility of applying SVR in travel-time prediction and prove that SVR is applicable and performs well for traffic data analysis.

Journal ArticleDOI
TL;DR: Support Vector Tracking integrates the Support Vector Machine (SVM) classifier into an optic-flow-based tracker and maximizes the SVM classification score to account for large motions between successive frames.
Abstract: Support Vector Tracking (SVT) integrates the Support Vector Machine (SVM) classifier into an optic-flow-based tracker. Instead of minimizing an intensity difference function between successive frames, SVT maximizes the SVM classification score. To account for large motions between successive frames, we build pyramids from the support vectors and use a coarse-to-fine approach in the classification stage. We show results of using SVT for vehicle tracking in image sequences.

Book ChapterDOI
20 Sep 2004
TL;DR: An algorithm is proposed based on a variant of the SMOTE algorithm by Chawla et al, combined with Veropoulos et al's different error costs algorithm for overcoming problems of imbalanced datasets in which negative instances heavily outnumber the positive instances.
Abstract: Support Vector Machines (SVM) have been extensively studied and have shown remarkable success in many applications. However the success of SVM is very limited when it is applied to the problem of learning from imbalanced datasets in which negative instances heavily outnumber the positive instances (e.g. in gene profiling and detecting credit card fraud). This paper discusses the factors behind this failure and explains why the common strategy of undersampling the training data may not be the best choice for SVM. We then propose an algorithm for overcoming these problems which is based on a variant of the SMOTE algorithm by Chawla et al, combined with Veropoulos et al's different error costs algorithm. We compare the performance of our algorithm against these two algorithms, along with undersampling and regular SVM and show that our algorithm outperforms all of them.

Journal Article
TL;DR: It is shown that this feature selection method outperforms other classical algorithms, and that a naive Bayesian classifier built with features selected that way achieves error rates similar to those of state-of-the-art methods such as boosting or SVMs.
Abstract: We propose in this paper a very fast feature selection technique based on conditional mutual information. By picking features which maximize their mutual information with the class to predict conditional to any feature already picked, it ensures the selection of features which are both individually informative and two-by-two weakly dependant. We show that this feature selection method outperforms other classical algorithms, and that a naive Bayesian classifier built with features selected that way achieves error rates similar to those of state-of-the-art methods such as boosting or SVMs. The implementation we propose selects 50 features among 40,000, based on a training set of 500 examples in a tenth of a second on a standard 1Ghz PC.

Journal ArticleDOI
TL;DR: A nonlinear version of the recursive least squares (RLS) algorithm that uses a sequential sparsification process that admits into the kernel representation a new input sample only if its feature space image cannot be sufficiently well approximated by combining the images of previously admitted samples.
Abstract: We present a nonlinear version of the recursive least squares (RLS) algorithm. Our algorithm performs linear regression in a high-dimensional feature space induced by a Mercer kernel and can therefore be used to recursively construct minimum mean-squared-error solutions to nonlinear least-squares problems that are frequently encountered in signal processing applications. In order to regularize solutions and keep the complexity of the algorithm bounded, we use a sequential sparsification process that admits into the kernel representation a new input sample only if its feature space image cannot be sufficiently well approximated by combining the images of previously admitted samples. This sparsification procedure allows the algorithm to operate online, often in real time. We analyze the behavior of the algorithm, compare its scaling properties to those of support vector machines, and demonstrate its utility in solving two signal processing problems-time-series prediction and channel equalization.

Journal ArticleDOI
TL;DR: Although each classifier could yield a very accurate classification, > 90% correct, the classifiers differed in the ability to correctly label individual cases and so may be suitable candidates for an ensemble-based approach to classification.
Abstract: Support vector machines (SVMs) have considerable potential as classifiers of remotely sensed data. A constraint on their application in remote sensing has been their binary nature, requiring multiclass classifications to be based upon a large number of binary analyses. Here, an approach for multiclass classification of airborne sensor data by a single SVM analysis is evaluated against a series of classifiers that are widely used in remote sensing, with particular regard to the effect of training set size on classification accuracy. In addition to the SVM, the same datasets were classified using a discriminant analysis, decision tree, and multilayer perceptron neural network. The accuracy statements of the classifications derived from the different classifiers were compared in a statistically rigorous fashion that accommodated for the related nature of the samples used in the analyses. For each classification technique, accuracy was positively related with the size of the training set. In general, the most accurate classifications were derived from the SVM approach, and with the largest training set the SVM classification was significantly (p 90% correct, the classifiers differed in the ability to correctly label individual cases and so may be suitable candidates for an ensemble-based approach to classification.

Journal ArticleDOI
01 Sep 2004
TL;DR: A relatively new machine learning technique, support vector machines (SVM), is introduced to the problem in attempt to provide a model with better explanatory power and relative importance of the input financial variables from the neural network models.
Abstract: Corporate credit rating analysis has attracted lots of research interests in the literature. Recent studies have shown that Artificial Intelligence (AI) methods achieved better performance than traditional statistical methods. This article introduces a relatively new machine learning technique, support vector machines (SVM), to the problem in attempt to provide a model with better explanatory power. We used backpropagation neural network (BNN) as a benchmark and obtained prediction accuracy around 80% for both BNN and SVM methods for the United States and Taiwan markets. However, only slight improvement of SVM was observed. Another direction of the research is to improve the interpretability of the AI-based models. We applied recent research results in neural network model interpretation and obtained relative importance of the input financial variables from the neural network models. Based on these results, we conducted a market comparative analysis on the differences of determining factors in the United States and Taiwan markets.

Journal ArticleDOI
TL;DR: In this article, a reproducing kernel Hilbert space was proposed for online learning in a wide range of problems such as classification, regression, and novelty detection, and worst-case loss bounds were derived.
Abstract: Kernel-based algorithms such as support vector machines have achieved considerable success in various problems in batch setting, where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for real-time applications. In this paper, we consider online learning in a reproducing kernel Hilbert space. By considering classical stochastic gradient descent within a feature space and the use of some straightforward tricks, we develop simple and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We derive worst-case loss bounds, and moreover, we show the convergence of the hypothesis to the minimizer of the regularized risk functional. We present some experimental results that support the theory as well as illustrating the power of the new algorithms for online novelty detection.

Proceedings ArticleDOI
21 Jul 2004
TL;DR: This work extends previous work on tree kernels to estimate the similarity between the dependency trees of sentences, and uses this kernel within a Support Vector Machine to detect and classify relations between entities in the Automatic Content Extraction (ACE) corpus of news articles.
Abstract: We extend previous work on tree kernels to estimate the similarity between the dependency trees of sentences. Using this kernel within a Support Vector Machine, we detect and classify relations between entities in the Automatic Content Extraction (ACE) corpus of news articles. We examine the utility of different features such as Wordnet hypernyms, parts of speech, and entity types, and find that the dependency tree kernel achieves a 20% F1 improvement over a "bag-of-words" kernel.

Book
01 Aug 2004
TL;DR: This book presents different ideas for the design of kernel functions specifically adapted to various biological data, and covers different approaches to learning from heterogeneous data.
Abstract: Modern machine learning techniques are proving to be extremely valuable for the analysis of data in computational biology problems. One branch of machine learning, kernel methods, lends itself particularly well to the difficult aspects of biological data, which include high dimensionality (as in microarray measurements), representation as discrete and structured data (as in DNA or amino acid sequences), and the need to combine heterogeneous sources of information. This book provides a detailed overview of current research in kernel methods and their applications to computational biology.Following three introductory chapters -- an introduction to molecular and computational biology, a short review of kernel methods that focuses on intuitive concepts rather than technical details, and a detailed survey of recent applications of kernel methods in computational biology -- the book is divided into three sections that reflect three general trends in current research. The first part presents different ideas for the design of kernel functions specifically adapted to various biological data; the second part covers different approaches to learning from heterogeneous data; and the third part offers examples of successful applications of support vector machine methods.

Journal ArticleDOI
TL;DR: The MSVM is proposed, which extends the binary SVM to the multicategory case and has good theoretical properties, and an approximate leave-one-out cross-validation function is derived, analogous to the binary case.
Abstract: Two-category support vector machines (SVM) have been very popular in the machine learning community for classification problems. Solving multicategory problems by a series of binary classifiers is quite common in the SVM paradigm; however, this approach may fail under various circumstances. We propose the multicategory support vector machine (MSVM), which extends the binary SVM to the multicategory case and has good theoretical properties. The proposed method provides a unifying framework when there are either equal or unequal misclassification costs. As a tuning criterion for the MSVM, an approximate leave-one-out cross-validation function, called Generalized Approximate Cross Validation, is derived, analogous to the binary case. The effectiveness of the MSVM is demonstrated through the applications to cancer classification using microarray data and cloud classification with satellite radiance profiles.

Journal ArticleDOI
TL;DR: How SVM, a new learning technique, is successfully applied to load forecasting is discussed in detail and some important conclusions are that temperature might not be useful in such a mid-term load forecasting problem and that the introduction of time-series concept may improve the forecasting.
Abstract: Load forecasting is usually made by constructing models on relative information, such as climate and previous load demand data. In 2001, EUNITE network organized a competition aiming at mid-term load forecasting (predicting daily maximum load of the next 31 days). During the competition we proposed a support vector machine (SVM) model, which was the winning entry, to solve the problem. In this paper, we discuss in detail how SVM, a new learning technique, is successfully applied to load forecasting. In addition, motivated by the competition results and the approaches by other participants, more experiments and deeper analyses are conducted and presented here. Some important conclusions from the results are that temperature (or other types of climate information) might not be useful in such a mid-term load forecasting problem and that the introduction of time-series concept may improve the forecasting.

Book ChapterDOI
26 May 2004
TL;DR: A new technique for combining text features and features indicating relationships between classes, which can be used with any discriminative algorithm is presented, which beat accuracy of existing methods with statistically significant improvements.
Abstract: In this paper we present methods of enhancing existing discriminative classifiers for multi-labeled predictions. Discriminative methods like support vector machines perform very well for uni-labeled text classification tasks. Multi-labeled classification is a harder task subject to relatively less attention. In the multi-labeled setting, classes are often related to each other or part of a is-a hierarchy. We present a new technique for combining text features and features indicating relationships between classes, which can be used with any discriminative algorithm. We also present two enhancements to the margin of SVMs for building better models in the presence of overlapping classes. We present results of experiments on real world text benchmark datasets. Our new methods beat accuracy of existing methods with statistically significant improvements.

Journal ArticleDOI
TL;DR: This paper describes recent research applying machine learning methods to the problem of classifying the cognitive state of a human subject based on fRMI data observed over a single time interval, and presents case studies in which classifiers are successfully trained to distinguish cognitive states.
Abstract: Over the past decade, functional Magnetic Resonance Imaging (fMRI) has emerged as a powerful new instrument to collect vast quantities of data about activity in the human brain. A typical fMRI experiment can produce a three-dimensional image related to the human subject's brain activity every half second, at a spatial resolution of a few millimeters. As in other modern empirical sciences, this new instrumentation has led to a flood of new data, and a corresponding need for new data analysis methods. We describe recent research applying machine learning methods to the problem of classifying the cognitive state of a human subject based on fRMI data observed over a single time interval. In particular, we present case studies in which we have successfully trained classifiers to distinguish cognitive states such as (1) whether the human subject is looking at a picture or a sentence, (2) whether the subject is reading an ambiguous or non-ambiguous sentence, and (3) whether the word the subject is viewing is a word describing food, people, buildings, etc. This learning problem provides an interesting case study of classifier learning from extremely high dimensional (105 features), extremely sparse (tens of training examples), noisy data. This paper summarizes the results obtained in these three case studies, as well as lessons learned about how to successfully apply machine learning methods to train classifiers in such settings.

Journal Article
TL;DR: An algorithm is derived that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.
Abstract: The support vector machine (SVM) is a widely used tool for classification. Many efficient implementations exist for fitting a two-class SVM model. The user has to supply values for the tuning parameters: the regularization cost parameter, and the kernel parameters. It seems a common practice is to use a default value for the cost parameter, often leading to the least restrictive model. In this paper we argue that the choice of the cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. We illustrate our algorithm on some examples, and use our representation to give further insight into the range of SVM solutions.

Journal ArticleDOI
TL;DR: This paper presents a new learning technique, which extends Multiple-Instance Learning (MIL), and its application to the problem of region-based image categorization, and provides experimental results on an image categorizing problem and a drug activity prediction problem.
Abstract: Designing computer programs to automatically categorize images using low-level features is a challenging research topic in computer vision. In this paper, we present a new learning technique, which extends Multiple-Instance Learning (MIL), and its application to the problem of region-based image categorization. Images are viewed as bags, each of which contains a number of instances corresponding to regions obtained from image segmentation. The standard MIL problem assumes that a bag is labeled positive if at least one of its instances is positive; otherwise, the bag is negative. In the proposed MIL framework, DD-SVM, a bag label is determined by some number of instances satisfying various properties. DD-SVM first learns a collection of instance prototypes according to a Diverse Density (DD) function. Each instance prototype represents a class of instances that is more likely to appear in bags with the specific label than in the other bags. A nonlinear mapping is then defined using the instance prototypes and maps every bag to a point in a new feature space, named the bag feature space. Finally, standard support vector machines are trained in the bag feature space. We provide experimental results on an image categorization problem and a drug activity prediction problem.