scispace - formally typeset
Search or ask a question

Showing papers on "Relevance vector machine published in 2002"


Proceedings ArticleDOI
06 Jul 2002
TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

6,626 citations


Book
12 Nov 2002
TL;DR: Support Vector Machines Basic Methods of Least Squares Support Vector Machines Bayesian Inference for LS-SVM Models Robustness Large Scale Problems LS- sVM for Unsupervised Learning LS- SVM for Recurrent Networks and Control.
Abstract: Support Vector Machines Basic Methods of Least Squares Support Vector Machines Bayesian Inference for LS-SVM Models Robustness Large Scale Problems LS-SVM for Unsupervised Learning LS-SVM for Recurrent Networks and Control.

2,983 citations


Proceedings Article
01 Jan 2002
TL;DR: The proposed extensions of the Support Vector Machine learning approach lead to mixed integer quadratic programs that can be solved heuristic ally and a generalization of SVMs makes a state-of-the-art classification technique, including non-linear classification via kernels, available to an area that up to now has been largely dominated by special purpose methods.
Abstract: This paper presents two new formulations of multiple-instance learning as a maximum margin problem. The proposed extensions of the Support Vector Machine (SVM) learning approach lead to mixed integer quadratic programs that can be solved heuristic ally. Our generalization of SVMs makes a state-of-the-art classification technique, including non-linear classification via kernels, available to an area that up to now has been largely dominated by special purpose methods. We present experimental results on a pharmaceutical data set and on applications in automated image indexing and document categorization.

1,556 citations


Journal ArticleDOI
TL;DR: This paper applies a fuzzy membership to each input point and reformulate the SVMs such that different input points can make different contributions to the learning of decision surface.
Abstract: A support vector machine (SVM) learns the decision surface from two distinct classes of the input points. In many applications, each input point may not be fully assigned to one of these two classes. In this paper, we apply a fuzzy membership to each input point and reformulate the SVMs such that different input points can make different contributions to the learning of decision surface. We call the proposed method fuzzy SVMs (FSVMs).

1,374 citations


Book
01 Apr 2002
TL;DR: Mitchell, K. and Morik as mentioned in this paper proposed a Statistical Learning Model of Text Classification for SVMS and trained Transductive Text Classification and Inductive Text Classification with SVM.
Abstract: Foreword T.Mitchell, K. Morik. Preface. Acknowledgments. Notation. 1. Introduction. 2. Text Classification. 3. Support Vector Machines. Part Theory. 4. A Statistical Learning Model of Text Classification for SVMS. 5. Efficient Performance Estimators for SVMS. Part Methods. 6. Inductive Text Classification. 7. Transductive Text Classification. Part Algorithms. 8. Training Inductive Support Vector Machines. 9. Training Transductive Support Vector Machines. 10. Conclusions. Bibliography. Appendices. Index.

909 citations


Book
30 Apr 2002
TL;DR: Learning To Classify Text Using Support Vector Machines (LTSVMs) as discussed by the authors is a new approach to generate text classifiers from examples, which combines high performance and efficiency with theoretical understanding and improved robustness.
Abstract: Based on ideas from Support Vector Machines (SVMs), Learning To Classify Text Using Support Vector Machines presents a new approach to generating text classifiers from examples. The approach combines high performance and efficiency with theoretical understanding and improved robustness. In particular, it is highly effective without greedy heuristic components. The SVM approach is computationally efficient in training and classification, and it comes with a learning theory that can guide real-world applications. Learning To Classify Text Using Support Vector Machines gives a complete and detailed description of the SVM approach to learning text classifiers, including training algorithms, transductive text classification, efficient performance estimation, and a statistical learning model of text classification. In addition, it includes an overview of the field of text classification, making it self-contained even for newcomers to the field. This book gives a concise introduction to SVMs for pattern recognition, and it includes a detailed description of how to formulate text-classification tasks for machine learning.

829 citations


Journal ArticleDOI
TL;DR: This work reports the recent achievement of the lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.
Abstract: Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines, provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than previous SVM methods.

633 citations


Proceedings Article
01 Jan 2002
TL;DR: A framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretic principles, which allows for Bayesian model selection and is less complex in implementation is presented.
Abstract: We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretic principles, previously suggested for active learning. Our goal is not only to learn d-sparse predictors (which can be evaluated in O(d) rather than O(n), d ≪ n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n · d2), and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be significantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities ('error bars'), allows for Bayesian model selection and is less complex in implementation.

590 citations


Journal ArticleDOI
TL;DR: This paper explains why the standard support vectors machine is not suitable for the nonstandard situation, and introduces a simple procedure for adapting the support vector machine methodology to the non standard situation.
Abstract: The majority of classification algorithms are developed for the standard situation in which it is assumed that the examples in the training set come from the same distribution as that of the target population, and that the cost of misclassification into different classes are the same. However, these assumptions are often violated in real world settings. For some classification methods, this can often be taken care of simply with a change of thresholds for others, additional effort is required. In this paper, we explain why the standard support vector machine is not suitable for the nonstandard situation, and introduce a simple procedure for adapting the support vector machine methodology to the nonstandard situation. Theoretical justification for the procedure is provided. Simulation study illustrates that the modified support vector machine significantly improves upon the standard support vector machine in the nonstandard situation. The computational load of the proposed procedure is the same as that of the standard support vector machine. The procedure reduces to the standard support vector machine in the standard situation.

385 citations


Journal ArticleDOI
TL;DR: A modified version of support vector machines, called C-ascending support vector machine, is proposed to model non-stationary financial time series, where the recent e-insensitive errors are penalized more heavily than the distant e- insensitive errors.

376 citations


Book ChapterDOI
TL;DR: A brief introduction of SVMs is described and its numerous applications are summarized, which show good generalization performance on many real-life data and the approach is properly motivated theoretically.
Abstract: In this paper, we present a comprehensive survey on applications of Support Vector Machines (SVMs) for pattern recognition. Since SVMs show good generalization performance on many real-life data and the approach is properly motivated theoretically, it has been applied to wide range of applications. This paper describes a brief introduction of SVMs and summarizes its numerous applications.

Journal ArticleDOI
TL;DR: The paper discusses implementation issues related to the tuning of the hyperparameters of a support vector machine (SVM) with L/sub 2/ soft margin, for which the radius/margin bound is taken as the index to be minimized, and iterative techniques are employed for computing radius and margin.
Abstract: The paper discusses implementation issues related to the tuning of the hyperparameters of a support vector machine (SVM) with L/sub 2/ soft margin, for which the radius/margin bound is taken as the index to be minimized, and iterative techniques are employed for computing radius and margin. The implementation is shown to be feasible and efficient, even for large problems having more than 10000 support vectors.

Journal ArticleDOI
TL;DR: This work discusses the relation between-support vector regression (-SVR) and v- support vector regression (v-SVR), and focuses on properties that are different from those of C- Support vector classification (C-SVC) andv-supportvector classification (v -SVC).
Abstract: We discuss the relation between e-support vector regression (e-SVR) and ν-support vector regression (ν-SVR). In particular, we focus on properties that are different from those of C-support vector classification (C-SVC) and ν-support vector classification (ν-SVC). We then discuss some issues that do not occur in the case of classification: the possible range of e and the scaling of target values. A practical decomposition method for ν-SVR is implemented, and computational experiments are conducted. We show some interesting numerical observations specific to regression.

Journal ArticleDOI
TL;DR: This tutorial survey this subject with a principal focus on the most well-known models based on kernel substitution, namely, support vector machines.

Journal ArticleDOI
TL;DR: The support vector machine method is applied to approach the prediction of protein structural class and indicates that the structural class of a protein inconsiderably correlated with its amino and composition can be referred as a powerful computational tool for predicting the structural classes of proteins.

01 Jan 2002
TL;DR: A high-frequency device, comprising a number of solenoid coils which are arranged in a row along the edge of a board, and which is provided with apertures which are coaxial with the coils.
Abstract: A high-frequency device, comprising a number of solenoid coils which are arranged in a row along the edge of a board. Parallel to the row of coils there is provided a plate which extends perpendicularly to the board and which is provided with apertures which are coaxial with the coils. Adjusting cores for adjusting the coils are provided in the apertures.

Journal ArticleDOI
TL;DR: Support vector machines (SVM) as a recent approach to classification implement classifiers of an adjustable flexibility, which are automatically and in a principled way optimised on the training data for a good generalisation performance.

Journal ArticleDOI
TL;DR: A framework for interpreting Support Vector Machines as maximum a posteriori (MAP) solutions to inference problems with Gaussian Process priors is described, which allows Bayesian methods to be used for tackling two of the outstanding challenges in SVM classification: how to tune hyperparameters and how to obtain predictive class probabilities.
Abstract: I describe a framework for interpreting Support Vector Machines (SVMs) as maximum a posteriori (MAP) solutions to inference problems with Gaussian Process priors. This probabilistic interpretation can provide intuitive guidelines for choosing a ‘good’ SVM kernel. Beyond this, it allows Bayesian methods to be used for tackling two of the outstanding challenges in SVM classification: how to tune hyperparameters—the misclassification penalty C, and any parameters specifying the ernel—and how to obtain predictive class probabilities rather than the conventional deterministic class label predictions. Hyperparameters can be set by maximizing the evidences I explain how the latter can be defined and properly normalized. Both analytical approximations and numerical methods (Monte Carlo chaining) for estimating the evidence are discussed. I also compare different methods of estimating class probabilities, ranging from simple evaluation at the MAP or at the posterior average to full averaging over the posterior. A simple toy application illustrates the various concepts and techniques.

Journal ArticleDOI
TL;DR: An algorithm is presented that allows unnecessary support vectors to be recognized and eliminated while leaving the solution otherwise unchanged, and in most cases the procedure leads to a reduction in the number of support vectors.
Abstract: This paper demonstrates that standard algorithms for training support vector machines generally produce solutions with a greater number of support vectors than are strictly necessary. An algorithm is presented that allows unnecessary support vectors to be recognized and eliminated while leaving the solution otherwise unchanged. The algorithm is applied to a variety of benchmark data sets (for both classification and regression) and in most cases the procedure leads to a reduction in the number of support vectors. In some cases the reduction is substantial.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: The proposed transformation is based on simplifying the original problem and employing the Kesler construction which can be carried out by the use of properly defined kernel only and is comparable with the one-against-all decomposition solved by the state-of-the-art sequential minimal optimizer algorithm.
Abstract: We propose a transformation from the multi-class support vector machine (SVM) classification problem to the single-class SVM problem which is more convenient for optimization. The proposed transformation is based on simplifying the original problem and employing the Kesler construction which can be carried out by the use of properly defined kernel only. The experiments conducted indicate that the proposed method is comparable with the one-against-all decomposition solved by the state-of-the-art sequential minimal optimizer algorithm.

Journal ArticleDOI
01 Nov 2002
TL;DR: A robust support vector machine for pattern classification, which aims at solving the over-fitting problem when outliers exist in the training data set, and the generalization performance is improved significantly compared to that of the standard SVM training.
Abstract: This paper proposes a robust support vector machine for pattern classification, which aims at solving the over-fitting problem when outliers exist in the training data set. During the robust training phase, the distance between each data point and the center of class is used to calculate the adaptive margin. The incorporation of the average techniques to the standard support vector machine (SVM) training makes the decision function less detoured by outliers, and controls the amount of regularization automatically. Experiments for the bullet hole classification problem show that the number of the support vectors is reduced, and the generalization performance is improved significantly compared to that of the standard SVM training.

Proceedings Article
01 Jan 2002
TL;DR: Numerical results show improvement in test set accuracy after the incorporation of prior knowledge into ordinary, data-based linear support vector machine classifiers, and one experiment shows that a linear classifier, based solely on prior knowledge, far outperforms the direct application of priorknowledge rules to classify data.
Abstract: Prior knowledge in the form of multiple polyhedral sets, each belonging to one of two categories, is introduced into a reformulation of a linear support vector machine classifier. The resulting formulation leads to a linear program that can be solved efficiently. Real world examples, from DNA sequencing and breast cancer prognosis, demonstrate the effectiveness of the proposed method. Numerical results show improvement in test set accuracy after the incorporation of prior knowledge into ordinary, data-based linear support vector machine classifiers. One experiment also shows that a linear classifier, based solely on prior knowledge, far outperforms the direct application of prior knowledge rules to classify data.

Proceedings Article
01 Jan 2002
TL;DR: A procedure for rule extraction from support vector machines is proposed: the SVM+Prototypes method, which allows to give explanation ability to SVM.
Abstract: Support vector machines (SVMs) are learning systems based on the statistical learning theory, which are exhibiting good generalization ability on real data sets Nevertheless, a possible limitation of SVM is that they generate black box models In this work, a procedure for rule extraction from support vector machines is proposed: the SVM+Prototypes method This method allows to give explanation ability to SVM Once determined the decision function by means of a SVM, a clustering algorithm is used to determine prototype vectors for each class These points are combined with the support vectors using geometric methods to define ellipsoids in the input space, which are later transfers to if-then rules By using the support vectors we can establish the limits of these regions

Proceedings Article
01 Jan 2002
TL;DR: A fully-automated pattern search methodology for model selection of support vector machines (SVMs) for regression and classification and has proven to be very effective on benchmark tests and in high-variance drug design domains with high potential of overfitting.
Abstract: We develop a fully-automated pattern search methodology for model selection of support vector machines (SVMs) for regression and classification. Pattern search (PS) is a derivative-free optimization method suitable for low-dimensional optimization problems for which it is difficult or impossible to calculate derivatives. This methodology was motivated by an application in drug design in which regression models are constructed based on a few high-dimensional exemplars. Automatic model selection in such underdetermined problems is essential to avoid overfitting and overestimates of generalization capability caused by selecting parameters based on testing results. We focus on SVM model selection for regression based on leave-one-out (LOO) and cross-validated estimates of mean squared error, but the search strategy is applicable to any model criterion. Because the resulting error surface produces an extremely noisy map of the model quality with many local minima, the resulting generalization capacity of any single local optimal model illustrates high variance. Thus several locally optimal SVMmodels are generated and then bagged or averaged to produce the final SVM. This strategy of pattern search combined with model averaging has proven to be very effective on benchmark tests and in high-variance drug design domains with high potential of overfitting. ∗This work is supported by the NSF Grants IIS-9979860 and IRI-97092306. †Dept. of Decision Sciences and Engineering Systems, Rensselaer Polytechnic Institute, Troy, NY 12180, mommam@rpi.edu ‡Dept. of Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180, bennek@rpi.edu

Proceedings Article
01 Jan 2002
TL;DR: A very fast and simple incremental support vector machine (SVM) classifier is proposed which is capable of modifying an existing linear classifier by both retiring old data and adding new data.
Abstract: Using a recently introduced proximal support vector machine classifier [4], a very fast and simple incremental support vector machine (SVM) classifier is proposed which is capable of modifying an existing linear classifier by both retiring old data and adding new data. A very important feature of the proposed single-pass algorithm , which allows it to handle massive datasets, is that huge blocks of data, say of the order of millions of points, can be stored in blocks of size (n + 1), where n is the usually small (typically less than 100) dimensional input space in which the data resides. To demonstrate the effectiveness of the algorithm we classify a dataset of 1 billion points in 10-dimensional input space into two classes in less than 2.5 hours on a 400 MHz Pentium II processor.

Proceedings Article
01 Jan 2002
TL;DR: The generalization ability of the fuzzy support vector machine is the same with or better than that of the support vectors machine for pair- wise classification.
Abstract: Since support vector machines for pattern classification are based on two-class classification problems, unclassifiable regions ex- ist when extended to n(> 2)-class problems. In our previous work, to solve this problem, we developed fuzzy support vector machines for one- to-(n −1) classification. In this paper, we extend our method to pairwise classification. Namely, using the decision functions obtained by training the support vector machines for classes i and j (j �= i,j =1 ,...,n), for class i we define a truncated polyhedral pyramidal membership function. The membership functions are defined so that, for the data in the classi- fiable regions, the classification results are the same for the two methods. Thus, the generalization ability of the fuzzy support vector machine is the same with or better than that of the support vector machine for pair- wise classification. We evaluate our method for four benchmark data sets and demonstrate the superiority of our method.

Proceedings ArticleDOI
07 Aug 2002
TL;DR: A fast iterative algorithm for identifying the support vectors of a given set of points using a greedy approach to pick points for inclusion in the candidate set, which is extremely competitive as compared to other conventional iterative algorithms like SMO and the NPA.
Abstract: We present a fast iterative algorithm for identifying the support vectors of a given set of points. Our algorithm works by maintaining a candidate support vector set. It uses a greedy approach to pick points for inclusion in the candidate set. When the addition of a point to the candidate set is blocked because of other points already present in the set, we use a backtracking approach to prune away such points. To speed up convergence we initialize our algorithm with the nearest pair of points from opposite classes. We then use an optimization based approach to increase or prune the candidate support vector set. The algorithm makes repeated passes over the data to satisfy the KKT constraints. The memory requirements of our algorithm scale as O(|SI|/sup 2/) in the average case, where |S| is the size of the support vector set. We show that the algorithm is extremely competitive as compared to other conventional iterative algorithms like SMO and the NPA. We present results on a variety of real life datasets to validate our claims.

Journal ArticleDOI
TL;DR: This paper concentrates on the derivation of the evidence and error bar approximation for regression problems and an error bar formula is derived based on the ∈-insensitive loss function.
Abstract: In this paper, we elaborate on the well-known relationship between Gaussian Processes (GP) and Support Vector Machines (SVM) under some convex assumptions for the loss functions. This paper concentrates on the derivation of the evidence and error bar approximation for regression problems. An error bar formula is derived based on the e-insensitive loss function.

Proceedings ArticleDOI
18 Nov 2002
TL;DR: Four types of decision trees based on separability measured by the Euclidean distances between class centers and Mahalanobis-distance-based classifiers are proposed, demonstrating the effectiveness of these methods over conventional SVMs using benchmark data sets.
Abstract: In this paper, we propose decision-tree-based multiclass support vector machines. In training, at the top node, we determine the hyperplane that separates a class (or some classes) from the others. If the separated classes include plural classes, at the node connected to the top node, we determine the hyperplane that separates the classes. We repeat this procedure until only one class remains in the separated region. This can resolve the unclassifiable regions that exist in the conventional SVMs, but a new problem arises. Namely, the division of the feature space depends on the structure of a decision tree. To maintain high generalization ability, the most separable classes should be separated at the upper nodes of a decision tree. For this, we propose four types of decision trees based on separability measured by the Euclidean distances between class centers and Mahalanobis-distance-based classifiers. We demonstrate the effectiveness of our methods over conventional SVMs using benchmark data sets.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: This work introduces a new class of kernels for support vector machines which incorporate tangent distance and therefore are applicable in cases where such transformation invariances are known.
Abstract: When dealing with pattern recognition problems one encounters different types of a-priori knowledge. It is important to incorporate such knowledge into the classification method at hand. A very common type of a-priori knowledge is transformation invariance of the input data, e.g. geometric transformations of image-data like shifts, scaling etc. Distance based classification methods can make use of this by a modified distance measure called tangent distance. We introduce a new class of kernels for support vector machines which incorporate tangent distance and therefore are applicable in cases where such transformation invariances are known. We report experimental results which show that the performance of our method is comparable to other state-of-the-art methods, while problems of existing ones are avoided.