scispace - formally typeset
Search or ask a question

Showing papers by "Nello Cristianini published in 1999"


Proceedings Article
29 Nov 1999
TL;DR: An algorithm, DAGSVM, is presented, which operates in a kernel-induced feature space and uses two-class maximal margin hyperplanes at each decision-node of the DDAG, which is substantially faster to train and evaluate than either the standard algorithm or Max Wins, while maintaining comparable accuracy to both of these algorithms.
Abstract: We present a new learning architecture: the Decision Directed Acyclic Graph (DDAG), which is used to combine many two-class classifiers into a multiclass classifier. For an N-class problem, the DDAG contains N(N - 1)/2 classifiers, one for each pair of classes. We present a VC analysis of the case when the node classifiers are hyperplanes; the resulting bound on the test error depends on N and on the margin achieved at the nodes, but not on the dimension of the space. This motivates an algorithm, DAGSVM, which operates in a kernel-induced feature space and uses two-class maximal margin hyperplanes at each decision-node of the DDAG. The DAGSVM is substantially faster to train and evaluate than either the standard algorithm or Max Wins, while maintaining comparable accuracy to both of these algorithms.

1,857 citations


01 Jan 1999
TL;DR: Two schemes for adjusting the sensitivity and speciicity of Support Vector Machines and the description of their performance using receiver operating characteristic (ROC) curves are discussed and their use on real-life medical diagnostic tasks is illustrated.
Abstract: For many applications it is important to accurately distinguish false negative results from false positives. This is particularly important for medical diagnosis where the correct balance between sensitivity and speciicity plays an important role in evaluating the performance of a classiier. In this paper we discuss two schemes for adjusting the sensitivity and speciicity of Support Vector Machines and the description of their performance using receiver operating characteristic (ROC) curves. We then illustrate their use on real-life medical diagnostic tasks.

750 citations



Proceedings Article
01 Jan 1999
TL;DR: An algorithm, DAGSVM, is presented, which operates in a kernel­induced feature space and uses two­class maximal margin hyperplanes at each decision­node of the DDAG, which is substantially faster to train and evaluate than either the standard algorithm or Max Wins, while maintaining comparable accuracy to both of these algorithms.
Abstract: We present a new learning architecture: the Decision Directed Acyclic Graph (DDAG), which is used to combine many two­class classifiers into a multiclass classifier. For an N­class problem, the DDAG contains N(N-1)/2 classifiers, one for each pair of classes. We present a VC analysis of the case when the node classifiers are hyperplanes; the resulting bound on the test error depends on N and on the margin achieved at the nodes, but not on the dimension of the space. This motivates an algorithm, DAGSVM, which operates in a kernel­induced feature space and uses two­class maximal margin hyperplanes at each decision­node of the DDAG. The DAGSVM is substantially faster to train and evaluate than either the standard algorithm or Max Wins, while maintaining comparable accuracy to both of these algorithms.

205 citations


Proceedings ArticleDOI
06 Jul 1999
TL;DR: It is shown that in the linear case the approach can be viewed as a change of kernel and that the algorithms arising from the approach are exactly those originally proposed by Cortes and Vapnik.
Abstract: A number of results have bounded generalization error of a classifier in terms of its margin on the training points. There has been some debate about whether the minimum margin is the best measure of the distribution of training set margin values with which to estimate the generalization error. Freund and Schapire [7] have shown how a different function of the margin distribution can be used to bound the number of mistakes of an on-line learning algorithm for a perceptron, as well as an expected error bound. Shawe-Taylor and Cristianini [ 131 showed that a slight generalization of their construction can be used to give a pat style bound on the tail of the distribution of the generalization errors that arise from a given sample size when using threshold linear classifiers. We show that in the linear case the approach can be viewed as a change of kernel and that the algorithms arising from the approach are exactly those originally proposed by Cortes and Vapnik [4]. We generalise the basic result to function classes with bounded fat-shattering dimension and the Ii measure for slack variables which gives rise to Vapnik’s box constraint algorithm. Finally, application to regression is considered, which includes standard least squares as a special case.

73 citations



Proceedings Article
27 Jun 1999
TL;DR: Experimental results are given which demonstrate that considerable advantage can be derived from using the margin information, which is applied to the problem of transduction, where the positions of the testing points are revealed to the training algorithm.
Abstract: The problem of controlling the capacity of decision trees is considered for the case where the decision nodes implement linear threshold functions. In addition to the standard early stopping and pruning procedures, we implement a strategy based on the margins of the decision boundaries at the nodes. The approach is motivated by bounds on generalization error obtained in terms of the margins of the individual classifiers. Experimental results are given which demonstrate that considerable advantage can be derived from using the margin information. The same strategy is applied to the problem of transduction, where the positions of the testing points are revealed to the training algorithm. This information is used to generate an alternative training criterion motivated by transductive theory. In the transductive case, the results are not as encouraging, suggesting that little, if any, consistent advantage is culled from using the unlabelled data in the proposed fashion. This conclusion does not contradict theoretical results, but leaves open the theoretical and practical question of whether more effective use can be made of the additional information.

41 citations