scispace - formally typeset
Search or ask a question

Showing papers by "Nello Cristianini published in 1998"


Proceedings Article
24 Jul 1998
TL;DR: This paper proposes an adaptation of the Adatron algorithm for clas-siication with kernels in high dimensional spaces that can find a solution very rapidly with an exponentially fast rate of convergence towards the optimal solution.
Abstract: Support Vector Machines work by mapping training data for classiication tasks into a high dimensional feature space. In the feature space they then nd a maximal margin hyperplane which separates the data. This hyperplane is usually found using a quadratic programming routine which is computation-ally intensive, and is non trivial to implement. In this paper we propose an adaptation of the Adatron algorithm for clas-siication with kernels in high dimensional spaces. The algorithm is simple and can nd a solution very rapidly with an exponentially fast rate of convergence (in the number of iterations) towards the optimal solution. Experimental results with real and artiicial datasets are provided.

290 citations


Proceedings Article
01 Dec 1998
TL;DR: In this procedure model selection and learning are not separate, but kernels are dynamically adjusted during the learning process to find the kernel parameter which provides the best possible upper bound on the generalisation error.
Abstract: The kernel-parameter is one of the few tunable parameters in Support Vector machines, controlling the complexity of the resulting hypothesis. Its choice amounts to model selection and its value is usually found by means of a validation set. We present an algorithm which can automatically perform model selection with little additional computational cost and with no need of a validation set. In this procedure model selection and learning are not separate, but kernels are dynamically adjusted during the learning process to find the kernel parameter which provides the best possible upper bound on the generalisation error. Theoretical results motivating the approach and experimental results confirming its validity are presented.

171 citations


01 Jan 1998
TL;DR: The present invention is directed to non-hygroscopic, water-soluble sugar compositions which are prepared by grinding together in a dry, solid state, a white sugar component and a "pulverizing aid" in the form of a water- soluble maltodextrin having a measurable dextrose equivalent value not substantially above 20.
Abstract: The present invention is directed to non-hygroscopic, water-soluble sugar compositions which are prepared by grinding together in a dry, solid state, a white sugar component and a "pulverizing aid" in the form of a water-soluble maltodextrin having a measurable dextrose equivalent value not substantially above 20, said "pulverizing aid" being employed in amounts ranging from about 5 to about 20% by weight of said total composition, the resulting product having an average particle size such that 95% by weight of the composition passes through a 325 mesh, said composition being further characterized as having a ratio of weight average particle size to number average particle size of less than 2. The compositions are free-flowing powders useful in preparing icings, buttercreams and fudges.

118 citations


01 Jan 1998
TL;DR: It is shown that a slight generalization of their construction can be used to give a pac style bound on the tail of the distribution of the generalization errors that arise from a given sample size.
Abstract: A number of results have bounded generalization of a classifier in terms of its margin on the training points. There has been some debate about whether the minimum margin is the best measure of the distribution of training set margin values with which to estimate the generalization. Freund and Schapire have shown how a different function of the margin distribution can be used to bound the number of mistakes of an on-line learning algorithm for a perceptron, as well as an expected error bound. We show that a slight generalization of their construction can be used to give a pac style bound on the tail of the distribution of the generalization errors that arise from a given sample size. Algorithms arising from the approach are related to those of Cortes and Vapnik. We generalise the basic result to function classes with bounded fat- shattering dimension and the 1-norm of the slack variables which gives rise to Vapnik's box constraint algorithm. We also extend the results to the regression case and obtain bounds on the probability that a randomly chosen test point will have error greater than a given value. The bounds apply to the $\epsilon$-insensitive loss function proposed by Vapnik for Support Vector Machine regression. A special case of this bound gives a bound on the probabilities in terms of the least squares error on the training set showing a quadratic decline in probability with margin.

43 citations


Proceedings Article
24 Jul 1998
TL;DR: A novel theoretical analysis of classiiers of Bayesian algorithms for Neural Networks, based on Data-Dependent VC theory, proves that they can be expected to be large margin hyper-planes in a Hilbert space, and presents experimental evidence that the predictions of the model are correct.
Abstract: Bayesian algorithms for Neural Networks are known to produce classiiers which are very resistent to overrt-ting. It is often claimed that one of the main distinctive features of Bayesian Learning Algorithms is that they don't simply output one hypothesis, but rather an entire distribution of probability over an hypothesis set: the Bayes posterior. An alternative perspective is that they output a linear combination of classiiers, whose coeecients are given by Bayes theorem. One of the concepts used to deal with thresholded convex combinations is thèmargin' of the hyperplane with respect to the training sample, which is correlated to the predictive power of the hypothesis itself. We provide a novel theoretical analysis of such clas-siiers, based on Data-Dependent VC theory, proving that they can be expected to be large margin hyper-planes in a Hilbert space. We then present experimental evidence that the predictions of our model are correct , i.e. that bayesian classifers really nd hypotheses which have large margin on the training examples. This not only explains the remarkable resistance to overrtting exhibited by such classiiers, but also co-locates them in the same class of other systems, like Support Vector machines and Adaboost, which have a similar performance.

17 citations