scispace - formally typeset
Search or ask a question

Showing papers on "Empirical risk minimization published in 1991"


Proceedings Article
Vladimir Vapnik1
02 Dec 1991
TL;DR: Systematic improvements in prediction power and empirical risk minimization are illustrated in application to zip-code recognition.
Abstract: Learning is posed as a problem of function estimation, for which two principles of solution are considered: empirical risk minimization and structural risk minimization. These two principles are applied to two different statements of the function estimation problem: global and local. Systematic improvements in prediction power are illustrated in application to zip-code recognition.

770 citations


Proceedings ArticleDOI
08 Jul 1991
TL;DR: An original approach to neural modeling based on the idea of searching, with learning methods, for a synaptic learning rule which is biologically plausible and yields networks that are able to learn to perform difficult tasks is discussed.
Abstract: Summary form only given, as follows. The authors discuss an original approach to neural modeling based on the idea of searching, with learning methods, for a synaptic learning rule which is biologically plausible and yields networks that are able to learn to perform difficult tasks. The proposed method of automatically finding the learning rule relies on the idea of considering the synaptic modification rule as a parametric function. This function has local inputs and is the same in many neurons. The parameters that define this function can be estimated with known learning methods. For this optimization, particular attention is given to gradient descent and genetic algorithms. In both cases, estimation of this function consists of a joint global optimization of the synaptic modification function and the networks that are learning to perform some tasks. Both network architecture and the learning function can be designed within constraints derived from biological knowledge. >

293 citations


Proceedings Article
Isabelle Guyon1, Vladimir Vapnik1, Bernhard E. Boser1, Léon Bottou1, Sara A. Solla1 
02 Dec 1991
TL;DR: The method of Structural Risk Minimization is used to control the capacity of linear classifiers and improve generalization on the problem of handwritten digit recognition.
Abstract: The method of Structural Risk Minimization refers to tuning the capacity of the classifier to the available amount of training data. This capacity is influenced by several factors, including: (1) properties of the input space, (2) nature and structure of the classifier, and (3) learning algorithm. Actions based on these three factors are combined here to control the capacity of linear classifiers and improve generalization on the problem of handwritten digit recognition.

115 citations


Proceedings ArticleDOI
03 Jan 1991
TL;DR: An algorithm for the on-line learning of linear functions which is optimal to within a constant factor with respect to bounds on the sum of squared errors for a worst case sequence of trials is presented.
Abstract: We present an algorithm for the on-line learning of linear functions which is optimal to within a constant factor with respect to bounds on the sum of squared errors for a worst case sequence of trials. The bounds are logarithmic in the number of variables. Furthermore, the algorithm is shown to be optimally robust with respect to noise in the data (again to within a constant factor). We also discuss an application of our methods to the iterative solution of sparse systems of linear equations.

70 citations


Proceedings Article
24 Aug 1991
TL;DR: Comparative experiments show the derived Bayesian algorithm is consistently as good or better, although sometimes at computational cost, than the several mature AI and statistical families of tree learning algorithms currently in use.
Abstract: This paper describes how a competitive tree learning algorithm can be derived from first principles. The algorithm approximates the Bayesian decision theoretic solution to the learning task. Comparative experiments with the algorithm and the several mature AI and statistical families of tree learning algorithms currently in use show the derived Bayesian algorithm is consistently as good or better, although sometimes at computational cost. Using the same strategy, we can design algorithms for many other supervised and model learning tasks given just a probabilistic representation for the kind of knowledge to be learned. As an illustration, a second learning algorithm is derived for learning Bayesian networks from data. Implications to incremental learning and the use of multiple models are also discussed.

38 citations


Proceedings ArticleDOI
26 Jun 1991
TL;DR: This paper addresses the problem of supervised learning in two types of artificial neurons, an ADALINE with differentiable activation function and an adaline feeding a discrete dynamical system, and proposes learning laws based on the Widrow-Hoff learning algorithm.
Abstract: This paper addresses the problem of supervised learning in two types of artificial neurons. They are: (1) an ADALINE (Adaptive Linear Element) with differentiable activation function (the McCulloch-Pitts type neuron), (ii) an adaline feeding a discrete dynamical system. Supervised learning occurs when the neuron is supplied with both the input and the correct output values. Learning algorithms are then used to adjust adaptable learning parameters, weights, based on the error of the computed output. We propose learning laws for both types of neurons. The proposed laws are based on the Widrow-Hoff learning algorithm. We then give sufficiency conditions under which the learning parameters converge, i.e. learning takes place. We also investigate conditions under which the learning parameters diverge.

4 citations


Proceedings ArticleDOI
18 Nov 1991
TL;DR: A modified k-means competitive learning algorithm that can perform efficiently in situations where the input statistics are changing, such as in nonstationary environments, is presented.
Abstract: A modified k-means competitive learning algorithm that can perform efficiently in situations where the input statistics are changing, such as in nonstationary environments, is presented. This modified algorithm is characterized by the membership indicator that attempts to balance the variations of all clusters and by the learning rate that is dynamically adjusted based on the estimated deviation of the current partition from an optimal one. Simulations comparing this new algorithm with other k-means competitive learning algorithms on stationary and nonstationary problems are presented. >

3 citations


Book ChapterDOI
01 Jan 1991
TL;DR: A formal definition of learning is proposed in which the probability distribution of examples is restricted to a family of reasonable distributions, and an upper bound on the time taken by the perceptron algorithm to learn a half-space is given.
Abstract: A formal definition of learning is proposed in which the probability distribution of examples is restricted to a family of reasonable distributions. The definition is more useful for the analysis of computational complexity of learning algorithms than Valiant's distribution-independent learning protocol. We give an upper bound on the time taken by the perceptron algorithm to learn a half-space under this definition. The definition makes obvious the effects of the distribution's characteristics on learning performance. We investigate perceptron-like algorithms that choose their own training examples, and show how this affects learning time.