Showing papers on "Empirical risk minimization published in 1991"

PDF

Open Access

Proceedings Article•

Principles of Risk Minimization for Learning Theory

[...]

02 Dec 1991

TL;DR: Systematic improvements in prediction power and empirical risk minimization are illustrated in application to zip-code recognition.

...read moreread less

Abstract: Learning is posed as a problem of function estimation, for which two principles of solution are considered: empirical risk minimization and structural risk minimization. These two principles are applied to two different statements of the function estimation problem: global and local. Systematic improvements in prediction power are illustrated in application to zip-code recognition.

...read moreread less

770 citations

Proceedings Article•DOI•

Learning a synaptic learning rule

[...]

Yoshua Bengio¹, Samy Bengio, Jocelyn Cloutier•Institutions (1)

McGill University¹

08 Jul 1991

TL;DR: An original approach to neural modeling based on the idea of searching, with learning methods, for a synaptic learning rule which is biologically plausible and yields networks that are able to learn to perform difficult tasks is discussed.

...read moreread less

Abstract: Summary form only given, as follows. The authors discuss an original approach to neural modeling based on the idea of searching, with learning methods, for a synaptic learning rule which is biologically plausible and yields networks that are able to learn to perform difficult tasks. The proposed method of automatically finding the learning rule relies on the idea of considering the synaptic modification rule as a parametric function. This function has local inputs and is the same in many neurons. The parameters that define this function can be estimated with known learning methods. For this optimization, particular attention is given to gradient descent and genetic algorithms. In both cases, estimation of this function consists of a joint global optimization of the synaptic modification function and the networks that are learning to perform some tasks. Both network architecture and the learning function can be designed within constraints derived from biological knowledge. >

...read moreread less

293 citations

Proceedings Article•

Structural Risk Minimization for Character Recognition

[...]

Isabelle Guyon¹, Vladimir Vapnik¹, Bernhard E. Boser¹, Léon Bottou¹, Sara A. Solla¹ - Show less +1 more•Institutions (1)

Bell Labs¹

02 Dec 1991

TL;DR: The method of Structural Risk Minimization is used to control the capacity of linear classifiers and improve generalization on the problem of handwritten digit recognition.

...read moreread less

Abstract: The method of Structural Risk Minimization refers to tuning the capacity of the classifier to the available amount of training data. This capacity is influenced by several factors, including: (1) properties of the input space, (2) nature and structure of the classifier, and (3) learning algorithm. Actions based on these three factors are combined here to control the capacity of linear classifiers and improve generalization on the problem of handwritten digit recognition.

...read moreread less

115 citations

Proceedings Article•DOI•

On-line learning of linear functions

[...]

Nick Littlestone¹, Philip M. Long², Manfred K. Warmuth²•Institutions (2)

Princeton University¹, University of California, Santa Cruz²

03 Jan 1991

TL;DR: An algorithm for the on-line learning of linear functions which is optimal to within a constant factor with respect to bounds on the sum of squared errors for a worst case sequence of trials is presented.

...read moreread less

Abstract: We present an algorithm for the on-line learning of linear functions which is optimal to within a constant factor with respect to bounds on the sum of squared errors for a worst case sequence of trials. The bounds are logarithmic in the number of variables. Furthermore, the algorithm is shown to be optimally robust with respect to noise in the data (again to within a constant factor). We also discuss an application of our methods to the iterative solution of sparse systems of linear equations.

...read moreread less

70 citations

Proceedings Article•

Classifiers: a theoretical and empirical study

[...]

Wray Buntine¹•Institutions (1)

Research Institute for Advanced Computer Science¹

24 Aug 1991

TL;DR: Comparative experiments show the derived Bayesian algorithm is consistently as good or better, although sometimes at computational cost, than the several mature AI and statistical families of tree learning algorithms currently in use.

...read moreread less

Abstract: This paper describes how a competitive tree learning algorithm can be derived from first principles. The algorithm approximates the Bayesian decision theoretic solution to the learning task. Comparative experiments with the algorithm and the several mature AI and statistical families of tree learning algorithms currently in use show the derived Bayesian algorithm is consistently as good or better, although sometimes at computational cost. Using the same strategy, we can design algorithms for many other supervised and model learning tasks given just a probabilistic representation for the kind of knowledge to be learned. As an illustration, a second learning algorithm is derived for learning Bayesian networks from data. Implications to incremental learning and the use of multiple models are also discussed.

...read moreread less

38 citations

Proceedings Article•DOI•

Analysis of Single Perceptrons Learning Capabilities

[...]

Stefen Hui¹, Stanislaw H. Zak²•Institutions (2)

San Diego State University¹, Purdue University²

26 Jun 1991

TL;DR: This paper addresses the problem of supervised learning in two types of artificial neurons, an ADALINE with differentiable activation function and an adaline feeding a discrete dynamical system, and proposes learning laws based on the Widrow-Hoff learning algorithm.

...read moreread less

Abstract: This paper addresses the problem of supervised learning in two types of artificial neurons. They are: (1) an ADALINE (Adaptive Linear Element) with differentiable activation function (the McCulloch-Pitts type neuron), (ii) an adaline feeding a discrete dynamical system. Supervised learning occurs when the neuron is supplied with both the input and the correct output values. Learning algorithms are then used to adjust adaptable learning parameters, weights, based on the error of the computed output. We propose learning laws for both types of neurons. The proposed laws are based on the Widrow-Hoff learning algorithm. We then give sufficiency conditions under which the learning parameters converge, i.e. learning takes place. We also investigate conditions under which the learning parameters diverge.

...read moreread less

4 citations

Proceedings Article•DOI•

K-means competitive learning for non-stationary environments

[...]

C. Chinrungrueng¹, Carlo H. Séquin¹•Institutions (1)

University of California, Berkeley¹

18 Nov 1991

TL;DR: A modified k-means competitive learning algorithm that can perform efficiently in situations where the input statistics are changing, such as in nonstationary environments, is presented.

...read moreread less

Abstract: A modified k-means competitive learning algorithm that can perform efficiently in situations where the input statistics are changing, such as in nonstationary environments, is presented. This modified algorithm is characterized by the membership indicator that attempts to balance the variations of all clusters and by the learning rate that is dynamically adjusted based on the estimated deviation of the current partition from an optimal one. Simulations comparing this new algorithm with other k-means competitive learning algorithms on stationary and nonstationary problems are presented. >

...read moreread less

3 citations

Book Chapter•DOI•

Perceptron Learning with Reasonable Distributions of Examples

[...]

Peter L. Bartlett¹, Robert C. Williamson²•Institutions (2)

University of Queensland¹, Australian National University²

01 Jan 1991

TL;DR: A formal definition of learning is proposed in which the probability distribution of examples is restricted to a family of reasonable distributions, and an upper bound on the time taken by the perceptron algorithm to learn a half-space is given.

...read moreread less

Abstract: A formal definition of learning is proposed in which the probability distribution of examples is restricted to a family of reasonable distributions. The definition is more useful for the analysis of computational complexity of learning algorithms than Valiant's distribution-independent learning protocol. We give an upper bound on the time taken by the perceptron algorithm to learn a half-space under this definition. The definition makes obvious the effects of the distribution's characteristics on learning performance. We investigate perceptron-like algorithms that choose their own training examples, and show how this affects learning time.

...read moreread less