Showing papers on "Empirical risk minimization published in 1991"
•
02 Dec 1991TL;DR: Systematic improvements in prediction power and empirical risk minimization are illustrated in application to zip-code recognition.
Abstract: Learning is posed as a problem of function estimation, for which two principles of solution are considered: empirical risk minimization and structural risk minimization. These two principles are applied to two different statements of the function estimation problem: global and local. Systematic improvements in prediction power are illustrated in application to zip-code recognition.
770 citations
••
08 Jul 1991TL;DR: An original approach to neural modeling based on the idea of searching, with learning methods, for a synaptic learning rule which is biologically plausible and yields networks that are able to learn to perform difficult tasks is discussed.
Abstract: Summary form only given, as follows. The authors discuss an original approach to neural modeling based on the idea of searching, with learning methods, for a synaptic learning rule which is biologically plausible and yields networks that are able to learn to perform difficult tasks. The proposed method of automatically finding the learning rule relies on the idea of considering the synaptic modification rule as a parametric function. This function has local inputs and is the same in many neurons. The parameters that define this function can be estimated with known learning methods. For this optimization, particular attention is given to gradient descent and genetic algorithms. In both cases, estimation of this function consists of a joint global optimization of the synaptic modification function and the networks that are learning to perform some tasks. Both network architecture and the learning function can be designed within constraints derived from biological knowledge. >
293 citations
•
02 Dec 1991TL;DR: The method of Structural Risk Minimization is used to control the capacity of linear classifiers and improve generalization on the problem of handwritten digit recognition.
Abstract: The method of Structural Risk Minimization refers to tuning the capacity of the classifier to the available amount of training data. This capacity is influenced by several factors, including: (1) properties of the input space, (2) nature and structure of the classifier, and (3) learning algorithm. Actions based on these three factors are combined here to control the capacity of linear classifiers and improve generalization on the problem of handwritten digit recognition.
115 citations
••
03 Jan 1991TL;DR: An algorithm for the on-line learning of linear functions which is optimal to within a constant factor with respect to bounds on the sum of squared errors for a worst case sequence of trials is presented.
Abstract: We present an algorithm for the on-line learning of linear functions which is optimal to within a constant factor with respect to bounds on the sum of squared errors for a worst case sequence of trials. The bounds are logarithmic in the number of variables. Furthermore, the algorithm is shown to be optimally robust with respect to noise in the data (again to within a constant factor). We also discuss an application of our methods to the iterative solution of sparse systems of linear equations.
70 citations
•
24 Aug 1991TL;DR: Comparative experiments show the derived Bayesian algorithm is consistently as good or better, although sometimes at computational cost, than the several mature AI and statistical families of tree learning algorithms currently in use.
Abstract: This paper describes how a competitive tree learning algorithm can be derived from first principles. The algorithm approximates the Bayesian decision theoretic solution to the learning task. Comparative experiments with the algorithm and the several mature AI and statistical families of tree learning algorithms currently in use show the derived Bayesian algorithm is consistently as good or better, although sometimes at computational cost. Using the same strategy, we can design algorithms for many other supervised and model learning tasks given just a probabilistic representation for the kind of knowledge to be learned. As an illustration, a second learning algorithm is derived for learning Bayesian networks from data. Implications to incremental learning and the use of multiple models are also discussed.
38 citations
••
26 Jun 1991TL;DR: This paper addresses the problem of supervised learning in two types of artificial neurons, an ADALINE with differentiable activation function and an adaline feeding a discrete dynamical system, and proposes learning laws based on the Widrow-Hoff learning algorithm.
Abstract: This paper addresses the problem of supervised learning in two types of artificial neurons. They are: (1) an ADALINE (Adaptive Linear Element) with differentiable activation function (the McCulloch-Pitts type neuron), (ii) an adaline feeding a discrete dynamical system. Supervised learning occurs when the neuron is supplied with both the input and the correct output values. Learning algorithms are then used to adjust adaptable learning parameters, weights, based on the error of the computed output. We propose learning laws for both types of neurons. The proposed laws are based on the Widrow-Hoff learning algorithm. We then give sufficiency conditions under which the learning parameters converge, i.e. learning takes place. We also investigate conditions under which the learning parameters diverge.
4 citations
••
18 Nov 1991TL;DR: A modified k-means competitive learning algorithm that can perform efficiently in situations where the input statistics are changing, such as in nonstationary environments, is presented.
Abstract: A modified k-means competitive learning algorithm that can perform efficiently in situations where the input statistics are changing, such as in nonstationary environments, is presented. This modified algorithm is characterized by the membership indicator that attempts to balance the variations of all clusters and by the learning rate that is dynamically adjusted based on the estimated deviation of the current partition from an optimal one. Simulations comparing this new algorithm with other k-means competitive learning algorithms on stationary and nonstationary problems are presented. >
3 citations
••
01 Jan 1991TL;DR: A formal definition of learning is proposed in which the probability distribution of examples is restricted to a family of reasonable distributions, and an upper bound on the time taken by the perceptron algorithm to learn a half-space is given.
Abstract: A formal definition of learning is proposed in which the probability distribution of examples is restricted to a family of reasonable distributions. The definition is more useful for the analysis of computational complexity of learning algorithms than Valiant's distribution-independent learning protocol. We give an upper bound on the time taken by the perceptron algorithm to learn a half-space under this definition. The definition makes obvious the effects of the distribution's characteristics on learning performance. We investigate perceptron-like algorithms that choose their own training examples, and show how this affects learning time.