scispace - formally typeset
Search or ask a question

Showing papers on "Empirical risk minimization published in 1994"


Journal ArticleDOI
TL;DR: The complexity of on-line learning is investigated for the basic classes of geometrical objects over a discrete (“digitized”) domain and upper and lower bounds are derived for the complexity of learning algorithms for axis-parallel rectangles, rectangles in general position, balls, half-Spaces, intersections of half-spaces, and semi-algebraic sets.
Abstract: The complexity of on-line learning is investigated for the basic classes of geometrical objects over a discrete (“digitized”) domain. In particular, upper and lower bounds are derived for the complexity of learning algorithms for axis-parallel rectangles, rectangles in general position, balls, half-spaces, intersections of half-spaces, and semi-algebraic sets. The learning model considered is the standard model for on-line learning from counterexamples.

38 citations


Proceedings Article
05 Oct 1994
TL;DR: A heuristic learning algorithm is introduced that approximately solves the problem of efficient learning as a resource optimization problem and its performance improvements on synthetic and real-world problems are documented.
Abstract: This article shows how rational analysis can be used to minimize learning cost for a general class of statistical learning problems. We discuss the factors that influence learning cost and show that the problem of efficient learning can be cast as a resource optimization problem. Solutions found in this way can be significantly more efficient than the best solutions that do not account for these factors. We introduce a heuristic learning algorithm that approximately solves this optimization problem and document its performance improvements on synthetic and real-world problems.

15 citations


Proceedings ArticleDOI
06 Apr 1994
TL;DR: The main result of this paper shows that this particular type of learning can be done using the well-known technique of boolean expression minimization, and the boolean formulation unifies the various techniques suggested previously for hierarchical generalizations.
Abstract: Concept learning through hierarchical generalization is an important technique in machine learning. The main result of this paper shows that this particular type of learning can be done using the well-known technique of boolean expression minimization. The boolean formulation unifies the various techniques suggested previously for hierarchical generalizations. It gives better conceptual clarity and a computationally efficient method for this type of learning. In particular, learning from relational databases can also be cast in the framework of boolean minimization.

6 citations


Journal ArticleDOI
TL;DR: The condition for the asymptotic convergence of the feedback error learning method for each trial is derived and the condition is the relationship between the learning rate and the α function, which is calculated from the input-output relationship of the system.
Abstract: This paper deals with the improvement of learning speed based on the analysis of convergence of the feedback error learning method. We derive and obtain the condition for the asymptotic convergence of the feedback error learning method for each trial. This condition is the relationship between the learning rate and the α function, which is calculated from the input-output relationship of the system. Using the α function, we propose a high-speed learning method for a tracking control system. We present the simulation results for the tracking control system of a one-link robot manipulator for two cases as follows : (1) use of the general feedback error learning method and (2) use of the proposed high-speed learning method. The simulation results show the effectiveness of the proposed conditions and learning method.

4 citations


Book ChapterDOI
01 Jan 1994
TL;DR: The worst-case behavior of a family of learning algorithms based on Sutton's method of temporal differences is studied, and general upper bounds on the performance of a slightly modified version of Sutton's so-called TD(A) algorithm are proved.
Abstract: We study the worst-case behavior of a family of learning algorithms based on Sutton's [7] method of temporal differences. In our on-line learning framework, learning takes place in a sequence of trials, and the goal of the learning algorithm is to estimate a discounted sum of all the reinforcements that will be received in the future. In this setting, we are able to prove general upper bounds on the performance of a slightly modified version of Sutton's so-called TD(A) algorithm. These bounds are stated in terms of the performance of the best linear predictor on the given training sequence, and are proved without making any statistical assumptions of any kind about the process producing the learner's observed training sequence. We also prove lower bounds on the performance of any algorithm for this learning problem, and give a similar analysis of the closely related problem of learning to predict in a model in which the learner must produce predictions for a whole batch of observations before receiving reinforcement.

3 citations


Proceedings ArticleDOI
09 Oct 1994
TL;DR: Convergence properties of radial basis function (RBF) networks are studied for a large class of basis functions and the universal approximation property of the nets is shown.
Abstract: In the paper convergence properties of radial basis function (RBF) networks are studied for a large class of basis functions. The universal approximation property of the nets is shown. Parameters of RBF nets are learned through empirical risk minimization. The optimal nets are shown to be consistent in nonparametric classification. The tools used in the analysis include Vapnik-Chervonenkis (VC) dimension and the covering numbers.

3 citations


Proceedings ArticleDOI
02 Mar 1994
TL;DR: A new strategy to solve the inevitable local minima inherent in the cost function f(W,D) is described, which, adaptively, changes the learning rate and manipulates the gradient estimator simultaneously.
Abstract: One of the major problems in supervised learning of neural networks is the inevitable local minima inherent in the cost function f(W,D). This often makes classic gradient-descent-based learning algorithms that calculate the weight updates for each iteration according to (Delta) W(t) equals -(eta) (DOT)$DELwf(W,D) powerless. In this paper we describe a new strategy to solve this problem, which, adaptively, changes the learning rate and manipulates the gradient estimator simultaneously. The idea is to implicitly convert the local- minima-laden cost function f((DOT)) into a sequence of its smoothed versions {f(betat)}Ttequals1, which, subject to the parameter (beta) t, bears less details at time t equals 1 and gradually more later on, the learning is actually performed on this sequence of functionals. The corresponding smoothed global minima obtained in this way, {Wt}Ttequals1, thus progressively approximate W--the desired global minimum. Experimental results on a nonconvex function minimization problem and a typical neural network learning task are given, analyses and discussions of some important issues are provided.© (1994) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

1 citations


Proceedings ArticleDOI
09 Oct 1994
TL;DR: The author shows that in the regime where the overlap between the classes is large, algorithms with low empirical error do worse in terms of generalization, a phenomenon known as over-training.
Abstract: Considers a simple two class pattern classification problem from two points of view, namely that of empirical risk minimization and that of maximum-likelihood estimation. The main focus is on an exact solution for the generalization error resulting from the above two approaches, emphasizing mainly the finite sample behavior, which is very different for the two methods. Focusing on the case of normal input distributions and linear threshold classifiers, the author uses statistical mechanics techniques to calculate the empirical and expected (or generalization) errors for the maximum-likelihood and minimal empirical error estimation methods, as well as several other algorithms. In the case of spherically symmetric distributions within each class the author finds that the simple Hebb rule, corresponding to maximum-likelihood parameter estimation, outperforms the other more complex algorithms, based on error minimization. Moreover, the author shows that in the regime where the overlap between the classes is large, algorithms with low empirical error do worse in terms of generalization, a phenomenon known as over-training.

1 citations