Showing papers on "Empirical risk minimization published in 1994"

PDF

Open Access

Journal Article•DOI•

Algorithms and Lower Bounds for On-Line Learning of Geometrical Concepts

[...]

Wolfgang Maass¹, György Turán²•Institutions (2)

Graz University of Technology¹, University of Illinois at Chicago²

01 Mar 1994-Machine Learning

TL;DR: The complexity of on-line learning is investigated for the basic classes of geometrical objects over a discrete (“digitized”) domain and upper and lower bounds are derived for the complexity of learning algorithms for axis-parallel rectangles, rectangles in general position, balls, half-Spaces, intersections of half-spaces, and semi-algebraic sets.

...read moreread less

Abstract: The complexity of on-line learning is investigated for the basic classes of geometrical objects over a discrete (“digitized”) domain. In particular, upper and lower bounds are derived for the complexity of learning algorithms for axis-parallel rectangles, rectangles in general position, balls, half-spaces, intersections of half-spaces, and semi-algebraic sets. The learning model considered is the standard model for on-line learning from counterexamples.

...read moreread less

38 citations

Proceedings Article•

Improving learning performance through rational resource allocation

[...]

Jonathan Gratch¹, Steve Chien², Gerald DeJong¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, California Institute of Technology²

05 Oct 1994

TL;DR: A heuristic learning algorithm is introduced that approximately solves the problem of efficient learning as a resource optimization problem and its performance improvements on synthetic and real-world problems are documented.

...read moreread less

Abstract: This article shows how rational analysis can be used to minimize learning cost for a general class of statistical learning problems. We discuss the factors that influence learning cost and show that the problem of efficient learning can be cast as a resource optimization problem. Solutions found in this way can be significantly more efficient than the best solutions that do not account for these factors. We introduce a heuristic learning algorithm that approximately solves this optimization problem and document its performance improvements on synthetic and real-world problems.

...read moreread less

15 citations

Proceedings Article•DOI•

Application of Boolean expression minimization to learning via hierarchical generalization

[...]

Jianhua Chen¹•Institutions (1)

Louisiana State University¹

06 Apr 1994

TL;DR: The main result of this paper shows that this particular type of learning can be done using the well-known technique of boolean expression minimization, and the boolean formulation unifies the various techniques suggested previously for hierarchical generalizations.

...read moreread less

Abstract: Concept learning through hierarchical generalization is an important technique in machine learning. The main result of this paper shows that this particular type of learning can be done using the well-known technique of boolean expression minimization. The boolean formulation unifies the various techniques suggested previously for hierarchical generalizations. It gives better conceptual clarity and a computationally efficient method for this type of learning. In particular, learning from relational databases can also be cast in the framework of boolean minimization.

...read moreread less

6 citations

Journal Article•DOI•

Asymptotic Convergence of Feedback Error Learning Method and Improvement of Learning Speed.

[...]

Fumihito Arai, Lili Rong, Toshio Fukuda

01 Jan 1994-Transactions of the Japan Society of Mechanical Engineers. C

TL;DR: The condition for the asymptotic convergence of the feedback error learning method for each trial is derived and the condition is the relationship between the learning rate and the α function, which is calculated from the input-output relationship of the system.

...read moreread less

Abstract: This paper deals with the improvement of learning speed based on the analysis of convergence of the feedback error learning method. We derive and obtain the condition for the asymptotic convergence of the feedback error learning method for each trial. This condition is the relationship between the learning rate and the α function, which is calculated from the input-output relationship of the system. Using the α function, we propose a high-speed learning method for a tracking control system. We present the simulation results for the tracking control system of a one-link robot manipulator for two cases as follows : (1) use of the general feedback error learning method and (2) use of the proposed high-speed learning method. The simulation results show the effectiveness of the proposed conditions and learning method.

...read moreread less

4 citations

Book Chapter•DOI•

On the Worst-Case Analysis of Temporal-Difference Learning Algorithms.

[...]

Robert E. Schapire¹, Manfred K. Warmuth²•Institutions (2)

Bell Labs¹, University of California²

01 Jan 1994

TL;DR: The worst-case behavior of a family of learning algorithms based on Sutton's method of temporal differences is studied, and general upper bounds on the performance of a slightly modified version of Sutton's so-called TD(A) algorithm are proved.

...read moreread less

Abstract: We study the worst-case behavior of a family of learning algorithms based on Sutton's [7] method of temporal differences. In our on-line learning framework, learning takes place in a sequence of trials, and the goal of the learning algorithm is to estimate a discounted sum of all the reinforcements that will be received in the future. In this setting, we are able to prove general upper bounds on the performance of a slightly modified version of Sutton's so-called TD(A) algorithm. These bounds are stated in terms of the performance of the best linear predictor on the given training sequence, and are proved without making any statistical assumptions of any kind about the process producing the learner's observed training sequence. We also prove lower bounds on the performance of any algorithm for this learning problem, and give a similar analysis of the closely related problem of learning to predict in a model in which the learner must produce predictions for a whole batch of observations before receiving reinforcement.

...read moreread less

3 citations

Proceedings Article•DOI•

Nonparametric classification using radial basis function nets and empirical risk minimization

[...]

Adam Krzyżak¹, Tamas Linder, Gábor Lugosi•Institutions (1)

Concordia University¹

09 Oct 1994

TL;DR: Convergence properties of radial basis function (RBF) networks are studied for a large class of basis functions and the universal approximation property of the nets is shown.

...read moreread less

Abstract: In the paper convergence properties of radial basis function (RBF) networks are studied for a large class of basis functions. The universal approximation property of the nets is shown. Parameters of RBF nets are learned through empirical risk minimization. The optimal nets are shown to be consistent in nonparametric classification. The tools used in the analysis include Vapnik-Chervonenkis (VC) dimension and the covering numbers.

...read moreread less

3 citations

Proceedings Article•DOI•

Smoothing of cost function leads to faster convergence of neural network learning

[...]

Li-Qun Xu¹, Trevor J. Hall¹•Institutions (1)

King's College London¹

02 Mar 1994

TL;DR: A new strategy to solve the inevitable local minima inherent in the cost function f(W,D) is described, which, adaptively, changes the learning rate and manipulates the gradient estimator simultaneously.

...read moreread less

Abstract: One of the major problems in supervised learning of neural networks is the inevitable local minima inherent in the cost function f(W,D). This often makes classic gradient-descent-based learning algorithms that calculate the weight updates for each iteration according to (Delta) W(t) equals -(eta) (DOT)$DELwf(W,D) powerless. In this paper we describe a new strategy to solve this problem, which, adaptively, changes the learning rate and manipulates the gradient estimator simultaneously. The idea is to implicitly convert the local- minima-laden cost function f((DOT)) into a sequence of its smoothed versions {f(betat)}Ttequals1, which, subject to the parameter (beta) t, bears less details at time t equals 1 and gradually more later on, the learning is actually performed on this sequence of functionals. The corresponding smoothed global minima obtained in this way, {Wt}Ttequals1, thus progressively approximate W--the desired global minimum. Experimental results on a nonconvex function minimization problem and a typical neural network learning task are given, analyses and discussions of some important issues are provided.© (1994) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

...read moreread less

1 citations

Proceedings Article•DOI•

Empirical risk minimization versus maximum-likelihood estimation: A case study

[...]

Ron Meir¹•Institutions (1)

Technion – Israel Institute of Technology¹

09 Oct 1994

TL;DR: The author shows that in the regime where the overlap between the classes is large, algorithms with low empirical error do worse in terms of generalization, a phenomenon known as over-training.

...read moreread less

Abstract: Considers a simple two class pattern classification problem from two points of view, namely that of empirical risk minimization and that of maximum-likelihood estimation. The main focus is on an exact solution for the generalization error resulting from the above two approaches, emphasizing mainly the finite sample behavior, which is very different for the two methods. Focusing on the case of normal input distributions and linear threshold classifiers, the author uses statistical mechanics techniques to calculate the empirical and expected (or generalization) errors for the maximum-likelihood and minimal empirical error estimation methods, as well as several other algorithms. In the case of spherically symmetric distributions within each class the author finds that the simple Hebb rule, corresponding to maximum-likelihood parameter estimation, outperforms the other more complex algorithms, based on error minimization. Moreover, the author shows that in the regime where the overlap between the classes is large, algorithms with low empirical error do worse in terms of generalization, a phenomenon known as over-training.

...read moreread less

1 citations