scispace - formally typeset
Search or ask a question

Showing papers on "Empirical risk minimization published in 1998"


01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

26,531 citations


Proceedings Article
01 Dec 1998
TL;DR: A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcement-learning algorithms, and allows policy-search and value-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search algorithm.
Abstract: A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcement-learning algorithms. These algorithms solve a number of open problems, define several new approaches to reinforcement learning, and unify different approaches to reinforcement learning under a single theory. These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MOPs. These include Q-learning, SARSA, and advantage learning. In addition to these value-based algorithms it also generates pure policy-search reinforcement-learning algorithms, which learn optimal policies without learning a value function. In addition, it allows policy-search and value-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search (VAPS) algorithm. And these algorithms converge for POMDPs without requiring a proper belief state. Simulations results are given, and several areas for future research are discussed.

284 citations


Journal ArticleDOI
TL;DR: This approach differs from previous complexity regularization neural-network function learning schemes in that it operates with random covering numbers and l(1) metric entropy, making it possible to consider much broader families of activation functions, namely functions of bounded variation.
Abstract: We apply the method of complexity regularization to derive estimation bounds for nonlinear function estimation using a single hidden layer radial basis function network. Our approach differs from previous complexity regularization neural-network function learning schemes in that we operate with random covering numbers and l/sub 1/ metric entropy, making it possible to consider much broader families of activation functions, namely functions of bounded variation. Some constraints previously imposed on the network parameters are also eliminated this way. The network is trained by means of complexity regularization involving empirical risk minimization. Bounds on the expected risk in terms of the sample size are obtained for a large class of loss functions. Rates of convergence to the optimal loss are also derived.

90 citations


01 Jan 1998
TL;DR: A vector space based method is presented that performs a linear mapping from documeats to scalar utility values and thus guarantees transitivity, and is extended to polynomial utility functions by using the potential function method, which allows to incorporate higher order correlations of features into the utility function at minimal computational costs.
Abstract: In this paper we investigate the problem of learning a preference relation i~om a given set of ranked documents. We show that the Bayes’s optimal decision function, when applied to learning a preference relation, may violate transitivity. This is undesirable for information retrieval, because it is in conflict with a document ranking based on the user’s preferences. To overcome this problem we present a vector space based method that performs a linear mapping from documeats to scalar utility values and thus guarantees transitivity. The learning of the relation between documeats is formulated as a classification problem on pairs of documents and is solved using the principle of structural risk minimization for good generalization. The approach is extended to polynomial utility functions by using the potential function method (the so called "kernel trick"), which allows to incorporate higher order correlations of features into the utility function at minimal computational costs. The resulting algorithm is tested on aa example with artificial data. The algorithm successfully learns the utility function underlying the training examples and shows good classification performance.

71 citations


01 Jan 1998
TL;DR: Although eligibility traces increased the rate of convergence to the optimal value function compared to learning with macro-actions but without eligibility traces, eligibility traces did not permit the optimal policy to be learned as quickly as it was using macro- actions.
Abstract: Several researchers have proposed reinforcement learning methods that obtain advantages in learning by using temporally extended actions, or macro-actions, but none has carefully analyzed what these advantages are. In this paper, we separate and analyze two advantages of using macro-actions in reinforcement learning: the effect on exploratory behavior, independent of learning, and the effect on the speed with which the learning process propagates accurate value information. We empirically measure the separate contributions of these two effects in gridworld and simulated robotic environments. In these environments, both effects were significant, but the effect of value propagation was larger. We also compare the accelerations of value propagation due to macro-actions and eligibility traces in the gridworld environment. Although eligibility traces increased the rate of convergence to the optimal value function compared to learning with macro-actions but without eligibility traces, eligibility traces did not permit the optimal policy to be learned as quickly as it was using macro-actions.

47 citations


03 Apr 1998
TL;DR: A theory of unsupervised learning has been developed in analogy to the highly successful statistical learning theory of classiication and regression Vapnik and Chervonenkis which addresses computational problems of supervised learning in addition to the statistical constraints.
Abstract: Unsupervised learning algorithms are designed to extract structure from data without reference to explicit teacher information. The quality of the learned structure is determined by a cost function which guides the learning process. This paper proposes Empirical Risk Approximation as a new induction principle for unsupervised learning. The complexity of the unsupervised learning models are automatically controlled by the two conditions for learning: (i) the empirical risk of learning should uniformly converge towards the expected risk; (ii) the hypothesis class should retain a minimal variety for consistent inference. The maximal entropy principle with deterministic annealing as an eecient search strategy arises from the Empirical Risk Approximation principle as the optimal inference strategy for large learning problems. Parameter selection of learnable data structures is demonstrated for the case of k-means clustering. 1 What is unsupervised learning? Learning algorithms are designed with the goal in mind that they should extract structure from data. Two classes of algorithms have been widely discussed in the literature { supervised and unsupervised learning. The distinction between the two classes relates to supervision or teacher information which is either available to the learning algorithm or missing in the learning process. This paper presents a theory of unsupervised learning which has been developed in analogy to the highly successful statistical learning theory of classiication and regression Vapnik, 1982, Vapnik, 1995]. In supervised learning of classiication boundaries or of regression functions the learning algorithm is provided with example points and selects the best candidate function from a set of functions, called the hypothesis class. Statistical learning theory, developed by Vapnik and Chervonenkis in a series of seminal papers (see Vapnik, 1982, Vapnik, 1995]), measures the amount of information in a data set which can be used to determine the parameters of the classiication or regression models. Computational learning theory Valiant, 1984] addresses computational problems of supervised learning in addition to the statistical constraints. 2 In this paper I propose a theoretical framework for unsupervised learning based on optimization of a quality functional for structures in data. The learning algorithm extracts an underlying structure from a sample data set under the guidance of a quality measure denoted as learning costs. The extracted structure of the data is encoded by a loss function and it is assumed to produce a learning risk below a predeened risk threshold. This induction principle is refered to as Empirical Risk Approximation (ERA) and is summarized …

32 citations


Proceedings ArticleDOI
23 May 1998
TL;DR: A new approach to the composition of learning algorithmo (in various models) for classes of constant VC-dimension into learning algorithms for more complicated classes is presented and it is shown that if a class of constantVC-dimension is PAC-learnable from aclass of conotnnt VC- dimension then it is SQ-learnables and PAC- learnable with mnlicious noise.

14 citations


Journal ArticleDOI
TL;DR: This paper demonstrates worst-case upper bounds on the absolute loss for the Perception learning algorithm and the Exponentiated Update learning algorithm, which is related to the Weighted Majority algorithm.

9 citations


Book ChapterDOI
Kuniaki Uehara1
14 Dec 1998
TL;DR: A framework, called random case analysis, is adopted, which can predict various aspects of learning algorithm's behavior, and require less computational time than the other theoretical analyses, and can easily apply to practical learning algorithms.
Abstract: In machine learning, it is important to reduce computational time to analyze learning algorithms. Some researchers have attempted to understand learning algorithms by experimenting them on a variety of domains. Others have presented theoretical methods of learning algorithm by using approximately mathematical model. The mathematical model has some deficiency that, if the model is too simplified, it may lose the essential behavior of the original algorithm. Furthermore, experimental analyses are based only on informal analyses of the learning task, whereas theoretical analyses address the worst case. Therefore, the results of theoretical analyses are quite different from empirical results. In our framework, called random case analysis, we adopt the idea of randomized algorithms. By using random case analysis, it can predict various aspects of learning algorithm's behavior, and require less computational time than the other theoretical analyses. Furthermore, we can easily apply our framework to practical learning algorithms.

5 citations


Proceedings ArticleDOI
10 Nov 1998
TL;DR: Experimental studies exhibit that the string representation of genetic algorithms (GA) is a key issue in determining the suitable network structures and the performances of function approximation for the two learning algorithms.
Abstract: Neural networks based on wavelets are constructed to study the function learning problems. Two types of learning algorithms, the overall multilevel learning (OML) and the pyramidal multilevel learning (PML) with genetic neuron selection are comparatively studied for the convergence rate and accuracy using data samples of a piecewise defined signal. Moreover, the two algorithms are examined using orthogonal and non orthogonal bases. Experimental studies exhibit that the string representation of genetic algorithms (GA) is a key issue in determining the suitable network structures and the performances of function approximation for the two learning algorithms.

3 citations


Proceedings Article
01 Jan 1998
TL;DR: It is pointed out that the algorithms based on the explicit modeling of the elites' distribution tend to converge to unpreferable local optima, and the algorithm is modified to conquer the defect.
Abstract: Population search algorithms for optimization problems such as Genetic algorithm is an e ective way to nd an optimal value, especially when we have little information about the objective function. Baluja has proposed e ective algorithms modeling the distribution of elites explicitly by some statistical model. We propose such an algorithm based on Gaussian modeling of elites, and analyze the convergence property of the algorithm by de ning the objective function as a stochastic model. We point out that the algorithms based on the explicit modeling of the elites' distribution tend to converge to unpreferable local optima, and we modify the algorithm to conquer the defect.

Journal ArticleDOI
TL;DR: A number of recent results in statistical learning theory are summarised in the context of nonlinear system identification, leading to the statement of a number of characterisation results.