Showing papers on "Empirical risk minimization published in 2003"

PDF

Open Access

Proceedings Article•

Semi-supervised learning using Gaussian fields and harmonic functions

[...]

Xiaojin Zhu¹, Zoubin Ghahramani¹, John Lafferty¹•Institutions (1)

21 Aug 2003

TL;DR: An approach to semi-supervised learning is proposed that is based on a Gaussian random field model, and methods to incorporate class priors and the predictions of classifiers obtained by supervised learning are discussed.

...read moreread less

Abstract: An approach to semi-supervised learning is proposed that is based on a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The learning problem is then formulated in terms of a Gaussian random field on this graph, where the mean of the field is characterized in terms of harmonic functions, and is efficiently obtained using matrix methods or belief propagation. The resulting learning algorithms have intimate connections with random walks, electric networks, and spectral graph theory. We discuss methods to incorporate class priors and the predictions of classifiers obtained by supervised learning. We also propose a method of parameter learning by entropy minimization, and show the algorithm's ability to perform feature selection. Promising experimental results are presented for synthetic data, digit classification, and text classification tasks.

...read moreread less

3,908 citations

Journal Article•DOI•

Scheduling with general job-dependent learning curves

[...]

Gur Mosheiov¹, Jeffrey B. Sidney²•Institutions (2)

Hebrew University of Jerusalem¹, University of Ottawa²

16 Jun 2003-European Journal of Operational Research

TL;DR: This work extends the setting studied so far to the case of job-dependent learning curves, that is, it allows the learning in the production process of some jobs to be faster than that of others, and shows that in the new, possibly more realistic setting, the problems of makespan and total flow-time minimization on a single machine, a due-date assignment problem and total flowspan on unrelated parallel machines remain polynomially solvable.

...read moreread less

304 citations

Proceedings Article•

Measure Based Regularization

[...]

Olivier Bousquet¹, Olivier Chapelle¹, Matthias Hein¹•Institutions (1)

Max Planck Society¹

09 Dec 2003

TL;DR: This paper proposes three theoretical methods for taking into account this distribution P(x) for regularization and provides links to existing graph-based semi-supervised learning algorithms.

...read moreread less

Abstract: We address in this paper the question of how the knowledge of the marginal distribution P(x) can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graph-based semi-supervised learning algorithms. We also propose practical implementations.

...read moreread less

144 citations

Posted Content•

On robustness properties of convex risk minimization methods for pattern recognition

[...]

Andreas Christmann¹, Ingo Steinwart²•Institutions (2)

Vrije Universiteit Brussel¹, Los Alamos National Laboratory²

01 Jan 2003-Technical reports

TL;DR: In this paper, robustness properties of machine learning methods based on convex risk minimization are investigated for the problem of pattern recognition, including Kernel logistic regression, support vector machines, least squares and AdaBoost loss function.

...read moreread less

Abstract: The paper brings together methods from two disciplines: machine learning theory and robust statistics. Robustness properties of machine learning methods based on convex risk minimization are investigated for the problem of pattern recognition. Assumptions are given for the existence of the influence function of the classifiers and for bounds of the influence function. Kernel logistic regression, support vector machines, least squares and the AdaBoost loss function are treated as special cases. A sensitivity analysis of the support vector machine is given.

...read moreread less

100 citations

Journal Article•DOI•

New Approaches to Statistical Learning Theory

[...]

Olivier Bousquet¹•Institutions (1)

Max Planck Society¹

01 Jun 2003-Annals of the Institute of Statistical Mathematics

TL;DR: New tools from probability theory are presented that allow to derive new bounds on the generalization performance of learning algorithms and to propose alternative measures of the complexity of the learning task, which in turn can be used to derivenew learning algorithms.

...read moreread less

Abstract: We present new tools from probability theory that can be applied to the analysis of learning algorithms. These tools allow to derive new bounds on the generalization performance of learning algorithms and to propose alternative measures of the complexity of the learning task, which in turn can be used to derive new learning algorithms.

...read moreread less

61 citations

Journal Article•DOI•

Empirical risk minimization for support vector classifiers

[...]

Fernando Perez-Cruz, A. Navia-Vazquez, Aníbal R. Figueiras-Vidal, Antonio Artés-Rodríguez

01 Mar 2003-IEEE Transactions on Neural Networks

TL;DR: A general technique for solving support vector classifiers (SVCs) for an arbitrary loss function, relying on the application of an iterative reweighted least squares (IRWLS) procedure is proposed.

...read moreread less

Abstract: In this paper, we propose a general technique for solving support vector classifiers (SVCs) for an arbitrary loss function, relying on the application of an iterative reweighted least squares (IRWLS) procedure. We further show that three properties of the SVC solution can be written as conditions over the loss function. This technique allows the implementation of the empirical risk minimization (ERM) inductive principle on large margin classifiers obtaining, at the same time, very compact (in terms of number of support vectors) solutions. The improvements obtained by changing the SVC loss function are illustrated with synthetic and real data examples.

...read moreread less

53 citations

Journal Article•DOI•

Nonparametric regression estimation by normalized radial basis function networks

[...]

Adam Krzyżak¹, D. Schafer²•Institutions (2)

Concordia University¹, University of Stuttgart²

15 Sep 2003

TL;DR: This paper establishes weak and strong universal consistency of regression estimates based on normalized radial basis function networks when the network parameters are chosen by empirical risk minimization.

...read moreread less

Abstract: This paper establishes weak and strong universal consistency of regression estimates based on normalized radial basis function networks when the network parameters are chosen by empirical risk minimization.

...read moreread less

23 citations

Journal Article•DOI•

Microchoice Bounds and Self Bounding Learning Algorithms

[...]

John Langford¹, Avrim Blum¹•Institutions (1)

Carnegie Mellon University¹

01 May 2003-Machine Learning

TL;DR: New adaptive bounds designed for learning algorithms that operate by making a sequence of choices are demonstrated, similar to Occam-style bounds and can be used to make learning algorithms self-bounding in the style of Freund (1998).

...read moreread less

Abstract: A major topic in machine learning is to determine good upper bounds on the true error rates of learned hypotheses based upon their empirical performance on training data. In this paper, we demonstrate new adaptive bounds designed for learning algorithms that operate by making a sequence of choices. These bounds, which we call Microchoice bounds, are similar to Occam-style bounds and can be used to make learning algorithms self-bounding in the style of Freund (1998). We then show how to combine these bounds with Freund's query-tree approach producing a version of Freund's query-tree structure that can be implemented with much more algorithmic efficiency.

...read moreread less

21 citations

Journal Article•

Support Vector Machine Based Nonlinear Model Predictive Control

[...]

LI Chang-gang¹•Institutions (1)

Shanghai Jiao Tong University¹

01 Jan 2003-Systems engineering and electronics

TL;DR: A nonlinear predictive control framework is presented, in which nonlinear plants are modeled on a support vector machine and the predictive control law is derived by a new stochastic search optimization algorithm.

...read moreread less

Abstract: Support vector machines(SVM)are a new-generation machine learning technique based on the statistical learning theory. They can solve small-sample learning problems better by using Structural Risk Minimization in place of Experiential Risk Minimization. Moreover, SVMs can change a nonlinear learning problem in to a linear learning problem in order to reduce the algorithm complexity by using the kernel function idea. A nonlinear predictive control framework is presented, in which nonlinear plants are modeled on a support vector machine. The predictive control law is derived by a new stochastic search optimization algorithm. At last a simulation example is given to demonstrate the proposed approach.

...read moreread less

11 citations

Proceedings Article•DOI•

The sub-key theorem on credibility measure space

[...]

Ming-Hu Ha¹, Yun-Chao Bai¹, Wen-Guang Tang¹•Institutions (1)

Hebei University¹

02 Nov 2003

TL;DR: The applied range of the key theorem of learning theory is generalized by means of changing the probability measure space into credibility measure space and new concepts and new theorem on classical theoretical foundation are given.

...read moreread less

Abstract: In 1970s, Vladimir N. Vapnik proposed statistical learning theory. The theory is considered as optimum theory on small samples statistical estimation and prediction learning. It has more systematically investigated the rational conditions of the empirical risk minimization discipline and the relations between the empirical risk and the expected risk on finite samples. In fact, the key theorem of learning theory plays an important role in statistical learning theory. Its importance results in paving the way for the subsequent theories and applications. However, some theories and definitions only suit to fixed probability measure. These restricted conditions reduce the applied range of theorem. In this paper, we will generalize the applied range by means of changing the probability measure space into credibility measure space. In new measure space, we give new concepts and new theorem on classical theoretical foundation.

...read moreread less

11 citations

Algoritmos híbridos para la modelización de series temporales con técnicas AR-ICA

[...]

Juan Manuel Górriz Sáez

01 Jan 2003

TL;DR: The aim of this work is to generalize "on line" endogenous algorithms based on the empirical risk minimization principle in order to apply them to time series analysis and forecasting.

...read moreread less

Abstract: Over the last decades learning an input-output mapping from a set of simples using neural networks has been regarded such as performing approximation of multidimensional functions, regression or classification. In this process, a wide variety of different methods and approximations have been applied to many areas, such as medicine, business and finance, social sciences, and others. Them aim of this work is to generalize "on line" endogenous algorithms based on the empirical risk minimization principle in order to apply them to time series analysis and forecasting. We postulate a model based on admissible kernel functions and regularization theory following the philosophy of support vector machines, a new emerging choice for solving the problem of function approximation. The new algorithm called INAPA-PRED (Improved Neural model with Automatic Parameter Adjustment for PREDiction) is derived, and we demonstrate its capacity for yielding quality predictions that can be very useful in many areas. In addition we extend this method to exogenous time series, hubridizing with techniques such as Independent Component Analysis (ICA) or Genetic Algorithms (GA) for reducing the neural approximation error. Finally we prove the benefits when the proposed methods are applied to chaotic time series in the experimental sections. Mainly we have worked with series from finance and business although these models could be apply to many others.

...read moreread less

Book Chapter•DOI•

Rademacher penalization over decision tree prunings

[...]

Matti Kääriäinen¹, Tapio Elomaa¹•Institutions (1)

University of Helsinki¹

22 Sep 2003

TL;DR: This paper applies Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prunings of a given decision tree rather than the tree growing phase, and generalizes the error-bounding approach from binary classification to multi-class situations.

...read moreread less

Abstract: Rademacher penalization is a modern technique for obtaining data-dependent bounds on the generalization error of classifiers. It would appear to be limited to relatively simple hypothesis classes because of computational complexity issues. In this paper we, nevertheless, apply Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prunings of a given decision tree rather than the tree growing phase. Moreover, we generalize the error-bounding approach from binary classification to multi-class situations. Our empirical experiments indicate that the proposed new bounds clearly outperform earlier bounds for decision tree prunings and provide non-trivial error estimates on real-world data sets.

...read moreread less

Book Chapter•DOI•

A deterministic learning approach based on discrepancy

[...]

Cristiano Cervellera, Marco Muselli

04 Jun 2003

TL;DR: The general problem of reconstructing an unknown function from a finite collection of samples is considered, in case the position of each input vector in the training set is not fixed beforehand, but is part of the learning process.

...read moreread less

Abstract: The general problem of reconstructing an unknown function from a finite collection of samples is considered, in case the position of each input vector in the training set is not fixed beforehand, but is part of the learning process. In particular, the consistency of the Empirical Risk Minimization (ERM) principle is analyzed, when the points in the input space are generated by employing a purely deterministic algorithm (deterministic learning). When the output generation is not subject to noise, classical number-theoretic results, involving discrepancy and variation, allow to establish a sufficient condition for the consistency of the ERM principle. In addition, the adoption of low-discrepancy sequences permits to achieve a learning rate of O(1/L), being L the size of the training set. An extension to the noisy case is discussed.

...read moreread less

Proceedings Article•

An Infinity-sample Theory for Multi-category Large Margin Classification

[...]

Tong Zhang¹•Institutions (1)

IBM¹

09 Dec 2003

TL;DR: It is shown that some risk minimization formulations can also be used to obtain conditional probability estimates for the underlying problem, which will be useful for statistical inferencing tasks beyond classification.

...read moreread less

Abstract: The purpose of this paper is to investigate infinity-sample properties of risk minimization based multi-category classification methods These methods can be considered as natural extensions to binary large margin classification We establish conditions that guarantee the infinity-sample consistency of classifiers obtained in the risk minimization framework Examples are provided for two specific forms of the general formulation, which extend a number of known methods Using these examples, we show that some risk minimization formulations can also be used to obtain conditional probability estimates for the underlying problem Such conditional probability information will be useful for statistical inferencing tasks beyond classification

...read moreread less

Journal Article•

Multi-Parameter Equipment Cost Estimation Based on a Support Vector Machine

[...]

Zhu Jia

01 Jan 2003-Systems engineering and electronics

TL;DR: The results as shown in tables indicate that the model can bring accurate learning precision and better prediction generalization ability compared with traditional linear regression and back propagation neural networks methods.

...read moreread less

Abstract: In this paper, we first present a novel support vector regression (SVR) algorithm based on structural risk minimization inductive principle instead of empirical risk minimization principle. The algorithm realizes to solve non linear regression problems mainly through inducting e insensitive loss function and kernel function. We then develop an equipment cost estimation model using the SVR.Different types of avionics equipment cost is estimated in this paper. The results as shown in tables indicate that the model can bring accurate learning precision and better prediction generalization ability compared with traditional linear regression and back propagation neural networks methods. \;

...read moreread less

Generalization Ability of a Class of Empirical Risk Minimization Algorithms and the Support

[...]

Vector Regression Method, Ji-Woong Lee, Pramod P. Khargonekar

01 Jan 2003

TL;DR: In this article, the generalization ability of empirical risk minimization algorithms is investigated in the con- text of distribution-free probably approximately correct (PAC) learning, and it is shown that a regularized approximation of the generic support vector method is PAC to any given accuracy when the regularization parameter is sufciently large.

...read moreread less

Abstract: o In this paper, the generalization ability of empir- ical risk minimization algorithms is investigated in the con- text of distribution-free probably approximately correct (PAC) learning. We identify a class of empirical risk minimization algorithms that are PAC, and show that the generic version of the support vector regression method belongs to the class for any given Mercer kernel. Moreover, it is shown that a regularized approximation of the generic support vector method is PAC to any given accuracy when the regularization parameter is sufciently large. The generalization ability of the usual support vector regression method is deduced from these results.

...read moreread less

Proceedings Article•DOI•

Generalization ability of a class of empirical risk minimization algorithms and the support vector regression method

[...]

Ji-Woong Lee¹, Pramod P. Khargonekar•Institutions (1)

University of Illinois at Urbana–Champaign¹

09 Dec 2003

TL;DR: This paper identifies a class of empirical risk minimization algorithms that are PAC, and shows that the generic version of the support vector regression method belongs to the class for any given Mercer kernel.

...read moreread less

Abstract: In this paper, the generalization ability of empirical risk minimization algorithms is investigated in the context of distribution-free probably approximately correct (PAC) learning. We identify a class of empirical risk minimization algorithms that are PAC, and show that the generic version of the support vector regression method belongs to the class for any given Mercer kernel. Moreover, it is shown that a regularized approximation of the generic support vector method is PAC to any given accuracy when the regularization parameter is sufficiently large. The generalization ability of the usual support vector regression method is deduced from these results.

...read moreread less