scispace - formally typeset
Search or ask a question

Showing papers on "Empirical risk minimization published in 2003"


Proceedings Article
21 Aug 2003
TL;DR: An approach to semi-supervised learning is proposed that is based on a Gaussian random field model, and methods to incorporate class priors and the predictions of classifiers obtained by supervised learning are discussed.
Abstract: An approach to semi-supervised learning is proposed that is based on a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The learning problem is then formulated in terms of a Gaussian random field on this graph, where the mean of the field is characterized in terms of harmonic functions, and is efficiently obtained using matrix methods or belief propagation. The resulting learning algorithms have intimate connections with random walks, electric networks, and spectral graph theory. We discuss methods to incorporate class priors and the predictions of classifiers obtained by supervised learning. We also propose a method of parameter learning by entropy minimization, and show the algorithm's ability to perform feature selection. Promising experimental results are presented for synthetic data, digit classification, and text classification tasks.

3,908 citations


Journal ArticleDOI
TL;DR: This work extends the setting studied so far to the case of job-dependent learning curves, that is, it allows the learning in the production process of some jobs to be faster than that of others, and shows that in the new, possibly more realistic setting, the problems of makespan and total flow-time minimization on a single machine, a due-date assignment problem and total flowspan on unrelated parallel machines remain polynomially solvable.

304 citations


Proceedings Article
09 Dec 2003
TL;DR: This paper proposes three theoretical methods for taking into account this distribution P(x) for regularization and provides links to existing graph-based semi-supervised learning algorithms.
Abstract: We address in this paper the question of how the knowledge of the marginal distribution P(x) can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graph-based semi-supervised learning algorithms. We also propose practical implementations.

144 citations


Posted Content
TL;DR: In this paper, robustness properties of machine learning methods based on convex risk minimization are investigated for the problem of pattern recognition, including Kernel logistic regression, support vector machines, least squares and AdaBoost loss function.
Abstract: The paper brings together methods from two disciplines: machine learning theory and robust statistics. Robustness properties of machine learning methods based on convex risk minimization are investigated for the problem of pattern recognition. Assumptions are given for the existence of the influence function of the classifiers and for bounds of the influence function. Kernel logistic regression, support vector machines, least squares and the AdaBoost loss function are treated as special cases. A sensitivity analysis of the support vector machine is given.

100 citations


Journal ArticleDOI
TL;DR: New tools from probability theory are presented that allow to derive new bounds on the generalization performance of learning algorithms and to propose alternative measures of the complexity of the learning task, which in turn can be used to derivenew learning algorithms.
Abstract: We present new tools from probability theory that can be applied to the analysis of learning algorithms. These tools allow to derive new bounds on the generalization performance of learning algorithms and to propose alternative measures of the complexity of the learning task, which in turn can be used to derive new learning algorithms.

61 citations


Journal ArticleDOI
TL;DR: A general technique for solving support vector classifiers (SVCs) for an arbitrary loss function, relying on the application of an iterative reweighted least squares (IRWLS) procedure is proposed.
Abstract: In this paper, we propose a general technique for solving support vector classifiers (SVCs) for an arbitrary loss function, relying on the application of an iterative reweighted least squares (IRWLS) procedure. We further show that three properties of the SVC solution can be written as conditions over the loss function. This technique allows the implementation of the empirical risk minimization (ERM) inductive principle on large margin classifiers obtaining, at the same time, very compact (in terms of number of support vectors) solutions. The improvements obtained by changing the SVC loss function are illustrated with synthetic and real data examples.

53 citations


Journal ArticleDOI
15 Sep 2003
TL;DR: This paper establishes weak and strong universal consistency of regression estimates based on normalized radial basis function networks when the network parameters are chosen by empirical risk minimization.
Abstract: This paper establishes weak and strong universal consistency of regression estimates based on normalized radial basis function networks when the network parameters are chosen by empirical risk minimization.

23 citations


Journal ArticleDOI
TL;DR: New adaptive bounds designed for learning algorithms that operate by making a sequence of choices are demonstrated, similar to Occam-style bounds and can be used to make learning algorithms self-bounding in the style of Freund (1998).
Abstract: A major topic in machine learning is to determine good upper bounds on the true error rates of learned hypotheses based upon their empirical performance on training data. In this paper, we demonstrate new adaptive bounds designed for learning algorithms that operate by making a sequence of choices. These bounds, which we call Microchoice bounds, are similar to Occam-style bounds and can be used to make learning algorithms self-bounding in the style of Freund (1998). We then show how to combine these bounds with Freund's query-tree approach producing a version of Freund's query-tree structure that can be implemented with much more algorithmic efficiency.

21 citations


Journal Article
TL;DR: A nonlinear predictive control framework is presented, in which nonlinear plants are modeled on a support vector machine and the predictive control law is derived by a new stochastic search optimization algorithm.
Abstract: Support vector machines(SVM)are a new-generation machine learning technique based on the statistical learning theory. They can solve small-sample learning problems better by using Structural Risk Minimization in place of Experiential Risk Minimization. Moreover, SVMs can change a nonlinear learning problem in to a linear learning problem in order to reduce the algorithm complexity by using the kernel function idea. A nonlinear predictive control framework is presented, in which nonlinear plants are modeled on a support vector machine. The predictive control law is derived by a new stochastic search optimization algorithm. At last a simulation example is given to demonstrate the proposed approach.

11 citations


Proceedings ArticleDOI
02 Nov 2003
TL;DR: The applied range of the key theorem of learning theory is generalized by means of changing the probability measure space into credibility measure space and new concepts and new theorem on classical theoretical foundation are given.
Abstract: In 1970s, Vladimir N. Vapnik proposed statistical learning theory. The theory is considered as optimum theory on small samples statistical estimation and prediction learning. It has more systematically investigated the rational conditions of the empirical risk minimization discipline and the relations between the empirical risk and the expected risk on finite samples. In fact, the key theorem of learning theory plays an important role in statistical learning theory. Its importance results in paving the way for the subsequent theories and applications. However, some theories and definitions only suit to fixed probability measure. These restricted conditions reduce the applied range of theorem. In this paper, we will generalize the applied range by means of changing the probability measure space into credibility measure space. In new measure space, we give new concepts and new theorem on classical theoretical foundation.

11 citations


01 Jan 2003
TL;DR: The aim of this work is to generalize "on line" endogenous algorithms based on the empirical risk minimization principle in order to apply them to time series analysis and forecasting.
Abstract: Over the last decades learning an input-output mapping from a set of simples using neural networks has been regarded such as performing approximation of multidimensional functions, regression or classification. In this process, a wide variety of different methods and approximations have been applied to many areas, such as medicine, business and finance, social sciences, and others. Them aim of this work is to generalize "on line" endogenous algorithms based on the empirical risk minimization principle in order to apply them to time series analysis and forecasting. We postulate a model based on admissible kernel functions and regularization theory following the philosophy of support vector machines, a new emerging choice for solving the problem of function approximation. The new algorithm called INAPA-PRED (Improved Neural model with Automatic Parameter Adjustment for PREDiction) is derived, and we demonstrate its capacity for yielding quality predictions that can be very useful in many areas. In addition we extend this method to exogenous time series, hubridizing with techniques such as Independent Component Analysis (ICA) or Genetic Algorithms (GA) for reducing the neural approximation error. Finally we prove the benefits when the proposed methods are applied to chaotic time series in the experimental sections. Mainly we have worked with series from finance and business although these models could be apply to many others.

Book ChapterDOI
22 Sep 2003
TL;DR: This paper applies Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prunings of a given decision tree rather than the tree growing phase, and generalizes the error-bounding approach from binary classification to multi-class situations.
Abstract: Rademacher penalization is a modern technique for obtaining data-dependent bounds on the generalization error of classifiers. It would appear to be limited to relatively simple hypothesis classes because of computational complexity issues. In this paper we, nevertheless, apply Rademacher penalization to the in practice important hypothesis class of unrestricted decision trees by considering the prunings of a given decision tree rather than the tree growing phase. Moreover, we generalize the error-bounding approach from binary classification to multi-class situations. Our empirical experiments indicate that the proposed new bounds clearly outperform earlier bounds for decision tree prunings and provide non-trivial error estimates on real-world data sets.

Book ChapterDOI
04 Jun 2003
TL;DR: The general problem of reconstructing an unknown function from a finite collection of samples is considered, in case the position of each input vector in the training set is not fixed beforehand, but is part of the learning process.
Abstract: The general problem of reconstructing an unknown function from a finite collection of samples is considered, in case the position of each input vector in the training set is not fixed beforehand, but is part of the learning process. In particular, the consistency of the Empirical Risk Minimization (ERM) principle is analyzed, when the points in the input space are generated by employing a purely deterministic algorithm (deterministic learning). When the output generation is not subject to noise, classical number-theoretic results, involving discrepancy and variation, allow to establish a sufficient condition for the consistency of the ERM principle. In addition, the adoption of low-discrepancy sequences permits to achieve a learning rate of O(1/L), being L the size of the training set. An extension to the noisy case is discussed.

Proceedings Article
Tong Zhang1
09 Dec 2003
TL;DR: It is shown that some risk minimization formulations can also be used to obtain conditional probability estimates for the underlying problem, which will be useful for statistical inferencing tasks beyond classification.
Abstract: The purpose of this paper is to investigate infinity-sample properties of risk minimization based multi-category classification methods These methods can be considered as natural extensions to binary large margin classification We establish conditions that guarantee the infinity-sample consistency of classifiers obtained in the risk minimization framework Examples are provided for two specific forms of the general formulation, which extend a number of known methods Using these examples, we show that some risk minimization formulations can also be used to obtain conditional probability estimates for the underlying problem Such conditional probability information will be useful for statistical inferencing tasks beyond classification

Journal Article
TL;DR: The results as shown in tables indicate that the model can bring accurate learning precision and better prediction generalization ability compared with traditional linear regression and back propagation neural networks methods.
Abstract: In this paper, we first present a novel support vector regression (SVR) algorithm based on structural risk minimization inductive principle instead of empirical risk minimization principle. The algorithm realizes to solve non linear regression problems mainly through inducting e insensitive loss function and kernel function. We then develop an equipment cost estimation model using the SVR.Different types of avionics equipment cost is estimated in this paper. The results as shown in tables indicate that the model can bring accurate learning precision and better prediction generalization ability compared with traditional linear regression and back propagation neural networks methods. \;

01 Jan 2003
TL;DR: In this article, the generalization ability of empirical risk minimization algorithms is investigated in the con- text of distribution-free probably approximately correct (PAC) learning, and it is shown that a regularized approximation of the generic support vector method is PAC to any given accuracy when the regularization parameter is sufciently large.
Abstract: o In this paper, the generalization ability of empir- ical risk minimization algorithms is investigated in the con- text of distribution-free probably approximately correct (PAC) learning. We identify a class of empirical risk minimization algorithms that are PAC, and show that the generic version of the support vector regression method belongs to the class for any given Mercer kernel. Moreover, it is shown that a regularized approximation of the generic support vector method is PAC to any given accuracy when the regularization parameter is sufciently large. The generalization ability of the usual support vector regression method is deduced from these results.

Proceedings ArticleDOI
09 Dec 2003
TL;DR: This paper identifies a class of empirical risk minimization algorithms that are PAC, and shows that the generic version of the support vector regression method belongs to the class for any given Mercer kernel.
Abstract: In this paper, the generalization ability of empirical risk minimization algorithms is investigated in the context of distribution-free probably approximately correct (PAC) learning. We identify a class of empirical risk minimization algorithms that are PAC, and show that the generic version of the support vector regression method belongs to the class for any given Mercer kernel. Moreover, it is shown that a regularized approximation of the generic support vector method is PAC to any given accuracy when the regularization parameter is sufficiently large. The generalization ability of the usual support vector regression method is deduced from these results.