scispace - formally typeset
Search or ask a question

Showing papers by "Jerome H. Friedman published in 2001"



Journal ArticleDOI
TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.
Abstract: Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent “boosting” paradigm is developed for additive expansions based on any fitting criterion.Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such “TreeBoost” models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.

17,764 citations



Book ChapterDOI
01 Jan 2001
TL;DR: The first three examples described in Chapter 1 have several components in common, for each there is a set of variables that might be denoted as inputs, which are measured or preset.
Abstract: The first three examples described in Chapter 1 have several components in common For each there is a set of variables that might be denoted as inputs, which are measured or preset These have some influence on one or more outputs For each example the goal is to use the inputs to predict the values of the outputs This exercise is called supervised learning

181 citations


Book ChapterDOI
01 Jan 2001
TL;DR: In this article, a linear regression model is used to model the transformation of the input to the output of a linear model, which is called basis function method (BFP) and can be applied to transformations of inputs.
Abstract: A linear regression model assumes that the regression function E(Y|X) is linear in the inputs X 1,..., X p . Linear models were largely developed in the precomputer age of statistics, but even in today’s computer era there are still good reasons to study and use them. They are simple and often provide an adequate and interpretable description of how the inputs affect the output. For prediction purposes they can sometimes outperform fancier nonlinear models, especially in situations with small numbers of training cases, low signal-to-noise ratio or sparse data. Finally, linear methods can be applied to transformations of the inputs and this considerably expands their scope. These generalizations are sometimes called basis-function methods, and are discussed in Chapter 5.

104 citations


Journal ArticleDOI
TL;DR: In this article, the authors explore certaines des causes of this situation, and s'interroge sur les raisons pour lesquelles les statisticiens devraient etre interesses a participer au developpement de nouvelles methodes pour traiter les fichiers de donnees volumineux and complexes.
Abstract: La nature des donnees change rapidement. Les fichiers de donnees deviennent de plus en plus volumineux et complexes. La methodologie moderne pour analyser ces nouveaux types de donnees provient des domaines de la gestion des bases de donnees, de l'intelligence artificielle, de la reconnaisance de caracteres, et de la visualisation de donnees. Pour l'instant, la statistique, en tant que discipline, n'a joue qu'un role mineur. Cet article explore certaines des causes de cette situation, et s'interroge sur les raisons pour lesquelles les statisticiens devraient etre interesses a participer au developpement de nouvelles methodes pour traiter les fichiers de donnees volumineux et complexes.

57 citations


Book ChapterDOI
01 Jan 2001
TL;DR: This chapter revisits the classification problem and focuses on linear methods for classification, which means that the boundaries of these regions can be rough or smooth, depending on the prediction function.
Abstract: In this chapter we revisit the classification problem and focus on linear methods for classification. Since our predictor G(x) takes values in a discrete set G, we can always divide the input space into a collection of regions labeled according to the classification. We saw in Chapter 2 that the boundaries of these regions can be rough or smooth, depending on the prediction function. For an important class of procedures, these decision boundaries are linear; this is what we will mean by linear methods for classification.

36 citations


Book ChapterDOI
01 Jan 2001
TL;DR: In this paper, it is shown that the true function f(X) = E(Y|X) will typically be nonlinear and nonadditive in X, and representation by a linear model is usually a convenient, and sometimes a necessary, approximation.
Abstract: We have already made use of models linear in the input features, both for regression and classification. Linear regression, linear discriminant analysis, logistic regression and separating hyperplanes all rely on a linear model. It is extremely unlikely that the true function f(X) is actually linear in X. In regression problems, f(X) = E(Y|X) will typically be nonlinear and nonadditive in X, and representing f(X) by a linear model is usually a convenient, and sometimes a necessary, approximation. Convenient because a linear model is easy to interpret, and is the first-order Taylor approximation to f(X). Sometimes necessary, because with N small and/or p large, a linear model might be all we are able to fit to the data without overfitting. Likewise in classification, a linear, Bayes-optimal decision boundary implies that some monotone transformation of Pr(Y = 1|X) is linear in X. This is inevitably an approximation.

19 citations