scispace - formally typeset
Search or ask a question

Showing papers by "Jerome H. Friedman published in 2009"


Book ChapterDOI
01 Jan 2009
TL;DR: The generalization performance of a learning method relates to its prediction capability on independent test data, and gives a measure of the quality of the ultimately chosen model.
Abstract: The generalization performance of a learning method relates to its prediction capability on independent test data. Assessment of this performance is extremely important in practice, since it guides the choice of learning method or model, and gives us a measure of the quality of the ultimately chosen model.

220 citations


Book ChapterDOI
01 Jan 2009
TL;DR: Boosting is one of the most powerful learning ideas introduced in the last ten years, but as will be seen in this chapter, it can profitably be extended to regression as well.
Abstract: Boosting is one of the most powerful learning ideas introduced in the last ten years. It was originally designed for classification problems, but as will be seen in this chapter, it can profitably be extended to regression as well. The motivation for boosting was a procedure that combines the outputs of many “weak” classifiers to produce a powerful “committee.” From this perspective boosting bears a resemblance to bagging and other committee-based approaches (Section 8.8). However we shall see that the connection is at best superficial and that boosting is fundamentally different.

192 citations


Book ChapterDOI
01 Jan 2009
TL;DR: This chapter begins the discussion of some specific methods for supervised learning by describing five related techniques: generalized additive models, trees, multivariate adaptive regression splines, the patient rule induction method, and hierarchical mixtures of experts.
Abstract: In this chapter we begin our discussion of some specific methods for supervised learning. These techniques each assume a (different) structured form for the unknown regression function, and by doing so they finesse the curse of dimensionality. Of course, they pay the possible price of misspecifying the model, and so in each case there is a tradeoff that has to be made. They take off where Chapters 3–6 left off. We describe five related techniques: generalized additive models, trees, multivariate adaptive regression splines, the patient rule induction method, and hierarchical mixtures of experts.

58 citations


Book ChapterDOI
01 Jan 2009
TL;DR: In this article, the authors describe generalizations of linear decision boundaries for classification, including flexible discriminant analysis which facilitates construction of nonlinear boundaries in a manner very similar to the support vector machines.
Abstract: In this chapter we describe generalizations of linear decision boundaries for classification. Optimal separating hyperplanes are introduced in Chapter 4 for the case when two classes are linearly separable. Here we cover extensions to the nonseparable case, where the classes overlap. These techniques are then generalized to what is known as the support vector machine, which produces nonlinear boundaries by constructing a linear boundary in a large, transformed version of the feature space. The second set of methods generalize Fisher’s linear discriminant analysis (LDA). The generalizations include flexible discriminant analysis which facilitates construction of nonlinear boundaries in a manner very similar to the support vector machines, penalized discriminant analysis for problems such as signal and image classification where the large number of features are highly correlated, and mixture discriminant analysis for irregularly shaped classes.

45 citations


Book ChapterDOI
01 Jan 2009

29 citations


Book ChapterDOI
01 Jan 2009

28 citations


Book ChapterDOI
01 Jan 2009
TL;DR: For most of this book, the fitting (learning) of models has been achieved by minimizing a sum of squares for regression, or by minimizing cross-entropy for classification by maximizing the maximum likelihood approach to fitting.
Abstract: For most of this book, the fitting (learning) of models has been achieved by minimizing a sum of squares for regression, or by minimizing cross-entropy for classification. In fact, both of these minimizations are instances of the maximum likelihood approach to fitting.

22 citations


Book ChapterDOI
01 Jan 2009
TL;DR: Because they are highly unstructured, they typically aren’t useful for understanding the nature of the relationship between the features and class outcome, but as black box prediction engines, they can be very effective, and are often among the best performers in real data problems.
Abstract: In this chapter we discuss some simple and essentially model-free methods for classification and pattern recognition. Because they are highly unstructured, they typically aren’t useful for understanding the nature of the relationship between the features and class outcome. However, as black box prediction engines, they can be very effective, and are often among the best performers in real data problems. The nearest-neighbor technique can also be used in regression; this was touched on in Chapter 2 and works reasonably well for low-dimensional problems. However, with high-dimensional features, the bias—variance tradeoff does not work as favorably for nearest-neighbor regression as it does for classification.

19 citations


Book ChapterDOI
01 Jan 2009

6 citations