scispace - formally typeset
Search or ask a question
Author

Jerome H. Friedman

Other affiliations: University of Washington
Bio: Jerome H. Friedman is an academic researcher from Stanford University. The author has contributed to research in topics: Lasso (statistics) & Multivariate statistics. The author has an hindex of 70, co-authored 155 publications receiving 138619 citations. Previous affiliations of Jerome H. Friedman include University of Washington.


Papers
More filters
01 Jan 2003
TL;DR: Support vector machines and boosted decision trees have been used for predictive learning as discussed by the authors, tracing their respective ancestral roots to standard kernel methods and ordinary decision trees, and they can be traced back to the early days of the computer age.
Abstract: Prediction involves estimating the unknown value of an attribute of a system under study given the values of other measured attributes. In prediction (machine) learning the prediction rule is derived from data consisting of previously solved cases. Most methods for predictive learning were originated many years ago at the dawn of the computer age. Recently two new techniques have emerged that have revitalized the field. These are support vector machines and boosted decision trees. This paper provides an introduction to these two new methods tracing their respective ancestral roots to standard kernel methods and ordinary decision trees.

14 citations

Journal ArticleDOI
TL;DR: While Cherkassky and Ma (2003) raise some interesting issues in comparing techniques for model selection, their article appears to be written largely in protest of comparisons made in the book, Elements of Statistical Learning (2001), CherkASSky andMa feel that the structural risk minimization (SRM) method is falsely represented.
Abstract: While Cherkassky and Ma (2003) raise some interesting issues in comparing techniques for model selection, their article appears to be written largely in protest of comparisons made in our book, Elements of Statistical Learning (2001). Cherkassky and Ma feel that we falsely represented the structural risk minimization (SRM) method, which they defend strongly here.In a two-page section of our book (pp. 212-213), we made an honest attempt to compare the SRM method with two related techniques, Aikaike information criterion (AIC) and Bayesian information criterion (BIC). Apparently, we did not apply SRM in the optimal way. We are also accused of using contrived examples, designed to make SRM look bad.Alas, we did introduce some careless errors in our original simulation-- errors that were corrected in the second and subsequent printings. Some of these errors were pointed out to us by Cherkassky and Ma (we supplied them with our source code), and as a result we replaced the assessment "SRM performs poorly overall" with a more moderate "the performance of SRM is mixed" (p. 212). These and other corrections can be seen in the errata section on-line at http://www-stat.stanford.edu/ElemStatLearn.

13 citations

Journal ArticleDOI
TL;DR: Contrast trees represent an approach for assessing the accuracy of many types of machine-learning estimates that are not amenable to standard validation methods and can be used as diagnostic tools to reveal and then understand the inaccuracies of models produced by any learning method.
Abstract: A method for decision tree induction is presented. Given a set of predictor variables [Formula: see text] and two outcome variables y and z associated with each x, the goal is to identify those values of x for which the respective distributions of [Formula: see text] and [Formula: see text], or selected properties of those distributions such as means or quantiles, are most different. Contrast trees provide a lack-of-fit measure for statistical models of such statistics, or for the complete conditional distribution [Formula: see text], as a function of x. They are easily interpreted and can be used as diagnostic tools to reveal and then understand the inaccuracies of models produced by any learning method. A corresponding contrast-boosting strategy is described for remedying any uncovered errors, thereby producing potentially more accurate predictions. This leads to a distribution-boosting strategy for directly estimating the full conditional distribution of y at each x under no assumptions concerning its shape, form, or parametric representation.

12 citations

Posted Content
TL;DR: A new method for supervised learning that combines the lasso (ℓ1) penalty with a quadratic penalty that shrinks the coefficient vector toward the feature matrix's leading principal components (PCs).
Abstract: We propose a new method for supervised learning, especially suited to wide data where the number of features is much greater than the number of observations. The method combines the lasso ($\ell_1$) sparsity penalty with a quadratic penalty that shrinks the coefficient vector toward the leading principal components of the feature matrix. We call the proposed method the "principal components lasso" ("pcLasso"). The method can be especially powerful if the features are pre-assigned to groups (such as cell-pathways, assays or protein interaction networks). In that case, pcLasso shrinks each group-wise component of the solution toward the leading principal components of that group. In the process, it also carries out selection of the feature groups. We provide some theory for this method and illustrate it on a number of simulated and real data examples.

10 citations


Cited by
More filters
Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

47,974 citations

Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Journal ArticleDOI
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

40,826 citations

Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations