scispace - formally typeset
Search or ask a question

Showing papers by "Jerome H. Friedman published in 2020"


25 Apr 2020
TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Abstract: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

730 citations


Journal ArticleDOI
TL;DR: In this article, an automated method that guides the extraction of expert knowledge and its integration into machine-learned models is presented, which improves performance on out-of-sample data and is able to train with less data.
Abstract: Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications.

64 citations


Journal ArticleDOI
TL;DR: A generalization of the lasso that allows the model coefficients to vary as a function of a general set of some prespecified modifying variables, which might be variables such as gender, age, or time is proposed.
Abstract: We propose a generalization of the lasso that allows the model coefficients to vary as a function of a general set of some prespecified modifying variables. These modifiers might be variables such ...

22 citations


Journal ArticleDOI
TL;DR: Contrast trees represent an approach for assessing the accuracy of many types of machine-learning estimates that are not amenable to standard validation methods and can be used as diagnostic tools to reveal and then understand the inaccuracies of models produced by any learning method.
Abstract: A method for decision tree induction is presented. Given a set of predictor variables [Formula: see text] and two outcome variables y and z associated with each x, the goal is to identify those values of x for which the respective distributions of [Formula: see text] and [Formula: see text], or selected properties of those distributions such as means or quantiles, are most different. Contrast trees provide a lack-of-fit measure for statistical models of such statistics, or for the complete conditional distribution [Formula: see text], as a function of x. They are easily interpreted and can be used as diagnostic tools to reveal and then understand the inaccuracies of models produced by any learning method. A corresponding contrast-boosting strategy is described for remedying any uncovered errors, thereby producing potentially more accurate predictions. This leads to a distribution-boosting strategy for directly estimating the full conditional distribution of y at each x under no assumptions concerning its shape, form, or parametric representation.

12 citations



Posted Content
TL;DR: Regression procedures based on an optimal transformation strategy are presented for estimating location, scale and shape of p(y|x) as general functions of x, in the possible presence of such imperfect training data.
Abstract: The goal of regression analysis is to predict the value of a numeric outcome variable y given a vector of joint values of other (predictor) variables x. Usually a particular x-vector does not specify a repeatable value for y, but rather a probability distribution of possible y--values, p(y|x). This distribution has a location, scale and shape, all of which can depend on x, and are needed to infer likely values for y given x. Regression methods usually assume that training data y-values are perfect numeric realizations from some well behaived p(y|x). Often actual training data y-values are discrete, truncated and/or arbitrary censored. Regression procedures based on an optimal transformation strategy are presented for estimating location, scale and shape of p(y|x) as general functions of x, in the possible presence of such imperfect training data. In addition, validation diagnostics are presented to ascertain the quality of the solutions.

3 citations


Journal ArticleDOI
TL;DR: There is more of a continuum between the old and new methodology, and the opportunity for both to improve through their synergy.
Abstract: Professor Efron has presented us with a thought‐provoking paper on the relationship between prediction, estimation, and attribution in the modern era of data science. While we appreciate many of his arguments, we see more of a continuum between the old and new methodology, and the opportunity for both to improve through their synergy.

2 citations


Journal ArticleDOI
TL;DR: The observation that decision trees are boosting algorithms, as cited in the authors' work and acknowledged by Nock and Nielsen (2), was first established by refs.
Abstract: The observation that decision trees are boosting algorithms, as cited in our work (1) and acknowledged by Nock and Nielsen (2), was first established by refs. 3 and 4. This was later used by refs. 5 and 6 to develop, to the best of our knowledge, the first decision tree algorithms based purely on boosting. This work, cited in our article, precedes refs. 7 and 8 cited by Nock and Nielsen (2). The original and important contributions of refs. 7 and 8 as they pertain to this discussion was to theoretically prove convergence rates for decision tree algorithms built with boosting, along with … [↵][1]1To whom correspondence may be addressed. Email: gilmer.valdes{at}ucsf.edu. [1]: #xref-corresp-1-1

Journal ArticleDOI
TL;DR: In this paper, Efron has presented a thought-provoking paper on the relationship between prediction, estimation, and attribution in the modern era of data science, and while we appreciate many of his insights, we do not agree with all of them.
Abstract: Professor Efron has presented us with a thought-provoking paper on the relationship between prediction, estimation, and attribution in the modern era of data science. While we appreciate many of hi...