scispace - formally typeset
Search or ask a question
Author

Jerome H. Friedman

Other affiliations: University of Washington
Bio: Jerome H. Friedman is an academic researcher from Stanford University. The author has contributed to research in topics: Lasso (statistics) & Multivariate statistics. The author has an hindex of 70, co-authored 155 publications receiving 138619 citations. Previous affiliations of Jerome H. Friedman include University of Washington.


Papers
More filters
Posted Content
TL;DR: A fast algorithm is presented that provides all such solutions for any differentiable function f and loss L, and any constraint P that is an increasing monotone function of the absolute value of each parameter.
Abstract: Many regression and classification procedures fit a parameterized function $f(x;w)$ of predictor variables $x$ to data $\{x_{i},y_{i}\}_1^N$ based on some loss criterion $L(y,f)$. Often, regularization is applied to improve accuracy by placing a constraint $P(w)\leq t$ on the values of the parameters $w$. Although efficient methods exist for finding solutions to these constrained optimization problems for all values of $t\geq0$ in the special case when $f$ is a linear function, none are available when $f$ is non-linear (e.g. Neural Networks). Here we present a fast algorithm that provides all such solutions for any differentiable function $f$ and loss $L$, and any constraint $P$ that is an increasing monotone function of the absolute value of each parameter. Applications involving sparsity inducing regularization of arbitrary Neural Networks are discussed. Empirical results indicate that these sparse solutions are usually superior to their dense counterparts in both accuracy and interpretability. This improvement in accuracy can often make Neural Networks competitive with, and sometimes superior to, state-of-the-art methods in the analysis of tabular data.

3 citations

Journal ArticleDOI
TL;DR: There is more of a continuum between the old and new methodology, and the opportunity for both to improve through their synergy.
Abstract: Professor Efron has presented us with a thought‐provoking paper on the relationship between prediction, estimation, and attribution in the modern era of data science. While we appreciate many of his arguments, we see more of a continuum between the old and new methodology, and the opportunity for both to improve through their synergy.

2 citations

Book ChapterDOI
08 Jun 2005
TL;DR: A new class of similarity functions, SF's, are introduced that can be used to discover properties in the feature space X and to perform their grouping with standard clustering techniques.
Abstract: Variability and noise in data-sets entries make hard the discover of important regularities among association rules in mining problems. The need exists for defining flexible and robust similarity measures between association rules. This paper introduces a new class of similarity functions, SF's, that can be used to discover properties in the feature space X and to perform their grouping with standard clustering techniques. Properties of the proposed SF's are investigated and experiments on simulated data-sets are also shown to evaluate the grouping performance.

2 citations

01 Nov 1981
TL;DR: In this paper, an importance sampling technique is proposed to improve the efficiency of an acceptance/rejection generating method by adaptively partitioning the sampling region so that the variation of density values within each subregion is relatively small.
Abstract: : Monte Carlo calculations often require generation of a random sample of n-dimensional points drawn from a specified multivariate probability distribution. We present an importance sampling technique that can often greatly improve the efficiency of an acceptance/rejection generating method. The importance sampling function is defined as piecewise constant on a set of subregions, which are obtained by adaptively partitioning the sampling region so that the variation of density values within each subregion is relatively small. The partitioning strategy is based on multiparameter optimization to estimate the maximum and minimum of the original density function in each subregion. (Author)

2 citations

Journal ArticleDOI
TL;DR: In this article, the authors compare the results obtained by other types of approaches to the inclusive analysis of high-energy collisions, including the multi-regge model, with the results of the single-energy analysis.
Abstract: The pr~dictions of the inultiperipheral model are compared to inclusive data in K+ P and IT P re,!:ctions, We compare with topological \\ longitudinal momentum distributions, double, differential distributions, m~ltiplicity cross-sections, IT+ /ITratio, asymmetry characteristics, isotropy in the cm, and Regge behavior near the kinematical limit. The agreement is reasonably good. We discuss the relation of this work to earlier work on the multi-Regge model, to results of other models, and to the results obtained by other types of approaches to \\ the inclusive analysis. _2 1. INTRODUC TION h I 'h\" I . (1, 2) f t' During t east two years t e mc USlve type 0 reac lOn a + b -+ c .j. anything has become a popular means of studying high energy collisions. Two different approaches to this study can perhap~ be distinguished. On the one hand, detailed studies have been made of the momentum , distribution of particle\" c\" in the momentum regions.gear the kine, /\\ matical limit. For example, comparisons of a given' reaction (e. g. ' (3) , IT + p'\" IT + anything for slow IT in the lab. ) at several energIes have been made to test the Yang conjecture(2) of limiting distributions. Comparisons of the IT distribution of proton targets with different ' incident particles have been made(4) to test the factorization hypothesis(5). Finally, st.udies of a singte reaction at a,single energy have been ,made to test the quantitative predictions of the Regge limit near the kinematical boundary( 6). The advantage of this type of approach is that by examining this momentum range in such detail with these va.rious methods, one can perhaps obtain insight into the precise character, of the production process. H'Owever, the scope of the knowledge is limited for example, little is said about the distribution at PL 0, or about its dependence on prong number, or about correlations between the spectra of different types of secondaries (for example, in a p p reaction the relation between fast produced IT spectra and inelastic p spectra). On the other hand, various dynamical models have been proposed that describe the spectra over the entire momentum_range. For example, we list: (a) the multiperipheral model in the exclusive form of ABFST(7), Chew and Pignotti(8), and CLA(9); and in the inclusive form of Caneschi and Pignotti(10); (b) the thermodynamical model

2 citations


Cited by
More filters
Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

47,974 citations

Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Journal ArticleDOI
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

40,826 citations

Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations