scispace - formally typeset
Search or ask a question
Author

Jerome H. Friedman

Other affiliations: University of Washington
Bio: Jerome H. Friedman is an academic researcher from Stanford University. The author has contributed to research in topics: Lasso (statistics) & Multivariate statistics. The author has an hindex of 70, co-authored 155 publications receiving 138619 citations. Previous affiliations of Jerome H. Friedman include University of Washington.


Papers
More filters
ReportDOI
01 Jul 1982
TL;DR: Computationally efficient algorithms making use of updating formulas and corresponding FORTRAN subroutines are presented, to make the smoother resistant against outliers.
Abstract: : A variable span scatterplot smoother based on local linear fits is described. Local cross-validation is used to estimate the optimal span as a function of abscissa value. A rejection rule is suggested to make the smoother resistant against outliers. Computationally efficient algorithms making use of updating formulas and corresponding FORTRAN subroutines are presented.

72 citations

Journal ArticleDOI
TL;DR: In this paper, three statistical approaches (CARP, Patient Rule Induction Method (PRIM), and ModeMap) have been used to define compositional populations within a large database (n > 13,000) of Cr-pyrope garnets from the subcontinental lithospheric mantle (SCLM).
Abstract: [1] Three novel statistical approaches (Cluster Analysis by Regressive Partitioning [CARP], Patient Rule Induction Method [PRIM], and ModeMap) have been used to define compositional populations within a large database (n > 13,000) of Cr-pyrope garnets from the subcontinental lithospheric mantle (SCLM). The variables used are the major oxides and proton-microprobe data for Zn, Ga, Sr, Y, and Zr. Because the rules defining these populations (classes) are expressed in simple compositional variables, they are easily applied to new samples and other databases. The classes defined by the three methods show strong similarities and correlations, suggesting that they are statistically meaningful. The geological significance of the classes has been tested by classifying garnets from 184 mantle-derived peridotite xenoliths and from a smaller database (n > 5400) of garnets analyzed for >20 trace elements by laser ablation microprobe–inductively coupled plasma-mass spectrometry (LAM–ICPMS). The relative abundances of these classes in the lithospheric mantle vary widely across different tectonic settings, and some classes are absent or very rare in either Archean or Phanerozoic SCLM. Their distribution with depth also varies widely within individual lithospheric sections and between different sections of similar tectonothermal age. These garnet classes therefore are a useful tool for mapping the geology of the SCLM. Archean SCLM sections show high degrees of depletion and varying degrees of metasomatism, and they are commonly strongly layered. Several Proterozoic SCLM sections show a concentration of more depleted material near their base, grading upward into more fertile lherzolites. The distribution of garnet classes reflecting low-T phlogopite-related metasomatism and high-T melt-related metasomatism suggests that many of these Proterozoic SCLM sections consist of strongly metasomatized Archean SCLM. The garnet-facies SCLM beneath Phanerozoic terrains is only mildly depleted relative to Primitive Upper Mantle (PUM) compositions. These data emphasize the secular evolution of SCLM composition defined earlier [Griffin et al., 1998, 1999a] and suggest that at least part of this evolutionary trend reflects reworking and refertilization of SCLM formed in the Archean time.

72 citations

BookDOI
01 Jan 1986
TL;DR: The Third International Workshop on Data Analysis in Astronomy, held at the EUore Majorana Center for Scientific Culture, Erice, Sicily, Italy, on June 20-27, 1988 as mentioned in this paper, was the natural evolution of the two previous ones.
Abstract: In the book are reported the main results presented at the Third International Workshop on Data Analysis in Astronomy, held at the EUore Majorana Center for Scientific Culture, Erice, Sicily, Italy, on June 20-27,1988 The Workshop was the natural evolution of the two previous ones The main goal of the first edition (Erice 1984) was to start a scientific interaction between Astronomers and Computer Scientists Aim of the second (Erice 1986) was to look at the progress in data analysis methods and dedicated hardware technology Data analysis problems become harder whenever the data are poor in statistics or the signal is weak and embedded in structured background Experiments collecting data of such a nature require new and non-standard methodologies Possibilistic approaches could be merged with the statistical ones, in order to formalize all the knowledge used by the scientists to reach conclusions Moreover, the last decade has been characterized by very fast developments of Intelligent Systems for data analysis (knowledge based systems, ) that would be useful to support astronomers in complex decision making For these reasons, the last edition of the workshop was intended to provide an overview on the state of the art in the data analysis methodologies and tools in the new frontieres of the astrophysics (y-astronomy, neutrino astronomy, gravitational waves, background radiation and extreme cosmic ray energy spectrum) The book is organized in two sections: - Data analysis methods and tools, - New frontieres in astronomy

71 citations

Journal ArticleDOI
TL;DR: In this paper, graphical methods for comparing multivariate samples are presented based on minimal spanning tree techniques developed for multivariate two-sample tests and illustrated through examples using both real and artificial data.
Abstract: Some graphical methods for comparing multivariate samples are presented. These methods are based on minimal spanning tree techniques developed for multivariate two-sample tests. The utility of these methods is illustrated through examples using both real and artificial data.

66 citations


Cited by
More filters
Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

47,974 citations

Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Journal ArticleDOI
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

40,826 citations

Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations