scispace - formally typeset
Author

Jake Vanderplas

Bio: Jake Vanderplas is an academic researcher from University of Washington. The author has contributed to research in topic(s): Python (programming language) & Weak gravitational lensing. The author has an hindex of 30, co-authored 56 publication(s) receiving 77174 citation(s).

Papers published on a yearly basis

Papers
More filters
Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

33,540 citations

Posted Content
TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from this http URL.

28,898 citations

Journal ArticleDOI
TL;DR: SciPy as discussed by the authors is an open-source scientific computing library for the Python programming language, which has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year.
Abstract: SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.

6,244 citations

Journal ArticleDOI
TL;DR: SciPy as discussed by the authors is an open source scientific computing library for the Python programming language, which includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics.
Abstract: SciPy is an open source scientific computing library for the Python programming language. SciPy 1.0 was released in late 2017, about 16 years after the original version 0.1 release. SciPy has become a de facto standard for leveraging scientific algorithms in the Python programming language, with more than 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories, and millions of downloads per year. This includes usage of SciPy in almost half of all machine learning projects on GitHub, and usage by high profile projects including LIGO gravitational wave analysis and creation of the first-ever image of a black hole (M87). The library includes functionality spanning clustering, Fourier transforms, integration, interpolation, file I/O, linear algebra, image processing, orthogonal distance regression, minimization algorithms, signal processing, sparse matrix handling, computational geometry, and statistics. In this work, we provide an overview of the capabilities and development practices of the SciPy library and highlight some recent technical developments.

3,147 citations

Posted Content
TL;DR: The Large Synoptic Survey Telescope (LSST) as discussed by the authors will have an effective aperture of 6.7 meters and an imaging camera with field of view of 9.6 degrees.
Abstract: A survey that can cover the sky in optical bands over wide fields to faint magnitudes with a fast cadence will enable many of the exciting science opportunities of the next decade. The Large Synoptic Survey Telescope (LSST) will have an effective aperture of 6.7 meters and an imaging camera with field of view of 9.6 deg^2, and will be devoted to a ten-year imaging survey over 20,000 deg^2 south of +15 deg. Each pointing will be imaged 2000 times with fifteen second exposures in six broad bands from 0.35 to 1.1 microns, to a total point-source depth of r~27.5. The LSST Science Book describes the basic parameters of the LSST hardware, software, and observing plans. The book discusses educational and outreach opportunities, then goes on to describe a broad range of science that LSST will revolutionize: mapping the inner and outer Solar System, stellar populations in the Milky Way and nearby galaxies, the structure of the Milky Way disk and halo and other objects in the Local Volume, transient and variable objects both at low and high redshift, and the properties of normal and active galaxies at low and high redshift. It then turns to far-field cosmological topics, exploring properties of supernovae to z~1, strong and weak lensing, the large-scale distribution of galaxies and baryon oscillations, and how these different probes may be combined to constrain cosmological models and the physics of dark energy.

1,127 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a combination of seven-year data from WMAP and improved astrophysical data rigorously tests the standard cosmological model and places new constraints on its basic parameters and extensions.
Abstract: The combination of seven-year data from WMAP and improved astrophysical data rigorously tests the standard cosmological model and places new constraints on its basic parameters and extensions. By combining the WMAP data with the latest distance measurements from the baryon acoustic oscillations (BAO) in the distribution of galaxies and the Hubble constant (H0) measurement, we determine the parameters of the simplest six-parameter ΛCDM model. The power-law index of the primordial power spectrum is ns = 0.968 ± 0.012 (68% CL) for this data combination, a measurement that excludes the Harrison–Zel’dovich–Peebles spectrum by 99.5% CL. The other parameters, including those beyond the minimal set, are also consistent with, and improved from, the five-year results. We find no convincing deviations from the minimal model. The seven-year temperature power spectrum gives a better determination of the third acoustic peak, which results in a better determination of the redshift of the matter-radiation equality epoch. Notable examples of improved parameters are the total mass of neutrinos, � mν < 0.58 eV (95% CL), and the effective number of neutrino species, Neff = 4.34 +0.86 −0.88 (68% CL), which benefit from better determinations of the third peak and H0. The limit on a constant dark energy equation of state parameter from WMAP+BAO+H0, without high-redshift Type Ia supernovae, is w =− 1.10 ± 0.14 (68% CL). We detect the effect of primordial helium on the temperature power spectrum and provide a new test of big bang nucleosynthesis by measuring Yp = 0.326 ± 0.075 (68% CL). We detect, and show on the map for the first time, the tangential and radial polarization patterns around hot and cold spots of temperature fluctuations, an important test of physical processes at z = 1090 and the dominance of adiabatic scalar fluctuations. The seven-year polarization data have significantly improved: we now detect the temperature–E-mode polarization cross power spectrum at 21σ , compared with 13σ from the five-year data. With the seven-year temperature–B-mode cross power spectrum, the limit on a rotation of the polarization plane due to potential parity-violating effects has improved by 38% to Δα =− 1. 1 ± 1. 4(statistical) ± 1. 5(systematic) (68% CL). We report significant detections of the Sunyaev–Zel’dovich (SZ) effect at the locations of known clusters of galaxies. The measured SZ signal agrees well with the expected signal from the X-ray data on a cluster-by-cluster basis. However, it is a factor of 0.5–0.7 times the predictions from “universal profile” of Arnaud et al., analytical models, and hydrodynamical simulations. We find, for the first time in the SZ effect, a significant difference between the cooling-flow and non-cooling-flow clusters (or relaxed and non-relaxed clusters), which can explain some of the discrepancy. This lower amplitude is consistent with the lower-than-theoretically expected SZ power spectrum recently measured by the South Pole Telescope Collaboration.

10,928 citations

Proceedings ArticleDOI
13 Aug 2016
TL;DR: XGBoost as discussed by the authors proposes a sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning to achieve state-of-the-art results on many machine learning challenges.
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

10,428 citations

Journal ArticleDOI
TL;DR: The second Gaia data release, Gaia DR2 as mentioned in this paper, is a major advance with respect to Gaia DR1 in terms of completeness, performance, and richness of the data products.
Abstract: Context. We present the second Gaia data release, Gaia DR2, consisting of astrometry, photometry, radial velocities, and information on astrophysical parameters and variability, for sources brighter than magnitude 21. In addition epoch astrometry and photometry are provided for a modest sample of minor planets in the solar system. Aims: A summary of the contents of Gaia DR2 is presented, accompanied by a discussion on the differences with respect to Gaia DR1 and an overview of the main limitations which are still present in the survey. Recommendations are made on the responsible use of Gaia DR2 results. Methods: The raw data collected with the Gaia instruments during the first 22 months of the mission have been processed by the Gaia Data Processing and Analysis Consortium (DPAC) and turned into this second data release, which represents a major advance with respect to Gaia DR1 in terms of completeness, performance, and richness of the data products. Results: Gaia DR2 contains celestial positions and the apparent brightness in G for approximately 1.7 billion sources. For 1.3 billion of those sources, parallaxes and proper motions are in addition available. The sample of sources for which variability information is provided is expanded to 0.5 million stars. This data release contains four new elements: broad-band colour information in the form of the apparent brightness in the GBP (330-680 nm) and GRP (630-1050 nm) bands is available for 1.4 billion sources; median radial velocities for some 7 million sources are presented; for between 77 and 161 million sources estimates are provided of the stellar effective temperature, extinction, reddening, and radius and luminosity; and for a pre-selected list of 14 000 minor planets in the solar system epoch astrometry and photometry are presented. Finally, Gaia DR2 also represents a new materialisation of the celestial reference frame in the optical, the Gaia-CRF2, which is the first optical reference frame based solely on extragalactic sources. There are notable changes in the photometric system and the catalogue source list with respect to Gaia DR1, and we stress the need to consider the two data releases as independent. Conclusions: Gaia DR2 represents a major achievement for the Gaia mission, delivering on the long standing promise to provide parallaxes and proper motions for over 1 billion stars, and representing a first step in the availability of complementary radial velocity and source astrophysical information for a sample of stars in the Gaia survey which covers a very substantial fraction of the volume of our galaxy.

7,024 citations

Journal ArticleDOI
Peter A. R. Ade1, Nabila Aghanim2, C. Armitage-Caplan3, Monique Arnaud4  +324 moreInstitutions (70)
TL;DR: In this paper, the authors present the first cosmological results based on Planck measurements of the cosmic microwave background (CMB) temperature and lensing-potential power spectra, which are extremely well described by the standard spatially-flat six-parameter ΛCDM cosmology with a power-law spectrum of adiabatic scalar perturbations.
Abstract: This paper presents the first cosmological results based on Planck measurements of the cosmic microwave background (CMB) temperature and lensing-potential power spectra. We find that the Planck spectra at high multipoles (l ≳ 40) are extremely well described by the standard spatially-flat six-parameter ΛCDM cosmology with a power-law spectrum of adiabatic scalar perturbations. Within the context of this cosmology, the Planck data determine the cosmological parameters to high precision: the angular size of the sound horizon at recombination, the physical densities of baryons and cold dark matter, and the scalar spectral index are estimated to be θ∗ = (1.04147 ± 0.00062) × 10-2, Ωbh2 = 0.02205 ± 0.00028, Ωch2 = 0.1199 ± 0.0027, and ns = 0.9603 ± 0.0073, respectively(note that in this abstract we quote 68% errors on measured parameters and 95% upper limits on other parameters). For this cosmology, we find a low value of the Hubble constant, H0 = (67.3 ± 1.2) km s-1 Mpc-1, and a high value of the matter density parameter, Ωm = 0.315 ± 0.017. These values are in tension with recent direct measurements of H0 and the magnitude-redshift relation for Type Ia supernovae, but are in excellent agreement with geometrical constraints from baryon acoustic oscillation (BAO) surveys. Including curvature, we find that the Universe is consistent with spatial flatness to percent level precision using Planck CMB data alone. We use high-resolution CMB data together with Planck to provide greater control on extragalactic foreground components in an investigation of extensions to the six-parameter ΛCDM model. We present selected results from a large grid of cosmological models, using a range of additional astrophysical data sets in addition to Planck and high-resolution CMB data. None of these models are favoured over the standard six-parameter ΛCDM cosmology. The deviation of the scalar spectral index from unity isinsensitive to the addition of tensor modes and to changes in the matter content of the Universe. We find an upper limit of r0.002< 0.11 on the tensor-to-scalar ratio. There is no evidence for additional neutrino-like relativistic particles beyond the three families of neutrinos in the standard model. Using BAO and CMB data, we find Neff = 3.30 ± 0.27 for the effective number of relativistic degrees of freedom, and an upper limit of 0.23 eV for the sum of neutrino masses. Our results are in excellent agreement with big bang nucleosynthesis and the standard value of Neff = 3.046. We find no evidence for dynamical dark energy; using BAO and CMB data, the dark energy equation of state parameter is constrained to be w = -1.13-0.10+0.13. We also use the Planck data to set limits on a possible variation of the fine-structure constant, dark matter annihilation and primordial magnetic fields. Despite the success of the six-parameter ΛCDM model in describing the Planck data at high multipoles, we note that this cosmology does not provide a good fit to the temperature power spectrum at low multipoles. The unusual shape of the spectrum in the multipole range 20 ≲ l ≲ 40 was seen previously in the WMAP data and is a real feature of the primordial CMB anisotropies. The poor fit to the spectrum at low multipoles is not of decisive significance, but is an “anomaly” in an otherwise self-consistent analysis of the Planck temperature data.

6,641 citations

Proceedings ArticleDOI
13 Aug 2016
TL;DR: In this article, the authors propose LIME, a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem.
Abstract: Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

6,284 citations