scispace - formally typeset
Search or ask a question

Showing papers by "Jake Vanderplas published in 2011"


Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

47,974 citations


Proceedings Article
14 Jun 2011
TL;DR: Generative models for detecting group anomalies, which are larger scale phenomena that only become apparent when groups of points are considered, are proposed.
Abstract: Statistical anomaly detection typically focuses on finding individual point anomalies. Often the most interesting or unusual things in a data set are not odd individual points, but rather larger scale phenomena that only become apparent when groups of points are considered. In this paper, we propose generative models for detecting such group anomalies. We evaluate our methods on synthetic data as well as astronomical data from the Sloan Digital Sky Survey. The empirical results show that the proposed models are effective in detecting group anomalies.

95 citations


Journal ArticleDOI
TL;DR: In this paper, the authors study observable deviations from modified gravity theories in the disks of late-type dwarf galaxies moving under gravity and find four distinct observable effects in such disk galaxies: 1. Warping of the stellar disk along the direction of the external force.
Abstract: In modified gravity theories that seek to explain cosmic acceleration, dwarf galaxies in low density environments can be subject to enhanced forces. The class of scalar-tensor theories, which includes f(R) gravity, predict such a force enhancement (massive galaxies like the Milky Way can evade it through a screening mechanism that protects the interior of the galaxy from this ``fifth'' force). We study observable deviations from GR in the disks of late-type dwarf galaxies moving under gravity. The fifth-force acts on the dark matter and HI gas disk, but not on the stellar disk owing to the self-screening of main sequence stars. We find four distinct observable effects in such disk galaxies: 1. A displacement of the stellar disk from the HI disk. 2. Warping of the stellar disk along the direction of the external force. 3. Enhancement of the rotation curve measured from the HI gas compared to that of the stellar disk. 4. Asymmetry in the rotation curve of the stellar disk. We estimate that the spatial effects can be up to 1 kpc and the rotation velocity effects about 10 km/s in infalling dwarf galaxies. Such deviations are measurable: we expect that with a careful analysis of a sample of nearby dwarf galaxies one can improve astrophysical constraints on gravity theories by over three orders of magnitude, and even solar system constraints by one order of magnitude. Thus effective tests of gravity along the lines suggested by Hui, Nicolis, and Stubbs (2009) and Jain (2011) can be carried out with low-redshift galaxies, though care must be exercised in understanding possible complications from astrophysical effects.

68 citations


Journal ArticleDOI
TL;DR: In this article, the authors study observable deviations from modified gravity theories in the disks of late-type dwarf galaxies moving under gravity and find four distinct observable effects in such disk galaxies: 1. Warping of the stellar disk along the direction of the external force.
Abstract: In modified gravity theories that seek to explain cosmic acceleration, dwarf galaxies in low density environments can be subject to enhanced forces. The class of scalar-tensor theories, which includes f(R) gravity, predict such a force enhancement (massive galaxies like the Milky Way can evade it through a screening mechanism that protects the interior of the galaxy from this "fifth" force). We study observable deviations from GR in the disks of late-type dwarf galaxies moving under gravity. The fifth-force acts on the dark matter and HI gas disk, but not on the stellar disk owing to the self-screening of main sequence stars. We find four distinct observable effects in such disk galaxies: 1. A displacement of the stellar disk from the HI disk. 2. Warping of the stellar disk along the direction of the external force. 3. Enhancement of the rotation curve measured from the HI gas compared to that of the stellar disk. 4. Asymmetry in the rotation curve of the stellar disk. We estimate that the spatial effects can be up to 1 kpc and the rotation velocity effects about 10 km/s in infalling dwarf galaxies. Such deviations are measurable: we expect that with a careful analysis of a sample of nearby dwarf galaxies one can improve astrophysical constraints on gravity theories by over three orders of magnitude, and even solar system constraints by one order of magnitude. Thus effective tests of gravity along the lines suggested by Hui et al (2009) and Jain (2011) can be carried out with low-redshift galaxies, though care must be exercised in understanding possible complications from astrophysical effects.

54 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigated the use of dimensionality reduction techniques for the classification of stellar spectra selected from the Sloan Digital Sky Survey (SDS) using local linear embedding (LLE), a technique that preserves the local structure within high-dimensional data sets.
Abstract: We investigate the use of dimensionality reduction techniques for the classification of stellar spectra selected from the Sloan Digital Sky Survey. Using local linear embedding (LLE), a technique that preserves the local (and possibly nonlinear) structure within high-dimensional data sets, we show that the majority of stellar spectra can be represented as a one-dimensional sequence within a three-dimensional space. The position along this sequence is highly correlated with spectral temperature. Deviations from this 'stellar locus' are indicative of spectra with strong emission lines (including misclassified galaxies) or broad absorption lines (e.g., carbon stars). Based on this analysis, we propose a hierarchical classification scheme using LLE that progressively identifies and classifies stellar spectra in a manner that requires no feature extraction and that can reproduce the classic MK classifications to an accuracy of one type.

39 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a hierarchical classification scheme using LLE that progressively identifies and classifies stellar spectra in a manner that requires no feature extraction and that can reproduce the classic MK classifications to an accuracy of one type.
Abstract: We investigate the use of dimensionality reduction techniques for the classification of stellar spectra selected from the SDSS. Using local linear embedding (LLE), a technique that preserves the local (and possibly non-linear) structure within high dimensional data sets, we show that the majority of stellar spectra can be represented as a one dimensional sequence within a three dimensional space. The position along this sequence is highly correlated with spectral temperature. Deviations from this "stellar locus" are indicative of spectra with strong emission lines (including misclassified galaxies) or broad absorption lines (e.g. Carbon stars). Based on this analysis, we propose a hierarchical classification scheme using LLE that progressively identifies and classifies stellar spectra in a manner that requires no feature extraction and that can reproduce the classic MK classifications to an accuracy of one type.

34 citations


Journal ArticleDOI
TL;DR: In this paper, the singular value framework was used to construct a 3D mass map from gravitational lensing shear data, which yields near-optimal angular resolution and allows cluster sized halos to be de-blended robustly.
Abstract: We present a new method for constructing three-dimensional mass maps from gravitational lensing shear data. We solve the lensing inversion problem using truncation of singular values (within the context of generalized least-squares estimation) without a priori assumptions about the statistical nature of the signal. This singular value framework allows a quantitative comparison between different filtering methods: we evaluate our method beside the previously explored Wiener-filter approaches. Our method yields near-optimal angular resolution of the lensing reconstruction and allows cluster sized halos to be de-blended robustly. It allows for mass reconstructions which are two to three orders of magnitude faster than the Wiener-filter approach; in particular, we estimate that an all-sky reconstruction with arcminute resolution could be performed on a timescale of hours. We find however that linear, non-parametric reconstructions have a fundamental limitation in the resolution achieved in the redshift direction.

33 citations


Journal ArticleDOI
TL;DR: In this article, the authors explore the utility of Karhunen Loeve (KL) analysis in solving practical problems in the analysis of gravitational shear surveys and develop a method to use two dimensional KL eigenmodes of shear to interpolate noisy shear measurements across masked regions.
Abstract: We explore the utility of Karhunen Loeve (KL) analysis in solving practical problems in the analysis of gravitational shear surveys. Shear catalogs from large-field weak lensing surveys will be subject to many systematic limitations, notably incomplete coverage and pixel-level masking due to foreground sources. We develop a method to use two dimensional KL eigenmodes of shear to interpolate noisy shear measurements across masked regions. We explore the results of this method with simulated shear catalogs, using statistics of high-convergence regions in the resulting map. We find that the KL procedure not only minimizes the bias due to masked regions in the field, it also reduces spurious peak counts from shape noise by a factor of ~ 3 in the cosmologically sensitive regime. This indicates that KL reconstructions of masked shear are not only useful for creating robust convergence maps from masked shear catalogs, but also offer promise of improved parameter constraints within studies of shear peak statistics.

1 citations