scispace - formally typeset
Search or ask a question
Author

Jean-Michel Loubes

Bio: Jean-Michel Loubes is an academic researcher from Institut de Mathématiques de Toulouse. The author has contributed to research in topics: Estimator & Inverse problem. The author has an hindex of 23, co-authored 184 publications receiving 9133 citations. Previous affiliations of Jean-Michel Loubes include Centre national de la recherche scientifique & Département de Mathématiques.


Papers
More filters
Journal ArticleDOI
TL;DR: A publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates is described.
Abstract: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

7,828 citations

Journal ArticleDOI
TL;DR: The LARS method as discussed by the authors is based on a recursive procedure selecting, at each step, the covariates having largest absolute correlation with the response variable, which enables recovering the estimates given by the Lasso and Stagewise.
Abstract: DISCUSSION OF “LEAST ANGLE REGRESSION” BY EFRONET AL.By Jean-Michel Loubes and Pascal MassartUniversit´e Paris-SudThe issue of model selection has drawn the attention of both applied andtheoretical statisticians for a long time. Indeed, there has been an enor-mous range of contribution in model selection proposals, including work byAkaike (1973), Mallows (1973), Foster and George (1994), Birg´e and Mas-sart (2001a) and Abramovich, Benjamini, Donoho and Johnstone (2000).Over the last decade, modern computer-driven methods have been devel-oped such as All Subsets, Forward Selection, Forward Stagewise or Lasso.Such methods are useful in the setting of the standard linear model, wherewe observe noisy data and wish to predict the response variable using onlya few covariates, since they provide automatically linear models that fit thedata. The procedure described in this paper is, on the one hand, numeri-cally very efficient and, on the other hand, very general, since, with slightmodifications, it enables us to recover the estimates given by the Lasso andStagewise.1. Estimation procedure. The “LARS” method is based on a recursiveprocedure selecting, at each step, the covariates having largest absolute cor-relation with the response y. In the case of an orthogonal design, the esti-mates can then be viewed as an l

341 citations

Journal ArticleDOI
TL;DR: A new distance is introduced: symmetrized segment-path distance (SSPD), which is compared to their corresponding clustering results obtained using both the hierarchical clustering and affinity propagation methods.
Abstract: In this paper, we tackle the issue of clustering trajectories of geolocalized observations based on the distance between trajectories. We first provide a comprehensive review of the different distances used in the literature to compare trajectories. Then, based on the limitations of these methods, we introduce a new distance: symmetrized segment-path distance (SSPD). We compare this new distance to the others according to their corresponding clustering results obtained using both the hierarchical clustering and affinity propagation methods. We finally present a python package: trajectory distance, which contains the methods for calculating the SSPD distance, and the other distances reviewed in this paper.

127 citations

Journal ArticleDOI
TL;DR: In this article, the existence of Wasserstein barycenters of random distributions defined on a geodesic space was proved and the consistency of this barycenter in a general setting was established.
Abstract: In this paper, based on the Frechet mean, we define a notion of barycenter corresponding to a usual notion of statistical mean. We prove the existence of Wasserstein barycenters of random distributions defined on a geodesic space (E, d). We also prove the consistency of this barycenter in a general setting, that includes taking barycenters of empirical versions of the distributions or of a growing set of distributions.

101 citations

Journal ArticleDOI
TL;DR: In this article, the authors tackle the problem of comparing distributions of random variables and defining a mean pattern between a sample of random events using barycenters of measures in the Wasserstein space, and propose an iterative version as an estimation of the mean distribution.
Abstract: In this paper we tackle the problem of comparing distributions of random variables and defining a mean pattern between a sample of random events. Using barycenters of measures in the Wasserstein space, we propose an iterative version as an estimation of the mean distribution. Moreover, when the distributions are a common measure warped by a centered random operator, then the barycenter enables to recover this distribution template.

88 citations


Cited by
More filters
Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

47,974 citations

Journal ArticleDOI
TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Abstract: Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the

16,538 citations

Journal ArticleDOI
TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Abstract: We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

13,656 citations

Proceedings ArticleDOI
13 Aug 2016
TL;DR: In this article, the authors propose LIME, a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem.
Abstract: Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In this work, we propose LIME, a novel explanation technique that explains the predictions of any classifier in an interpretable and faithful manner, by learning an interpretable model locally varound the prediction. We also propose a method to explain models by presenting representative individual predictions and their explanations in a non-redundant way, framing the task as a submodular optimization problem. We demonstrate the flexibility of these methods by explaining different models for text (e.g. random forests) and image classification (e.g. neural networks). We show the utility of explanations via novel experiments, both simulated and with human subjects, on various scenarios that require trust: deciding if one should trust a prediction, choosing between models, improving an untrustworthy classifier, and identifying why a classifier should not be trusted.

11,104 citations

Journal ArticleDOI
TL;DR: It is demonstrated theoretically and empirically that a greedy algorithm called orthogonal matching pursuit (OMP) can reliably recover a signal with m nonzero entries in dimension d given O(m ln d) random linear measurements of that signal.
Abstract: This paper demonstrates theoretically and empirically that a greedy algorithm called orthogonal matching pursuit (OMP) can reliably recover a signal with m nonzero entries in dimension d given O(m ln d) random linear measurements of that signal. This is a massive improvement over previous results, which require O(m2) measurements. The new results for OMP are comparable with recent results for another approach called basis pursuit (BP). In some settings, the OMP algorithm is faster and easier to implement, so it is an attractive alternative to BP for signal recovery problems.

8,604 citations