scispace - formally typeset
Search or ask a question
Author

Quentin Bertrand

Bio: Quentin Bertrand is an academic researcher from French Institute for Research in Computer Science and Automation. The author has contributed to research in topics: Optimization problem & Lasso (statistics). The author has an hindex of 3, co-authored 12 publications receiving 42 citations.

Papers
More filters
Posted Content
TL;DR: The authors proposed an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems, which scales to high-dimensional data by leveraging the sparsity of the solutions.
Abstract: Setting regularization parameters for Lasso-type estimators is notoriously difficult, though crucial in practice. The most popular hyperparameter optimization approach is grid-search using held-out validation data. Grid-search however requires to choose a predefined grid for each parameter, which scales exponentially in the number of parameters. Another approach is to cast hyperparameter optimization as a bi-level optimization problem, one can solve by gradient descent. The key challenge for these methods is the estimation of the gradient with respect to the hyperparameters. Computing this gradient via forward or backward automatic differentiation is possible yet usually suffers from high memory consumption. Alternatively implicit differentiation typically involves solving a linear system which can be prohibitive and numerically unstable in high dimension. In addition, implicit differentiation usually assumes smooth loss functions, which is not the case for Lasso-type problems. This work introduces an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions. Experiments demonstrate that the proposed method outperforms a large number of standard methods to optimize the error on held-out data, or the Stein Unbiased Risk Estimator (SURE).

32 citations

Posted Content
TL;DR: In this paper, a concomitant estimator is proposed to cope with complex noise structure by using non-averaged measurements, which is convex and amenable, thanks to smoothing theory, to state-of-the-art optimization techniques that leverage the sparsity of the solutions.
Abstract: Sparsity promoting norms are frequently used in high dimensional regression. A limitation of such Lasso-type estimators is that the optimal regularization parameter depends on the unknown noise level. Estimators such as the concomitant Lasso address this dependence by jointly estimating the noise level and the regression coefficients. Additionally, in many applications, the data is obtained by averaging multiple measurements: this reduces the noise variance, but it dramatically reduces sample sizes and prevents refined noise modeling. In this work, we propose a concomitant estimator that can cope with complex noise structure by using non-averaged measurements. The resulting optimization problem is convex and amenable, thanks to smoothing theory, to state-of-the-art optimization techniques that leverage the sparsity of the solutions. Practical benefits are demonstrated on toy datasets, realistic simulated data and real neuroimaging data.

13 citations

Posted Content
TL;DR: This work proposes an accelerated version of coordinate descent using extrapolation, showing considerable speed up in practice, compared to inertial accelerated coordinate descent and extrapolated (proximal) gradient descent.
Abstract: Acceleration of first order methods is mainly obtained via inertial techniques a la Nesterov, or via nonlinear extrapolation. The latter has known a recent surge of interest, with successful applications to gradient and proximal gradient techniques. On multiple Machine Learning problems, coordinate descent achieves performance significantly superior to full-gradient methods. Speeding up coordinate descent in practice is not easy: inertially accelerated versions of coordinate descent are theoretically accelerated, but might not always lead to practical speed-ups. We propose an accelerated version of coordinate descent using extrapolation, showing considerable speed up in practice, compared to inertial accelerated coordinate descent and extrapolated (proximal) gradient descent. Experiments on least squares, Lasso, elastic net and logistic regression validate the approach.

7 citations

Proceedings Article
12 Jul 2020
TL;DR: This work introduces an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems, and demonstrates that the proposed method outperforms a large number of standard methods to optimize the error on held-out data, or the Stein Unbiased Risk Estimator (SURE).
Abstract: Setting regularization parameters for Lasso-type estimators is notoriously difficult, though crucial in practice. The most popular hyperparam-eter optimization approach is grid-search using held-out validation data. Grid-search however requires to choose a predefined grid for each parameter , which scales exponentially in the number of parameters. Another approach is to cast hyperparameter optimization as a bi-level optimization problem, one can solve by gradient descent. The key challenge for these methods is the estimation of the gradient w.r.t. the hyperpa-rameters. Computing this gradient via forward or backward automatic differentiation is possible yet usually suffers from high memory consumption. Alternatively implicit differentiation typically involves solving a linear system which can be prohibitive and numerically unstable in high dimension. In addition, implicit differentiation usually assumes smooth loss functions, which is not the case for Lasso-type problems. This work introduces an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions. Experiments demonstrate that the proposed method outperforms a large number of standard methods to optimize the error on held-out data, or the Stein Unbiased Risk Esti-mator (SURE).

6 citations

Posted Content
TL;DR: It is shown that cyclic coordinate descent achieves model identification in finite time for a wide class of functions, and it is proved explicit local linear convergence rates for coordinate descent match well empirical results.
Abstract: For composite nonsmooth optimization problems, Forward-Backward algorithm achieves model identification (e.g., support identification for the Lasso) after a finite number of iterations, provided the objective function is regular enough. Results concerning coordinate descent are scarcer and model identification has only been shown for specific estimators, the support-vector machine for instance. In this work, we show that cyclic coordinate descent achieves model identification in finite time for a wide class of functions. In addition, we prove explicit local linear convergence rates for coordinate descent. Extensive experiments on various estimators and on real datasets demonstrate that these rates match well empirical results.

4 citations


Cited by
More filters
01 Jan 1962
TL;DR: In this paper, the Samelson Inverse of a Vector (SIV) is introduced, which is equivalent to the simultaneous application of the scalar e-algorithm to the components of (a), (b) and (c).
Abstract: is slowly convergent, then (in certain cases) the numerical convergence of the sequence e20Js = 0, 1, ■ • • to the limit (or antilimit), with which the sequence (2) may be associated, is far more rapid. In the application of the e-algorithm so far the e/m) have been scalar quantities; it is the purpose of this paper to extend the inquiry to the cases in which the Sm are a sequence of slowly convergent arrays. In particular, the cases in which the Sm are (a) vectors (b) square matrices (c) triangular matrices will be considered. The sums and differences of these entities are of course already well defined, but the choice of an inverse must be given some consideration. Four possibilities will be considered. They are (1) Primitive Inverse. Regarding each component separately, this is equivalent to the simultaneous application of the scalar e-algorithm to the components of (a), (b) and (c). (2) The Samelson Inverse of a Vector. Here an extremely elegant and profound idea, due to K. Samelson, is introduced. The inverse of a vector

180 citations

01 Jan 2016
TL;DR: Thank you for downloading exploring artificial intelligence in the new millennium, instead of reading a good book with a cup of coffee in the afternoon, instead they are facing with some infectious virus inside their computer.
Abstract: Thank you for downloading exploring artificial intelligence in the new millennium. Maybe you have knowledge that, people have search hundreds times for their chosen novels like this exploring artificial intelligence in the new millennium, but end up in infectious downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they are facing with some infectious virus inside their computer.

137 citations

Journal ArticleDOI
TL;DR: The resulting algorithm, Champagne with noise learning, is quite robust to initialization and is computationally efficient, and performance of the proposed noise learning algorithm is consistently superior to Champagne without noise learning.

25 citations

Proceedings Article
31 Jan 2022
TL;DR: SABA, an adaptation of the celebrated SAGA algorithm in this framework, has O ( 1 T ) convergence rate, and that it achieves linear convergence under Polyak-Łojasciewicz assumption, which is the first stochastic algorithm for bilevel optimization that verifies either of these properties.
Abstract: Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale empirical risk minimization setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.

24 citations

Proceedings Article
27 Jan 2022
TL;DR: In this paper , the optimal convergence rate of general ∆-point problems with nonexpansive nonlinear operators has not yet been established, and the authors present an acceleration mechanism for ∆ -point iterations with nonsmansive and contractive operators satisfying a H ¼ older-type growth condition.
Abstract: Despite the broad use of fixed-point iterations throughout applied mathematics, the optimal convergence rate of general fixed-point problems with nonexpansive nonlinear operators has not been established. This work presents an acceleration mechanism for fixed-point iterations with nonexpansive operators, contractive operators, and nonexpansive operators satisfying a H ¨ older-type growth condition. We then provide matching complexity lower bounds to establish the exact optimality of the acceleration mechanisms in the nonexpansive and contractive setups. Finally, we provide experiments with CT imaging, optimal transport, and decentralized optimization to demonstrate the practical effectiveness of the acceleration mechanism.

11 citations