Home
/
Authors
/
Quentin Bertrand

Author

Quentin Bertrand

French Institute for Research in Computer Science and Automation

Bio: Quentin Bertrand is an academic researcher from French Institute for Research in Computer Science and Automation. The author has contributed to research in topics: Optimization problem & Lasso (statistics). The author has an hindex of 3, co-authored 12 publications receiving 42 citations.

Papers

PDF

Open Access

More filters

Posted Content•

Implicit differentiation of Lasso-type models for hyperparameter optimization

[...]

Quentin Bertrand¹, Quentin Klopfenstein², Mathieu Blondel³, Samuel Vaiter⁴, Alexandre Gramfort¹, Joseph Salmon⁵ - Show less +2 more•Institutions (5)

French Institute for Research in Computer Science and Automation¹, University of Burgundy², Google³, Centre national de la recherche scientifique⁴, University of Montpellier⁵

20 Feb 2020-arXiv: Machine Learning

TL;DR: The authors proposed an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems, which scales to high-dimensional data by leveraging the sparsity of the solutions.

...read moreread less

Abstract: Setting regularization parameters for Lasso-type estimators is notoriously difficult, though crucial in practice. The most popular hyperparameter optimization approach is grid-search using held-out validation data. Grid-search however requires to choose a predefined grid for each parameter, which scales exponentially in the number of parameters. Another approach is to cast hyperparameter optimization as a bi-level optimization problem, one can solve by gradient descent. The key challenge for these methods is the estimation of the gradient with respect to the hyperparameters. Computing this gradient via forward or backward automatic differentiation is possible yet usually suffers from high memory consumption. Alternatively implicit differentiation typically involves solving a linear system which can be prohibitive and numerically unstable in high dimension. In addition, implicit differentiation usually assumes smooth loss functions, which is not the case for Lasso-type problems. This work introduces an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions. Experiments demonstrate that the proposed method outperforms a large number of standard methods to optimize the error on held-out data, or the Stein Unbiased Risk Estimator (SURE).

...read moreread less

32 citations

Posted Content•

[...]

Quentin Bertrand¹, Mathurin Massias, Alexandre Gramfort, Joseph Salmon•Institutions (1)

French Institute for Research in Computer Science and Automation¹

07 Feb 2019-arXiv: Machine Learning

TL;DR: In this paper, a concomitant estimator is proposed to cope with complex noise structure by using non-averaged measurements, which is convex and amenable, thanks to smoothing theory, to state-of-the-art optimization techniques that leverage the sparsity of the solutions.

...read moreread less

Abstract: Sparsity promoting norms are frequently used in high dimensional regression. A limitation of such Lasso-type estimators is that the optimal regularization parameter depends on the unknown noise level. Estimators such as the concomitant Lasso address this dependence by jointly estimating the noise level and the regression coefficients. Additionally, in many applications, the data is obtained by averaging multiple measurements: this reduces the noise variance, but it dramatically reduces sample sizes and prevents refined noise modeling. In this work, we propose a concomitant estimator that can cope with complex noise structure by using non-averaged measurements. The resulting optimization problem is convex and amenable, thanks to smoothing theory, to state-of-the-art optimization techniques that leverage the sparsity of the solutions. Practical benefits are demonstrated on toy datasets, realistic simulated data and real neuroimaging data.

...read moreread less

13 citations

Posted Content•

Anderson acceleration of coordinate descent

[...]

Quentin Bertrand¹, Mathurin Massias²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, Université Paris-Saclay²

19 Nov 2020-arXiv: Machine Learning

TL;DR: This work proposes an accelerated version of coordinate descent using extrapolation, showing considerable speed up in practice, compared to inertial accelerated coordinate descent and extrapolated (proximal) gradient descent.

...read moreread less

Abstract: Acceleration of first order methods is mainly obtained via inertial techniques a la Nesterov, or via nonlinear extrapolation. The latter has known a recent surge of interest, with successful applications to gradient and proximal gradient techniques. On multiple Machine Learning problems, coordinate descent achieves performance significantly superior to full-gradient methods. Speeding up coordinate descent in practice is not easy: inertially accelerated versions of coordinate descent are theoretically accelerated, but might not always lead to practical speed-ups. We propose an accelerated version of coordinate descent using extrapolation, showing considerable speed up in practice, compared to inertial accelerated coordinate descent and extrapolated (proximal) gradient descent. Experiments on least squares, Lasso, elastic net and logistic regression validate the approach.

...read moreread less

7 citations

Proceedings Article•

Implicit differentiation of Lasso-type models for hyperparameter optimization

[...]

Quentin Bertrand¹, Quentin Klopfenstein², Mathieu Blondel³, Samuel Vaiter⁴, Alexandre Gramfort¹, Joseph Salmon⁵ - Show less +2 more•Institutions (5)

French Institute for Research in Computer Science and Automation¹, University of Burgundy², Google³, Centre national de la recherche scientifique⁴, University of Montpellier⁵

12 Jul 2020

TL;DR: This work introduces an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems, and demonstrates that the proposed method outperforms a large number of standard methods to optimize the error on held-out data, or the Stein Unbiased Risk Estimator (SURE).

...read moreread less

Abstract: Setting regularization parameters for Lasso-type estimators is notoriously difficult, though crucial in practice. The most popular hyperparam-eter optimization approach is grid-search using held-out validation data. Grid-search however requires to choose a predefined grid for each parameter , which scales exponentially in the number of parameters. Another approach is to cast hyperparameter optimization as a bi-level optimization problem, one can solve by gradient descent. The key challenge for these methods is the estimation of the gradient w.r.t. the hyperpa-rameters. Computing this gradient via forward or backward automatic differentiation is possible yet usually suffers from high memory consumption. Alternatively implicit differentiation typically involves solving a linear system which can be prohibitive and numerically unstable in high dimension. In addition, implicit differentiation usually assumes smooth loss functions, which is not the case for Lasso-type problems. This work introduces an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems. Our approach scales to high-dimensional data by leveraging the sparsity of the solutions. Experiments demonstrate that the proposed method outperforms a large number of standard methods to optimize the error on held-out data, or the Stein Unbiased Risk Esti-mator (SURE).

...read moreread less

6 citations

Posted Content•

Model identification and local linear convergence of coordinate descent

[...]

Quentin Klopfenstein, Quentin Bertrand, Alexandre Gramfort, Joseph Salmon, Samuel Vaiter - Show less +1 more

23 Nov 2020-arXiv: Machine Learning

TL;DR: It is shown that cyclic coordinate descent achieves model identification in finite time for a wide class of functions, and it is proved explicit local linear convergence rates for coordinate descent match well empirical results.

...read moreread less

Abstract: For composite nonsmooth optimization problems, Forward-Backward algorithm achieves model identification (e.g., support identification for the Lasso) after a finite number of iterations, provided the objective function is regular enough. Results concerning coordinate descent are scarcer and model identification has only been shown for specific estimators, the support-vector machine for instance. In this work, we show that cyclic coordinate descent achieves model identification in finite time for a wide class of functions. In addition, we prove explicit local linear convergence rates for coordinate descent. Extensive experiments on various estimators and on real datasets demonstrate that these rates match well empirical results.

...read moreread less

4 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

Acceleration techniques for iterated vector and matrix problems : (mathematics of computation, _1_6(1962), nr 79, p 301-322)

[...]

P. Wynn

01 Jan 1962

TL;DR: In this paper, the Samelson Inverse of a Vector (SIV) is introduced, which is equivalent to the simultaneous application of the scalar e-algorithm to the components of (a), (b) and (c).

...read moreread less

Abstract: is slowly convergent, then (in certain cases) the numerical convergence of the sequence e20Js = 0, 1, ■ • • to the limit (or antilimit), with which the sequence (2) may be associated, is far more rapid. In the application of the e-algorithm so far the e/m) have been scalar quantities; it is the purpose of this paper to extend the inquiry to the cases in which the Sm are a sequence of slowly convergent arrays. In particular, the cases in which the Sm are (a) vectors (b) square matrices (c) triangular matrices will be considered. The sums and differences of these entities are of course already well defined, but the choice of an inverse must be given some consideration. Four possibilities will be considered. They are (1) Primitive Inverse. Regarding each component separately, this is equivalent to the simultaneous application of the scalar e-algorithm to the components of (a), (b) and (c). (2) The Samelson Inverse of a Vector. Here an extremely elegant and profound idea, due to K. Samelson, is introduced. The inverse of a vector

...read moreread less

180 citations

Exploring Artificial Intelligence In The New Millennium

[...]

Sebastian Fischer

01 Jan 2016

TL;DR: Thank you for downloading exploring artificial intelligence in the new millennium, instead of reading a good book with a cup of coffee in the afternoon, instead they are facing with some infectious virus inside their computer.

...read moreread less

Abstract: Thank you for downloading exploring artificial intelligence in the new millennium. Maybe you have knowledge that, people have search hundreds times for their chosen novels like this exploring artificial intelligence in the new millennium, but end up in infectious downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they are facing with some infectious virus inside their computer.

...read moreread less

137 citations

Journal Article•DOI•

Robust estimation of noise for electromagnetic brain imaging with the champagne algorithm.

[...]

Chang Cai¹, Chang Cai², Ali Hashemi³, Mithun Diwakar², Stefan Haufe, Kensuke Sekihara⁴, Srikantan S. Nagarajan² - Show less +3 more•Institutions (4)

Central China Normal University¹, University of California, San Francisco², Technical University of Berlin³, Tokyo Medical and Dental University⁴

15 Jan 2021-NeuroImage

TL;DR: The resulting algorithm, Champagne with noise learning, is quite robust to initialization and is computationally efficient, and performance of the proposed noise learning algorithm is consistently superior to Champagne without noise learning.

...read moreread less

25 citations

Proceedings Article•

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

[...]

Mathieu Dagr'eou, Pierre Ablin, Samuel Vaiter, Thomas Moreau

31 Jan 2022

TL;DR: SABA, an adaptation of the celebrated SAGA algorithm in this framework, has O ( 1 T ) convergence rate, and that it achieves linear convergence under Polyak-Łojasciewicz assumption, which is the first stochastic algorithm for bilevel optimization that verifies either of these properties.

...read moreread less

Abstract: Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale empirical risk minimization setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates. To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates. The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction. We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption. This is the first stochastic algorithm for bilevel optimization that verifies either of these properties. Numerical experiments validate the usefulness of our method.

...read moreread less

24 citations

Proceedings Article•

Exact Optimal Accelerated Complexity for Fixed-Point Iterations

[...]

Jisun Park, Ernest K. Ryu

27 Jan 2022

TL;DR: In this paper , the optimal convergence rate of general ∆-point problems with nonexpansive nonlinear operators has not yet been established, and the authors present an acceleration mechanism for ∆ -point iterations with nonsmansive and contractive operators satisfying a H ¼ older-type growth condition.

...read moreread less

Abstract: Despite the broad use of ﬁxed-point iterations throughout applied mathematics, the optimal convergence rate of general ﬁxed-point problems with nonexpansive nonlinear operators has not been established. This work presents an acceleration mechanism for ﬁxed-point iterations with nonexpansive operators, contractive operators, and nonexpansive operators satisfying a H ¨ older-type growth condition. We then provide matching complexity lower bounds to establish the exact optimality of the acceleration mechanisms in the nonexpansive and contractive setups. Finally, we provide experiments with CT imaging, optimal transport, and decentralized optimization to demonstrate the practical effectiveness of the acceleration mechanism.

...read moreread less

11 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15

Collapse