Optimization Methods for Large-Scale Machine Learning

doi:10.1137/16M1080173

Open AccessJournal ArticleDOI

Optimization Methods for Large-Scale Machine Learning

Léon Bottou, +2 more

- 08 May 2018 -

Siam Review

- Vol. 60, Iss: 2, pp 223-311

Chats0

TLDR

The authors provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications and discusses how optimization problems arise in machine learning and what makes them challenging.

Abstract:

This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques th...

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Generalizing from a Few Examples: A Survey on Few-shot Learning

Yaqing Wang, +3 more

- 12 Jun 2020 -

ACM Computing Surveys

TL;DR: A thorough survey to fully understand Few-shot Learning (FSL), and categorizes FSL methods from three perspectives: data, which uses prior knowledge to augment the supervised experience; model, which used to reduce the size of the hypothesis space; and algorithm, which using prior knowledgeto alter the search for the best hypothesis in the given hypothesis space.

...read moreread less

Posted Content

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Nitish Shirish Keskar, +4 more

- 15 Sep 2016 -

arXiv: Learning

TL;DR: In this paper, the authors investigate the cause of the generalization drop in the large batch regime and present numerical evidence that supports the view that large-batch methods tend to converge to sharp minima of the training and testing functions.

...read moreread less

Posted Content

Generalizing from a Few Examples: A Survey on Few-Shot Learning

Yaqing Wang, +3 more

- 10 Apr 2019 -

arXiv: Learning

TL;DR: A thorough survey to fully understand Few-Shot Learning (FSL), and categorizes FSL methods from three perspectives: data, which uses prior knowledge to augment the supervised experience; model, which used to reduce the size of the hypothesis space; and algorithm, which using prior knowledgeto alter the search for the best hypothesis in the given hypothesis space.

...read moreread less

Journal ArticleDOI

Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders

Kookjin Lee, +1 more

- 01 Mar 2020 -

Journal of Computational Physics

TL;DR: The ability of the method to significantly outperform even the optimal linear-subspace ROM on benchmark advection-dominated problems is demonstrated, thereby demonstrating the method's ability to overcome the intrinsic $n$-width limitations of linear subspaces.

...read moreread less

Journal ArticleDOI

Solving inverse problems using data-driven models

Simon R. Arridge, +3 more

- 01 May 2019 -

Acta Numerica

TL;DR: This survey paper aims to give an account of some of the main contributions in data-driven inverse problems.

...read moreread less

Collapse

Optimization Methods for Large-Scale Machine Learning

Citations

Generalizing from a Few Examples: A Survey on Few-shot Learning

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Generalizing from a Few Examples: A Survey on Few-Shot Learning

Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders

Solving inverse problems using data-driven models

Related Papers (5)

A Stochastic Approximation Method

Deep Residual Learning for Image Recognition

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Adam: A Method for Stochastic Optimization

Learning Multiple Layers of Features from Tiny Images