scispace - formally typeset
Open AccessJournal ArticleDOI

Optimization Methods for Large-Scale Machine Learning

Léon Bottou, +2 more
- 08 May 2018 - 
- Vol. 60, Iss: 2, pp 223-311
Reads0
Chats0
TLDR
The authors provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications and discusses how optimization problems arise in machine learning and what makes them challenging.
Abstract
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques th...

read more

Citations
More filters
Journal ArticleDOI

Generalizing from a Few Examples: A Survey on Few-shot Learning

TL;DR: A thorough survey to fully understand Few-shot Learning (FSL), and categorizes FSL methods from three perspectives: data, which uses prior knowledge to augment the supervised experience; model, which used to reduce the size of the hypothesis space; and algorithm, which using prior knowledgeto alter the search for the best hypothesis in the given hypothesis space.
Posted Content

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

TL;DR: In this paper, the authors investigate the cause of the generalization drop in the large batch regime and present numerical evidence that supports the view that large-batch methods tend to converge to sharp minima of the training and testing functions.
Posted Content

Generalizing from a Few Examples: A Survey on Few-Shot Learning

TL;DR: A thorough survey to fully understand Few-Shot Learning (FSL), and categorizes FSL methods from three perspectives: data, which uses prior knowledge to augment the supervised experience; model, which used to reduce the size of the hypothesis space; and algorithm, which using prior knowledgeto alter the search for the best hypothesis in the given hypothesis space.
Journal ArticleDOI

Model reduction of dynamical systems on nonlinear manifolds using deep convolutional autoencoders

TL;DR: The ability of the method to significantly outperform even the optimal linear-subspace ROM on benchmark advection-dominated problems is demonstrated, thereby demonstrating the method's ability to overcome the intrinsic $n$-width limitations of linear subspaces.
Journal ArticleDOI

Solving inverse problems using data-driven models

TL;DR: This survey paper aims to give an account of some of the main contributions in data-driven inverse problems.