scispace - formally typeset
Topic

Stochastic optimization

About: Stochastic optimization is a(n) research topic. Over the lifetime, 14753 publication(s) have been published within this topic receiving 567014 citation(s). The topic is also known as: stochastic optimisation.

...read more

Papers
  More

Open accessProceedings Article
Diederik P. Kingma1, Jimmy Ba2Institutions (2)
01 Jan 2015-
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

...read more

Topics: Stochastic optimization (63%), Convex optimization (54%), Rate of convergence (52%) ...read more

78,539 Citations


Open accessPosted Content
Diederik P. Kingma1, Jimmy Ba2Institutions (2)
22 Dec 2014-arXiv: Learning
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

...read more

23,369 Citations


Open accessJournal ArticleDOI: 10.1214/AOMS/1177729586
Herbert Robbins1, Sutton Monro1Institutions (1)
Abstract: Let M(x) denote the expected value at level x of the response to a certain experiment. M(x) is assumed to be a monotone function of x but is unknown to the experimenter, and it is desired to find the solution x = θ of the equation M(x) = α, where a is a given constant. We give a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability.

...read more

7,621 Citations


01 Jan 1983-
Topics: Stochastic optimization (77%)

6,990 Citations


Open accessJournal Article
John C. Duchi1, Elad Hazan2, Yoram Singer3Institutions (3)
Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

...read more

Topics: Subgradient method (69%), Online machine learning (63%), Empirical risk minimization (59%) ...read more

6,957 Citations


Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202218
2021732
2020802
2019707
2018684
2017850

Top Attributes

Show by:

Topic's top 5 most impactful authors

Yuri Ermoliev

31 papers, 1.3K citations

Roger J.-B. Wets

28 papers, 2.1K citations

Alejandro Ribeiro

19 papers, 713 citations

Werner Römisch

18 papers, 556 citations

Andrzej Ruszczyński

18 papers, 1.9K citations

Network Information
Related Topics (5)
Optimization problem

96.4K papers, 2.1M citations

93% related
Linear programming

32.1K papers, 920.3K citations

92% related
Constrained optimization

11.2K papers, 386.6K citations

92% related
Quadratic programming

13.9K papers, 427.2K citations

92% related
Nonlinear programming

19.4K papers, 656.6K citations

92% related