scispace - formally typeset

Stochastic gradient descent

About: Stochastic gradient descent is a(n) research topic. Over the lifetime, 6111 publication(s) have been published within this topic receiving 246716 citation(s). The topic is also known as: SGD & Stochastic gradient descent, SGD. more


Open accessJournal ArticleDOI: 10.1214/AOMS/1177729586
Herbert Robbins1, Sutton Monro1Institutions (1)
Abstract: Let M(x) denote the expected value at level x of the response to a certain experiment. M(x) is assumed to be a monotone function of x but is unknown to the experimenter, and it is desired to find the solution x = θ of the equation M(x) = α, where a is a given constant. We give a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability. more

7,621 Citations

Open accessPosted Content
22 Dec 2012-arXiv: Learning
Abstract: We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment. more

Topics: Stochastic gradient descent (66%), Gradient descent (63%), Online machine learning (63%) more

5,567 Citations

Open accessBook ChapterDOI: 10.1007/978-3-319-58347-1_10
Yaroslav Ganin1, Evgeniya Ustinova1, Hana Ajakan2, Pascal Germain2  +4 moreInstitutions (3)
Abstract: We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains. The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of features that are (i) discriminative for the main learning task on the source domain and (ii) indiscriminate with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation and stochastic gradient descent, and can thus be implemented with little effort using any of the deep learning packages. We demonstrate the success of our approach for two distinct classification problems (document sentiment analysis and image classification), where state-of-the-art domain adaptation performance on standard benchmarks is achieved. We also validate the approach for descriptor learning task in the context of person re-identification application. more

Topics: Semi-supervised learning (60%), Domain (software engineering) (58%), Feature learning (56%) more

4,760 Citations

Open accessBook ChapterDOI: 10.1007/978-3-7908-2604-3_16
01 Jan 2010-
Abstract: During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems. The large-scale case involves the computational complexity of the underlying optimization algorithm in non-trivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for large-scale problems. In particular, second order stochastic gradient and averaged stochastic gradient are asymptotically efficient after a single pass on the training set. more

Topics: Stochastic gradient descent (72%), Gradient method (69%), Gradient descent (65%) more

4,576 Citations

Open accessProceedings Article
01 Jan 2017-
Abstract: Generative Adversarial Networks (GANs) excel at creating realistic images with complex models for which maximum likelihood is infeasible. However, the convergence of GAN training has still not been proved. We propose a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions. TTUR has an individual learning rate for both the discriminator and the generator. Using the theory of stochastic approximation, we prove that the TTUR converges under mild assumptions to a stationary local Nash equilibrium. The convergence carries over to the popular Adam optimization, for which we prove that it follows the dynamics of a heavy ball with friction and thus prefers flat minima in the objective landscape. For the evaluation of the performance of GANs at image generation, we introduce the `Frechet Inception Distance'' (FID) which captures the similarity of generated images to real ones better than the Inception Score. In experiments, TTUR improves learning for DCGANs and Improved Wasserstein GANs (WGAN-GP) outperforming conventional GAN training on CelebA, CIFAR-10, SVHN, LSUN Bedrooms, and the One Billion Word Benchmark. more

3,731 Citations

No. of papers in the topic in previous years

Top Attributes

Show by:

Topic's top 5 most impactful authors

Francis Bach

31 papers, 2.9K citations

Dan Alistarh

22 papers, 1.1K citations

Praneeth Netrapalli

19 papers, 675 citations

Nathan Srebro

17 papers, 3.1K citations

Alejandro Ribeiro

17 papers, 371 citations

Network Information
Related Topics (5)
Gradient descent

16.3K papers, 466.1K citations

93% related
Supervised learning

20.8K papers, 710.5K citations

90% related
Semi-supervised learning

12.1K papers, 611.2K citations

90% related
Recurrent neural network

29.2K papers, 890K citations

90% related
Kernel method

11.3K papers, 501K citations

89% related