Adam: A Method for Stochastic Optimization
Citations
5,668 citations
Cites methods from "Adam: A Method for Stochastic Optim..."
...We train with a batch size of 4 for 200k iterations using Adam [51] with a learning rate of 1×10−3 without weight decay or dropout....
[...]
...a batch size of 4 for 200k iterations using Adam [51] with a learning rate of 1×10−3 without weight decay or dropout....
[...]
...We use Adam [51] with a learning rate of 1× 10−3....
[...]
5,667 citations
5,448 citations
Cites background or methods from "Adam: A Method for Stochastic Optim..."
...For larger data-sets a more computationally efficient mini-batch setting can be readily employed using stochastic gradient descent and its modern variants [36,37]....
[...]
...For larger data-sets, such as the data-driven model discovery examples discussed in section 4, a more computationally efficient mini-batch setting can be readily employed using stochastic gradient descent and its modern variants [36,37]....
[...]
5,444 citations
Cites methods from "Adam: A Method for Stochastic Optim..."
...We use the Adam optimizer [32] with a learning rate of 2e-4, and use 8 GPUs each with a minibatch of 8 examples from which the negative samples in the contrastive loss are drawn....
[...]
5,354 citations
References
73,978 citations
20,769 citations
16,717 citations
9,091 citations
"Adam: A Method for Stochastic Optim..." refers background or methods in this paper
...Objectives may also have other sources of noise than data subsampling, such as dropout (Hinton et al., 2012b) regularization....
[...]
...SGD proved itself as an efficient and effective optimization method that was central in many machine learning success stories, such as recent advances in deep learning (Deng et al., 2013; Krizhevsky et al., 2012; Hinton & Salakhutdinov, 2006; Hinton et al., 2012a; Graves et al., 2013)....
[...]
...…the advantages of two recently popular methods: AdaGrad (Duchi et al., 2011), which works well with sparse gradients, and RMSProp (Tieleman & Hinton, 2012), which works well in on-line and non-stationary settings; important connections to these and other stochastic optimization methods are…...
[...]
7,244 citations