Adam: A Method for Stochastic Optimization
Citations
58 citations
Cites methods from "Adam: A Method for Stochastic Optim..."
...We use the Adam optimizer (Kingma & Ba, 2014) with learning rate = 3 × 10−4 for Jaytracer and GQN datasets and = 5× 10−4 for the CLEVR dataset....
[...]
...NeRF is trained with a batch size of 256 rays, using Adam with learning rate 1−3, for 56 iterations....
[...]
...We use Adam (Kingma & Ba, 2014) and β-annealing of the KL term in Eq....
[...]
...We use a 128 dimensional latent variable, Adam with a learning rate of 5−4 for 16 iterations, and β = 11−6, which is annealed to 1−4 from iteration 40k to 140k....
[...]
58 citations
Cites methods from "Adam: A Method for Stochastic Optim..."
...As for the the optimization algorithm, we chose the Adaptive Moment Estimator (ADAM) [25]....
[...]
...In our early experiments, ADAM converged much faster than stochastic gradient descent and NADAM [11]....
[...]
58 citations
58 citations
58 citations
References
73,978 citations
20,769 citations
16,717 citations
9,091 citations
"Adam: A Method for Stochastic Optim..." refers background or methods in this paper
...Objectives may also have other sources of noise than data subsampling, such as dropout (Hinton et al., 2012b) regularization....
[...]
...SGD proved itself as an efficient and effective optimization method that was central in many machine learning success stories, such as recent advances in deep learning (Deng et al., 2013; Krizhevsky et al., 2012; Hinton & Salakhutdinov, 2006; Hinton et al., 2012a; Graves et al., 2013)....
[...]
...…the advantages of two recently popular methods: AdaGrad (Duchi et al., 2011), which works well with sparse gradients, and RMSProp (Tieleman & Hinton, 2012), which works well in on-line and non-stationary settings; important connections to these and other stochastic optimization methods are…...
[...]
7,244 citations