Snapshot Ensembles: Train 1, Get M for Free
Citations
6,909 citations
Cites methods from "Snapshot Ensembles: Train 1, Get M ..."
...SGDR has been successfully adopted to lead to new state-of-the-art results for popular image classification benchmarks (Huang et al., 2017; Gastaldi, 2017; Zoph et al., 2017), and we therefore tried extending it to Adam....
[...]
3,497 citations
1,656 citations
Cites background or methods or result from "Snapshot Ensembles: Train 1, Get M ..."
...Our results reproduce the finding by Huang et al. (2016a) that intermediate models generated by SGDR can be used to build efficient ensembles at no cost....
[...]
...Alternative network structures should be also considered; e.g., soon after our initial arXiv report (Loshchilov & Hutter, 2016), Zhang et al. (2016); Huang et al. (2016b); Han et al. (2016) reported that WRNs models can be replaced by more memory-efficient models....
[...]
...…state-of-the-art results on CIFAR-10, CIFAR-100, SVHN, ImageNet, PASCAL VOC and MS COCO datasets were obtained by Residual Neural Networks (He et al., 2015; Huang et al., 2016c; He et al., 2016; Zagoruyko & Komodakis, 2016) trained without the use of advanced methods such as AdaDelta and Adam....
[...]
...Three runs (N = 3) of SGDR with M = 3 snapshots per run are sufficient to greatly improve the results to 3.25% on CIFAR-10 and 16.64% on CIFAR-100 outperforming the results of Huang et al. (2016a)....
[...]
...Our initial arXiv report on SGDR (Loshchilov & Hutter, 2016) inspired a follow-up study by Huang et al. (2016a) in which the authors suggest to takeM snapshots of the models obtained by SGDR (in their paper referred to as cyclical learning rate schedule and cosine annealing cycles) right before M…...
[...]
1,257 citations
1,173 citations
Cites background or methods from "Snapshot Ensembles: Train 1, Get M ..."
...Nevertheless, state-of-the-art results for popular image classification datasets, such as CIFAR-10 and CIFAR-100 Krizhevsky (2009), are still obtained by applying SGD with momentum (Huang et al., 2016; 2017; Loshchilov & Hutter, 2016; Gastaldi, 2017)....
[...]
...SGDR has been successfully adopted to lead to new state-of-the-art results for popular image classification benchmarks (Huang et al., 2017; Gastaldi, 2017), and we therefore tried extending it to Adam shortly afterwards....
[...]
References
123,388 citations
111,197 citations
"Snapshot Ensembles: Train 1, Get M ..." refers background in this paper
...Stochastic Gradient Descent (SGD) (Bottou, 2010) and its accelerated variants (Kingma & Ba, 2014; Duchi et al., 2011) have become the de-facto approaches for optimizing deep neural networks....
[...]
73,978 citations
33,597 citations
30,843 citations
"Snapshot Ensembles: Train 1, Get M ..." refers methods in this paper
...Our approach is naturally compatible with other methods to improve the accuracy, such as data augmentation, stochastic depth (Huang et al., 2016b), or batch normalization (Ioffe & Szegedy, 2015)....
[...]