Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning
Citations
1,733 citations
Cites background or methods from "Bootstrap Your Own Latent: A New Ap..."
...1MoCo [17] and BYOL [15] do not directly share the weights between the two branches, though in theory the momentum encoder should converge to the same status as the trainable encoder....
[...]
...clustering, BYOL [15] relies only on positive pairs but it does not collapse in case a momentum encoder is used....
[...]
...We use batch normalization (BN) [22] synchronized across devices, following [8, 15, 7]....
[...]
...A prediction MLP head [15], denoted as h, transforms the output of one view and matches it to the other view....
[...]
...Our method does not require a large-batch optimizer such as LARS [38] (unlike [8, 15, 7])....
[...]
949 citations
Cites background or methods from "Bootstrap Your Own Latent: A New Ap..."
...the big convolutional ResNets in prior art [10, 17]....
[...]
...We adopt a symmetrized loss [17, 7, 12]: ctr(q1, k2)+ctr(q2, k1)....
[...]
...A large batch is also beneficial for accuracy in recent selfsupervised learning methods [9, 17, 7]....
[...]
...prior works [8, 15] that train self-supervised Transformers with masked auto-encoding, we study the frameworks that are based on Siamese networks, including MoCo [19] and others [9, 17, 7]....
[...]
...lr is the hyper-parameter being set [19, 9, 17]....
[...]
754 citations
675 citations
601 citations
References
123,388 citations
55,235 citations
"Bootstrap Your Own Latent: A New Ap..." refers background in this paper
...Learning good image representations is a key challenge in computer vision [1, 2, 3] as it allows for efficient training on downstream tasks [4, 5, 6, 7]....
[...]
49,914 citations
38,211 citations
"Bootstrap Your Own Latent: A New Ap..." refers background in this paper
...This is similar to GANs [66], where there is no loss that is jointly minimized w....
[...]
...This is similar to GANs [66], where there is no loss that is jointly minimized w.r.t. both the discriminator and generator parameters....
[...]
30,843 citations
"Bootstrap Your Own Latent: A New Ap..." refers methods in this paper
...This MLP consists in a linear layer with output size 4096 followed by batch normalization [55], rectified linear units (ReLU) [56], and a final linear layer with output dimension 256....
[...]