Spatial transformer networks
Citations
14,807 citations
Cites background from "Spatial transformer networks"
...More recent work has sought to better model spatial dependence [1, 31] and incorporate spatial attention [19]....
[...]
...The benefits of such a mechanism have been shown across a range of tasks, from localisation and understanding in images [3, 19] to sequence-based models [2, 28]....
[...]
[...]
14,299 citations
Cites methods from "Spatial transformer networks"
...We use bilinear interpolation [17] to compute the exact values of the input features at four regularly sampled locations in each RoI bin, and aggregate the result (using max or average)....
[...]
...So even though RoIWarp also adopts bilinear resampling motivated by [17], it performs on par with RoIPool as shown by experiments (more details in Table 2c), demonstrating the crucial role of alignment....
[...]
[...]
11,343 citations
Cites methods from "Spatial transformer networks"
...We use bilinear interpolation [31] to compute the exact values of the input features at four regularly sampled locations in...
[...]
...bilinear resampling motivated by [31], it performs on par with RoIPool as shown by experiments (more details in Table 2 c), demonstrating the crucial role of alignment....
[...]
9,457 citations
Cites background or result from "Spatial transformer networks"
...[9] introduces the idea of spatial transformer to align 2D images through sampling and interpolation, achieved by a specifically tailored layer implemented on GPU....
[...]
...Our input form of point clouds allows us to achieve this goal in a much simpler way compared with [9]....
[...]
5,757 citations
Cites background from "Spatial transformer networks"
...The significance of attention has been studied extensively in the previous literature [12,13,14,15,16,17]....
[...]
References
49,914 citations
42,067 citations
40,257 citations
30,843 citations
"Spatial transformer networks" refers methods in this paper
...We consider a strong baseline CNN model – an Inception architecture with batch normalisation [15] pre-trained on ImageNet [22] and fine-tuned on CUB – which by itself achieves state-of-the-art accuracy of 82....
[...]
30,811 citations