Mimetics: Towards Understanding Human Actions Out of Context
Citations
44 citations
24 citations
Cites background from "Mimetics: Towards Understanding Hum..."
...HDM05 [34] 100 1500 Fixed Lab/Prompted Non-contextual Not specified High Fixed 3D-Iconic dataset [39] 20 1739 Fixed Lab/Prompted Non-contextual Not specified High Fixed Florence-3D [40] 9 215 Fixed Lab/Prompted Non-contextual Not specified High Fixed NTU-60 [41] 60 56880 Fixed Lab/Prompted Non-contextual 1-10 seconds High Fixed Large-RGB+D [56] 94 4953 Fixed Lab/Prompted Non-contextual Not specified High Fixed/Moving Kinetics-skeleton [55] 400 300,000 Fixed Wild Contextual 10 seconds High Fixed/Moving NTU-120 [30] 120 114,480 Fixed Lab/Prompted Non-contextual 1-10 seconds High Fixed Mimetics [53] 50 713 Fixed Wild Non-contextual 1-10 seconds Moderate Fixed Skeletics-152 152 125,657 Fixed Wild Contextual 10 seconds High Fixed/Moving Skeleton-Mimetics 23 319 Fixed Wild Non-contextual 1-10 seconds Moderate Fixed Metaphorics N....
[...]
16 citations
10 citations
Cites background or methods or result from "Mimetics: Towards Understanding Hum..."
...Our model also obtains an improvement on Mimetics[55], a dataset with out-of-context actions using only pose heatmaps and without any tracking....
[...]
...But, studies [26, 27, 55] show that RGB and flow-based models capture a lot of dataset biases....
[...]
...But, these approaches have certain limitations like overfitting on small datasets[11], requirement for access to multiple posemodalities [59, 55], pose-tracking [55]....
[...]
...We also evaluate our method on the Mimetics [55] dataset....
[...]
...Mimetics [55] is a test set for fifty actions from the Kinetics dataset....
[...]
9 citations
References
123,388 citations
40,257 citations
23,183 citations
7,183 citations
7,091 citations
"Mimetics: Towards Understanding Hum..." refers methods in this paper
...While improvements of this approach have been proposed [14], most state-ofthe-art methods now use a 3D deep convolutional network [5, 43, 44, 50], optionally in combination with a twostream architecture....
[...]
...Different strategies have been deployed to handle video processing with CNNs such as two-stream architectures [14, 38], Recurrent Neural Networks (RNNs) [9], or spatio-temporal 3D convolutions [5, 13, 43]....
[...]