Action Recognition with Improved Trajectories
Citations
40,257 citations
8,059 citations
7,091 citations
Cites background or methods from "Action Recognition with Improved Tr..."
...We apply the same process with iDT [44] as well as Imagenet features [7] and compare the results in Figure 5....
[...]
...C3D is 91x faster than improved dense trajectories [44] and 274x faster than Brox’s GPU implementation in OpenCV....
[...]
...proposed improved Dense Trajectories (iDT) [44] which is currently the state-of-the-art hand-crafted feature....
[...]
...For iDT, we use the code kindly provided by the authors [44]....
[...]
...Baselines: We compare C3D feature with a few baselines: the current best hand-crafted features, namely improved dense trajectories (iDT) [44] and the popular-used deep image features, namely Imagenet [16], using Caffe’s Imagenet pre-train model....
[...]
6,397 citations
Cites background or methods or result from "Action Recognition with Improved Tr..."
...There still remain some essential ingredients of the state-of-the-art shallow representation [26], which are missed in our current architecture....
[...]
...Recent improvements of trajectory-based hand-crafted representations include compensation of global (camera) motion [10, 16, 26], and the use of the Fisher vector encoding [22] (in [26]) or its deeper variant [23] (in [21])....
[...]
...deep architecture significantly outperforms that of [14] and is competitive with the state of the art shallow representations [20, 21, 26] in spite of being trained on relatively small datasets....
[...]
...The combination of the two nets further improves the results (in line with the single-split experiments above), and is comparable to the very recent state-of-the-art hand-crafted models [20, 21, 26]....
[...]
...The importance of camera motion compensation has been previously highlighted in [10, 26], where a global motion component was estimated and subtracted from the dense flow....
[...]
5,073 citations
References
23,396 citations
Additional excerpts
...In section 4.3, we compare the performance of action recognition with or without human detection....
[...]
13,011 citations
10,501 citations
"Action Recognition with Improved Tr..." refers methods in this paper
...Finally, the datasets and experimental setup are presented....
[...]
8,432 citations
"Action Recognition with Improved Tr..." refers methods in this paper
...It is trained using the PASCAL VOC07 training data for humans as well as near-frontal upper-bodies from [10]....
[...]
...In contrast, camera motion is successfully compensated (the right two columns of Figure 4), when the human bounding boxes are used to remove matches not corresponding to camera motion....
[...]
6,882 citations
"Action Recognition with Improved Tr..." refers methods in this paper
...This work was supported by Quaero (funded by OSEO, French State agency for innovation), the European integrated project AXES, the MSR/INRIA joint project and the ERC advanced grant ALLEGRO....
[...]
Related Papers (5)
Frequently Asked Questions (9)
Q2. How many sports are represented by the UCF50 dataset?
There are 16 sports actions (such as high-jump, pole-vault, basketball lay-up, discus), represented by a total of 783 video sequences.
Q3. How do the authors normalize the histogram-based descriptors?
To normalize the histogram-based descriptors, i.e., HOG, HOF and MBH, the authors apply the recent RootSIFT [2] approach, i.e., square root each dimension after L1 normalization.
Q4. What is the method for estimating dense trajectories?
Jain et al. [14] decompose visual motion into dominant and residual motions both for extracting trajectories and computing descriptors.
Q5. Does the compensating camera motion improve the performance of the HOG?
Since HOG is designed to capture static appearance information, the authors do not expect that compensating camera motion significantly improves its performance.
Q6. How can the authors correct the optical flow?
given the camera motion, the authors can correct the optical flow, so that the motion vectors of human actors are independent of camera motion.
Q7. What is the way to use local space-time features?
A bag-of-features representation of these local features can be directly used for action classification and achieves state-of-the-art performance (see [1] for a recent survey).
Q8. How many Gaussian features are used to estimate the GMM?
The authors set the number of Gaussians to K = 256 and randomly sample a subset of 256,000 features from the training set to estimate the GMM.
Q9. What is the funding source for this work?
This work was supported by Quaero (funded by OSEO, French State agency for innovation), the European integrated project AXES, the MSR/INRIA joint project and the ERC advanced grant ALLEGRO.