Author
Zhihong Fu
Bio: Zhihong Fu is an academic researcher from Beihang University. The author has contributed to research in topics: Frame (networking) & Eye tracking. The author has an hindex of 1, co-authored 2 publications receiving 6 citations.
Topics: Frame (networking), Eye tracking, BitTorrent tracker
Papers
More filters
01 Jun 2021
TL;DR: Zhang et al. as mentioned in this paper proposed a novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target for better adapting to appearance variations during tracking.
Abstract: Boosting performance of the offline trained siamese trackers is getting harder nowadays since the fixed information of the template cropped from the first frame has been almost thoroughly mined, but they are poorly capable of resisting target appearance changes. Existing trackers with template updating mechanisms rely on time-consuming numerical optimization and complex hand-designed strategies to achieve competitive performance, hindering them from real-time tracking and practical applications. In this paper, we propose a novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target for better adapting to appearance variations during tracking. Specifically, a novel memory mechanism is introduced, which stores the historical information of the target to guide the tracker to focus on the most informative regions in the current frame. Furthermore, the pixel-level similarity computation of the memory network enables our tracker to generate much more accurate bounding boxes of the target. Extensive experiments and comparisons with many competitive trackers on challenging large-scale benchmarks, OTB-2015, TrackingNet, GOT-10k, LaSOT, UAV123, and VOT2018, show that, without bells and whistles, our tracker outperforms all previous state-of-the-art real-time methods while running at 37 FPS. The code is available at https: //github.com/fzh0917/STMTrack.
73 citations
Posted Content•
TL;DR: A novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target for better adapting to appearance variations during tracking is proposed.
Abstract: Boosting performance of the offline trained siamese trackers is getting harder nowadays since the fixed information of the template cropped from the first frame has been almost thoroughly mined, but they are poorly capable of resisting target appearance changes. Existing trackers with template updating mechanisms rely on time-consuming numerical optimization and complex hand-designed strategies to achieve competitive performance, hindering them from real-time tracking and practical applications. In this paper, we propose a novel tracking framework built on top of a space-time memory network that is competent to make full use of historical information related to the target for better adapting to appearance variations during tracking. Specifically, a novel memory mechanism is introduced, which stores the historical information of the target to guide the tracker to focus on the most informative regions in the current frame. Furthermore, the pixel-level similarity computation of the memory network enables our tracker to generate much more accurate bounding boxes of the target. Extensive experiments and comparisons with many competitive trackers on challenging large-scale benchmarks, OTB-2015, TrackingNet, GOT-10k, LaSOT, UAV123, and VOT2018, show that, without bells and whistles, our tracker outperforms all previous state-of-the-art real-time methods while running at 37 FPS. The code is available at this https URL.
67 citations
Cited by
More filters
TL;DR: In this paper , an adaptive multi-model has been realized by combining the color histogram with the Kernel Correlation Filter algorithm, and the sparse representation method has been introduced into the training process to heighten the stability of the proposed object tracking algorithm.
Abstract: Aiming at the existing problems that object tracking algorithm fails to track under the influence of occlusion conditions, the paper has improved the Kernel Correlation Filter algorithm. Firstly, the occlusion condition has been added to the Kernel Correlation Filter algorithm. If there is no occlusion, the Kernel Correlation Filter algorithm has used for object tracking. If there is occlusion, the improved algorithm based on Unscented Rauch--Tung--Striebel Smoother has been used. Secondly, the predicted position of the object has been feedback to the Kernel Correlation Filter algorithm. Finally, the combination of adaptive multi-model has been realized by combining the color histogram with the Kernel Correlation Filter algorithm, and the sparse representation method has been introduced into the training process to heighten the stability of the proposed object tracking algorithm. The experimental results using the proposed method on the OTB-2013 dataset can express that the proposed object tracking algorithm can reduce the occlusion interference in the object tracking process, and ameliorate the accuracy rate and success rate.
73 citations
21 Mar 2022
TL;DR: This paper proposes a compact tracking framework, termed as MixFormer, built upon transformers, to utilize the flexibility of attention operations, and proposes a Mixed Attention Module (MAM) for simultaneous feature extraction and target information integration.
Abstract: Tracking often uses a multistage pipeline of feature extraction, target information integration, and bounding box estimation. To simplify this pipeline and unify the process of feature extraction and target information integration, we present a compact tracking framework, termed as MixFormer, built upon transformers. Our core design is to utilize the flexibility of attention operations, and propose a Mixed Attention Module (MAM) for simultaneous feature extraction and target information integration. This synchronous modeling scheme allows to extract target-specific discriminative features and perform extensive communication between target and search area. Based on MAM, we build our MixFormer tracking framework simply by stacking multiple MAMs with progressive patch embedding and placing a localization head on top. In addition, to handle multiple target templates during online tracking, we devise an asymmetric attention scheme in MAM to reduce computational cost, and propose an effective score prediction module to select high-quality templates. Our MixFormer sets a new state-of-the-art performance on five tracking benchmarks, including LaSOT, TrackingNet, VOT2020, GOT-10k, and UAV123. In particular, our MixFormer-L achieves NP score of 79.9% on LaSOT, 88.9% on TrackingNet and EAO of 0.555 on VOT2020. We also perform in-depth ablation studies to demonstrate the effectiveness of simultaneous feature extraction and information integration. Code and trained models are publicly available at https://github.com/MCG-NJU/MixFormer.
66 citations
21 Mar 2022
TL;DR: This work trains the proposed tracker end-to-end and validate its performance by conducting comprehensive experiments on multiple tracking datasets, achieving an AUC of 68.5% on the challenging LaSOT [14] dataset.
Abstract: Optimization based tracking methods have been widely successful by integrating a target model prediction module, providing effective global reasoning by minimizing an objective function. While this inductive bias integrates valuable domain knowledge, it limits the expressivity of the tracking network. In this work, we therefore propose a tracker architecture employing a Transformer-based model prediction module. Transformers capture global relations with little inductive bias, allowing it to learn the prediction of more powerful target models. We further extend the model predictor to estimate a second set of weights that are applied for accurate bounding box regression. The resulting tracker ToMP relies on training and on test frame information in order to predict all weights transductively. We train the proposed tracker end-to-end and validate its performance by conducting comprehensive experiments on multiple tracking datasets. ToMP sets a new state of the art on three benchmarks, achieving an AUC of 68.5% on the challenging LaSOT [14] dataset. The code and trained models are available at https://github.com/visionml/pytracking
44 citations
22 Mar 2022
TL;DR: A novel one-stream tracking framework that unifies feature learning and relation modeling by bridging the template-search image pairs with bidirectional information flows and achieves state-of-the-art performance on multiple benchmarks.
Abstract: The current popular two-stream, two-stage tracking framework extracts the template and the search region features separately and then performs relation modeling, thus the extracted features lack the awareness of the target and have limited target-background discriminability. To tackle the above issue, we propose a novel one-stream tracking (OSTrack) framework that unifies feature learning and relation modeling by bridging the template-search image pairs with bidirectional information flows. In this way, discriminative target-oriented features can be dynamically extracted by mutual guidance. Since no extra heavy relation modeling module is needed and the implementation is highly parallelized, the proposed tracker runs at a fast speed. To further improve the inference efficiency, an in-network candidate early elimination module is proposed based on the strong similarity prior calculated in the one-stream framework. As a unified framework, OSTrack achieves state-of-the-art performance on multiple benchmarks, in particular, it shows impressive results on the one-shot tracking benchmark GOT-10k, i.e., achieving 73.7% AO, improving the existing best result (SwinTrack) by 4.3\%. Besides, our method maintains a good performance-speed trade-off and shows faster convergence. The code and models are available at https://github.com/botaoye/OSTrack.
37 citations
01 Jun 2022
TL;DR: Wang et al. as discussed by the authors proposed a Mixed Attention Module (MAM) for simultaneous feature extraction and target information integration, which allows to extract target-specific discriminative features and perform extensive communication between target and search area.
Abstract: Tracking often uses a multistage pipeline of feature extraction, target information integration, and bounding box estimation. To simplify this pipeline and unify the process of feature extraction and target information integration, we present a compact tracking framework, termed as MixFormer, built upon transformers. Our core design is to utilize the flexibility of attention operations, and propose a Mixed Attention Module (MAM) for simultaneous feature extraction and target information integration. This synchronous modeling scheme allows to extract target-specific discriminative features and perform extensive communication between target and search area. Based on MAM, we build our MixFormer tracking framework simply by stacking multiple MAMs with progressive patch embedding and placing a localization head on top. In addition, to handle multiple target templates during online tracking, we devise an asymmetric attention scheme in MAM to reduce computational cost, and propose an effective score prediction module to select high-quality templates. Our MixFormer sets a new state-of-the-art performance on five tracking benchmarks, including LaSOT, TrackingNet, VOT2020, GOT-10k, and UAV123. In particular, our MixFormer-L achieves NP score of 79.9% on LaSOT, 88.9% on TrackingNet and EAO of 0.555 on VOT2020. We also perform in-depth ablation studies to demonstrate the effectiveness of simultaneous feature extraction and information integration. Code and trained models are publicly available at https://github.com/MCG-NJU/MixFormer.
31 citations