Author
Tian Li
Bio: Tian Li is an academic researcher from Zhangzhou Normal University. The author has contributed to research in topics: Video tracking & Tracking (particle physics). The author has an hindex of 2, co-authored 3 publications receiving 6 citations.
Papers
More filters
TL;DR: This paper proposes a tracking method for UAV scenes, which utilizes background cues and aberrances response suppression mechanism to track in 4 degrees-of-freedom, and is superior on UAV small target tracking.
Abstract: Real-time object tracking for unmanned aerial vehicles (UAVs) is an essential and challenging research topic for computer vision. However, the scenarios that UAVs deal with are complicated, and the UAV tracking targets are small. Therefore, general trackers often fail to take full advantage of their performance in UAV scenarios. In this paper, we propose a tracking method for UAV scenes, which utilizes background cues and aberrances response suppression mechanism to track in 4 degrees-of-freedom. Firstly, we consider the tracking task as a similarity measurement problem. In this study, we decompose this problem into two subproblems for optimization. Secondly, to alleviate the problem of small targets in UAV scenes, we utilize background cues fully. Also, to reduce interference by background information, we employ an aberrance response suppression mechanism. Then, to obtain accurate target state information, we introduce a logarithmic polar coordinate system. We perform phase correlation calculations in logarithmic polar coordinates to obtain the rotation and scale changes of the target. Finally, target states are obtained through response fusion, which includes displacement, scale, and rotation angle. Our approach is carried out in a large number of experiments on various UAV datasets, such as UAV123, DBT70, and UAVDT2019. Compared with the current advanced trackers, our method is superior on UAV small target tracking.
13 citations
TL;DR: A parallel dual network is constructed from two networks and an adjustment module to enable judgement of tracking failures, as well as target relocation and tracking, which improves tracking precision while maintaining real-time performance.
Abstract: Visual Object Tracking plays an essential role in solving many basic problems in computer vision. In order to improve the tracking accuracy, the previous methods have prevented tracking failures from occurring by improving the ability to describe the target. However, few of them consider ways to relocate and track the target after a tracking failure. In this paper, we propose the use of a parallel dual network for visual object tracking. This is constructed from two networks and an adjustment module to enable judgement of tracking failures, as well as target relocation and tracking. Firstly, we employ the Siamese matching method and correlation filter method to build tracking network and inspecting network. Both networks track the target simultaneously to obtain two tracking results. Secondly, an adjustment module is constructed, which compares the overlap ratio of the two tracking results with a set threshold, then fuses them or selects the best one. Finally, the fusion or selection result is output and the tracker is updated. We perform comprehensive experiments on five benchmarks: VOT2016, UAV123, Temple Color-128, OTB-100 and OTB-50. The results demonstrate that, compared with other state-of-the-art algorithms, the proposed tracking method improves tracking precision while maintaining real-time performance.
10 citations
TL;DR: This paper exploits multi-cue cascades for building a robust end-to-end visual tracking which cascades each level response via fully exploring the complementary properties of different levels of learning, and takes surrounding background information into account in the high level learning procedure.
Abstract: Generic object tracking is a fundamental vision task. Numerous attempts are made to utilize handcrafted features like Hogs, deep convolutional features pretrained independently from other vision tasks, as well as hierarchical features. These methods achieve good balanced accuracy and speed in visual tracking. However, they explore the complementary characteristics of deep and shallow features imperfectly and ignore surrounding background information. In this paper, we exploit multi-cue cascades for building a robust end-to-end visual tracking, which cascades each level response via fully exploring the complementary properties of different levels of learning. Firstly, we crop out image patches and extract the features to construct corresponding levels of learning. Each levels of learning is utilized to cope with different challenges. Secondly, these multi-level learning procedure is embedded into the dynamic siamese networks for an end-to-end training. Additionaly, we take surrounding background information into account in the high level learning procedure. Finally, the outputs of each level are fused and we gain the accuracy and robustness trade-off. Extensive experiments on OTB-2013, OTB-2015 and VOT 2016 demonstrate that the proposed tracker performs favorably in comparison with the state-of-the-art trackers, while being more robust for background clutters.
Cited by
More filters
Proceedings Article•
01 Jan 201468 citations
TL;DR: A comprehensive Siamese network which consists of a mutual learning sub network (M-net) and a feature fusion subnetwork (F-net), to realize object tracking, which achieves competitive results while maintaining a considerable real-time speed.
Abstract: Recently Siamese-based trackers have shown their outstanding performance in visual object tracking community. But they seldom pay attention to the inter-branch interaction as well as intra-branch feature fusion from different convolution layers. In this paper, we build up a comprehensive Siamese network which consists of a mutual learning subnetwork (M-net) and a feature fusion subnetwork (F-net), to realize object tracking. Each of them is a Siamese network with special functions. M-net is designed to help the two branches mine the dependencies from each other, thus the object template is adaptively updated to a certain extent. F-net fuses different levels of convolutional features for full usage of spatial and semantic information. We also design a global-local channel attention (GLCA) module in F-net to capture the channel dependencies for a proper feature fusion. Our method takes ResNet as feature extractor and is trained offline in an end-to-end style. We evaluate our method in several famous benchmarks such as OTB2013, OTB2015, VOT2015, VOT2016, NFS and TC128. Extensive experimental results demonstrate our method achieves competitive results while maintaining a considerable real-time speed.
27 citations
TL;DR: A dual inspection mechanism, which identifies missed targets in suspicious areas to assist single-stage detection branches, and shares dual decisions to make feature-level multi-instance detection modules produce reliable results is proposed.
Abstract: Unmanned Aerial Vehicles (UAVs) are utilized instead of humans to complete aerial assignments in various fields. With the development of computer vision, object detection has become one of the core technologies in UAV application. However, object detection of small targets often has missed detection, and the detection performance is far less than that of large targets. In this paper, we propose a dual inspection mechanism, which identifies missed targets in suspicious areas to assist single-stage detection branches, and shares dual decisions to make feature-level multi-instance detection modules produce reliable results. Firstly, the detection results contain missed targets is confirmed, which are in the part that does not reach the confidence threshold. For this reason, the feature vector provided by the denoising sparse autoencoder is calculated, and this part of the result is filtered again. Secondly, we empirically reveal that single detection results are not reliable enough, and the multiple attributes of the target need to be considered. Motivated by this, the initial and secondary detection results are combined and rank by importance. Finally, we give the corresponding confidence to the top-ranked instance, making it possible to become the object again. Experimental results reflect that our mechanism improves 2.7% mAP on the VisDrone2020 dataset, 1.0% mAP on the UAVDT dataset and 1.8% mAP on the MS COCO dataset. We propose detection mechanism which achieves state-of-the-art levels on these datasets and it performs better on small object detection.
20 citations
10 Jan 2021
TL;DR: In this article, a deep reinforcement learning (RL) based single object tracker was proposed to track an object of interest in drone images by estimating a series of actions to find the location of the object in the next frame.
Abstract: There is an increasing demand on utilizing camera equipped drones and their applications in many domains varying from agriculture to entertainment and from sports events to surveillance. In such drone applications, an essential and a common task is tracking an object of interest visually. Drone (or UAV) images have different properties when compared to the ground taken (natural) images and those differences introduce additional complexities to the existing object trackers to be directly applied on drone applications. Some important differences among those complexities include (i) smaller object sizes to be tracked and (ii) different orientations and viewing angles yielding different texture and features to be observed. Therefore, new algorithms trained on drone images are needed for the drone-based applications. In this paper, we introduce a deep reinforcement learning (RL) based single object tracker that tracks an object of interest in drone images by estimating a series of actions to find the location of the object in the next frame. This is the first work introducing a single object tracker using a deep RL-based technique for drone images. Our proposed solution introduces a novel reward function that aims to reduce the total number of actions taken to estimate the object's location in the next frame and also introduces a different backbone network to be used on low resolution images. Additionally, we introduce a set of new actions into the action library to better deal with the above-mentioned complexities. We compare our proposed solutions to a state of the art tracking algorithm from the recent literature and demonstrate up to 3.87 % improvement in precision and 3.6% improvement in IoU values on the VisDrone2019 data set. We also provide additional results on OTB-100 data set and show up to 3.15% improvement in precision on the OTB-100 data set when compared to the same previous state of the art algorithm. Lastly, we analyze the ability to handle some of the challenges faced during tracking, including but not limited to occlusion, deformation, and scale variation for our proposed solutions.
15 citations
TL;DR: A guideline to design a slim backbone is proposed: the dimension of output should be smaller than that of the input for every layer of the network and the tracker achieves an AUC of 60.9% on UAV123 data set and reaches 30 frames per second on NVIDIA Jetson TX2, which can be embedded in UAVs.
5 citations