Author
Yang Li
Other affiliations: East China Normal University
Bio: Yang Li is an academic researcher from Zhejiang University. The author has contributed to research in topics: Video tracking & Deep learning. The author has an hindex of 12, co-authored 25 publications receiving 3164 citations. Previous affiliations of Yang Li include East China Normal University.
Papers
More filters
••
06 Sep 2014TL;DR: This paper presents a very appealing tracker based on the correlation filter framework and suggests an effective scale adaptive scheme to tackle the problem of the fixed template size in kernel correlation filter tracker.
Abstract: Although the correlation filter-based trackers achieve the competitive results both on accuracy and robustness, there is still a need to improve the overall tracking capability. In this paper, we presented a very appealing tracker based on the correlation filter framework. To tackle the problem of the fixed template size in kernel correlation filter tracker, we suggest an effective scale adaptive scheme. Moreover, the powerful features including HoG and color-naming are integrated together to further boost the overall tracking performance. The extensive empirical evaluations on the benchmark videos and VOT 2014 dataset demonstrate that the proposed tracker is very promising for the various challenging scenarios. Our method successfully tracked the targets in about 72% videos and outperformed the state-of-the-art trackers on the benchmark dataset with 51 sequences.
1,298 citations
••
University of Ljubljana1, University of Birmingham2, Czech Technical University in Prague3, Linköping University4, Austrian Institute of Technology5, Carnegie Mellon University6, Parthenope University of Naples7, University of Isfahan8, Autonomous University of Madrid9, University of Ottawa10, University of Oxford11, Hong Kong Baptist University12, Kyiv Polytechnic Institute13, Middle East Technical University14, Hacettepe University15, King Abdullah University of Science and Technology16, Pohang University of Science and Technology17, University of Nottingham18, University at Albany, SUNY19, Chinese Academy of Sciences20, Dalian University of Technology21, Xi'an Jiaotong University22, Indian Institute of Space Science and Technology23, Hong Kong University of Science and Technology24, ASELSAN25, Australian National University26, Commonwealth Scientific and Industrial Research Organisation27, University of Missouri28, University of Verona29, Universidade Federal de Itajubá30, United States Naval Research Laboratory31, Marquette University32, Graz University of Technology33, Naver Corporation34, Imperial College London35, Electronics and Telecommunications Research Institute36, Zhejiang University37, University of Surrey38, Harbin Institute of Technology39, Lehigh University40
TL;DR: The Visual Object Tracking challenge VOT2016 goes beyond its predecessors by introducing a new semi-automatic ground truth bounding box annotation methodology and extending the evaluation system with the no-reset experiment.
Abstract: The Visual Object Tracking challenge VOT2016 aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 70 trackers are presented, with a large number of trackers being published at major computer vision conferences and journals in the recent years. The number of tested state-of-the-art trackers makes the VOT 2016 the largest and most challenging benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the Appendix. The VOT2016 goes beyond its predecessors by (i) introducing a new semi-automatic ground truth bounding box annotation methodology and (ii) extending the evaluation system with the no-reset experiment. The dataset, the evaluation kit as well as the results are publicly available at the challenge website (http://votchallenge.net).
744 citations
••
University of Ljubljana1, University of Birmingham2, Czech Technical University in Prague3, Linköping University4, Austrian Institute of Technology5, Autonomous University of Madrid6, Parthenope University of Naples7, University of Isfahan8, University of Oxford9, Superior National School of Advanced Techniques10, Middle East Technical University11, Dalian University of Technology12, Chinese Academy of Sciences13, ASELSAN14, United States Naval Research Laboratory15, National University of Defense Technology16, University of Science and Technology of China17, Electronics and Telecommunications Research Institute18, Zhejiang University19, Beijing University of Posts and Telecommunications20, Huazhong University of Science and Technology21, University of Missouri22, Carnegie Mellon University23, General Electric24, King Abdullah University of Science and Technology25, University of California, Merced26, University of Surrey27, University at Albany, SUNY28
TL;DR: The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by the VOT initiative; results of 51 trackers are presented; many are state-of-the-art published at major computer vision conferences or journals in recent years.
Abstract: The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by the VOT initiative. Results of 51 trackers are presented; many are state-of-the-art published at major computer vision conferences or journals in recent years. The evaluation included the standard VOT and other popular methodologies and a new "real-time" experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The VOT2017 goes beyond its predecessors by (i) improving the VOT public dataset and introducing a separate VOT2017 sequestered dataset, (ii) introducing a realtime tracking experiment and (iii) releasing a redesigned toolkit that supports complex experiments. The dataset, the evaluation kit and the results are publicly available at the challenge website1.
485 citations
••
University of Ljubljana1, Austrian Institute of Technology2, University of Birmingham3, Czech Technical University in Prague4, Parthenope University of Naples5, Panasonic6, Pohang University of Science and Technology7, Linköping University8, Graz University of Technology9, Zhejiang University10, Shanghai Jiao Tong University11, Seoul National University12, Electronics and Telecommunications Research Institute13, University of Coimbra14, Autonomous University of Barcelona15, University of Surrey16, École Polytechnique Fédérale de Lausanne17, Chinese Academy of Sciences18, University of Oxford19, Harbin Institute of Technology20
TL;DR: The evaluation protocol of the VOT2013 challenge and the results of a comparison of 27 trackers on the benchmark dataset are presented, offering a more systematic comparison of the trackers.
Abstract: The Visual Object Tracking challenge 2014, VOT2014, aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 38 trackers are presented. The number of tested trackers makes VOT 2014 the largest benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the appendix. Features of the VOT2014 challenge that go beyond its VOT2013 predecessor are introduced: (i) a new VOT2014 dataset with full annotation of targets by rotated bounding boxes and per-frame attribute, (ii) extensions of the VOT2013 evaluation methodology, (iii) a new unit for tracking speed assessment less dependent on the hardware and (iv) the VOT2014 evaluation toolkit that significantly speeds up execution of experiments. The dataset, the evaluation kit as well as the results are publicly available at the challenge website (http://votchallenge.net).
391 citations
••
07 Jun 2015TL;DR: A tracking reliability metric is presented to measure how reliably a patch can be tracked, where a probability model is proposed to estimate the distribution of reliable patches under a sequential Monte Carlo framework.
Abstract: Most modern trackers typically employ a bounding box given in the first frame to track visual objects, where their tracking results are often sensitive to the initialization. In this paper, we propose a new tracking method, Reliable Patch Trackers (RPT), which attempts to identify and exploit the reliable patches that can be tracked effectively through the whole tracking process. Specifically, we present a tracking reliability metric to measure how reliably a patch can be tracked, where a probability model is proposed to estimate the distribution of reliable patches under a sequential Monte Carlo framework. As the reliable patches distributed over the image, we exploit the motion trajectories to distinguish them from the background. Therefore, the visual object can be defined as the clustering of homo-trajectory patches, where a Hough voting-like scheme is employed to estimate the target state. Encouraging experimental results on a large set of sequences showed that the proposed approach is very effective and in comparison to the state-of-the-art trackers. The full source code of our implementation will be publicly available.
374 citations
Cited by
More filters
••
TL;DR: An extensive evaluation of the state-of-the-art online object-tracking algorithms with various evaluation criteria is carried out to identify effective approaches for robust tracking and provide potential future research directions in this field.
Abstract: Object tracking has been one of the most important and active research areas in the field of computer vision. A large number of tracking algorithms have been proposed in recent years with demonstrated success. However, the set of sequences used for evaluation is often not sufficient or is sometimes biased for certain types of algorithms. Many datasets do not have common ground-truth object positions or extents, and this makes comparisons among the reported quantitative results difficult. In addition, the initial conditions or parameters of the evaluated tracking algorithms are not the same, and thus, the quantitative results reported in literature are incomparable or sometimes contradictory. To address these issues, we carry out an extensive evaluation of the state-of-the-art online object-tracking algorithms with various evaluation criteria to understand how these methods perform within the same framework. In this work, we first construct a large dataset with ground-truth object positions and extents for tracking and introduce the sequence attributes for the performance analysis. Second, we integrate most of the publicly available trackers into one code library with uniform input and output formats to facilitate large-scale performance evaluation. Third, we extensively evaluate the performance of 31 algorithms on 100 sequences with different initialization settings. By analyzing the quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.
2,974 citations
••
08 Oct 2016TL;DR: A basic tracking algorithm is equipped with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video and achieves state-of-the-art performance in multiple benchmarks.
Abstract: The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object’s appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inherently limits the richness of the model they can learn. Recently, several attempts have been made to exploit the expressive power of deep convolutional networks. However, when the object to track is not known beforehand, it is necessary to perform Stochastic Gradient Descent online to adapt the weights of the network, severely compromising the speed of the system. In this paper we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video. Our tracker operates at frame-rates beyond real-time and, despite its extreme simplicity, achieves state-of-the-art performance in multiple benchmarks.
2,936 citations
••
18 Jun 2018TL;DR: The Siamese region proposal network (Siamese-RPN) is proposed which is end-to-end trained off-line with large-scale image pairs for visual object tracking and consists of SiAMESe subnetwork for feature extraction and region proposal subnetwork including the classification branch and regression branch.
Abstract: Visual object tracking has been a fundamental topic in recent years and many deep learning based trackers have achieved state-of-the-art performance on multiple benchmarks. However, most of these trackers can hardly get top performance with real-time speed. In this paper, we propose the Siamese region proposal network (Siamese-RPN) which is end-to-end trained off-line with large-scale image pairs. Specifically, it consists of Siamese subnetwork for feature extraction and region proposal subnetwork including the classification branch and regression branch. In the inference phase, the proposed framework is formulated as a local one-shot detection task. We can pre-compute the template branch of the Siamese subnetwork and formulate the correlation layers as trivial convolution layers to perform online tracking. Benefit from the proposal refinement, traditional multi-scale test and online fine-tuning can be discarded. The Siamese-RPN runs at 160 FPS while achieving leading performance in VOT2015, VOT2016 and VOT2017 real-time challenges.
2,016 citations
••
21 Jul 2017TL;DR: This work revisit the core DCF formulation and introduces a factorized convolution operator, which drastically reduces the number of parameters in the model, and a compact generative model of the training sample distribution that significantly reduces memory and time complexity, while providing better diversity of samples.
Abstract: In recent years, Discriminative Correlation Filter (DCF) based methods have significantly advanced the state-of-the-art in tracking. However, in the pursuit of ever increasing tracking performance, their characteristic speed and real-time capability have gradually faded. Further, the increasingly complex models, with massive number of trainable parameters, have introduced the risk of severe over-fitting. In this work, we tackle the key causes behind the problems of computational complexity and over-fitting, with the aim of simultaneously improving both speed and performance. We revisit the core DCF formulation and introduce: (i) a factorized convolution operator, which drastically reduces the number of parameters in the model, (ii) a compact generative model of the training sample distribution, that significantly reduces memory and time complexity, while providing better diversity of samples, (iii) a conservative model update strategy with improved robustness and reduced complexity. We perform comprehensive experiments on four benchmarks: VOT2016, UAV123, OTB-2015, and TempleColor. When using expensive deep features, our tracker provides a 20-fold speedup and achieves a 13.0% relative gain in Expected Average Overlap compared to the top ranked method [12] in the VOT2016 challenge. Moreover, our fast variant, using hand-crafted features, operates at 60 Hz on a single CPU, while obtaining 65.0% AUC on OTB-2015.
1,993 citations
••
27 Jun 2016TL;DR: A novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network using a large set of videos with tracking ground-truths to obtain a generic target representation.
Abstract: We propose a novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network (CNN). Our algorithm pretrains a CNN using a large set of videos with tracking groundtruths to obtain a generic target representation. Our network is composed of shared layers and multiple branches of domain-specific layers, where domains correspond to individual training sequences and each branch is responsible for binary classification to identify target in each domain. We train each domain in the network iteratively to obtain generic target representations in the shared layers. When tracking a target in a new sequence, we construct a new network by combining the shared layers in the pretrained CNN with a new binary classification layer, which is updated online. Online tracking is performed by evaluating the candidate windows randomly sampled around the previous target state. The proposed algorithm illustrates outstanding performance in existing tracking benchmarks.
1,960 citations