scispace - formally typeset
Search or ask a question
Posted Content

A Survey of Appearance Models in Visual Object Tracking

TL;DR: This survey provides a detailed review of the existing 2D appearance models for visual object tracking and takes a module-based architecture that enables readers to easily grasp the key points ofVisual object tracking.
Abstract: Visual object tracking is a significant computer vision task which can be applied to many domains such as visual surveillance, human computer interaction, and video compression. In the literature, researchers have proposed a variety of 2D appearance models. To help readers swiftly learn the recent advances in 2D appearance models for visual object tracking, we contribute this survey, which provides a detailed review of the existing 2D appearance models. In particular, this survey takes a module-based architecture that enables readers to easily grasp the key points of visual object tracking. In this survey, we first decompose the problem of appearance modeling into two different processing stages: visual representation and statistical modeling. Then, different 2D appearance models are categorized and discussed with respect to their composition modules. Finally, we address several issues of interest as well as the remaining challenges for future research on this topic. The contributions of this survey are four-fold. First, we review the literature of visual representations according to their feature-construction mechanisms (i.e., local and global). Second, the existing statistical modeling schemes for tracking-by-detection are reviewed according to their model-construction mechanisms: generative, discriminative, and hybrid generative-discriminative. Third, each type of visual representations or statistical modeling techniques is analyzed and discussed from a theoretical or practical viewpoint. Fourth, the existing benchmark resources (e.g., source code and video datasets) are examined in this survey.
Citations
More filters
Proceedings ArticleDOI
23 Jun 2013
TL;DR: Large scale experiments are carried out with various evaluation criteria to identify effective approaches for robust tracking and provide potential future research directions in this field.
Abstract: Object tracking is one of the most important components in numerous applications of computer vision. While much progress has been made in recent years with efforts on sharing code and datasets, it is of great importance to develop a library and benchmark to gauge the state of the art. After briefly reviewing recent advances of online object tracking, we carry out large scale experiments with various evaluation criteria to understand how these algorithms perform. The test image sequences are annotated with different attributes for performance evaluation and analysis. By analyzing quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

3,828 citations


Cites background from "A Survey of Appearance Models in Vi..."

  • ...Object representation is one of major components in any visual tracker and numerous schemes have been presented [35]....

    [...]

Journal ArticleDOI
TL;DR: An extensive evaluation of the state-of-the-art online object-tracking algorithms with various evaluation criteria is carried out to identify effective approaches for robust tracking and provide potential future research directions in this field.
Abstract: Object tracking has been one of the most important and active research areas in the field of computer vision. A large number of tracking algorithms have been proposed in recent years with demonstrated success. However, the set of sequences used for evaluation is often not sufficient or is sometimes biased for certain types of algorithms. Many datasets do not have common ground-truth object positions or extents, and this makes comparisons among the reported quantitative results difficult. In addition, the initial conditions or parameters of the evaluated tracking algorithms are not the same, and thus, the quantitative results reported in literature are incomparable or sometimes contradictory. To address these issues, we carry out an extensive evaluation of the state-of-the-art online object-tracking algorithms with various evaluation criteria to understand how these methods perform within the same framework. In this work, we first construct a large dataset with ground-truth object positions and extents for tracking and introduce the sequence attributes for the performance analysis. Second, we integrate most of the publicly available trackers into one code library with uniform input and output formats to facilitate large-scale performance evaluation. Third, we extensively evaluate the performance of 31 algorithms on 100 sequences with different initialization settings. By analyzing the quantitative results, we identify effective approaches for robust tracking and provide potential future research directions in this field.

2,974 citations


Cites methods from "A Survey of Appearance Models in Vi..."

  • ...In this section, we review some recent algorithms for object tracking in terms of target representation scheme, search mechanism, and model update....

    [...]

Proceedings ArticleDOI
07 Dec 2015
TL;DR: This paper adaptively learn correlation filters on each convolutional layer to encode the target appearance and hierarchically infer the maximum response of each layer to locate targets.
Abstract: Visual object tracking is challenging as target objects often undergo significant appearance changes caused by deformation, abrupt motion, background clutter and occlusion. In this paper, we exploit features extracted from deep convolutional neural networks trained on object recognition datasets to improve tracking accuracy and robustness. The outputs of the last convolutional layers encode the semantic information of targets and such representations are robust to significant appearance variations. However, their spatial resolution is too coarse to precisely localize targets. In contrast, earlier convolutional layers provide more precise localization but are less invariant to appearance changes. We interpret the hierarchies of convolutional layers as a nonlinear counterpart of an image pyramid representation and exploit these multiple levels of abstraction for visual tracking. Specifically, we adaptively learn correlation filters on each convolutional layer to encode the target appearance. We hierarchically infer the maximum response of each layer to locate targets. Extensive experimental results on a largescale benchmark dataset show that the proposed algorithm performs favorably against state-of-the-art methods.

1,812 citations


Cites methods from "A Survey of Appearance Models in Vi..."

  • ...We refer the readers to a comprehensive review on visual tracking in [33, 22, 26]....

    [...]

Book ChapterDOI
06 Sep 2014
TL;DR: It is shown that the proposed multi-expert restoration scheme significantly improves the robustness of the base tracker, especially in scenarios with frequent occlusions and repetitive appearance variations.
Abstract: We propose a multi-expert restoration scheme to address the model drift problem in online tracking. In the proposed scheme, a tracker and its historical snapshots constitute an expert ensemble, where the best expert is selected to restore the current tracker when needed based on a minimum entropy criterion, so as to correct undesirable model updates. The base tracker in our formulation exploits an online SVM on a budget algorithm and an explicit feature mapping method for efficient model update and inference. In experiments, our tracking method achieves substantially better overall performance than 32 trackers on a benchmark dataset of 50 video sequences under various evaluation settings. In addition, in experiments with a newly collected dataset of challenging sequences, we show that the proposed multi-expert restoration scheme significantly improves the robustness of our base tracker, especially in scenarios with frequent occlusions and repetitive appearance variations.

1,174 citations


Cites background from "A Survey of Appearance Models in Vi..."

  • ...For more comprehensive literature reviews, readers are directed to [18,28]....

    [...]

Book ChapterDOI
Matej Kristan1, Ales Leonardis2, Jiří Matas3, Michael Felsberg4, Roman Pflugfelder5, Luka Cehovin1, Tomas Vojir3, Gustav Häger4, Alan Lukežič1, Gustavo Fernandez5, Abhinav Gupta6, Alfredo Petrosino7, Alireza Memarmoghadam8, Alvaro Garcia-Martin9, Andres Solis Montero10, Andrea Vedaldi11, Andreas Robinson4, Andy J. Ma12, Anton Varfolomieiev13, A. Aydin Alatan14, Aykut Erdem15, Bernard Ghanem16, Bin Liu, Bohyung Han17, Brais Martinez18, Chang-Ming Chang19, Changsheng Xu20, Chong Sun21, Daijin Kim17, Dapeng Chen22, Dawei Du20, Deepak Mishra23, Dit-Yan Yeung24, Erhan Gundogdu25, Erkut Erdem15, Fahad Shahbaz Khan4, Fatih Porikli26, Fatih Porikli27, Fei Zhao20, Filiz Bunyak28, Francesco Battistone7, Gao Zhu26, Giorgio Roffo29, Gorthi R. K. Sai Subrahmanyam23, Guilherme Sousa Bastos30, Guna Seetharaman31, Henry Medeiros32, Hongdong Li26, Honggang Qi20, Horst Bischof33, Horst Possegger33, Huchuan Lu21, Hyemin Lee17, Hyeonseob Nam34, Hyung Jin Chang35, Isabela Drummond30, Jack Valmadre11, Jae-chan Jeong36, Jaeil Cho36, Jae-Yeong Lee36, Jianke Zhu37, Jiayi Feng20, Jin Gao20, Jin-Young Choi, Jingjing Xiao2, Ji-Wan Kim36, Jiyeoup Jeong, João F. Henriques11, Jochen Lang10, Jongwon Choi, José M. Martínez9, Junliang Xing20, Junyu Gao20, Kannappan Palaniappan28, Karel Lebeda38, Ke Gao28, Krystian Mikolajczyk35, Lei Qin20, Lijun Wang21, Longyin Wen19, Luca Bertinetto11, Madan Kumar Rapuru23, Mahdieh Poostchi28, Mario Edoardo Maresca7, Martin Danelljan4, Matthias Mueller16, Mengdan Zhang20, Michael Arens, Michel Valstar18, Ming Tang20, Mooyeol Baek17, Muhammad Haris Khan18, Naiyan Wang24, Nana Fan39, Noor M. Al-Shakarji28, Ondrej Miksik11, Osman Akin15, Payman Moallem8, Pedro Senna30, Philip H. S. Torr11, Pong C. Yuen12, Qingming Huang39, Qingming Huang20, Rafael Martin-Nieto9, Rengarajan Pelapur28, Richard Bowden38, Robert Laganiere10, Rustam Stolkin2, Ryan Walsh32, Sebastian B. Krah, Shengkun Li19, Shengping Zhang39, Shizeng Yao28, Simon Hadfield38, Simone Melzi29, Siwei Lyu19, Siyi Li24, Stefan Becker, Stuart Golodetz11, Sumithra Kakanuru23, Sunglok Choi36, Tao Hu20, Thomas Mauthner33, Tianzhu Zhang20, Tony P. Pridmore18, Vincenzo Santopietro7, Weiming Hu20, Wenbo Li40, Wolfgang Hübner, Xiangyuan Lan12, Xiaomeng Wang18, Xin Li39, Yang Li37, Yiannis Demiris35, Yifan Wang21, Yuankai Qi39, Zejian Yuan22, Zexiong Cai12, Zhan Xu37, Zhenyu He39, Zhizhen Chi21 
08 Oct 2016
TL;DR: The Visual Object Tracking challenge VOT2016 goes beyond its predecessors by introducing a new semi-automatic ground truth bounding box annotation methodology and extending the evaluation system with the no-reset experiment.
Abstract: The Visual Object Tracking challenge VOT2016 aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 70 trackers are presented, with a large number of trackers being published at major computer vision conferences and journals in the recent years. The number of tested state-of-the-art trackers makes the VOT 2016 the largest and most challenging benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the Appendix. The VOT2016 goes beyond its predecessors by (i) introducing a new semi-automatic ground truth bounding box annotation methodology and (ii) extending the evaluation system with the no-reset experiment. The dataset, the evaluation kit as well as the results are publicly available at the challenge website (http://votchallenge.net).

744 citations

References
More filters
Journal ArticleDOI
01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Abstract: Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. The generalization error for forests converges a.s. to a limit as the number of trees in the forest becomes large. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation between them. Using a random selection of features to split each node yields error rates that compare favorably to Adaboost (Y. Freund & R. Schapire, Machine Learning: Proceedings of the Thirteenth International conference, aaa, 148–156), but are more robust with respect to noise. Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the splitting. Internal estimates are also used to measure variable importance. These ideas are also applicable to regression.

79,257 citations

Proceedings ArticleDOI
20 Jun 2005
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.

31,952 citations

Journal ArticleDOI
01 Nov 1973
TL;DR: These results indicate that the easily computable textural features based on gray-tone spatial dependancies probably have a general applicability for a wide variety of image-classification applications.
Abstract: Texture is one of the important characteristics used in identifying objects or regions of interest in an image, whether the image be a photomicrograph, an aerial photograph, or a satellite image. This paper describes some easily computable textural features based on gray-tone spatial dependancies, and illustrates their application in category-identification tasks of three different kinds of image data: photomicrographs of five kinds of sandstones, 1:20 000 panchromatic aerial photographs of eight land-use categories, and Earth Resources Technology Satellite (ERTS) multispecial imagery containing seven land-use categories. We use two kinds of decision rules: one for which the decision regions are convex polyhedra (a piecewise linear decision rule), and one for which the decision regions are rectangular parallelpipeds (a min-max decision rule). In each experiment the data set was divided into two parts, a training set and a test set. Test set identification accuracy is 89 percent for the photomicrographs, 82 percent for the aerial photographic imagery, and 83 percent for the satellite imagery. These results indicate that the easily computable textural features probably have a general applicability for a wide variety of image-classification applications.

20,442 citations

Journal ArticleDOI
TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.
Abstract: Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent “boosting” paradigm is developed for additive expansions based on any fitting criterion.Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such “TreeBoost” models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.

17,764 citations


"A Survey of Appearance Models in Vi..." refers methods in this paper

  • ...In essence, this BDAM is an extension of the GradientBoost algorithm [Friedman 2001] and works similarly to the AnyBoost algorithm [Mason et al. 1999]....

    [...]

Book ChapterDOI
01 Jan 2001
TL;DR: In this paper, the clssical filleting and prediclion problem is re-examined using the Bode-Shannon representation of random processes and the?stat-tran-sition? method of analysis of dynamic systems.
Abstract: The clssical filleting and prediclion problem is re-examined using the Bode-Shannon representation of random processes and the ?stat-tran-sition? method of analysis of dynamic systems. New result are: (1) The formulation and Methods of solution of the problm apply, without modification to stationary and nonstationary stalistics end to growing-memory and infinile -memory filters. (2) A nonlinear difference (or differential) equalion is dericed for the covariance matrix of the optimal estimalion error. From the solution of this equation the coefficients of the difference, (or differential) equation of the optimal linear filter are obtained without further caleulations. (3) Tke fillering problem is shoum to be the dual of the nois-free regulator problem. The new method developed here, is applied to do well-known problems, confirming and extending, earlier results. The discussion is largely, self-contatained, and proceeds from first principles; basic concepts of the theory of random processes are reviewed in the Appendix.

15,391 citations