scispace - formally typeset
Search or ask a question
Author

Rama Krishna Sai Subrahmanyam Gorthi

Bio: Rama Krishna Sai Subrahmanyam Gorthi is an academic researcher from Indian Institutes of Technology. The author has contributed to research in topics: Video tracking & Deep learning. The author has an hindex of 6, co-authored 25 publications receiving 817 citations. Previous affiliations of Rama Krishna Sai Subrahmanyam Gorthi include Indian Institute of Space Science and Technology.

Papers
More filters
Book ChapterDOI
Matej Kristan1, Ales Leonardis2, Jiří Matas3, Michael Felsberg4  +155 moreInstitutions (47)
23 Jan 2019
TL;DR: The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative; results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years.
Abstract: The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative. Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis and a “real-time” experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. A long-term tracking subchallenge has been introduced to the set of standard VOT sub-challenges. The new subchallenge focuses on long-term tracking properties, namely coping with target disappearance and reappearance. A new dataset has been compiled and a performance evaluation methodology that focuses on long-term tracking capabilities has been adopted. The VOT toolkit has been updated to support both standard short-term and the new long-term tracking subchallenges. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net).

639 citations

Proceedings ArticleDOI
Matej Kristan1, Amanda Berg2, Linyu Zheng3, Litu Rout4  +176 moreInstitutions (43)
01 Oct 2019
TL;DR: The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative; results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years.
Abstract: The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative. Results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis as well as the standard VOT methodology for long-term tracking analysis. The VOT2019 challenge was composed of five challenges focusing on different tracking domains: (i) VOTST2019 challenge focused on short-term tracking in RGB, (ii) VOT-RT2019 challenge focused on "real-time" shortterm tracking in RGB, (iii) VOT-LT2019 focused on longterm tracking namely coping with target disappearance and reappearance. Two new challenges have been introduced: (iv) VOT-RGBT2019 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2019 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2019, VOT-RT2019 and VOT-LT2019 datasets were refreshed while new datasets were introduced for VOT-RGBT2019 and VOT-RGBD2019. The VOT toolkit has been updated to support both standard shortterm, long-term tracking and tracking with multi-channel imagery. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website.

393 citations

Book ChapterDOI
Matej Kristan1, Ales Leonardis2, Jiří Matas3, Michael Felsberg4, Roman Pflugfelder5, Roman Pflugfelder6, Joni-Kristian Kamarainen, Martin Danelljan7, Luka Čehovin Zajc1, Alan Lukežič1, Ondrej Drbohlav3, Linbo He4, Yushan Zhang4, Yushan Zhang8, Song Yan, Jinyu Yang2, Gustavo Fernandez6, Alexander G. Hauptmann9, Alireza Memarmoghadam10, Alvaro Garcia-Martin11, Andreas Robinson4, Anton Varfolomieiev12, Awet Haileslassie Gebrehiwot11, Bedirhan Uzun13, Bin Yan14, Bing Li15, Chen Qian, Chi-Yi Tsai16, Christian Micheloni17, Dong Wang14, Fei Wang, Fei Xie18, Felix Järemo Lawin4, Fredrik K. Gustafsson19, Gian Luca Foresti17, Goutam Bhat7, Guangqi Chen, Haibin Ling20, Haitao Zhang, Hakan Cevikalp13, Haojie Zhao14, Haoran Bai21, Hari Chandana Kuchibhotla22, Hasan Saribas, Heng Fan20, Hossein Ghanei-Yakhdan23, Houqiang Li24, Houwen Peng25, Huchuan Lu14, Hui Li26, Javad Khaghani27, Jesús Bescós11, Jianhua Li14, Jianlong Fu25, Jiaqian Yu28, Jingtao Xu28, Josef Kittler29, Jun Yin, Junhyun Lee30, Kaicheng Yu31, Kaiwen Liu15, Kang Yang32, Kenan Dai14, Li Cheng27, Li Zhang33, Lijun Wang14, Linyuan Wang, Luc Van Gool7, Luca Bertinetto, Matteo Dunnhofer17, Miao Cheng, Mohana Murali Dasari22, Ning Wang32, Pengyu Zhang14, Philip H. S. Torr33, Qiang Wang, Radu Timofte7, Rama Krishna Sai Subrahmanyam Gorthi22, Seokeon Choi34, Seyed Mojtaba Marvasti-Zadeh27, Shaochuan Zhao26, Shohreh Kasaei35, Shoumeng Qiu15, Shuhao Chen14, Thomas B. Schön19, Tianyang Xu29, Wei Lu, Weiming Hu15, Wengang Zhou24, Xi Qiu, Xiao Ke36, Xiaojun Wu26, Xiaolin Zhang15, Xiaoyun Yang, Xue-Feng Zhu26, Yingjie Jiang26, Yingming Wang14, Yiwei Chen28, Yu Ye36, Yuezhou Li36, Yuncon Yao18, Yunsung Lee30, Yuzhang Gu15, Zezhou Wang14, Zhangyong Tang26, Zhen-Hua Feng29, Zhijun Mai37, Zhipeng Zhang15, Zhirong Wu25, Ziang Ma 
23 Aug 2020
TL;DR: A significant novelty is introduction of a new VOT short-term tracking evaluation methodology, and introduction of segmentation ground truth in the VOT-ST2020 challenge – bounding boxes will no longer be used in theVDT challenges.
Abstract: The Visual Object Tracking challenge VOT2020 is the eighth annual tracker benchmarking activity organized by the VOT initiative. Results of 58 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The VOT2020 challenge was composed of five sub-challenges focusing on different tracking domains: (i) VOT-ST2020 challenge focused on short-term tracking in RGB, (ii) VOT-RT2020 challenge focused on “real-time” short-term tracking in RGB, (iii) VOT-LT2020 focused on long-term tracking namely coping with target disappearance and reappearance, (iv) VOT-RGBT2020 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2020 challenge focused on long-term tracking in RGB and depth imagery. Only the VOT-ST2020 datasets were refreshed. A significant novelty is introduction of a new VOT short-term tracking evaluation methodology, and introduction of segmentation ground truth in the VOT-ST2020 challenge – bounding boxes will no longer be used in the VOT-ST challenges. A new VOT Python toolkit that implements all these novelites was introduced. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net).

158 citations

Journal ArticleDOI
TL;DR: The results obtained highlight that deep convolutional neural network can indeed be effectively applied for phase unwrapping, and the proposed framework will hopefully pave the way for the development of a new set of deep learning based phase unwrap methods.
Abstract: Phase unwrapping is a crucial signal processing problem in several applications that aims to restore original phase from the wrapped phase. In this letter, we propose a novel framework for unwrapping the phase using deep fully convolutional neural network termed as PhaseNet. We reformulate the problem definition of directly obtaining continuous original phase as obtaining the wrap-count (integer jump of 2 $\pi$ ) at each pixel by semantic segmentation and this is accomplished through a suitable deep learning framework. The proposed architecture consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The relationship between the absolute phase and the wrap-count is leveraged in generating abundant simulated data of several random shapes. This deliberates the network on learning continuity in wrapped phase maps rather than specific patterns in the training data. We compare the proposed framework with the widely adapted quality-guided phase unwrapping algorithm and also with the well-known MATLAB's unwrap function for varying noise levels. The proposed framework is found to be robust to noise and computationally fast. The results obtained highlight that deep convolutional neural network can indeed be effectively applied for phase unwrapping, and the proposed framework will hopefully pave the way for the development of a new set of deep learning based phase unwrapping methods.

155 citations

Journal ArticleDOI
TL;DR: The proposed novel deep learning framework for unwrapping the phase does not require post-processing, is highly robust to noise, accurately unwraps the phase even at the severe noise level of −5 dB, and can unwrap the phase maps even at relatively high dynamic ranges.
Abstract: Phase unwrapping is an ill-posed classical problem in many practical applications of significance such as 3D profiling through fringe projection, synthetic aperture radar and magnetic resonance imaging. Conventional phase unwrapping techniques estimate the phase either by integrating through the confined path (referred to as path-following methods) or by minimizing the energy function between the wrapped phase and the approximated true phase (referred to as minimum-norm approaches). However, these conventional methods have some critical challenges like error accumulation and high computational time and often fail under low SNR conditions. To address these problems, this paper proposes a novel deep learning framework for unwrapping the phase and is referred to as “PhaseNet 2.0”. The phase unwrapping problem is formulated as a dense classification problem and a fully convolutional DenseNet based neural network is trained to predict the wrap-count at each pixel from the wrapped phase maps. To train this network, we simulate arbitrary shapes and propose new loss function that integrates the residues by minimizing the difference of gradients and also uses $L_{1}$ loss to overcome class imbalance problem. The proposed method, unlike our previous approach PhaseNet, does not require post-processing, highly robust to noise, accurately unwraps the phase even at the severe noise level of −5 dB, and can unwrap the phase maps even at relatively high dynamic ranges. Simulation results from the proposed framework are compared with different classes of existing phase unwrapping methods for varying SNR values and discontinuity, and these evaluations demonstrate the advantages of the proposed framework. We also demonstrate the generality of the proposed method on 3D reconstruction of synthetic CAD models that have diverse structures and finer geometric variations. Finally, the proposed method is applied to real-data for 3D profiling of objects using fringe projection technique and digital holographic interferometry. The proposed framework achieves significant improvements over existing methods while being highly efficient with interactive frame-rates on modern GPUs.

85 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

Reference EntryDOI
15 Oct 2004

2,118 citations

Proceedings ArticleDOI
01 Jun 2019
TL;DR: This method improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task, and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 55 frames per second.
Abstract: In this paper we illustrate how to perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach. Our method, dubbed SiamMask, improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task. Once trained, SiamMask solely relies on a single bounding box initialisation and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 55 frames per second. Despite its simplicity, versatility and fast speed, our strategy allows us to establish a new state-of-the-art among real-time trackers on VOT-2018, while at the same time demonstrating competitive performance and the best speed for the semi-supervised video object segmentation task on DAVIS-2016 and DAVIS-2017.

1,162 citations

Journal ArticleDOI
TL;DR: A large tracking database that offers an unprecedentedly wide coverage of common moving objects in the wild, called GOT-10k, and the first video trajectory dataset that uses the semantic hierarchy of WordNet to guide class population, which ensures a comprehensive and relatively unbiased coverage of diverse moving objects.
Abstract: We introduce here a large tracking database that offers an unprecedentedly wide coverage of common moving objects in the wild, called GOT-10k. Specifically, GOT-10k is built upon the backbone of WordNet structure [1] and it populates the majority of over 560 classes of moving objects and 87 motion patterns, magnitudes wider than the most recent similar-scale counterparts [19] , [20] , [23] , [26] . By releasing the large high-diversity database, we aim to provide a unified training and evaluation platform for the development of class-agnostic, generic purposed short-term trackers. The features of GOT-10k and the contributions of this article are summarized in the following. (1) GOT-10k offers over 10,000 video segments with more than 1.5 million manually labeled bounding boxes, enabling unified training and stable evaluation of deep trackers. (2) GOT-10k is by far the first video trajectory dataset that uses the semantic hierarchy of WordNet to guide class population, which ensures a comprehensive and relatively unbiased coverage of diverse moving objects. (3) For the first time, GOT-10k introduces the one-shot protocol for tracker evaluation, where the training and test classes are zero-overlapped . The protocol avoids biased evaluation results towards familiar objects and it promotes generalization in tracker development. (4) GOT-10k offers additional labels such as motion classes and object visible ratios, facilitating the development of motion-aware and occlusion-aware trackers. (5) We conduct extensive tracking experiments with 39 typical tracking algorithms and their variants on GOT-10k and analyze their results in this paper. (6) Finally, we develop a comprehensive platform for the tracking community that offers full-featured evaluation toolkits, an online evaluation server, and a responsive leaderboard. The annotations of GOT-10k’s test data are kept private to avoid tuning parameters on it.

852 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: An end-to-end tracking architecture, capable of fully exploiting both target and background appearance information for target model prediction, derived from a discriminative learning loss by designing a dedicated optimization process that is capable of predicting a powerful model in only a few iterations.
Abstract: The current strive towards end-to-end trainable computer vision systems imposes major challenges for the task of visual tracking. In contrast to most other vision problems, tracking requires the learning of a robust target-specific appearance model online, during the inference stage. To be end-to-end trainable, the online learning of the target model thus needs to be embedded in the tracking architecture itself. Due to the imposed challenges, the popular Siamese paradigm simply predicts a target feature template, while ignoring the background appearance information during inference. Consequently, the predicted model possesses limited target-background discriminability. We develop an end-to-end tracking architecture, capable of fully exploiting both target and background appearance information for target model prediction. Our architecture is derived from a discriminative learning loss by designing a dedicated optimization process that is capable of predicting a powerful model in only a few iterations. Furthermore, our approach is able to learn key aspects of the discriminative loss itself. The proposed tracker sets a new state-of-the-art on 6 tracking benchmarks, achieving an EAO score of 0.440 on VOT2018, while running at over 40 FPS. The code and models are available at https://github.com/visionml/pytracking.

761 citations