Author
Bao Xin Chen
Bio: Bao Xin Chen is an academic researcher from York University. The author has contributed to research in topics: Bulk synchronous parallel & Computer science. The author has an hindex of 6, co-authored 13 publications receiving 346 citations.
Papers
More filters
••
01 Oct 2019
TL;DR: The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative; results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years.
Abstract: The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative. Results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis as well as the standard VOT methodology for long-term tracking analysis. The VOT2019 challenge was composed of five challenges focusing on different tracking domains: (i) VOTST2019 challenge focused on short-term tracking in RGB, (ii) VOT-RT2019 challenge focused on "real-time" shortterm tracking in RGB, (iii) VOT-LT2019 focused on longterm tracking namely coping with target disappearance and reappearance. Two new challenges have been introduced: (iv) VOT-RGBT2019 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2019 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2019, VOT-RT2019 and VOT-LT2019 datasets were refreshed while new datasets were introduced for VOT-RGBT2019 and VOT-RGBD2019. The VOT toolkit has been updated to support both standard shortterm, long-term tracking and tracking with multi-channel imagery. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website.
393 citations
••
10 Jul 2017TL;DR: A stereo vision based CNN tracker for a person following robot able to track a person in real-time using an online convolutional neural network that enables the robot to follow a target under challenging situations.
Abstract: In this paper, we introduce a stereo vision based CNN tracker for a person following robot. The tracker is able to track a person in real-time using an online convolutional neural network. Our approach enables the robot to follow a target under challenging situations such as occlusions, appearance changes, pose changes, crouching, illumination changes or people wearing the same clothes in different environments. The robot follows the target around corners even when it is momentarily unseen by estimating and replicating the local path of the target. We build an extensive dataset for person following robots under challenging situations. We evaluate the proposed system quantitatively by comparing our tracking approach with existing real-time tracking algorithms.
54 citations
••
16 May 2017TL;DR: A modified Online Ada-Boosting (OAB) tracking algorithm with integrated scene depth information obtained from a stereo camera which runs in real-time on a mobile robot.
Abstract: Person following behavior is an important task for social robots To enable robots to follow a person, we have to track the target in real-time without critical failures There are many situations where the robot will potentially loose tracking in a dynamic environment, eg, occlusion, illumination, pose-changes, etc Often, people use a complex tracking algorithm to improve robustness However, the trade-off is that their approaches may not able to run in real-time on mobile robots In this paper, we present Selected Online Ada-Boosting (SOAB) technique, a modified Online Ada-Boosting (OAB) tracking algorithm with integrated scene depth information obtained from a stereo camera which runs in real-time on a mobile robot We build and share our results on the performance of our technique on a new stereo dataset for the task of person following The dataset covers different challenging situations like squatting, partial and complete occlusion of the target being tracked, people wearing similar clothes, appearance changes, walking facing the front and back side of the person to the robot, and normal walking
42 citations
••
07 Jul 2019TL;DR: In this article, a distributed paradigm on the parameter server framework called Dynamic Stale Synchronous Parallel (DSSP) is presented, which improves the state-of-the-art Stale Parallel (SSP) paradigm by dynamically determining the staleness threshold at the run time.
Abstract: Deep learning is a popular machine learning technique and has been applied to many real-world problems, ranging from computer vision to natural language processing. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to train a large model over large datasets. A popular solution is to distribute and parallelize the training process across multiple machines using the parameter server framework. In this paper, we present a distributed paradigm on the parameter server framework called Dynamic Stale Synchronous Parallel (DSSP) which improves the state-of-the-art Stale Synchronous Parallel (SSP) paradigm by dynamically determining the staleness threshold at the run time. Conventionally to run distributed training in SSP, the user needs to specify a particular stalenes threshold as a hyper-parameter. However, a user does not usually know how to set the threshold and thus often finds a threshold value through trial and error, which is time-consuming. Based on workers' recent processing time, our approach DSSP adaptively adjusts the threshold per iteration at running time to reduce the waiting time of faster workers for synchronization of the globally shared parameters (the weights of the model), and consequently increases the frequency of parameters updates (increases iteration through-put), which speedups the convergence rate. We compare DSSP with other paradigms such as Bulk Synchronous Parallel (BSP), Asynchronous Parallel (ASP), and SSP by running deep neural networks (DNN) models over GPU clusters in both homogeneous and heterogeneous environments. The results show that in a heterogeneous environment where the cluster consists of mixed models of GPUs, DSSP converges to a higher accuracy much earlier than SSP and BSP and performs similarly to ASP. In a homogeneous distributed cluster, DSSP has more stable and slightly better performance than SSP and ASP, and converges much faster than BSP.
33 citations
•
TL;DR: A novel algorithm that uses ellipse fitting to estimate the bounding box rotation angle and size with the segmentation(mask) on the target for online and real-time visual object tracking.
Abstract: In this paper, we demonstrate a novel algorithm that uses ellipse fitting to estimate the bounding box rotation angle and size with the segmentation(mask) on the target for online and real-time visual object tracking. Our method, SiamMask E, improves the bounding box fitting procedure of the state-of-the-art object tracking algorithm SiamMask and still retains a fast-tracking frame rate (80 fps) on a system equipped with GPU (GeForce GTX 1080 Ti or higher). We tested our approach on the visual object tracking datasets (VOT2016, VOT2018, and VOT2019) that were labeled with rotated bounding boxes. By comparing with the original SiamMask, we achieved an improved Accuracy of 0.645 and 0.303 EAO on VOT2019, which is 0.049 and 0.02 higher than the original SiamMask. Our project website is available at this http URL.
32 citations
Cited by
More filters
••
TL;DR: A large tracking database that offers an unprecedentedly wide coverage of common moving objects in the wild, called GOT-10k, and the first video trajectory dataset that uses the semantic hierarchy of WordNet to guide class population, which ensures a comprehensive and relatively unbiased coverage of diverse moving objects.
Abstract: We introduce here a large tracking database that offers an unprecedentedly wide coverage of common moving objects in the wild, called GOT-10k. Specifically, GOT-10k is built upon the backbone of WordNet structure [1] and it populates the majority of over 560 classes of moving objects and 87 motion patterns, magnitudes wider than the most recent similar-scale counterparts [19] , [20] , [23] , [26] . By releasing the large high-diversity database, we aim to provide a unified training and evaluation platform for the development of class-agnostic, generic purposed short-term trackers. The features of GOT-10k and the contributions of this article are summarized in the following. (1) GOT-10k offers over 10,000 video segments with more than 1.5 million manually labeled bounding boxes, enabling unified training and stable evaluation of deep trackers. (2) GOT-10k is by far the first video trajectory dataset that uses the semantic hierarchy of WordNet to guide class population, which ensures a comprehensive and relatively unbiased coverage of diverse moving objects. (3) For the first time, GOT-10k introduces the one-shot protocol for tracker evaluation, where the training and test classes are zero-overlapped . The protocol avoids biased evaluation results towards familiar objects and it promotes generalization in tracker development. (4) GOT-10k offers additional labels such as motion classes and object visible ratios, facilitating the development of motion-aware and occlusion-aware trackers. (5) We conduct extensive tracking experiments with 39 typical tracking algorithms and their variants on GOT-10k and analyze their results in this paper. (6) Finally, we develop a comprehensive platform for the tracking community that offers full-featured evaluation toolkits, an online evaluation server, and a responsive leaderboard. The annotations of GOT-10k’s test data are kept private to avoid tuning parameters on it.
852 citations
••
01 Oct 2019
TL;DR: The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative; results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years.
Abstract: The Visual Object Tracking challenge VOT2019 is the seventh annual tracker benchmarking activity organized by the VOT initiative. Results of 81 trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis as well as the standard VOT methodology for long-term tracking analysis. The VOT2019 challenge was composed of five challenges focusing on different tracking domains: (i) VOTST2019 challenge focused on short-term tracking in RGB, (ii) VOT-RT2019 challenge focused on "real-time" shortterm tracking in RGB, (iii) VOT-LT2019 focused on longterm tracking namely coping with target disappearance and reappearance. Two new challenges have been introduced: (iv) VOT-RGBT2019 challenge focused on short-term tracking in RGB and thermal imagery and (v) VOT-RGBD2019 challenge focused on long-term tracking in RGB and depth imagery. The VOT-ST2019, VOT-RT2019 and VOT-LT2019 datasets were refreshed while new datasets were introduced for VOT-RGBT2019 and VOT-RGBD2019. The VOT toolkit has been updated to support both standard shortterm, long-term tracking and tracking with multi-channel imagery. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website.
393 citations
••
14 Jun 2020
TL;DR: SiamBAN views the visual tracking problem as a parallel classification and regression problem, and thus directly classifies objects and regresses their bounding boxes in a unified FCN, making SiamB Ban more flexible and general.
Abstract: Most of the existing trackers usually rely on either a multi-scale searching scheme or pre-defined anchor boxes to accurately estimate the scale and aspect ratio of a target. Unfortunately, they typically call for tedious and heuristic configurations. To address this issue, we propose a simple yet effective visual tracking framework (named Siamese Box Adaptive Network, SiamBAN) by exploiting the expressive power of the fully convolutional network (FCN). SiamBAN views the visual tracking problem as a parallel classification and regression problem, and thus directly classifies objects and regresses their bounding boxes in a unified FCN. The no-prior box design avoids hyper-parameters associated with the candidate boxes, making SiamBAN more flexible and general. Extensive experiments on visual tracking benchmarks including VOT2018, VOT2019, OTB100, NFS, UAV123, and LaSOT demonstrate that SiamBAN achieves state-of-the-art performance and runs at 40 FPS, confirming its effectiveness and efficiency. The code will be available at https://github.com/hqucv/siamban.
358 citations
•
TL;DR: SiamBAN as discussed by the authors views the visual tracking problem as a parallel classification and regression problem, and thus directly classifies objects and regresses their bounding boxes in a unified FCN.
Abstract: Most of the existing trackers usually rely on either a multi-scale searching scheme or pre-defined anchor boxes to accurately estimate the scale and aspect ratio of a target. Unfortunately, they typically call for tedious and heuristic configurations. To address this issue, we propose a simple yet effective visual tracking framework (named Siamese Box Adaptive Network, SiamBAN) by exploiting the expressive power of the fully convolutional network (FCN). SiamBAN views the visual tracking problem as a parallel classification and regression problem, and thus directly classifies objects and regresses their bounding boxes in a unified FCN. The no-prior box design avoids hyper-parameters associated with the candidate boxes, making SiamBAN more flexible and general. Extensive experiments on visual tracking benchmarks including VOT2018, VOT2019, OTB100, NFS, UAV123, and LaSOT demonstrate that SiamBAN achieves state-of-the-art performance and runs at 40 FPS, confirming its effectiveness and efficiency. The code will be available at this https URL.
275 citations
••
TL;DR: A novel method for infrared and visible image fusion where the nest connection-based network and spatial/channel attention models are developed that describe the importance of each spatial position and of each channel with deep features is proposed.
Abstract: In this article, we propose a novel method for infrared and visible image fusion where we develop nest connection-based network and spatial/channel attention models. The nest connection-based network can preserve significant amounts of information from input data in a multiscale perspective. The approach comprises three key elements: encoder, fusion strategy, and decoder, respectively. In our proposed fusion strategy, spatial attention models and channel attention models are developed that describe the importance of each spatial position and of each channel with deep features. First, the source images are fed into the encoder to extract multiscale deep features. The novel fusion strategy is then developed to fuse these features for each scale. Finally, the fused image is reconstructed by the nest connection-based decoder. Experiments are performed on publicly available data sets. These exhibit that our proposed approach has better fusion performance than other state-of-the-art methods. This claim is justified through both subjective and objective evaluations. The code of our fusion method is available at https://github.com/hli1221/imagefusion-nestfuse .
235 citations