scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Computationally efficient deep tracker: Guided MDNet

TL;DR: The main objective of the paper is to recommend an essential improvement to the existing Multi-Domain Convolutional Neural Network tracker (MDNet) which is used to track unknown object in a video-stream.
Abstract: The main objective of the paper is to recommend an essential improvement to the existing Multi-Domain Convolutional Neural Network tracker (MDNet) which is used to track unknown object in a video-stream. MDNet is able to handle major basic tracking challenges like fast motion, background clutter, out of view, scale variations etc. through offline training and online tracking. We pre-train the Convolutional Neural Network (CNN) offline using many videos with ground truth to obtain a target representation in the network. In online tracking the MDNet uses large number of random sample of windows around the previous target for estimating the target in the current frame which make its tracking computationally complex while testing or obtaining the track. The major contribution of the paper is to give guided samples to the MDNet rather than random samples so that the computation and time required by the CNN while tracking could be greatly reduced. Evaluation of the proposed algorithm is done using the videos from the ALOV300++ dataset and the VOT dataset and the results are compared with the state of art trackers.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the YOLOv3 pretraining model is used for ship detection, recognition, and counting in the context of intelligent maritime surveillance, timely ocean rescue, and computer-aided decision-making.
Abstract: Automatic ship detection, recognition, and counting are crucial for intelligent maritime surveillance, timely ocean rescue, and computer-aided decision-making. YOLOv3 pretraining model is used for ...

7 citations

Book ChapterDOI
03 Jul 2019
TL;DR: A novel face recognition method for population search and criminal pursuit in smart cities and a cloud server architecture for face recognition in smart city environments are proposed.
Abstract: Face recognition technology can be applied to many aspects in smart city, and the combination of face recognition and deep learning can bring new applications to the public security. The use of deep learning machine vision technology and video-based image retrieval technology can quickly and easily solve the current problem of quickly finding the missing children and arresting criminal suspects. The main purpose of this paper is to propose a novel face recognition method for population search and criminal pursuit in smart cities. In large and medium-sized security, the face pictures of the most similar face images can be accurately searched in tens of millions of photos. The storage requires a powerful information processing center for a variety of information storage and processing. To fundamentally support the safe operation of a large system, cloud-based network architecture is considered and a smart city cloud computing data center is built. In addition, this paper proposed a cloud server architecture for face recognition in smart city environments.

1 citations

01 Jan 2018
TL;DR: Visual tracking is a computer vision problem where the task is to follow a target through a video sequence to solve the problem of tracking blindfolded people in the dark.
Abstract: Visual tracking is a computer vision problem where the task is to follow a targetthrough a video sequence. Tracking has many important real-world applications in several fields such as autonomous v ...

Cites methods from "Computationally efficient deep trac..."

  • ...Approaches such as MDnet [37] and SiamFC [2] train their networks to output the location of the target....

    [...]

References
More filters
Proceedings ArticleDOI
13 Jun 2010
TL;DR: A new type of correlation filter is presented, a Minimum Output Sum of Squared Error (MOSSE) filter, which produces stable correlation filters when initialized using a single frame, which enables the tracker to pause and resume where it left off when the object reappears.
Abstract: Although not commonly used, correlation filters can track complex objects through rotations, occlusions and other distractions at over 20 times the rate of current state-of-the-art techniques. The oldest and simplest correlation filters use simple templates and generally fail when applied to tracking. More modern approaches such as ASEF and UMACE perform better, but their training needs are poorly suited to tracking. Visual tracking requires robust filters to be trained from a single frame and dynamically adapted as the appearance of the target object changes. This paper presents a new type of correlation filter, a Minimum Output Sum of Squared Error (MOSSE) filter, which produces stable correlation filters when initialized using a single frame. A tracker based upon MOSSE filters is robust to variations in lighting, scale, pose, and nonrigid deformations while operating at 669 frames per second. Occlusion is detected based upon the peak-to-sidelobe ratio, which enables the tracker to pause and resume where it left off when the object reappears.

2,948 citations


"Computationally efficient deep trac..." refers background in this paper

  • ...Correlation filter trackers outstands many other trackers in terms of their computational efficiency and competitive performance [22], [8], [19], [20]....

    [...]

Proceedings ArticleDOI
01 Jan 2014
TL;DR: This paper presents a novel approach to robust scale estimation that can handle large scale variations in complex image sequences and shows promising results in terms of accuracy and efficiency.
Abstract: Robust scale estimation is a challenging problem in visual object tracking. Most existing methods fail to handle large scale variations in complex image sequences. This paper presents a novel appro ...

2,038 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: A novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network using a large set of videos with tracking ground-truths to obtain a generic target representation.
Abstract: We propose a novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network (CNN). Our algorithm pretrains a CNN using a large set of videos with tracking groundtruths to obtain a generic target representation. Our network is composed of shared layers and multiple branches of domain-specific layers, where domains correspond to individual training sequences and each branch is responsible for binary classification to identify target in each domain. We train each domain in the network iteratively to obtain generic target representations in the shared layers. When tracking a target in a new sequence, we construct a new network by combining the shared layers in the pretrained CNN with a new binary classification layer, which is updated online. Online tracking is performed by evaluating the candidate windows randomly sampled around the previous target state. The proposed algorithm illustrates outstanding performance in existing tracking benchmarks.

1,960 citations

Posted Content
TL;DR: Zhang et al. as discussed by the authors proposed a novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network (CNN), which pretrain a CNN using a large set of videos with tracking ground-truths to obtain a generic target representation.
Abstract: We propose a novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network (CNN). Our algorithm pretrains a CNN using a large set of videos with tracking ground-truths to obtain a generic target representation. Our network is composed of shared layers and multiple branches of domain-specific layers, where domains correspond to individual training sequences and each branch is responsible for binary classification to identify the target in each domain. We train the network with respect to each domain iteratively to obtain generic target representations in the shared layers. When tracking a target in a new sequence, we construct a new network by combining the shared layers in the pretrained CNN with a new binary classification layer, which is updated online. Online tracking is performed by evaluating the candidate windows randomly sampled around the previous target state. The proposed algorithm illustrates outstanding performance compared with state-of-the-art methods in existing tracking benchmarks.

1,818 citations

Proceedings ArticleDOI
20 Jun 2009
TL;DR: It is shown that using Multiple Instance Learning (MIL) instead of traditional supervised learning avoids these problems, and can therefore lead to a more robust tracker with fewer parameter tweaks.
Abstract: In this paper, we address the problem of learning an adaptive appearance model for object tracking. In particular, a class of tracking techniques called “tracking by detection” have been shown to give promising results at real-time speeds. These methods train a discriminative classifier in an online manner to separate the object from the background. This classifier bootstraps itself by using the current tracker state to extract positive and negative examples from the current frame. Slight inaccuracies in the tracker can therefore lead to incorrectly labeled training examples, which degrades the classifier and can cause further drift. In this paper we show that using Multiple Instance Learning (MIL) instead of traditional supervised learning avoids these problems, and can therefore lead to a more robust tracker with fewer parameter tweaks. We present a novel online MIL algorithm for object tracking that achieves superior results with real-time performance.

1,752 citations