Computationally efficient deep tracker: Guided MDNet
02 Mar 2017-pp 1-6
TL;DR: The main objective of the paper is to recommend an essential improvement to the existing Multi-Domain Convolutional Neural Network tracker (MDNet) which is used to track unknown object in a video-stream.
Abstract: The main objective of the paper is to recommend an essential improvement to the existing Multi-Domain Convolutional Neural Network tracker (MDNet) which is used to track unknown object in a video-stream. MDNet is able to handle major basic tracking challenges like fast motion, background clutter, out of view, scale variations etc. through offline training and online tracking. We pre-train the Convolutional Neural Network (CNN) offline using many videos with ground truth to obtain a target representation in the network. In online tracking the MDNet uses large number of random sample of windows around the previous target for estimating the target in the current frame which make its tracking computationally complex while testing or obtaining the track. The major contribution of the paper is to give guided samples to the MDNet rather than random samples so that the computation and time required by the CNN while tracking could be greatly reduced. Evaluation of the proposed algorithm is done using the videos from the ALOV300++ dataset and the VOT dataset and the results are compared with the state of art trackers.
Citations
More filters
[...]
TL;DR: A novel face recognition method for population search and criminal pursuit in smart cities and a cloud server architecture for face recognition in smart city environments are proposed.
Abstract: Face recognition technology can be applied to many aspects in smart city, and the combination of face recognition and deep learning can bring new applications to the public security. The use of deep learning machine vision technology and video-based image retrieval technology can quickly and easily solve the current problem of quickly finding the missing children and arresting criminal suspects. The main purpose of this paper is to propose a novel face recognition method for population search and criminal pursuit in smart cities. In large and medium-sized security, the face pictures of the most similar face images can be accurately searched in tens of millions of photos. The storage requires a powerful information processing center for a variety of information storage and processing. To fundamentally support the safe operation of a large system, cloud-based network architecture is considered and a smart city cloud computing data center is built. In addition, this paper proposed a cloud server architecture for face recognition in smart city environments.
1 citations
[...]
TL;DR: In this paper, the YOLOv3 pretraining model is used for ship detection, recognition, and counting in the context of intelligent maritime surveillance, timely ocean rescue, and computer-aided decision-making.
Abstract: Automatic ship detection, recognition, and counting are crucial for intelligent maritime surveillance, timely ocean rescue, and computer-aided decision-making. YOLOv3 pretraining model is used for ...
1 citations
[...]
01 Jan 2018
TL;DR: Visual tracking is a computer vision problem where the task is to follow a target through a video sequence to solve the problem of tracking blindfolded people in the dark.
Abstract: Visual tracking is a computer vision problem where the task is to follow a targetthrough a video sequence. Tracking has many important real-world applications in several fields such as autonomous v ...
Cites methods from "Computationally efficient deep trac..."
[...]
References
More filters
Proceedings Article•
[...]
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
73,871 citations
Proceedings Article•
[...]
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.
49,857 citations
[...]
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.
18,335 citations
"Computationally efficient deep trac..." refers methods in this paper
[...]
[...]
TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also present experiments that provide insight into what the network learns, revealing a rich hierarchy of image features. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.
15,107 citations
Posted Content•
[...]
TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at this http URL.
13,081 citations
"Computationally efficient deep trac..." refers methods in this paper
[...]
Related Papers (5)
[...]
[...]