scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Deep GoogLeNet Features for Visual Object Tracking

TL;DR: This study demonstrates for the first time, the viability of features extracted from deep layers of GoogLeNet CNN architecture for the purpose of object tracking, and integrated Goog LeNet features in a discriminative correlation filter based tracking framework.
Abstract: Convolutional Neural Network (CNN) has recently become very popular in visual object tracking due to their strong feature representation capabilities. Almost all of the CNN based trackers currently use the features extracted from shallow convolutional layers of VGGNet architecture. This paper presents an investigation of the impact of deep convolutional layer features in an object tracking framework. In this study, we demonstrate for the first time, the viability of features extracted from deep layers of GoogLeNet CNN architecture for the purpose of object tracking. We integrated GoogLeNet features in a discriminative correlation filter based tracking framework. Our experimental results show that the GoogLeNet features provides significant computational advantages over the conventionally used VGGNet features, without much compromise on the tracking performance. It was observed that features obtained from inception modules of GoogLeNet have high depths. Further, Principal Component Analysis (PCA) was employed to reduce the dimensionality of the extracted features. This greatly reduces the computational cost and thus improve the speed of the tracking process. Extensive evaluation have been performed on three benchmark datasets: OTB, ALOV300++ and VOT2016 datasets and its performances are measured in terms of metrics like F-score, One Pass Evaluation, robustness and accuracy.
Citations
More filters
Journal ArticleDOI
01 Apr 2020-Symmetry
TL;DR: The main idea is to collect all the possible images for COVID-19 that exists until the writing of this research and use the GAN network to generate more images to help in the detection of this virus from the available X-rays images with the highest accuracy possible.
Abstract: The coronavirus (COVID-19) pandemic is putting healthcare systems across the world under unprecedented and increasing pressure according to the World Health Organization (WHO). With the advances in computer algorithms and especially Artificial Intelligence, the detection of this type of virus in the early stages will help in fast recovery and help in releasing the pressure off healthcare systems. In this paper, a GAN with deep transfer learning for coronavirus detection in chest X-ray images is presented. The lack of datasets for COVID-19 especially in chest X-rays images is the main motivation of this scientific study. The main idea is to collect all the possible images for COVID-19 that exists until the writing of this research and use the GAN network to generate more images to help in the detection of this virus from the available X-rays images with the highest accuracy possible. The dataset used in this research was collected from different sources and it is available for researchers to download and use it. The number of images in the collected dataset is 307 images for four different types of classes. The classes are the COVID-19, normal, pneumonia bacterial, and pneumonia virus. Three deep transfer models are selected in this research for investigation. The models are the Alexnet, Googlenet, and Restnet18. Those models are selected for investigation through this research as it contains a small number of layers on their architectures, this will result in reducing the complexity, the consumed memory and the execution time for the proposed model. Three case scenarios are tested through the paper, the first scenario includes four classes from the dataset, while the second scenario includes 3 classes and the third scenario includes two classes. All the scenarios include the COVID-19 class as it is the main target of this research to be detected. In the first scenario, the Googlenet is selected to be the main deep transfer model as it achieves 80.6% in testing accuracy. In the second scenario, the Alexnet is selected to be the main deep transfer model as it achieves 85.2% in testing accuracy, while in the third scenario which includes two classes (COVID-19, and normal), Googlenet is selected to be the main deep transfer model as it achieves 100% in testing accuracy and 99.9% in the validation accuracy. All the performance measurement strengthens the obtained results through the research.

391 citations


Cites methods from "Deep GoogLeNet Features for Visual ..."

  • ...The used deep transfer learning CNN models investigated in this research are Alexnet [29], Resnet18 [39], Googlenet [60], The mentioned CNN models had a few numbers of layers if it is compared to large CNN models such as Xception [40], Densenet [42], and Inceptionresnet [61] which consist of 71, 201 and 164 layers accordingly....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a cluster-based one-shot learning is introduced for detecting COVID-19 from chest X-ray images, which has an advantage of learning from a few samples against learning from many samples in case of deep leaning architectures.
Abstract: Coronavirus disease (COVID-19) has infected over more than 28.3 million people around the globe and killed 913K people worldwide as on 11 September 2020. With this pandemic, to combat the spreading of COVID-19, effective testing methodologies and immediate medical treatments are much required. Chest X-rays are the widely available modalities for immediate diagnosis of COVID-19. Hence, automation of detection of COVID-19 from chest X-ray images using machine learning approaches is of greater demand. A model for detecting COVID-19 from chest X-ray images is proposed in this paper. A novel concept of cluster-based one-shot learning is introduced in this work. The introduced concept has an advantage of learning from a few samples against learning from many samples in case of deep leaning architectures. The proposed model is a multi-class classification model as it classifies images of four classes, viz., pneumonia bacterial, pneumonia virus, normal, and COVID-19. The proposed model is based on ensemble of Generalized Regression Neural Network (GRNN) and Probabilistic Neural Network (PNN) classifiers at decision level. The effectiveness of the proposed model has been demonstrated through extensive experimentation on a publicly available dataset consisting of 306 images. The proposed cluster-based one-shot learning has been found to be more effective on GRNN and PNN ensembled model to distinguish COVID-19 images from that of the other three classes. It has also been experimentally observed that the model has a superior performance over contemporary deep learning architectures. The concept of one-shot cluster-based learning is being first of its kind in literature, expected to open up several new dimensions in the field of machine learning which require further researching for various applications.

40 citations

Posted ContentDOI
27 Jul 2020
TL;DR: Experiments conducted with publicly available chest x-ray images demonstrate that the proposed one shot cluster based approach for the accurate detection of COVID-19 accurately with high precision outperformed many of the convolutional neural network based existing methods proposed in the literature.
Abstract: Coronavirus disease (COVID-19) has infected over more than 28.3 million people around the globe and killed 913K people worldwide as on 11 September 2020. With this pandemic, to combat the spreading of COVID-19, effective testing methodologies and immediate medical treatments are much required. Chest X-rays are the widely available modalities for immediate diagnosis of COVID-19. Hence, automation of detection of COVID-19 from chest X-ray images using machine learning approaches is of greater demand. A model for detecting COVID-19 from chest X-ray images is proposed in this paper. A novel concept of cluster-based one-shot learning is introduced in this work. The introduced concept has an advantage of learning from a few samples against learning from many samples in case of deep leaning architectures. The proposed model is a multi-class classification model as it classifies images of four classes, viz., pneumonia bacterial, pneumonia virus, normal, and COVID-19. The proposed model is based on ensemble of Generalized Regression Neural Network (GRNN) and Probabilistic Neural Network (PNN) classifiers at decision level. The effectiveness of the proposed model has been demonstrated through extensive experimentation on a publicly available dataset consisting of 306 images. The proposed cluster-based one-shot learning has been found to be more effective on GRNN and PNN ensembled model to distinguish COVID-19 images from that of the other three classes. It has also been experimentally observed that the model has a superior performance over contemporary deep learning architectures. The concept of one-shot cluster-based learning is being first of its kind in literature, expected to open up several new dimensions in the field of machine learning which require further researching for various applications.

31 citations


Cites result from "Deep GoogLeNet Features for Visual ..."

  • ...From the Table 1 it is evident that the proposed idea with only 2 samples (one from each class) we were able to achieve 100% detection rate and the same has been compared with well-known deep learning models such as AlexNet [24], GoogLeNet [25] and ResNet [26]....

    [...]

Journal ArticleDOI
TL;DR: Yolo, Yolo-conv, GoogleNet and ResNet18 are computationally efficient detectors which took less processing time and are suitable for real-time detection while Resnet50 was computationally expensive.
Abstract: Parking has been a common problem over several years in many cities around the globe. The search for parking space leads to congestion, frustration and increased air pollution. Information of vacant parking space would facilitate to reduce congestion and subsequent air pollution. Therefore, the aim of the study is to acquire vehicle occupancy in an open parking lot using deep learning. Thermal camera was used to collect videos during varying environmental conditions and frames from these videos were extracted to prepare the dataset. The frames in the dataset were manually labelled as there were no pre-labelled thermal images available. Vehicle detection with deep learning algorithms was implemented to perform multi-object detection. Multiple deep learning networks such as Yolo, Yolo-conv, GoogleNet, ReNet18 and ResNet50 with varying layers and architectures were evaluated on vehicle detection. ResNet18 performed better than other detectors which had an average precision of 96.16 and log-average miss rate of 19.40. The detected results were compared with a template of parking spaces to identify vehicle occupancy information. Yolo, Yolo-conv, GoogleNet and ResNet18 are computationally efficient detectors which took less processing time and are suitable for real-time detection while Resnet50 was computationally expensive.

18 citations

Journal ArticleDOI
TL;DR: Experiments based on OSM data from Tianjin city, China, revealed that compared with state‐of‐the‐art methods, the proposed method effectively identified more types of complex junctions and achieved a significantly higher identification accuracy.

11 citations

References
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Journal ArticleDOI
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

30,811 citations

Posted Content
TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at this http URL.

13,081 citations


"Deep GoogLeNet Features for Visual ..." refers background in this paper

  • ...Deep convolutional neural networks have clearly shown excellent performance in object recognition and object detection problems [6], [7], [21], and are therefore of interest for visual object tracking....

    [...]

Journal ArticleDOI
TL;DR: The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends to discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.
Abstract: The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends. Object tracking, in general, is a challenging problem. Difficulties in tracking objects can arise due to abrupt object motion, changing appearance patterns of both the object and the scene, nonrigid object structures, object-to-object and object-to-scene occlusions, and camera motion. Tracking is usually performed in the context of higher-level applications that require the location and/or shape of the object in every frame. Typically, assumptions are made to constrain the tracking problem in the context of a particular application. In this survey, we categorize the tracking methods on the basis of the object and motion representations used, provide detailed descriptions of representative methods in each category, and examine their pros and cons. Moreover, we discuss the important issues related to tracking including the use of appropriate image features, selection of motion models, and detection of objects.

5,318 citations

Journal ArticleDOI
TL;DR: A new kernelized correlation filter is derived, that unlike other kernel algorithms has the exact same complexity as its linear counterpart, which is called dual correlation filter (DCF), which outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite being implemented in a few lines of code.
Abstract: The core component of most modern trackers is a discriminative classifier, tasked with distinguishing between the target and the surrounding environment. To cope with natural image changes, this classifier is typically trained with translated and scaled sample patches. Such sets of samples are riddled with redundancies—any overlapping pixels are constrained to be the same. Based on this simple observation, we propose an analytic model for datasets of thousands of translated patches. By showing that the resulting data matrix is circulant, we can diagonalize it with the discrete Fourier transform, reducing both storage and computation by several orders of magnitude. Interestingly, for linear regression our formulation is equivalent to a correlation filter, used by some of the fastest competitive trackers. For kernel regression, however, we derive a new kernelized correlation filter (KCF), that unlike other kernel algorithms has the exact same complexity as its linear counterpart. Building on it, we also propose a fast multi-channel extension of linear correlation filters, via a linear kernel, which we call dual correlation filter (DCF). Both KCF and DCF outperform top-ranking trackers such as Struck or TLD on a 50 videos benchmark, despite running at hundreds of frames-per-second, and being implemented in a few lines of code (Algorithm 1). To encourage further developments, our tracking framework was made open-source.

4,994 citations


"Deep GoogLeNet Features for Visual ..." refers methods in this paper

  • ...Some of the popularly used features are Histogram of Oriented gradients (HOG) [16], Color names [13] and CNN features [8]–[10]....

    [...]

  • ...Feature representations such as HOG [16], Color Names etc. [13], [18] have been successfully employed in DCF based tracking frameworks....

    [...]

  • ...Till 2015, most of the trackers used the hand-crafted appearance features, such as HOG and color names for modelling the target object....

    [...]

  • ...Feature representations such as HOG [16], Color Names etc....

    [...]

  • ...Some of the popularly used features are Histogram of Oriented gradients (HOG) [16], Color names [13] and CNN features [8]–[10]....

    [...]