scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Adversarial Adaptation From Synthesis to Reality in Fast Detector for Smoke Detection

04 Mar 2019-IEEE Access (IEEE)-Vol. 7, pp 29471-29483
TL;DR: This paper proposes a method based on two state-of-the-art fast detectors, a single-shot multi-box detector, and a multi-scale deep convolutional neural network, for smoke detection using synthetic smoke image samples, and designs an adversarial training strategy to optimize the model of the adapted detectors, to learn a domain-invariant representation for Smoke detection.
Abstract: Video smoke detection is a promising method for early fire prevention. However, it is still a challenging task for application of video smoke detection in real-world detection systems, as the limitations of smoke image samples for training and lack of efficient detection algorithm. This paper proposes a method based on two state-of-the-art fast detectors, a single-shot multi-box detector, and a multi-scale deep convolutional neural network, for smoke detection using synthetic smoke image samples. The virtual data can automatically offer rich samples with ground truth annotations. However, the learning of smoke representation in the detectors will be restricted by the appearance gap between real and synthetic smoke samples, which will cause a significant performance drop. To train a strong detector with synthetic smoke samples, we incorporate the domain adaptation into the fast detectors. A series of branches with the same structure as the detection branches are integrated into the fast detectors for domain adaptation. We design an adversarial training strategy to optimize the model of the adapted detectors, to learn a domain-invariant representation for smoke detection. The domain discrimination, domain confusion, and detection are combined in the iterative training procedure. The performance of the proposed approach surpasses the original baseline in our experiments.

Content maybe subject to copyright    Report

Citations
More filters
Book
26 Jun 2021
TL;DR: The synthetic-to-real domain adaptation problem that inevitably arises in applications of synthetic data is discussed, including synthetic- to-real refinement with GAN-based models and domain adaptation at the feature/model level without explicit data transformations.
Abstract: Synthetic data is an increasingly popular tool for training deep learning models, especially in computer vision but also in other areas. In this work, we attempt to provide a comprehensive survey of the various directions in the development and application of synthetic data. First, we discuss synthetic datasets for basic computer vision problems, both low-level (e.g., optical flow estimation) and high-level (e.g., semantic segmentation), synthetic environments and datasets for outdoor and urban scenes (autonomous driving), indoor scenes (indoor navigation), aerial navigation, simulation environments for robotics, applications of synthetic data outside computer vision (in neural programming, bioinformatics, NLP, and more); we also survey the work on improving synthetic data development and alternative ways to produce it such as GANs. Second, we discuss in detail the synthetic-to-real domain adaptation problem that inevitably arises in applications of synthetic data, including synthetic-to-real refinement with GAN-based models and domain adaptation at the feature/model level without explicit data transformations. Third, we turn to privacy-related applications of synthetic data and review the work on generating synthetic datasets with differential privacy guarantees. We conclude by highlighting the most promising directions for further work in synthetic data studies.

177 citations


Cites background from "Adversarial Adaptation From Synthes..."

  • ...Recent applications of synthetic data for object detection include the detection of objects in vending machines [623], objects in piles for training robotic arms [75], computer game objects [560], smoke detection [663], deformable part models [681], face detection in biomedical literature [140], drone detection [504], and more....

    [...]

  • ...[663] use adversarial domain adaptation to transfer object detection models—single-shot multi-box detector (SSD) [372] and multi-scale deep CNN (MSCNN) [81]—from synthetic samples to real videos in the smoke detection problem....

    [...]

Journal ArticleDOI
TL;DR: This review is focused on video flame and smoke based fire detection algorithms for both indoor and outdoor environments and the latest trend in literature which focuses on the hybrid approach utilizing both handcraft feature, and deep learning approaches is discussed.
Abstract: This review is focused on video flame and smoke based fire detection algorithms for both indoor and outdoor environments. It analyzes and discusses them in a taxonomical manner for the last two decades. These are mainly based on handcraft features with or without classifiers and deep learning approaches. The separate treatment is provided for detecting flames and smoke. Their static and dynamic characteristics are elaborated for the handcraft feature approach. The blending of the obtained features from these characteristics is the focus of most of the research and these concepts are analyzed critically. A fusion of both visible and thermal images leading to multi-fusion and multimodal approaches have conversed. It is a step towards obtaining accurate detection results and how the handcraft feature approach tackles the problems of flame and smoke detection, as well as their weaknesses are discussed which are still not solved. Some of these weaknesses can be tackled by developing a technology based on artificial intelligence named deep-learning. Its taxonomical literature study with a focus on the flame and smoke detection is presented. The strengths and weaknesses of this approach are discussed with possible solutions. The latest trend in literature which focuses on the hybrid approach utilizing both handcraft feature, and deep learning approaches is discussed. This approach aims to minimize the weaknesses still present in the current systems.

100 citations

Journal ArticleDOI
TL;DR: An energy-friendly edge intelligence-assisted smoke detection method is proposed using deep convolutional neural networks for foggy surveillance environments, considering all necessary requirements regarding accuracy, running time, and deployment feasibility for smoke detection in an industrial setting.
Abstract: Smoke detection in foggy surveillance environments is a challenging task and plays a key role in disaster management for industrial systems. The current smoke detection methods are applicable to only normal surveillance videos, providing unsatisfactory results for video streams captured from foggy environments, due to challenges related to clutter and unclear contents. In this paper, an energy-friendly edge intelligence-assisted smoke detection method is proposed using deep convolutional neural networks for foggy surveillance environments. Our method uses a light-weight architecture, considering all necessary requirements regarding accuracy, running time, and deployment feasibility for smoke detection in an industrial setting, compared to other complex and computationally expensive architectures including AlexNet, GoogleNet, and visual geometry group (VGG). Experiments are conducted on available benchmark smoke detection datasets, and the obtained results show better performance of the proposed method over state-of-the-art for early smoke detection in foggy surveillance.

88 citations


Cites methods from "Adversarial Adaptation From Synthes..."

  • ...The most recent methods are presented in [26] and [27] based on visual geometry group...

    [...]

Journal ArticleDOI
Rui Ba, Chen Chen, Jing Yuan, Weiguo Song, Siuming Lo 
TL;DR: This paper presents a new large-scale satellite imagery smoke detection benchmark based on Moderate Resolution Imaging Spectroradiometer (MODIS) data, namely USTC_SmokeRS, and proposes a new convolution neural network (CNN) model, SmokeNet, which incorporates spatial and channel-wise attention in CNN to enhance feature representation for scene classification.
Abstract: A variety of environmental analysis applications have been advanced by the use of satellite remote sensing. Smoke detection based on satellite imagery is imperative for wildfire detection and monitoring. However, the commonly used smoke detection methods mainly focus on smoke discrimination from a few specific classes, which reduces their applicability in different regions of various classes. To this end, in this paper, we present a new large-scale satellite imagery smoke detection benchmark based on Moderate Resolution Imaging Spectroradiometer (MODIS) data, namely USTC_SmokeRS, consisting of 6225 satellite images from six classes (i.e., cloud, dust, haze, land, seaside, and smoke) and covering various areas/regions over the world. To build a baseline for smoke detection in satellite imagery, we evaluate several state-of-the-art deep learning-based image classification models. Moreover, we propose a new convolution neural network (CNN) model, SmokeNet, which incorporates spatial and channel-wise attention in CNN to enhance feature representation for scene classification. The experimental results of our method using different proportions (16%, 32%, 48%, and 64%) of training images reveal that our model outperforms other approaches with higher accuracy and Kappa coefficient. Specifically, the proposed SmokeNet model trained with 64% training images achieves the best accuracy of 92.75% and Kappa coefficient of 0.9130. The model trained with 16% training images can also improve the classification accuracy and Kappa coefficient by at least 4.99% and 0.06, respectively, over the state-of-the-art models.

78 citations

Journal ArticleDOI
TL;DR: A novel Attention Enhanced Bidirectional Long Short-Term Memory Network (ABi-LSTM) for video based forest fire smoke recognition that can not only capture discriminative spatiotemporal features from image patch sequences but also pay different levels of attention to different patches.
Abstract: Detecting forest fire smoke during the initial stages is vital for preventing forest fire events. Recent studies have shown that exploring spatial and temporal features of the image sequence is important for this task. Nevertheless, since the long distance wildfire smoke usually move slowly and lacks salient features, accurate smoke detection is still a challenging task. In this paper, we propose a novel Attention Enhanced Bidirectional Long Short-Term Memory Network (ABi-LSTM) for video based forest fire smoke recognition. The proposed ABi-LSTM consists of the spatial features extraction network, the Bidirectional Long Short-Term Memory Network(LSTM), and the temporal attention subnetwork, which can not only capture discriminative spatiotemporal features from image patch sequences but also pay different levels of attention to different patches. Experiments show that out ABi-LSTM is capable of achieving best accuracy and less false alarms on different types of scenarios. The ABi-LSTM model achieve a highly accuracy of 97.8%, and there is 4.4% improvement over the image-based deep learning model.

64 citations

References
More filters
Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations


"Adversarial Adaptation From Synthes..." refers background in this paper

  • ...Different from the other branches, the branch located in lower layers Conv2 and Conv4_3 use the L2 normalization technique to scale the feature norm in SSD_ZF and SSD_VGG16 respectively, and the branch located in Conv4_3 use a buffer convolutional layer in MSCNN....

    [...]

  • ...The mean location accuracy of the adapted SSD_VGG16 with discarding negatives is close to that with remaining negatives....

    [...]

  • ...Meanwhile, the missing detection error of the adapted SSD_ZF model is much higher than that of basic SSD_ZF model, and the 29478 VOLUME 7, 2019 missing detection error of the adapted MSCNN model is slightly higher than that of basic MSCNN model, while the missing detection error of the adapted SSD_VGG16 model and basic SSD_VGG16 model are very closer....

    [...]

  • ...To further examine the performance differences between the models of basic detectors and adapted detectors, we look at the confusing and missing detection error of the model of basic SSD_ZF, SSD_VGG16, MSCNN and adapted SSD_ZF, adapted SSD_VGG16, adapted MSCNN as shown in Figure 6....

    [...]

  • ...Obviously, the adapted SSD_ZF model and adapted MSCNN model cause fewer confusing detection error than the basic SSD_ZFmodel and basicMSCNNmodel respectively, while the SSD_VGG16model causes more confusing detection error than the basic SSD_VGG16, although the confusing detection error of basic SSD_VGG16 model and adapted SSD_VGG16 model are closer....

    [...]

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Journal ArticleDOI
08 Dec 2014
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

38,211 citations


"Adversarial Adaptation From Synthes..." refers background in this paper

  • ...Like training GANs, it is typical to train the generator with the standard loss function with inverted labels [42], namely encourage a common feature space Mbackbone(Xd ) through an adversarial objective with respect to the domain discriminator....

    [...]

Proceedings ArticleDOI
27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

27,256 citations


Additional excerpts

  • ...One-stage detectors are applied over a regular, dense sampling of object locations, scales, and aspect ratios, based on deep networks, such as YOLO [23], SSD [24]....

    [...]

Posted Content
TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

23,183 citations

Trending Questions (1)
How do I know if my smoke detectors are ionized?

However, the learning of smoke representation in the detectors will be restricted by the appearance gap between real and synthetic smoke samples, which will cause a significant performance drop.