scispace - formally typeset
Search or ask a question
Posted Content

Unsupervised Abnormality Detection Using Heterogeneous Autonomous Systems

TL;DR: A heterogeneous system that estimates the degree of an anomaly in unmanned surveillance drone by inspecting IMU (Inertial Measurement Unit) sensor data and real-time image in an unsupervised approach is demonstrated.
Abstract: Anomaly detection (AD) in a surveillance scenario is an emerging and challenging field of research. For autonomous vehicles like drones or cars, it is immensely important to distinguish between normal and abnormal states in real-time. Additionally, we also need to detect any device malfunction. But the nature and degree of abnormality may vary depending upon the actual environment and adversary. As a result, it is impractical to model all cases a-priori and use supervised methods to classify. Also, an autonomous vehicle provides various data types like images and other analog or digital sensor data, all of which can be useful in anomaly detection if leveraged fruitfully. To that effect, in this paper, a heterogeneous system is proposed which estimates the degree of abnormality of an unmanned surveillance drone, analyzing real-time image and IMU (Inertial Measurement Unit) sensor data in an unsupervised manner. Here, we have demonstrated a Convolutional Neural Network (CNN) architecture, named AngleNet to estimate the angle between a normal image and another image under consideration, which provides us with a measure of anomaly of the device. Moreover, the IMU data are used in autoencoder to predict abnormality. Finally, the results from these two algorithms are ensembled to estimate the final degree of abnormality. The proposed method performs satisfactorily on the IEEE SP Cup-2020 dataset with an accuracy of 97.3%. Additionally, we have also tested this approach on an in-house dataset to validate its robustness.
Citations
More filters
Posted Content
TL;DR: In this article, the authors surveyed various anomaly detection methods developed to detect anomalies in intelligent video surveillance systems and discussed the challenges and opportunities involved in anomaly detection at the edge and presented a systematic categorization of anomaly detection methodologies developed for ease of understanding.
Abstract: The current concept of Smart Cities influences urban planners and researchers to provide modern, secured and sustainable infrastructure and give a decent quality of life to its residents. To fulfill this need video surveillance cameras have been deployed to enhance the safety and well-being of the citizens. Despite technical developments in modern science, abnormal event detection in surveillance video systems is challenging and requires exhaustive human efforts. In this paper, we surveyed various methodologies developed to detect anomalies in intelligent video surveillance. Firstly, we revisit the surveys on anomaly detection in the last decade. We then present a systematic categorization of methodologies developed for ease of understanding. Considering the notion of anomaly depends on context, we identify different objects-of-interest and publicly available datasets in anomaly detection. Since anomaly detection is considered a time-critical application of computer vision, our emphasis is on anomaly detection using edge devices and approaches explicitly designed for them. Further, we discuss the challenges and opportunities involved in anomaly detection at the edge.

17 citations

Proceedings ArticleDOI
28 Nov 2020
TL;DR: In this article, the basic Kretschmann configuration and narrow groove grating were optimized to detect two different types of lipids known as phospholipid and eggyolk, which were used as analyte (sensing layer) and two different proteins namely tryptophan and bovine serum albumin (BSA) are used as ligand (binding site).
Abstract: Surface Plasmon Resonance (SPR) is an important bio-sensing technique for real-time label-free detection However, it is pivotal to optimize various parameters of the sensor configuration for efficient and highly sensitive sensing To that effect, we focus on optimizing two different SPR structures - the basic Kretschmann configuration and narrow groove grating Our analysis aims to detect two different types of lipids known as phospholipid and eggyolk, which are used as analyte (sensing layer) and two different types of proteins namely tryptophan and bovine serum albumin (BSA) are used as ligand (binding site) For both the configurations, we investigate all possible lipid-protein combinations to understand the effect of various parameters on sensitivity, minimum reflectivity and full width half maximum (FWHM) Lipids are the structural building block of cell membranes and mutation of these layers by virus and bacteria is one the prime reasons of many diseases in our body Hence, improving the performance of a SPR sensor to detect very small change in lipid holds immense significance We use finite-difference time-domain (FDTD) technique to perform quantitative analysis to get an optimized structure We find that sensitivity increases when lipid concentration is increased and it is the highest (2195°/RIU) for phospholipid and tryptophan combination when metal and lipid layer thicknesses are 45 nm and 30 nm respectively However, metal layer thickness does not cause any significant variation in sensitivity, but as it increases to 50 nm, minimum reflectivity and full width half maximum (FWHM) decreases to the lowest In case of narrow groove grating structure, broad range of wavelengths can generate SPR and the sensitivity is highest (900nm/RIU) for a configuration of 10 nm groove width and 70 nm groove height at a resonance wavelength of 1411 nm

4 citations

Posted Content
TL;DR: This work uses finite-difference time-domain (FDTD) technique to perform quantitative analysis and finds that sensitivity increases when lipid concentration is increased and it is the highest for phospholipid and tryptophan combination when metal and lipid layer thicknesses are 45 nm and 30 nm respectively.
Abstract: Surface Plasmon Resonance (SPR) is an important bio-sensing technique for real-time label-free detection. However, it is pivotal to optimize various parameters of the sensor configuration for efficient and highly sensitive sensing. To that effect, we focus on optimizing two different SPR structures -- the basic Kretschmann configuration and narrow groove grating. Our analysis aims to detect two different types of lipids known as phospholipid and eggyolk, which are used as analyte (sensing layer) and two different types of proteins namely tryptophan and bovine serum albumin (BSA) are used as ligand (binding site). For both the configurations, we investigate all possible lipid-protein combinations to understand the effect of various parameters on sensitivity, minimum reflectivity and full width half maximum (FWHM). Lipids are the structural building block of cell membranes and mutation of these layers by virus and bacteria is one the prime reasons of many diseases in our body. Hence, improving the performance of a SPR sensor to detect very small change in lipid holds immense significance. We use finite-difference time-domain (FDTD) technique to perform quantitative analysis to get an optimized structure. We find that sensitivity increases when lipid concentration is increased and it is the highest (21.95 degree/RIU) for phospholipid and tryptophan combination when metal and lipid layer thickness are 45 nm and 30 nm respectively. However, metal layer thickness does not cause any significant variation in sensitivity, but as it increases to 50 nm, minimum reflectivity and full width half maximum (FWHM) decreases to the lowest. In case of narrow groove grating structure, broad range of wavelengths can generate SPR and the sensitivity is highest (900nm/RIU) for a configuration of 10 nm groove width and 70 nm groove height at a resonance wavelength of 1411 nm.

4 citations

Posted Content
TL;DR: An unsupervised ensemble anomaly detection system to detect device anomaly of an unmanned drone analyzing multimodal data like images and IMU sensor data synergistically and applied adversarial attack to test the robustness of the proposed approach and integrated defense mechanism.
Abstract: Autonomous aerial surveillance using drone feed is an interesting and challenging research domain. To ensure safety from intruders and potential objects posing threats to the zone being protected, it is crucial to be able to distinguish between normal and abnormal states in real-time. Additionally, we also need to consider any device malfunction. However, the inherent uncertainty embedded within the type and level of abnormality makes supervised techniques less suitable since the adversary may present a unique anomaly for intrusion. As a result, an unsupervised method for anomaly detection is preferable taking the unpredictable nature of attacks into account. Again in our case, the autonomous drone provides heterogeneous data streams consisting of images and other analog or digital sensor data, all of which can play a role in anomaly detection if they are ensembled synergistically. To that end, an ensemble detection mechanism is proposed here which estimates the degree of abnormality of analyzing the real-time image and IMU (Inertial Measurement Unit) sensor data in an unsupervised manner. First, we have implemented a Convolutional Neural Network (CNN) regression block, named AngleNet to estimate the angle between a reference image and current test image, which provides us with a measure of the anomaly of the device. Moreover, the IMU data are used in autoencoders to predict abnormality. Finally, the results from these two pipelines are ensembled to estimate the final degree of abnormality. Furthermore, we have applied adversarial attack to test the robustness and security of the proposed approach and integrated defense mechanism. The proposed method performs satisfactorily on the IEEE SP Cup-2020 dataset with an accuracy of 97.8%. Additionally, we have also tested this approach on an in-house dataset to validate its robustness.

3 citations


Cites methods from "Unsupervised Abnormality Detection ..."

  • ...Clustering of IMU with image classification [30] 97....

    [...]

Proceedings ArticleDOI
16 Nov 2020
TL;DR: In this article, a heterogeneous system that estimates the degree of an anomaly in unmanned surveillance drone by inspecting IMU (Inertial Measurement Unit) sensor data and real-time image in an unsupervised approach is presented.
Abstract: Due to the rise of autonomous vehicles like drones and cars anomaly detection for better and robust surveillance becomes prominent for real-time recognition of normal and abnormal states. But the whole system fails if the unmanned device is unable to detect its own device's anomaly in real-time. Considering the scenario, we can make use of various data of autonomous vehicles like images, video streams, and other digital or analog sensor data to detect device anomaly. In this paper, we have demonstrated a heterogeneous system that estimates the degree of an anomaly in unmanned surveillance drone by inspecting IMU (Inertial Measurement Unit) sensor data and real-time image in an unsupervised approach. We've used AngleNet for detecting images taken in an abnormal state. On top of that, an autoencoder fed by the IMU data has been ensembled with AngleNet for evaluating the final degree of the anomaly. This proposed method is based on the result of the IEEE SP Cup 2020 which achieved 97.3 percent accuracy on the provided dataset. Besides, this approach has been evaluated on an in-house setup for substantiating its robustness.
References
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.

28,225 citations

Posted Content
TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at this http URL.

13,081 citations


"Unsupervised Abnormality Detection ..." refers background in this paper

  • ...Recent applications of related CNN architecture including semantic segmentation [42,45,46,47] depth prediction [40], KeyPoint prediction [46], edge detection [44] and determining optical flow in supervised manner [41]....

    [...]

Proceedings ArticleDOI
07 Dec 2015
TL;DR: In this paper, the authors propose and compare two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations, and show that networks trained on this unrealistic data still generalize very well to existing datasets such as Sintel and KITTI.
Abstract: Convolutional neural networks (CNNs) have recently been very successful in a variety of computer vision tasks, especially on those linked to recognition. Optical flow estimation has not been among the tasks CNNs succeeded at. In this paper we construct CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. We propose and compare two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations. Since existing ground truth data sets are not sufficiently large to train a CNN, we generate a large synthetic Flying Chairs dataset. We show that networks trained on this unrealistic data still generalize very well to existing datasets such as Sintel and KITTI, achieving competitive accuracy at frame rates of 5 to 10 fps.

3,833 citations

Proceedings Article
08 Dec 2014
TL;DR: In this article, two deep network stacks are employed to make a coarse global prediction based on the entire image, and another to refine this prediction locally, which achieves state-of-the-art results on both NYU Depth and KITTI.
Abstract: Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring integration of both global and local information from various cues. Moreover, the task is inherently ambiguous, with a large source of uncertainty coming from the overall scale. In this paper, we present a new method that addresses this task by employing two deep network stacks: one that makes a coarse global prediction based on the entire image, and another that refines this prediction locally. We also apply a scale-invariant error to help measure depth relations rather than scale. By leveraging the raw datasets as large sources of training data, our method achieves state-of-the-art results on both NYU Depth and KITTI, and matches detailed depth boundaries without the need for superpixelation.

2,994 citations

Journal ArticleDOI
TL;DR: A method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel, alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information.
Abstract: Scene labeling consists of labeling each pixel in an image with the category of the object it belongs to. We propose a method that uses a multiscale convolutional network trained from raw pixels to extract dense feature vectors that encode regions of multiple sizes centered on each pixel. The method alleviates the need for engineered features, and produces a powerful representation that captures texture, shape, and contextual information. We report results using multiple postprocessing methods to produce the final labeling. Among those, we propose a technique to automatically retrieve, from a pool of segmentation components, an optimal set of components that best explain the scene; these components are arbitrary, for example, they can be taken from a segmentation tree or from any family of oversegmentations. The system yields record accuracies on the SIFT Flow dataset (33 classes) and the Barcelona dataset (170 classes) and near-record accuracy on Stanford background dataset (eight classes), while being an order of magnitude faster than competing approaches, producing a 320×240 image labeling in less than a second, including feature extraction.

2,791 citations


"Unsupervised Abnormality Detection ..." refers methods in this paper

  • ...Euclidean distance. Zbontar and LeCun [48] train a Siamese architecture CNN for predicting similarity in image patches. Recent applications of related CNN architecture including semantic segmentation [42,45,46,47] depth prediction [40], KeyPoint prediction [46], edge detection [44] and determining optical flow in supervised manner [41]. We have used a similar idea to estimate angles between two images. In this...

    [...]