scispace - formally typeset
Search or ask a question
Author

Manan Sharma

Bio: Manan Sharma is an academic researcher from Indian Institutes of Information Technology. The author has contributed to research in topics: Deep learning & Cluster analysis. The author has an hindex of 1, co-authored 3 publications receiving 6 citations.

Papers
More filters
Book ChapterDOI
01 Jan 2020
TL;DR: In order to detect violence through surveillance cameras, this architecture uses a pre-trained ResNet-50 model to extract features from the video frames and then feeds them further into a ConvLSTM block, which allows for more robustness in order to get rid of occlusions and discrepancies.
Abstract: In order to detect violence through surveillance cameras, we provide a neural architecture which can sense violence and can be a measure to prevent any chaos. This architecture uses a pre-trained ResNet-50 model to extract features from the video frames and then feeds them further into a ConvLSTM block. We use a short-term difference of video frames to provide more robustness in order to get rid of occlusions and discrepancies. Convolutional neural networks allow us to get more concentrated spatio-temporal features in the frames, which aids the sequential nature of videos to be fed in LSTMs. The model incorporates a pre-trained convolutional neural network connected to convolutional LSTM layer. The model takes raw videos as an input, converts it into frames, and outputs a binary classification of violence or non-violence label. We have pre-processed the video frames using cropping, dark-edge removal, and other data augmentation techniques to make data get rid of unnecessary details. For evaluation of the performance of our proposed method, three standard public datasets were used, and accuracy as the metric evaluation is used.

17 citations

Proceedings ArticleDOI
04 Jan 2022
TL;DR: This paper proposes S-CPD, a novel transfer learning based CPD algorithm, which uses similarities of the output probability distributions in order to generate a change point index corresponding to each of the sensor readings in the data stream.
Abstract: The task of Activity Recognition (AR) on ubiquitous sensor data is traditionally performed on annotated datasets where manually identified change points denoting the start and the end of the activities are present. However, the majority of the real-world smart home applications generate un-annotated data streams, where such change points are not known in prior. In this paper, we address this problem by proposing a real-time annotation framework for Activity Recognition based on Change Point Detection (CPD). First, we investigate the components of feature extraction, data augmentation, noise handling, and classification to propose an optimal AR framework for the chosen datasets. We then propose S-CPD, a novel transfer learning based CPD algorithm, which uses similarities of the output probability distributions in order to generate a change point index (CPI) corresponding to each of the sensor readings in the data stream. Based on this calculated CPI, we segment the data stream and allows us to perform enhanced annotations. To test the efficiency of this proposed annotation framework, we perform extensive experimentation on 4 real-world smart home datasets. Our proposed solutions outperform the existing state-of-the-art AR and annotation frameworks on these datasets by around 1.6% and 14% respectively, while providing comparable performance with that of the state-of-the-art CPD algorithm. In particular, we achieve an average AR and annotation accuracy of 96.64 % and 94.15% respectively, with an average sensor distance error of 1.1 across the 4 datasets.

2 citations

Proceedings ArticleDOI
01 Jan 2020
TL;DR: This work proposes a novel clustering technique for BA which can find hidden routines in ubiquitous data and also captures the pattern in the routines and efficiently works on high dimensional data for BA without performing any computationally expensive reduction operations.
Abstract: Behavioral analysis (BA) on ubiquitous sensor data is the task of finding the latent distribution of features for modeling user-specific characteristics. These characteristics, in turn, can be used for a number of tasks including resource management, power efficiency, and smart home applications. In recent years, the employment of topic models for BA has been found to successfully extract the dynamics of the sensed data. Topic modeling is popularly performed on text data for mining inherent topics. The task of finding the latent topics in textual data is done in an unsupervised manner. In this work we propose a novel clustering technique for BA which can find hidden routines in ubiquitous data and also captures the pattern in the routines. Our approach efficiently works on high dimensional data for BA without performing any computationally expensive reduction operations. We evaluate three different techniques namely LDA, the Non-negative Matrix Factorization (NMF) and the Probabilistic Latent Semantic Analysis (PLSA) for comparative study. We have analyzed the efficiency of the methods by using performance indices like perplexity and silhouette on three real-world ubiquitous sensor datasets namely, the Intel Lab Data, Kyoto Data, and MERL data. Through rigorous experiments, we achieve silhouette scores of 0.7049 over the Intel Lab dataset, 0.6547 over the Kyoto dataset and 0.8312 over the MERL dataset for clustering.

1 citations

Book ChapterDOI
01 Jan 2021
TL;DR: In this paper, various K-means-based clustering techniques are explored to generate pseudo-labels to facilitate the training of deep networks, and an auto-encoder (AE)-based dimensionality reduction method is employed.
Abstract: Recent development in deep learning (DL) methodologies has shown promising results on various computer vision tasks including the classification of hyperspectral data. However, these methodologies are expected to suffer in the presence of lack of training data, due to complex network architecture and a large number of parameters. In this paper, various K-means-based clustering techniques are explored to generate pseudo-labels to facilitate the training of deep networks. To tackle the curse of dimensionality, an auto-encoder (AE)-based dimensionality reduction method is employed. Finally, the classification is done using convolutional long short-term memory cells (ConvLSTM) which outperforms the rest of the deep neural networks used. In addition, an analysis of the effect of the proposed dimensionality reduction method on classification accuracy is presented. The efficacy of the proposed approach is demonstrated on two real-world hyperspectral image datasets namely the “University of Pavia” (UP) and “Salinas”.

Cited by
More filters
Journal ArticleDOI
TL;DR: A new deep learning architecture is presented, using an adapted version of DenseNet for three dimensions, a multi-head self-attention layer and a bidirectional convolutional long short-term memory (LSTM) module, that allows encoding relevant spatio-temporal features, to determine whether a video is violent or not.
Abstract: Introducing efficient automatic violence detection in video surveillance or audiovisual content monitoring systems would greatly facilitate the work of closed-circuit television (CCTV) operators, rating agencies or those in charge of monitoring social network content. In this paper we present a new deep learning architecture, using an adapted version of DenseNet for three dimensions, a multi-head self-attention layer and a bidirectional convolutional long short-term memory (LSTM) module, that allows encoding relevant spatio-temporal features, to determine whether a video is violent or not. Furthermore, an ablation study of the input frames, comparing dense optical flow and adjacent frames subtraction and the influence of the attention layer is carried out, showing that the combination of optical flow and the attention mechanism improves results up to 4.4%. The conducted experiments using four of the most widely used datasets for this problem, matching or exceeding in some cases the results of the state of the art, reducing the number of network parameters needed (4.5 millions), and increasing its efficiency in test accuracy (from 95.6% on the most complex dataset to 100% on the simplest one) and inference time (less than 0.3 s for the longest clips). Finally, to check if the generated model is able to generalize violence, a cross-dataset analysis is performed, which shows the complexity of this approach: using three datasets to train and testing on the remaining one the accuracy drops in the worst case to 70.08% and in the best case to 81.51%, which points to future work oriented towards anomaly detection in new datasets.

17 citations

Journal ArticleDOI
01 Jan 2022
TL;DR: In this paper , a skeleton-based method was proposed to identify violence and aggressive behavior in video sequences, which consists of two phases: feature extraction from image sequences to assess a human posture, and activity classification applying a neural network to identify whether the frames include aggressive situations and violence.
Abstract: In this paper, we propose a skeleton-based method to identify violence and aggressive behavior. The approach does not necessitate high-processing equipment and it can be quickly implemented. Our approach consists of two phases: feature extraction from image sequences to assess a human posture, followed by activity classification applying a neural network to identify whether the frames include aggressive situations and violence. A video violence dataset of 400 min comprising a single person's activities and 20 h of video data including physical violence and aggressive acts, and 13 classifications for distinguishing aggressor and victim behavior were generated. Finally, the proposed method was trained and tested using the collected dataset. The results indicate the accuracy of 97% was achieved in identifying aggressive conduct in video sequences. Furthermore, the obtained results show that the proposed method can detect aggressive behavior and violence in a short period of time and is accessible for real-world applications.

15 citations

Journal ArticleDOI
01 Mar 2022-Sensors
TL;DR: A novel architecture for violence detection from video surveillance cameras is presented, a spatial feature extracting a U-Net-like network that uses MobileNet V2 as an encoder followed by LSTM for temporal feature extraction and classification.
Abstract: Intelligent video surveillance systems are rapidly being introduced to public places. The adoption of computer vision and machine learning techniques enables various applications for collected video features; one of the major is safety monitoring. The efficacy of violent event detection is measured by the efficiency and accuracy of violent event detection. In this paper, we present a novel architecture for violence detection from video surveillance cameras. Our proposed model is a spatial feature extracting a U-Net-like network that uses MobileNet V2 as an encoder followed by LSTM for temporal feature extraction and classification. The proposed model is computationally light and still achieves good results—experiments showed that an average accuracy is 0.82 ± 2% and average precision is 0.81 ± 3% using a complex real-world security camera footage dataset based on RWF-2000.

11 citations

Proceedings ArticleDOI
28 Apr 2021
TL;DR: In this article, the extracted features from pre-trained models have been pooled and fed into a fully connected network in order to detect whether a violent action has occurred, and the results of both ResNet-50 and VGG16 are explored in the proposed approach.
Abstract: Violence detection aims at recognizing whether a violent action has happened. This field gained widespread popularity since there is a need to find applicable and automatic violence detection methods which explored visual data received from surveillance cameras installed in different areas. In this paper, we employed pre-trained deep neural networks in order to present a low-complexity method for violence detection. The extracted features from pre-trained models have been pooled and fed into a fully connected network in order to detect whether a violent action has occurred. As pre-trained models, the results of both ResNet-50 and VGG16 are explored in the proposed approach. We evaluate the effectiveness of the method on four public datasets. The experimental results depict the efficiency of the low-complexity proposed approach in comparison with other approaches using time-consuming networks like recurrent ones.

10 citations

Journal ArticleDOI
TL;DR: This work survey and analyze the VD literature into a single platform that highlights the working flow of VD in terms of machine learning strategies, neural networks (NNs)-based patterns analysis, limitations in existing VD articles, and their source details.
Abstract: Recent advancements in intelligent surveillance systems for video analysis have been a topic of great interest in the research community due to the vast number of applications to monitor humans’ activities. The growing demand for these systems aims towards automatic violence detection (VD) systems enhancing and comforting human lives through artificial neural networks (ANN) and machine intelligence. Extremely overcrowded regions such as subways, public streets, banks, and the industries need such automatic VD system to ensure safety and security in the smart city. For this purpose, researchers have published extensive VD literature in the form of surveys, proposals, and extensive reviews. Existing VD surveys are limited to a single domain of study, i.e., coverage of VD for non-surveillance or for person-to-person data only. To deeply examine and contribute to the VD arena, we survey and analyze the VD literature into a single platform that highlights the working flow of VD in terms of machine learning strategies, neural networks (NNs)-based patterns analysis, limitations in existing VD articles, and their source details. Further, we investigate VD in terms of surveillance datasets and VD applications and debate on the challenges faced by researchers using these datasets. We comprehensively discuss the evaluation strategies and metrics for VD methods. Finally, we emphasize the recommendations in future research guidelines of VD that aid this arena with respect to trending research endeavors.

8 citations