Author
Sumeet Saurav
Other affiliations: Academy of Scientific and Innovative Research
Bio: Sumeet Saurav is an academic researcher from Central Electronics Engineering Research Institute. The author has contributed to research in topics: Computer science & Deep learning. The author has an hindex of 5, co-authored 32 publications receiving 98 citations. Previous affiliations of Sumeet Saurav include Academy of Scientific and Innovative Research.
Papers
More filters
TL;DR: This paper presents an efficient dual integrated convolution neural network (DICNN) model for the recognition of facial expressions in the wild in real-time, running on an embedded platform and optimized the designed DICNN model using TensorRT SDK and deployed it on an Nvidia Xavier embedded platform.
Abstract: Automatic recognition of facial expressions in the wild is a challenging problem and has drawn a lot of attention from the computer vision and pattern recognition community. Since their emergence, the deep learning techniques have proved their efficacy in facial expression recognition (FER) tasks. However, these techniques are parameter intensive, and thus, could not be deployed on resource-constrained embedded platforms for real-world applications. To mitigate these limitations of the deep learning inspired FER systems, in this paper, we present an efficient dual integrated convolution neural network (DICNN) model for the recognition of facial expressions in the wild in real-time, running on an embedded platform. The designed DICNN model with just 1.08M parameters and 5.40 MB memory storage size achieves optimal performance by maintaining a proper balance between recognition accuracy and computational efficiency. We evaluated the DICNN model on four FER benchmark datasets (FER2013, FERPlus, RAF-DB, and CKPlus) using different performance evaluation metrics, namely the recognition accuracy, precision, recall, and F1-score. Finally, to provide a portable solution with high throughput inference, we optimized the designed DICNN model using TensorRT SDK and deployed it on an Nvidia Xavier embedded platform. Comparative analysis results with the other state-of-the-art methods revealed the effectiveness of the designed FER system, which achieved competitive accuracy with multi-fold improvement in the execution speed.
31 citations
TL;DR: This paper presents an alternative computationally efficient approach for Yoga pose recognition in complex real-world environments using deep learning, and is among the first studies, which utilized the inherent spatial–temporal relationship among Yoga poses for their recognition.
Abstract: Existing techniques for Yoga pose recognition build classifiers based on sophisticated handcrafted features computed from the raw inputs captured in a controlled environment. These techniques often fail in complex real-world situations and thus, pose limitations on the practical applicability of existing Yoga pose recognition systems. This paper presents an alternative computationally efficient approach for Yoga pose recognition in complex real-world environments using deep learning. To this end, a Yoga pose dataset was created with the participation of 27 individual (8 males and 19 females), which consists of ten Yoga poses, namely Malasana, Ananda Balasana, Janu Sirsasana, Anjaneyasana, Tadasana, Kumbhakasana, Hasta Uttanasana, Paschimottanasana, Uttanasana, and Dandasana. To capture the videos, we used smartphone cameras having 4 K resolution and 30 fps frame rate. For the recognition of Yoga poses in real time, a three-dimensional convolutional neural network (3D CNN) architecture is designed and implemented. The designed architecture is a modified version of the C3D architecture initially introduced for the recognition of human actions. In the proposed modified C3D architecture, the computationally intensive fully connected layers are pruned, and supplementary layers such as the batch normalization and average pooling were introduced for computational efficiency. To the best of our knowledge, this is among the first studies, which utilized the inherent spatial–temporal relationship among Yoga poses for their recognition. The designed 3D CNN architecture achieved test recognition accuracy of 91.15% on the in-house prepared Yoga pose dataset consisting of ten Yoga poses. Furthermore, on the publicly available dataset, the designed architecture achieved competitive test recognition accuracy of 99.39%, along with multifold improvement in the execution speed compared to the existing state-of-the-art technique. To promote further study, we will make the in-house created Yoga pose dataset publicly available to the research community.
27 citations
TL;DR: In this article , a three-dimensional DenseNet self-attention neural network (DenseAttNet) was used to identify and evaluate student participation in modern and traditional educational programs.
Abstract: Today, due to the widespread outbreak of the deadly coronavirus, popularly known as COVID-19, the traditional classroom education has been shifted to computer-based learning. Students of various cognitive and psychological abilities participate in the learning process. However, most students are hesitant to provide regular and honest feedback on the comprehensiveness of the course, making it difficult for the instructor to ensure that all students are grasping the information at the same rate. The students' understanding of the course and their emotional engagement, as indicated via facial expressions, are intertwined. This paper attempts to present a three-dimensional DenseNet self-attention neural network (DenseAttNet) used to identify and evaluate student participation in modern and traditional educational programs. With the Dataset for Affective States in E-Environments (DAiSEE), the proposed DenseAttNet model outperformed all other existing methods, achieving baseline accuracy of 63.59% for engagement classification and 54.27% for boredom classification, respectively. Besides, DenseAttNet trained on all four multi-labels, namely boredom, engagement, confusion, and frustration has registered an accuracy of 81.17%, 94.85%, 90.96%, and 95.85%, respectively. In addition, we performed a regression experiment on DAiSEE and obtained the lowest Mean Square Error (MSE) value of 0.0347. Finally, the proposed approach achieves a competitive MSE of 0.0877 when validated on the Emotion Recognition in the Wild Engagement Prediction (EmotiW-EP) dataset.
21 citations
TL;DR: This paper presents a novel deep integrated CNN model, named EmNet (Emotion Network), which consists of two structurally similar DCNN models and their integrated variant, jointly-optimized using a joint-optimization technique.
Abstract: In the past decade, facial emotion recognition (FER) research saw tremendous progress, which led to the development of novel convolutional neural network (CNN) architectures for automatic recognition of facial emotions in static images. These networks, though, have achieved good recognition accuracy, they incur high computational costs and memory utilization. These issues restrict their deployment in real-world applications, which demands the FER systems to run on resource-constrained embedded devices in real-time. Thus, to alleviate these issues and to develop a robust and efficient method for automatic recognition of facial emotions in the wild with real-time performance, this paper presents a novel deep integrated CNN model, named EmNet (Emotion Network). The EmNet model consists of two structurally similar DCNN models and their integrated variant, jointly-optimized using a joint-optimization technique. For a given facial image, the EmNet gives three predictions, which are fused using two fusion schemes, namely average fusion and weighted maximum fusion, to obtain the final decision. To test the efficiency of the proposed FER pipeline on a resource-constrained embedded platform, we optimized the EmNet model and the face detector using TensorRT SDK and deploy the complete FER pipeline on the Nvidia Xavier device. Our proposed EmNet model with 4.80M parameters and 19.3MB model size attains notable improvement over the current state-of-the-art in terms of accuracy with multi-fold improvement in computational efficiency.
18 citations
28 Sep 2015
TL;DR: This paper has analyzed the effect of different image scaling algorithms existing in literature on the performance of the Viola and Jones face detection framework and has tried to find out the optimal algorithm significant in performance.
Abstract: In today's world of automation, real time face detection with high performance is becoming necessary for a wide number of computer vision and image processing applications. Existing software based system for face detection uses the state of the art Viola and Jones face detection framework. This detector makes use of image scaling approach to detect faces of different dimensions and thus, performance of image scalar plays an important role in enhancing the accuracy of this detector. A low quality image scaling algorithm results in loss of features which directly affects the performance of the detector. Therefore, in this paper we have analyzed the effect of different image scaling algorithms existing in literature on the performance of the Viola and Jones face detection framework and have tried to find out the optimal algorithm significant in performance. The algorithms which will be analyzed are: Nearest Neighbor, Bilinear, Bicubic, Extended Linear and Piece-wise Extended Linear. All these algorithms have been integrated with the Viola and Jones face detection code available with OpenCV library and has been tested with different well know databases containing frontal faces.
17 citations
Cited by
More filters
TL;DR: A Deadline-Aware memory Scheduler for Heterogeneous systems (DASH), which overcomes problems using three key ideas, with the goal of meeting HWAs’ deadlines while providing high CPU performance.
Abstract: Modern SoCs integrate multiple CPU cores and hardware accelerators (HWAs) that share the same main memory system, causing interference among memory requests from different agents. The result of this interference, if it is not controlled well, is missed deadlines for HWAs and low CPU performance. Few previous works have tackled this problem. State-of-the-art mechanisms designed for CPU-GPU systems strive to meet a target frame rate for GPUs by prioritizing the GPU close to the time when it has to complete a frame. We observe two major problems when such an approach is adapted to a heterogeneous CPU-HWA system. First, HWAs miss deadlines because they are prioritized only when close to their deadlines. Second, such an approach does not consider the diverse memory access characteristics of different applications running on CPUs and HWAs, leading to low performance for latency-sensitive CPU applications and deadline misses for some HWAs, including GPUs.In this article, we propose a Deadline-Aware memory Scheduler for Heterogeneous systems (DASH), which overcomes these problems using three key ideas, with the goal of meeting HWAs’ deadlines while providing high CPU performance. First, DASH prioritizes an HWA when it is not on track to meet its deadline any time during a deadline period, instead of prioritizing it only when close to a deadline. Second, DASH prioritizes HWAs over memory-intensive CPU applications based on the observation that memory-intensive applications’ performance is not sensitive to memory latency. Third, DASH treats short-deadline HWAs differently as they are more likely to miss their deadlines and schedules their requests based on worst-case memory access time estimates.Extensive evaluations across a wide variety of different workloads and systems show that DASH achieves significantly better CPU performance than the best previous scheduler while always meeting the deadlines for all HWAs, including GPUs, thereby largely improving frame rates.
117 citations
TL;DR: In this paper , a review of 99 Q1 articles covering explainable artificial intelligence (XAI) techniques is presented, including SHAP, LIME, GradCAM, LRP, Fuzzy classifier, EBM, CBR, and others.
Abstract: Artificial intelligence (AI) has branched out to various applications in healthcare, such as health services management, predictive medicine, clinical decision-making, and patient data and diagnostics. Although AI models have achieved human-like performance, their use is still limited because they are seen as a black box. This lack of trust remains the main reason for their low use in practice, especially in healthcare. Hence, explainable artificial intelligence (XAI) has been introduced as a technique that can provide confidence in the model's prediction by explaining how the prediction is derived, thereby encouraging the use of AI systems in healthcare. The primary goal of this review is to provide areas of healthcare that require more attention from the XAI research community.Multiple journal databases were thoroughly searched using PRISMA guidelines 2020. Studies that do not appear in Q1 journals, which are highly credible, were excluded.In this review, we surveyed 99 Q1 articles covering the following XAI techniques: SHAP, LIME, GradCAM, LRP, Fuzzy classifier, EBM, CBR, rule-based systems, and others.We discovered that detecting abnormalities in 1D biosignals and identifying key text in clinical notes are areas that require more attention from the XAI research community. We hope this is review will encourage the development of a holistic cloud system for a smart city.
80 citations
76 citations
TL;DR: A novel methodology based on the usage of k-folds cross-validation and the AdaBoost algorithm that improves the performance accuracy of the k-NN classifier-based fall detection system to the extent that it outperforms all similar works in this field.
Abstract: This paper makes four scientific contributions to the field of fall detection in the elderly to contribute to their assisted living in the future of Internet of Things (IoT)-based pervasive living environments, such as smart homes. First, it presents and discusses a comprehensive comparative study, where 19 different machine learning methods were used to develop fall detection systems, to deduce the optimal machine learning method for the development of such systems. This study was conducted on two different datasets, and the results show that out of all the machine learning methods, the k-NN classifier is best suited for the development of fall detection systems in terms of performance accuracy. Second, it presents a framework that overcomes the limitations of binary classifier-based fall detection systems by being able to detect falls and fall-like motions. Third, to increase the trust and reliance on fall detection systems, it introduces a novel methodology based on the usage of k-folds cross-validation and the AdaBoost algorithm that improves the performance accuracy of the k-NN classifier-based fall detection system to the extent that it outperforms all similar works in this field. This approach achieved performance accuracies of 99.87% and 99.66%, respectively, when evaluated on the two datasets. Finally, the proposed approach is also highly accurate in detecting the activity of standing up from a lying position to infer whether a fall was followed by a long lie, which can cause minor to major health-related concerns. The above contributions address multiple research challenges in the field of fall detection, that we identified after conducting a comprehensive review of related works, which is also presented in this paper.
59 citations
14 Mar 2017
TL;DR: The proposed CLAHE algorithm can suppress effectively noise interference, improve the image quality for underwater image availably, and provide more detail enhancement and higher values of colorfulness restoration as compared to other existing image enhancement algorithms.
Abstract: In order to improve contrast and restore color for underwater image captured by camera sensors without suffering from insufficient details and color cast, a fusion algorithm for image enhancement in different color spaces based on contrast limited adaptive histogram equalization (CLAHE) is proposed in this article. The original color image is first converted from RGB color space to two different special color spaces: YIQ and HSI. The color space conversion from RGB to YIQ is a linear transformation, while the RGB to HSI conversion is nonlinear. Then, the algorithm separately operates CLAHE in YIQ and HSI color spaces to obtain two different enhancement images. The luminance component (Y) in the YIQ color space and the intensity component (I) in the HSI color space are enhanced with CLAHE algorithm. The CLAHE has two key parameters: Block Size and Clip Limit, which mainly control the quality of CLAHE enhancement image. After that, the YIQ and HSI enhancement images are respectively converted backward to RGB color. When the three components of red, green, and blue are not coherent in the YIQ-RGB or HSI-RGB images, the three components will have to be harmonized with the CLAHE algorithm in RGB space. Finally, with 4 direction Sobel edge detector in the bounded general logarithm ratio operation, a self-adaptive weight selection nonlinear image enhancement is carried out to fuse YIQ-RGB and HSI-RGB images together to achieve the final fused image. The enhancement fusion algorithm has two key factors: average of Sobel edge detector and fusion coefficient, and these two factors determine the effects of enhancement fusion algorithm. A series of evaluate metrics such as mean, contrast, entropy, colorfulness metric (CM), mean square error (MSE) and peak signal to noise ratio (PSNR) are used to assess the proposed enhancement algorithm. The experiments results showed that the proposed algorithm provides more detail enhancement and higher values of colorfulness restoration as compared to other existing image enhancement algorithms. The proposed algorithm can suppress effectively noise interference, improve the image quality for underwater image availably.
42 citations