scispace - formally typeset
Search or ask a question

Showing papers on "Face detection published in 2021"


Journal ArticleDOI
TL;DR: In this article, the authors proposed an approach using deep learning, TensorFlow, Keras, and OpenCV to detect face masks using Single Shot Multibox Detector as a face detector and MobilenetV2 architecture as a framework for the classifier.

193 citations


Journal ArticleDOI
TL;DR: This work globally presents the applied mask-to-face deformable model for permitting the generation of other masked face images, notably with specific masks and their combination for the global masked face detection (MaskedFace-Net).

156 citations


Journal ArticleDOI
TL;DR: A comprehensive review of recently developed deep learning methods for small object detection can be found in this article, where the authors summarize challenges and solutions of small-object detection, and present major deep learning techniques, including fusing feature maps, adding context information, balancing foreground-background examples, and creating sufficient positive examples.
Abstract: In computer vision, significant advances have been made on object detection with the rapid development of deep convolutional neural networks (CNN). This paper provides a comprehensive review of recently developed deep learning methods for small object detection. We summarize challenges and solutions of small object detection, and present major deep learning techniques, including fusing feature maps, adding context information, balancing foreground-background examples, and creating sufficient positive examples. We discuss related techniques developed in four research areas, including generic object detection, face detection, object detection in aerial imagery, and segmentation. In addition, this paper compares the performances of several leading deep learning methods for small object detection, including YOLOv3, Faster R-CNN, and SSD, based on three large benchmark datasets of small objects. Our experimental results show that while the detection accuracy on small objects by these deep learning methods was low, less than 0.4, Faster R-CNN performed the best, while YOLOv3 was a close second.

142 citations


Journal ArticleDOI
TL;DR: This work proposes a face detector named YOLO-face based on Y OLOv3 to improve the performance for face detection and includes using anchor boxes more appropriate for face Detection and a more precise regression loss function.
Abstract: Face detection is one of the important tasks of object detection. Typically detection is the first stage of pattern recognition and identity authentication. In recent years, deep learning-based algorithms in object detection have grown rapidly. These algorithms can be generally divided into two categories, i.e., two-stage detector like Faster R-CNN and one-stage detector like YOLO. Although YOLO and its varieties are not so good as two-stage detectors in terms of accuracy, they outperform the counterparts by a large margin in speed. YOLO performs well when facing normal size objects, but is incapable of detecting small objects. The accuracy decreases notably when dealing with objects that have large-scale changing like faces. Aimed to solve the detection problem of varying face scales, we propose a face detector named YOLO-face based on YOLOv3 to improve the performance for face detection. The present approach includes using anchor boxes more appropriate for face detection and a more precise regression loss function. The improved detector significantly increased accuracy while remaining fast detection speed. Experiments on the WIDER FACE and the FDDB datasets show that our improved algorithm outperforms YOLO and its varieties.

128 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a method for detecting face swapping and other identity manipulations in single images, which involves two networks: (i) a face identification network that considers the face region bounded by a tight semantic segmentation, and (ii) a context recognition network that consider the face context (e.g., hair, ears, neck).
Abstract: We propose a method for detecting face swapping and other identity manipulations in single images. Face swapping methods, such as DeepFake, manipulate the face region, aiming to adjust the face to the appearance of its context, while leaving the context unchanged. We show that this modus operandi produces discrepancies between the two regions (e.g., Fig. 1). These discrepancies offer exploitable telltale signs of manipulation. Our approach involves two networks: (i) a face identification network that considers the face region bounded by a tight semantic segmentation, and (ii) a context recognition network that considers the face context (e.g., hair, ears, neck). We describe a method which uses the recognition signals from our two networks to detect such discrepancies, providing a complementary detection signal that improves conventional real vs. fake classifiers commonly used for detecting fake images. Our method achieves state of the art results on the FaceForensics++, Celeb-DF-v2, and DFDC benchmarks for face manipulation detection, and even generalizes to detect fakes produced by unseen methods.

80 citations


Proceedings ArticleDOI
20 Jun 2021
TL;DR: Wang et al. as discussed by the authors proposed an attention-based data augmentation framework to guide the detector to refine and enlarge its attention to mine deeper into the regions ignored before for more representative forgery.
Abstract: Although vanilla Convolutional Neural Network (CNN) based detectors can achieve satisfactory performance on fake face detection, we observe that the detectors tend to seek forgeries on a limited region of face, which reveals that the detectors is short of understanding of forgery. Therefore, we propose an attention-based data augmentation framework to guide detector refine and enlarge its attention. Specifically, our method tracks and occludes the Top-N sensitive facial regions, encouraging the detector to mine deeper into the regions ignored before for more representative forgery. Especially, our method is simple-to-use and can be easily integrated with various CNN models. Extensive experiments show that the detector trained with our method is capable to separately point out the representative forgery of fake faces generated by different manipulation techniques, and our method enables a vanilla CNN-based detector to achieve state-of-the-art performance without structure modification. Our code is available at https://github.com/crywang/RFM.

79 citations


Journal ArticleDOI
TL;DR: The Properly Wearing Masked Face Detection Dataset (PWMFD), which included 9205 images of mask wearing samples with three categories, is proposed and Squeeze and Excitation (SE)-YOLOv3, a mask detector with relatively balanced effectiveness and efficiency is proposed.
Abstract: The rapid outbreak of COVID-19 has caused serious harm and infected tens of millions of people worldwide. Since there is no specific treatment, wearing masks has become an effective method to prevent the transmission of COVID-19 and is required in most public areas, which has also led to a growing demand for automatic real-time mask detection services to replace manual reminding. However, few studies on face mask detection are being conducted. It is urgent to improve the performance of mask detectors. In this paper, we proposed the Properly Wearing Masked Face Detection Dataset (PWMFD), which included 9205 images of mask wearing samples with three categories. Moreover, we proposed Squeeze and Excitation (SE)-YOLOv3, a mask detector with relatively balanced effectiveness and efficiency. We integrated the attention mechanism by introducing the SE block into Darknet53 to obtain the relationships among channels so that the network can focus more on the important feature. We adopted GIoUloss, which can better describe the spatial difference between predicted and ground truth boxes to improve the stability of bounding box regression. Focal loss was utilized for solving the extreme foreground-background class imbalance. Besides, we performed corresponding image augmentation techniques to further improve the robustness of the model on the specific task. Experimental results showed that SE-YOLOv3 outperformed YOLOv3 and other state-of-the-art detectors on PWMFD and achieved a higher 8.6% mAP compared to YOLOv3 while having a comparable detection speed.

74 citations


Journal ArticleDOI
TL;DR: An edge computing-based mask (ECMask) identification framework to help public health precautions, which can ensure real-time performance on the low-power camera devices of buses and has valuable application in COVID-19 prevention.
Abstract: During the outbreak of the Coronavirus disease 2019 (COVID-19), while bringing various serious threats to the world, it reminds us that we need to take precautions to control the transmission of the virus. The rise of the Internet of Medical Things (IoMT) has made related data collection and processing, including healthcare monitoring systems, more convenient on the one hand, and requirements of public health prevention are also changing and more challengeable on the other hand. One of the most effective nonpharmaceutical medical intervention measures is mask wearing. Therefore, there is an urgent need for an automatic real-time mask detection method to help prevent the public epidemic. In this article, we put forward an edge computing-based mask (ECMask) identification framework to help public health precautions, which can ensure real-time performance on the low-power camera devices of buses. Our ECMask consists of three main stages: 1) video restoration; 2) face detection; and 3) mask identification. The related models are trained and evaluated on our bus drive monitoring data set and public data set. We construct extensive experiments to validate the good performance based on real video data, in consideration of detection accuracy and execution time efficiency of the whole video analysis, which have valuable application in COVID-19 prevention.

68 citations


Proceedings ArticleDOI
02 Apr 2021
TL;DR: In this paper, a two-stage CNN architecture is used to detect both masked and unmasked faces and is compatible with CCTV cameras, which will aid in the tracking of safety violations, the promotion of face mask use, and the creation of a safe working environment.
Abstract: Coronavirus Disease 2019 (COVID-19) broke out at the end of 2019, and it's still wreaking havoc on millions of people's lives and businesses in 2020. There is an upsurge of uneasiness among people who plan to return to their daily activities in person, as the world recovers from the pandemic and plans to get back to a state of regularity. Wearing a face mask significantly reduces the risk of viral transmission and provides a sense of protection, according to several studies. However, manually tracking the implementation of this policy is not possible. The key here is technology. We present a Convolutional Neural Network (CNN) based architecture for detecting instances of improper use of face masks. Our system uses two-stage CNN architecture that can detect both masked and unmasked faces and is compatible with CCTV cameras. This will aid in the tracking of safety violations, the promotion of face mask use, and the creation of a safe working environment.

64 citations


Journal ArticleDOI
TL;DR: A two-stage approach to detect wearing masks using hybrid machine learning techniques, based on the transfer model of Faster_RCNN and InceptionV2 structure and designed to verify the real facial masks using a broad learning system is proposed.
Abstract: In the era of Corona Virus Disease 2019 (COVID-19), wearing a mask can effectively protect people from infection risk and largely decrease the spread in public places, such as hospitals and airports. This brings a demand for the monitoring instruments that are required to detect people who are wearing masks. However, this is not the objective of existing face detection algorithms. In this article, we propose a two-stage approach to detect wearing masks using hybrid machine learning techniques. The first stage is designed to detect candidate wearing mask regions as many as possible, which is based on the transfer model of Faster_RCNN and InceptionV2 structure, while the second stage is designed to verify the real facial masks using a broad learning system. It is implemented by training a two-class model. Moreover, this article proposes a data set for wearing mask detection (WMD) that includes 7804 realistic images. The data set has 26403 wearing masks and covers multiple scenes, which is available at “ https://github.com/BingshuCV/WMD .” Experiments conducted on the data set demonstrate that the proposed approach achieves an overall accuracy of 97.32% for simple scene and an overall accuracy of 91.13% for the complex scene, outperforming the compared methods.

59 citations


DOI
07 Oct 2021
TL;DR: In this article, a fast angle detection algorithm has been used to detect the placeholders on the face, which is based on the assumption that an image is available from the front (fully front) and skin areas were first detected using a color-based learning algorithm and six sigma techniques on RGB, HSV, and NTSC scales.
Abstract: This research study has applied facial recognition techniques using the angle detection algorithm. Also, a fast angle detection algorithm has been used here, but modified it by applying a shielding technique to create a technique related to loud noise. This article describes twelve facial signs that include the corner of the left eye, the corner of the right eye, the left eyebrow, the right eyebrow, the corner of the lip, and the nostril. It consists of two parts; first, a private browsing technique has been performed to filter the image from noise. The proposed method is based on the assumption that an image is available from the front (fully front). Skin areas were first detected using a color-based learning algorithm and six sigma techniques on RGB, HSV, and NTSC scales. Other analyzes involve morphological processing using the detection of the borderline and the detection of the reflection from the light source of the eye commonly referred to as the eye point. In the second step, a fast angle detection algorithm has been used to detect the placeholders on the face. The Fast Angle Finder works on the Angular Response Function (CRF) which is calculated as the minimum change in intensity in all possible directions. Finally, a comparison has been made with other filtering techniques based on the proposed protection techniques. This article has performed different experiments by using the IRIS Face Database, BioID, and the Cohn Canada Database. The recognition rate obtained by the proposed method is appreciable.

Journal ArticleDOI
TL;DR: This article proposes an efficient technique for face detection from still images under occlusion and non-uniform illumination using a combination of YCbCr, HSV and L’× a’a ×’b color model, which can be useful in the surveillance and security related applications.
Abstract: Face detection is important part of face recognition system. In face recognition, face detection is taken not so seriously. Face detection is taken for granted; primarily focus is on face recognition. Also, many challenges associated with face detection, increases the value of TN (True Negative). A lot of work has been done in field of face recognition. But in field of face detection, especially with problems of face occlusion and non-uniform illumination, not so much work has been done. It directly affects the efficiency of applications linked with face detection, example face recognition, surveillance, etc. So, these reasons motivate us to do research in field of face detection, especially with problems of face occlusion and non-uniform illumination. The main objective of this article is to detect face in still image. Experimental work has been conducted on images having problem of face occlusion and non-uniform illumination. Experimental images have been taken from public dataset AR face dataset and Color FERET dataset. One manual dataset has also been created for experimental purpose. The images in this manual dataset have been taken from the internet. This involves making the machine intelligent enough to acquire the human perception and knowledge to detect, localize and recognize the face in an arbitrary image with the same ease as humans do it. This article proposes an efficient technique for face detection from still images under occlusion and non-uniform illumination. The authors have presented a face detection technique using a combination of YCbCr, HSV and L × a × b color model. The proposed technique improved results in terms of Accuracy, Detection Rate, False Detection Rate and Precision. This technique can be useful in the surveillance and security related applications.

Journal ArticleDOI
TL;DR: A deep learning-based convolutional neural network architecture has been proposed to perform feature learning tasks for classification purposes to recognize the types of expressions and the comparison with competing methods shows the superiority of the proposed system.
Abstract: A novel facial expression recognition system has been proposed in this paper. The objective of this paper is to recognize the types of expressions in the human face region. The implementation of the proposed system has been divided into four components. In the first component, a region of interest as face detection has been performed from the captured input image. For extracting more distinctive and discriminant features, in the second component, a deep learning-based convolutional neural network architecture has been proposed to perform feature learning tasks for classification purposes to recognize the types of expressions. To enhance the performance of the proposed system, in the third component, some novel data augmentation techniques have been applied to the facial image to enrich the learning parameters of the proposed CNN model. In the fourth component, a trade-off between data augmentation and deep learning features have been performed for fine-tuning the trained CNN model. Extensive experimental results have been demonstrated using three benchmark databases: KDEF (seven expression classes), GENKI-4k (two expression classes), and CK+ (seven expression classes). The performance of the proposed system respect for each database has been well presented and described and finally, these performances have been compared with the existing state-of-the-art methods. The comparison with competing methods shows the superiority of the proposed system.

Journal ArticleDOI
Jiaying Liu1, Dejia Xu1, Wenhan Yang1, Minhao Fan1, Haofeng Huang1 
TL;DR: In this paper, a large-scale low-light image dataset is proposed to evaluate the performance of low-level vision enhancement and face detection in the lowlight condition via face detection task.
Abstract: In this paper, we present a systematic review and evaluation of existing single-image low-light enhancement algorithms. Besides the commonly used low-level vision oriented evaluations, we additionally consider measuring machine vision performance in the low-light condition via face detection task to explore the potential of joint optimization of high-level and low-level vision enhancement. To this end, we first propose a large-scale low-light image dataset serving both low/high-level vision with diversified scenes and contents as well as complex degradation in real scenarios, called Vision Enhancement in the LOw-Light condition (VE-LOL). Beyond paired low/normal-light images without annotations, we additionally include the analysis resource related to human, i.e. face images in the low-light condition with annotated face bounding boxes. Then, efforts are made on benchmarking from the perspective of both human and machine visions. A rich variety of criteria is used for the low-level vision evaluation, including full-reference, no-reference, and semantic similarity metrics. We also measure the effects of the low-light enhancement on face detection in the low-light condition. State-of-the-art face detection methods are used in the evaluation. Furthermore, with the rich material of VE-LOL, we explore the novel problem of joint low-light enhancement and face detection. We develop an enhanced face detector to apply low-light enhancement and face detection jointly. The features extracted by the enhancement module are fed to the successive layer with the same resolution of the detection module. Thus, these features are intertwined together to unitedly learn useful information across two phases, i.e. enhancement and detection. Experiments on VE-LOL provide a comparison of state-of-the-art low-light enhancement algorithms, point out their limitations, and suggest promising future directions. Our dataset has supported the Track “Face Detection in Low Light Conditions” of CVPR UG2+ Challenge (2019–2020) ( http://cvpr2020.ug2challenge.org/ ).

Journal ArticleDOI
TL;DR: In this article, the authors proposed an algorithm with the combination of the oriented FAST and rotated BRIEF (ORB) features and Local Binary Patterns (LBP) features extracted from facial expression.
Abstract: Emotion plays an important role in communication. For human-computer interaction, facial expression recognition has become an indispensable part. Recently, deep neural networks (DNNs) are widely used in this field and they overcome the limitations of conventional approaches. However, application of DNNs is very limited due to excessive hardware specifications requirement. Considering low hardware specifications used in real-life conditions, to gain better results without DNNs, in this paper, we propose an algorithm with the combination of the oriented FAST and rotated BRIEF (ORB) features and Local Binary Patterns (LBP) features extracted from facial expression. First of all, every image is passed through face detection algorithm to extract more effective features. Second, in order to increase computational speed, the ORB and LBP features are extracted from the face region; specifically, region division is innovatively employed in the traditional ORB to avoid the concentration of the features. The features are invariant to scale and grayscale as well as rotation changes. Finally, the combined features are classified by Support Vector Machine (SVM). The proposed method is evaluated on several challenging databases such as Cohn-Kanade database (CK+), Japanese Female Facial Expressions database (JAFFE), and MMI database; experimental results of seven emotion state (neutral, joy, sadness, surprise, anger, fear, and disgust) show that the proposed framework is effective and accurate.

Journal ArticleDOI
TL;DR: RefineFace as mentioned in this paper is a single-shot refinement face detector consisting of five modules: selective two-step regression, selective two step classification, scale-aware margin loss, feature supervision module, and receptive field enhancement.
Abstract: Face detection has achieved significant progress in recent years. However, high performance face detection still remains a very challenging problem, especially when there exists many tiny faces. In this paper, we present a single-shot refinement face detector namely RefineFace to achieve high performance. Specifically, it consists of five modules: selective two-step regression (STR), selective two-step classification (STC), scale-aware margin loss (SML), feature supervision module (FSM) and receptive field enhancement (RFE). To enhance the regression ability for high location accuracy, STR coarsely adjusts locations and sizes of anchors from high level detection layers to provide better initialization for subsequent regressor. To improve the classification ability for high recall efficiency, STC first filters out most simple negatives from low level detection layers to reduce search space for subsequent classifier, then SML is applied to better distinguish faces from background at various scales and FSM is introduced to let the backbone learn more discriminative features for classification. Besides, RFE is presented to provide more diverse receptive field to better capture faces in some extreme poses. Extensive experiments conducted on WIDER FACE, AFW, PASCAL Face, FDDB, MAFA demonstrate that our method achieves state-of-the-art results and runs at 37.3 FPS with ResNet-18 for VGA-resolution images.

Journal ArticleDOI
TL;DR: In this article, the authors implemented a Face Mask and Social Distancing Detection model as an embedded vision system and evaluated the system performance in terms of precision, recall, F1-score, support, sensitivity, specificity, and accuracy that demonstrate the practical applicability.
Abstract: Since the infectious coronavirus disease (COVID-19) was first reported in Wuhan, it has become a public health problem in China and even around the world. This pandemic is having devastating effects on societies and economies around the world. The increase in the number of COVID-19 tests gives more information about the epidemic spread, which may lead to the possibility of surrounding it to prevent further infections. However, wearing a face mask that prevents the transmission of droplets in the air and maintaining an appropriate physical distance between people, and reducing close contact with each other can still be beneficial in combating this pandemic. Therefore, this research paper focuses on implementing a Face Mask and Social Distancing Detection model as an embedded vision system. The pretrained models such as the MobileNet, ResNet Classifier, and VGG are used in our context. People violating social distancing or not wearing masks were detected. After implementing and deploying the models, the selected one achieved a confidence score of 100%. This paper also provides a comparative study of different face detection and face mask classification models. The system performance is evaluated in terms of precision, recall, F1-score, support, sensitivity, specificity, and accuracy that demonstrate the practical applicability. The system performs with F1-score of 99%, sensitivity of 99%, specificity of 99%, and an accuracy of 100%. Hence, this solution tracks the people with or without masks in a real-time scenario and ensures social distancing by generating an alarm if there is a violation in the scene or in public places. This can be used with the existing embedded camera infrastructure to enable these analytics which can be applied to various verticals, as well as in an office building or at airport terminals/gates. [ABSTRACT FROM AUTHOR] Copyright of Scientific Programming is the property of Hindawi Limited and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

Journal ArticleDOI
01 Nov 2021
TL;DR: In this article, a comparison of two face recognition techniques Haar Cascade and Local Binary Pattern was made for the classification. As a result, the accuracy of HBC was more than LBP but the execution time in HaBC was longer than HBC.
Abstract: Facial Recognition is the biometric technique used in face detection. The task for validating or recognizing a face from the multi-media photographs is done using facial recognition technique. With the evolution of advanced society the requirement for face identification has been really important. Detection and identification of faces has been grown worldwide. It owes the demand for security such as authorization, national safety and other vital circumstances. There are number of algorithms for facial detection. This paper aspires to present the comparison of two face recognition techniques Haar Cascade and Local Binary Pattern edified for the classification. As a result the accuracy of Haar Cascade is more than the Local Binary Pattern but the execution time in Haar Cascade is more than Local Binary Pattern.

Proceedings ArticleDOI
Vitor Albiero1, Xingyu Chen2, Xi Yin2, Guan Pang2, Tal Hassner2 
01 Jun 2021
TL;DR: In this paper, a real-time, six degrees of freedom (6DoF), 3D face pose estimation without face detection or landmark localization is proposed, which is based on Faster R-CNN.
Abstract: We propose real-time, six degrees of freedom (6DoF), 3D face pose estimation without face detection or landmark localization. We observe that estimating the 6DoF rigid transformation of a face is a simpler problem than facial landmark detection, often used for 3D face alignment. In addition, 6DoF offers more information than face bounding box labels. We leverage these observations to make multiple contributions: (a) We describe an easily trained, efficient, Faster R-CNN–based model which regresses 6DoF pose for all faces in the photo, without preliminary face detection. (b) We explain how pose is converted and kept consistent between the input photo and arbitrary crops created while training and evaluating our model. (c) Finally, we show how face poses can replace detection bounding box training labels. Tests on AFLW2000-3D and BIWI show that our method runs at real-time and outperforms state of the art (SotA) face pose estimators. Remarkably, our method also surpasses SotA models of comparable complexity on the WIDER FACE detection benchmark, despite not been optimized on bounding box labels.

Journal ArticleDOI
TL;DR: This paper introduces face detection under occlusions, a preliminary step in face recognition, and presents how existing face recognition methods cope with the occlusion problem and classify them into three categories.
Abstract: The limited capacity to recognize faces under occlusions is a long-standing problem that presents a unique challenge for face recognition systems and even for humans. The problem regarding occlusion is less covered by research when compared to other challenges such as pose variation, different expressions, etc. Nevertheless, occluded face recognition is imperative to exploit the full potential of face recognition for real-world applications. In this paper, we restrict the scope to occluded face recognition. First, we explore what the occlusion problem is and what inherent difficulties can arise. As a part of this review, we introduce face detection under occlusion, a preliminary step in face recognition. Second, we present how existing face recognition methods cope with the occlusion problem and classify them into three categories, which are 1) occlusion robust feature extraction approaches, 2) occlusion aware face recognition approaches, and 3) occlusion recovery based face recognition approaches. Furthermore, we analyze the motivations, innovations, pros and cons, and the performance of representative approaches for comparison. Finally, future challenges and method trends of occluded face recognition are thoroughly discussed.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an adaptive manipulation traces extraction network (AMTEN), which serves as pre-processing to suppress image content and highlight manipulation traces, which achieves an average accuracy up to 98.52%.

Journal ArticleDOI
TL;DR: Modifications to improve the efficiency of the improved K-nearest neighbor (MK-NN) algorithm for electronic medical care are introduced and a comparative analysis was performed to determine the best fuse target detection algorithm based on robustness, accuracy, and computational time.
Abstract: Object detection plays a vital role in the fields of computer vision, machine learning, and artificial intelligence applications (such as FUSE-AI (E-healthcare MRI scan), face detection, people counting, and vehicle detection) to identify good and defective food products In the field of artificial intelligence, target detection has been at its peak, but when it comes to detecting multiple targets in a single image or video file, there are indeed challenges This article focuses on the improved K-nearest neighbor (MK-NN) algorithm for electronic medical care to realize intelligent medical services and applications We introduced modifications to improve the efficiency of MK-NN, and a comparative analysis was performed to determine the best fuse target detection algorithm based on robustness, accuracy, and computational time The comparative analysis is performed using four algorithms, namely, MK-NN, traditional K-NN, convolutional neural network, and backpropagation Experimental results show that the improved K-NN algorithm is the best model in terms of robustness, accuracy, and computational time

Proceedings ArticleDOI
05 Mar 2021
TL;DR: In this paper, the authors proposed a comprehensive and effective solution to perform person detection, social distancing violation detection, face detection and face mask classification using object detection, clustering and Convolution Neural Network (CNN) based binary classifier.
Abstract: In the current times, the fear and danger of COVID-19 virus still stands large. Manual monitoring of social distancing norms is impractical with a large population moving about and with insufficient task force and resources to administer them. There is a need for a lightweight, robust and 24X7 video-monitoring system that automates this process. This paper proposes a comprehensive and effective solution to perform person detection, social distancing violation detection, face detection and face mask classification using object detection, clustering and Convolution Neural Network (CNN) based binary classifier. For this, YOLOv3, Density-based spatial clustering of applications with noise (DBSCAN), Dual Shot Face Detector (DSFD) and MobileNetV2 based binary classifier have been employed on surveillance video datasets. This paper also provides a comparative study of different face detection and face mask classification models. Finally, a video dataset labelling method is proposed along with the labelled video dataset to compensate for the lack of dataset in the community and is used for evaluation of the system. The system performance is evaluated in terms of accuracy, F1 score as well as the prediction time, which has to be low for practical applicability. The system performs with an accuracy of 91.2% and F1 score of 90.79% on the labelled video dataset and has an average prediction time of 7.12 seconds for 78 frames of a video.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed emotion recognition system from facial images based on edge computing is energy efficient, has less learnable parameters, and good recognition accuracy.
Abstract: The growing use of the Internet of Things (IoT) has increased the volume of data to be processed by manifolds. Edge computing can lessen the load of transmitting a massive volume of data to the cloud. It can also provide reduced latency and real-time experience to the users. This article proposes an emotion recognition system from facial images based on edge computing. A convolutional neural network (CNN) model is proposed to recognize emotion. The model is trained in a cloud during off time and downloaded to an edge server. During the testing, an end device such as a smartphone captures a face image and does some preprocessing, which includes face detection, face cropping, contrast enhancement, and image resizing. The preprocessed image is then sent to the edge server. The edge server runs the CNN model and infers a decision on emotion. The decision is then transmitted back to the smartphone. Two data sets, JAFFE and extended Cohn–Kanade (CK+), are used for the evaluation. Experimental results show that the proposed system is energy efficient, has less learnable parameters, and good recognition accuracy. The accuracies using the JAFFE and CK+ data sets are 93.5% and 96.6%, respectively.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: Li et al. as mentioned in this paper proposed a joint high-low adaptation (HLA) framework to reduce the burden of building new datasets for low-light conditions, making full use of existing normal light data and explore how to adapt face detectors from normal light to low light.
Abstract: Face detection in low light scenarios is challenging but vital to many practical applications, e.g., surveillance video, autonomous driving at night. Most existing face detectors heavily rely on extensive annotations, while collecting data is time-consuming and laborious. To reduce the burden of building new datasets for low light conditions, we make full use of existing normal light data and explore how to adapt face detectors from normal light to low light. The challenge of this task is that the gap between normal and low light is too huge and complex for both pixel-level and object-level. Therefore, most existing low-light enhancement and adaptation methods do not achieve desirable performance. To address the issue, we propose a joint High-Low Adaptation (HLA) framework. Through a bidirectional low-level adaptation and multi-task high-level adaptation scheme, our HLA-Face outperforms state-of-the-art methods even without using dark face labels for training. Our project is publicly available at: https://daooshee.github.io/HLA-Face-Website/.

Journal ArticleDOI
01 Feb 2021
TL;DR: The concept on how to design and develop a face recognition system through deep learning using OpenCV in python is described and experimental results are provided to demonstrate the accuracy of the proposed face Recognition system.
Abstract: Human face is the significant characteristic to identify a person. Everyone has their own unique face even for twins. Thus, a face recognition and identification are required to distinguish each other. A face recognition system is the verification system to find a person's identity through biometric method. Face recognition has become a popular method nowadays in many applications such as phone unlock system, criminal identification and even home security system. This system is more secure as it does not need any dependencies such as key and card but only facial image is needed. Generally, human recognition system involves 2 phases which are face detection and face identification. This paper describes the concept on how to design and develop a face recognition system through deep learning using OpenCV in python. Deep learning is an approach to perform the face recognition and seems to be an adequate method to carry out face recognition due to its high accuracy. Experimental results are provided to demonstrate the accuracy of the proposed face recognition system.

Journal ArticleDOI
TL;DR: A simple yet efficient pure convolutional neural network face detection method, named dual-branch center face detector (DBCFace for short), which solve face detection via a dual branch fully Convolutional framework without extra anchor design and NMS.
Abstract: Face detection generally requires prior boxes and an extra non-maximum suppression(NMS) post-processing in modern deep learning methods. However, anchor design and anchor matching strategy significantly affect the performance of face detectors, so we have to spend a lot of time on anchor designing for different business scenarios. The other issue is that NMS cannot be easily parallelized and it may become a bottleneck of detection speed. In this paper, we propose a simple yet efficient pure convolutional neural network face detection method, named dual-branch center face detector(DBCFace for short), which solve face detection via a dual branch fully convolutional framework without extra anchor design and NMS. Extensive experiments are conducted on four popular face detection benchmarks, including AFW, PASCAL face, FDDB, and WIDER FACE, demonstrating that our method is comparable with state-of-the-art methods while the speed is faster.

Proceedings ArticleDOI
27 Jan 2021
TL;DR: In this article, an approach similar to Eigenface is used for extracting facial features through facial vectors and the datasets are trained using Support Vector Machine (SVM) algorithm to perform face classification and detection.
Abstract: Today's pandemic situation has transformed the way of educating a student. Education is undertaken remotely through online platforms. In addition to the way the online course contents and online teaching, it has also changed the way of assessments. In online education, monitoring the attendance of the students is very important as the presence of students is part of a good assessment for teaching and learning. Educational institutions have adopting online examination portals for the assessments of the students. These portals make use of face recognition techniques to monitor the activities of the students and identify the malpractice done by them. This is done by capturing the students' activities through a web camera and analyzing their gestures and postures. Image processing algorithms are widely used in the literature to perform face recognition. Despite the progress made to improve the performance of face detection systems, there are issues such as variations in human facial appearance like varying lighting condition, noise in face images, scale, pose etc., that blocks the progress to reach human level accuracy. The aim of this study is to increase the accuracy of the existing face recognition systems by making use of SVM and Eigenface algorithms. In this project, an approach similar to Eigenface is used for extracting facial features through facial vectors and the datasets are trained using Support Vector Machine (SVM) algorithm to perform face classification and detection. This ensures that the face recognition can be faster and be used for online exam monitoring.

Journal ArticleDOI
TL;DR: The proposed method, named REGDet, is the first ‘detection-with-enhancement’ framework for low-light face detection and not only encourages rich interaction and feature fusion across different illumination levels, but also enables effective end-to-end learning of the REG component to be better tailored for face detection.
Abstract: Face detection from low-light images is challenging due to limited photons and inevitable noise, which, to make the task even harder, are often spatially unevenly distributed. A natural solution is to borrow the idea from multi-exposure, which captures multiple shots to obtain well-exposed images under challenging conditions. High-quality implementation/approximation of multi-exposure from a single image is however nontrivial. Fortunately, as shown in this paper, neither is such high-quality necessary since our task is face detection rather than image enhancement. Specifically, we propose a novel Recurrent Exposure Generation (REG) module and couple it seamlessly with a Multi-Exposure Detection (MED) module, and thus significantly improve face detection performance by effectively inhibiting non-uniform illumination and noise issues. REG produces progressively and efficiently intermediate images corresponding to various exposure settings, and such pseudo-exposures are then fused by MED to detect faces across different lighting conditions. The proposed method, named REGDet, is the first ‘detection-with-enhancement’ framework for low-light face detection. It not only encourages rich interaction and feature fusion across different illumination levels, but also enables effective end-to-end learning of the REG component to be better tailored for face detection. Moreover, as clearly shown in our experiments, REG can be flexibly coupled with different face detectors without extra low/normal-light image pairs for training. We tested REGDet on the DARK FACE low-light face benchmark with thorough ablation study, where REGDet outperforms previous state-of-the-arts by a significant margin, with only negligible extra parameters.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors implemented task-driven semantic coding by implementing semantic bit allocation based on reinforcement learning (RL) for video/image classification, detection and segmentation.
Abstract: Task-driven semantic video/image coding has drawn considerable attention with the development of intelligent media applications, such as license plate detection, face detection, and medical diagnosis, which focuses on maintaining the semantic information of videos/images. Deep neural network (DNN)-based codecs have been studied for this purpose due to their inherent end-to-end optimization mechanism. However, the traditional hybrid coding framework cannot be optimized in an end-to-end manner, which makes task-driven semantic fidelity metric unable to be automatically integrated into the rate-distortion optimization process. Therefore, it is still attractive and challenging to implement task-driven semantic coding with the traditional hybrid coding framework, which should still be widely used in practical industry for a long time. To solve this challenge, we design semantic maps for different tasks to extract the pixelwise semantic fidelity for videos/images. Instead of directly integrating the semantic fidelity metric into traditional hybrid coding framework, we implement task-driven semantic coding by implementing semantic bit allocation based on reinforcement learning (RL). We formulate the semantic bit allocation problem as a Markov decision process (MDP) and utilize one RL agent to automatically determine the quantization parameters (QPs) for different coding units (CUs) according to the task-driven semantic fidelity metric. Extensive experiments on different tasks, such as classification, detection and segmentation, have demonstrated the superior performance of our approach by achieving an average bitrate saving of 34.39% to 52.62% over the High Efficiency Video Coding (H.265/HEVC) anchor under equivalent task-related semantic fidelity.