scispace - formally typeset
Search or ask a question

Showing papers on "Face detection published in 2019"


Journal ArticleDOI
TL;DR: In this article, a review of deep learning-based object detection frameworks is provided, focusing on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further.
Abstract: Due to object detection’s close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles that combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy, and optimization function. In this paper, we provide a review of deep learning-based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely, the convolutional neural network. Then, we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection, and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network-based learning systems.

3,097 citations


Journal ArticleDOI
TL;DR: HyperFace as discussed by the authors combines face detection, landmarks localization, pose estimation and gender recognition using deep convolutional neural networks (CNNs) and achieves significant improvement in performance by fusing intermediate layers of a deep CNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features.
Abstract: We present an algorithm for simultaneous face detection, landmarks localization, pose estimation and gender recognition using deep convolutional neural networks (CNN). The proposed method called, HyperFace, fuses the intermediate layers of a deep CNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features. It exploits the synergy among the tasks which boosts up their individual performances. Additionally, we propose two variants of HyperFace: (1) HyperFace-ResNet that builds on the ResNet-101 model and achieves significant improvement in performance, and (2) Fast-HyperFace that uses a high recall fast face detector for generating region proposals to improve the speed of the algorithm. Extensive experiments show that the proposed models are able to capture both global and local information in faces and performs significantly better than many competitive algorithms for each of these four tasks.

1,218 citations


Posted Content
TL;DR: This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019), and makes an in-deep analysis of their challenges as well as technical improvements in recent years.
Abstract: Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.

802 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a 3D Dense Face Alignment (3DDFA) framework, in which a dense 3D Morphable Model (3DMM) is fitted to the image via Cascaded Convolutional Neural Networks.
Abstract: Face alignment, which fits a face model to an image and extracts the semantic meanings of facial pixels, has been an important topic in the computer vision community. However, most algorithms are designed for faces in small to medium poses (yaw angle is smaller than 45 degree), which lack the ability to align faces in large poses up to 90 degree. The challenges are three-fold. First, the commonly used landmark face model assumes that all the landmarks are visible and is therefore not suitable for large poses. Second, the face appearance varies more drastically across large poses, from the frontal view to the profile view. Third, labelling landmarks in large poses is extremely challenging since the invisible landmarks have to be guessed. In this paper, we propose to tackle these three challenges in an new alignment framework termed 3D Dense Face Alignment (3DDFA), in which a dense 3D Morphable Model (3DMM) is fitted to the image via Cascaded Convolutional Neural Networks. We also utilize 3D information to synthesize face images in profile views to provide abundant samples for training. Experiments on the challenging AFLW database show that the proposed approach achieves significant improvements over the state-of-the-art methods.

358 citations


Posted Content
TL;DR: A robust single-stage face detector, named RetinaFace, which performs pixel-wise face localisation on various scales of faces by taking advantages of joint extra-supervised and self-super supervised multi-task learning.
Abstract: Though tremendous strides have been made in uncontrolled face detection, accurate and efficient face localisation in the wild remains an open challenge. This paper presents a robust single-stage face detector, named RetinaFace, which performs pixel-wise face localisation on various scales of faces by taking advantages of joint extra-supervised and self-supervised multi-task learning. Specifically, We make contributions in the following five aspects: (1) We manually annotate five facial landmarks on the WIDER FACE dataset and observe significant improvement in hard face detection with the assistance of this extra supervision signal. (2) We further add a self-supervised mesh decoder branch for predicting a pixel-wise 3D shape face information in parallel with the existing supervised branches. (3) On the WIDER FACE hard test set, RetinaFace outperforms the state of the art average precision (AP) by 1.1% (achieving AP equal to 91.4%). (4) On the IJB-C test set, RetinaFace enables state of the art methods (ArcFace) to improve their results in face verification (TAR=89.59% for FAR=1e-6). (5) By employing light-weight backbone networks, RetinaFace can run real-time on a single CPU core for a VGA-resolution image. Extra annotations and code have been made available at: this https URL.

357 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: Dual Shot face detector (DSFD) as discussed by the authors adopts the architecture of SSD and introduces a Feature Enhance Module (FEM) for transferring the original feature maps to extend the single shot detector to dual shot detector.
Abstract: Recently, Convolutional Neural Network (CNN) has achieved great success in face detection. However, it remains a challenging problem for the current face detection methods owing to high degree of variability in scale, pose, occlusion, expression, appearance and illumination. In this Paper, we propose a novel detection network named Dual Shot face Detector(DSFD). which inherits the architecture of SSD and introduces a Feature Enhance Module (FEM) for transferring the original feature maps to extend the single shot detector to dual shot detector. Specially, progressive anchor loss (PAL) computed by using two set of anchors is adopted to effectively facilitate the features. Additionally, we propose an improved anchor matching (IAM) method by integrating novel data augmentation techniques and anchor design strategy in our DSFD to provide better initialization for the regressor. Extensive experiments on popular benchmarks: WIDER FACE (easy: 0.966, medium: 0.957, hard: 0.904) and FDDB ( discontinuous: 0.991, continuous: 0.862 ) demonstrate the superiority of DSFD over the state-of-the-art face detection methods (e.g., PyramidBox and SRN). Code will be made available upon publication.

321 citations


Journal ArticleDOI
TL;DR: This paper proposes a simple approach to implicitly select skin tissues based on their distinct pulsatility feature and shows that this method outperforms state of the art algorithms, without any critical face or skin detection.

253 citations


Journal ArticleDOI
TL;DR: A comprehensive survey of various techniques explored for face detection in digital images is presented in this paper, where the practical aspects towards the development of a robust face detection system and several promising directions for future research are discussed.
Abstract: With the marvelous increase in video and image database there is an incredible need of automatic understanding and examination of information by the intelligent systems as manually it is getting to be plainly distant. Face plays a major role in social intercourse for conveying identity and feelings of a person. Human beings have not tremendous ability to identify different faces than machines. So, automatic face detection system plays an important role in face recognition, facial expression recognition, head-pose estimation, human–computer interaction etc. Face detection is a computer technology that determines the location and size of a human face in a digital image. Face detection has been a standout amongst topics in the computer vision literature. This paper presents a comprehensive survey of various techniques explored for face detection in digital images. Different challenges and applications of face detection are also presented in this paper. At the end, different standard databases for face detection are also given with their features. Furthermore, we organize special discussions on the practical aspects towards the development of a robust face detection system and conclude this paper with several promising directions for future research.

227 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: A data collection solution along with a data synthesis technique to simulate digital medium-based face spoofing attacks, and a novel Spatio-Temporal Anti-Spoof Network (STASN) that can distinguish spoof faces by extracting features from a variety of regions to seek out subtle evidences.
Abstract: Face anti-spoofing is an important task in full-stack face applications including face detection, verification, and recognition. Previous approaches build models on datasets which do not simulate the real-world data well (e.g., small scale, insignificant variance, etc.). Existing models may rely on auxiliary information, which prevents these anti-spoofing solutions from generalizing well in practice. In this paper, we present a data collection solution along with a data synthesis technique to simulate digital medium-based face spoofing attacks, which can easily help us obtain a large amount of training data well reflecting the real-world scenarios. Through exploiting a novel Spatio-Temporal Anti-Spoof Network (STASN), we are able to push the performance on public face anti-spoofing datasets over state-of-the-art methods by a large margin. Since the proposed model can automatically attend to discriminative regions, it makes analyzing the behaviors of the network possible.We conduct extensive experiments and show that the proposed model can distinguish spoof faces by extracting features from a variety of regions to seek out subtle evidences such as borders, moire patterns, reflection artifacts, etc.

178 citations


Posted Content
Hao Dang1, Feng Liu1, Joel Stehouwer1, Xiaoming Liu1, Anil K. Jain1 
TL;DR: It is shown that the use of an attention mechanism improves facial forgery detection and manipulated region localization and also improves binary classification of genuine face v. fake face.
Abstract: Detecting manipulated facial images and videos is an increasingly important topic in digital media forensics. As advanced face synthesis and manipulation methods are made available, new types of fake face representations are being created which have raised significant concerns for their use in social media. Hence, it is crucial to detect manipulated face images and localize manipulated regions. Instead of simply using multi-task learning to simultaneously detect manipulated images and predict the manipulated mask (regions), we propose to utilize an attention mechanism to process and improve the feature maps for the classification task. The learned attention maps highlight the informative regions to further improve the binary classification (genuine face v. fake face), and also visualize the manipulated regions. To enable our study of manipulated face detection and localization, we collect a large-scale database that contains numerous types of facial forgeries. With this dataset, we perform a thorough analysis of data-driven fake face detection. We show that the use of an attention mechanism improves facial forgery detection and manipulated region localization.

172 citations


Journal ArticleDOI
TL;DR: A Disentangled Representation learning-Generative Adversarial Network (DR-GAN) with three distinct novelties that demonstrate the superiority of DR-GAN over the state of the art in both learning representations and rotating large-pose face images.
Abstract: The large pose discrepancy between two face images is one of the fundamental challenges in automatic face recognition. Conventional approaches to pose-invariant face recognition either perform face frontalization on, or learn a pose-invariant representation from, a non-frontal face image. We argue that it is more desirable to perform both tasks jointly to allow them to leverage each other. To this end, this paper proposes a Disentangled Representation learning-Generative Adversarial Network (DR-GAN) with three distinct novelties. First, the encoder-decoder structure of the generator enables DR-GAN to learn a representation that is both generative and discriminative, which can be used for face image synthesis and pose-invariant face recognition. Second, this representation is explicitly disentangled from other face variations such as pose, through the pose code provided to the decoder and pose estimation in the discriminator. Third, DR-GAN can take one or multiple images as the input, and generate one unified identity representation along with an arbitrary number of synthetic face images. Extensive quantitative and qualitative evaluation on a number of controlled and in-the-wild databases demonstrate the superiority of DR-GAN over the state of the art in both learning representations and rotating large-pose face images.

Journal ArticleDOI
TL;DR: A deep convolutional neural network approach is presented that provides a fully automated pipeline for face detection, tracking, and recognition of wild chimpanzees from long-term video records, and generates co-occurrence matrices to trace changes in the social network structure of an aging population.
Abstract: Video recording is now ubiquitous in the study of animal behavior, but its analysis on a large scale is prohibited by the time and resources needed to manually process large volumes of data. We present a deep convolutional neural network (CNN) approach that provides a fully automated pipeline for face detection, tracking, and recognition of wild chimpanzees from long-term video records. In a 14-year dataset yielding 10 million face images from 23 individuals over 50 hours of footage, we obtained an overall accuracy of 92.5% for identity recognition and 96.2% for sex recognition. Using the identified faces, we generated co-occurrence matrices to trace changes in the social network structure of an aging population. The tools we developed enable easy processing and annotation of video datasets, including those from other species. Such automated analysis unveils the future potential of large-scale longitudinal video archives to address fundamental questions in behavior and conservation.

Posted Content
TL;DR: These contributions include a lightweight feature extraction network inspired by, but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector (SSD), and an improved tie resolution strategy alternative to non-maximum suppression.
Abstract: We present BlazeFace, a lightweight and well-performing face detector tailored for mobile GPU inference It runs at a speed of 200-1000+ FPS on flagship devices This super-realtime performance enables it to be applied to any augmented reality pipeline that requires an accurate facial region of interest as an input for task-specific models, such as 2D/3D facial keypoint or geometry estimation, facial features or expression classification, and face region segmentation Our contributions include a lightweight feature extraction network inspired by, but distinct from MobileNetV1/V2, a GPU-friendly anchor scheme modified from Single Shot MultiBox Detector (SSD), and an improved tie resolution strategy alternative to non-maximum suppression

Journal ArticleDOI
02 Apr 2019
TL;DR: A novel face detector, deep pyramid single shot face detector (DPSSD), which is fast and detects faces with large scale variations (especially tiny faces), and a new loss function, called crystal loss, for the tasks of face verification and identification.
Abstract: The availability of large annotated datasets and affordable computation power have led to impressive improvements in the performance of convolutional neural networks (CNNs) on various face analysis tasks In this paper, we describe a deep learning pipeline for unconstrained face identification and verification which achieves state-of-the-art performance on several benchmark datasets We provide the design details of the various modules involved in automatic face recognition: face detection, landmark localization and alignment, and face identification/verification We propose a novel face detector, deep pyramid single shot face detector (DPSSD), which is fast and detects faces with large scale variations (especially tiny faces) Additionally, we propose a new loss function, called crystal loss, for the tasks of face verification and identification Crystal loss restricts the feature descriptors to lie on a hypersphere of a fixed radius, thus minimizing the angular distance between positive subject pairs and maximizing the angular distance between negative subject pairs We provide evaluation results of the proposed face detector on challenging unconstrained face detection datasets Then, we present experimental results for end-to-end face verification and identification on IARPA Janus Benchmarks A, B, and C (IJB-A, IJB-B, IJB-C), and the Janus Challenge Set 5 (CS5)

Journal ArticleDOI
17 Jul 2019
TL;DR: A novel single-shot face detector, named Selective Refinement Network (SRN), which introduces novel two-step classification and regression operations selectively into an anchor-based face detector to reduce false positives and improve location accuracy simultaneously.
Abstract: High performance face detection remains a very challenging problem, especially when there exists many tiny faces. This paper presents a novel single-shot face detector, named Selective Refinement Network (SRN), which introduces novel twostep classification and regression operations selectively into an anchor-based face detector to reduce false positives and improve location accuracy simultaneously. In particular, the SRN consists of two modules: the Selective Two-step Classification (STC) module and the Selective Two-step Regression (STR) module. The STC aims to filter out most simple negative anchors from low level detection layers to reduce the search space for the subsequent classifier, while the STR is designed to coarsely adjust the locations and sizes of anchors from high level detection layers to provide better initialization for the subsequent regressor. Moreover, we design a Receptive Field Enhancement (RFE) block to provide more diverse receptive field, which helps to better capture faces in some extreme poses. As a consequence, the proposed SRN detector achieves state-of-the-art performance on all the widely used face detection benchmarks, including AFW, PASCAL face, FDDB, and WIDER FACE datasets. Codes will be released to facilitate further studies on the face detection problem.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: A camera-based real-time face recognition system and an algorithm is built by developing programming on OpenCV, Haar Cascade, Eigenface, Fisher Face, LBPH, and Python.
Abstract: Face detection and picture or video recognition is a popular subject of research on biometrics. Face recognition in a real-time setting has an exciting area and a rapidly growing challenge. Framework for the use of face recognition application authentication. This proposes the PCA (Principal Component Analysis) facial recognition system. The key component analysis (PCA) is a statistical method under the broad heading of factor analysis. The aim of the PCA is to reduce the large amount of data storage to the size of the feature space that is required to represent the data economically. The wide 1-D pixel vector made of the 2-D face picture in compact main elements of the space function is designed for facial recognition by the PCA. This is called a projection of self-space. The proper space is determined with the identification of the covariance matrix’s own vectors, which are centered on a collection of fingerprint images. I build a camera-based real-time face recognition system and set an algorithm by developing programming on OpenCV, Haar Cascade, Eigenface, Fisher Face, LBPH, and Python.

Journal ArticleDOI
TL;DR: An algorithm for face detection and recognition based on convolution neural networks (CNN), which outperform the traditional techniques, is proposed and a smart classroom for the student’s attendance using face recognition has been proposed.
Abstract: Currently, data generated by smart devices connected through the Internet is increasing relentlessly. An effective and efficient paradigm is needed to deal with the bulk amount of data produced by the Internet of Things (IoT). Deep learning and edge computing are the emerging technologies, which are used for efficient processing of huge amount of data with distinct accuracy. In this world of advanced information systems, one of the major issues is authentication. Several techniques have been employed to solve this problem. Face recognition is considered as one of the most reliable solutions. Usually, for face recognition, scale-invariant feature transforms (SIFT) and speeded up robust features (SURF) have been used by the research community. This paper proposes an algorithm for face detection and recognition based on convolution neural networks (CNN), which outperform the traditional techniques. In order to validate the efficiency of the proposed algorithm, a smart classroom for the student's attendance using face recognition has been proposed. The face recognition system is trained on publically available labeled faces in the wild (LFW) dataset. The system can detect approximately 35 faces and recognizes 30 out of them from the single image of 40 students. The proposed system achieved 97.9% accuracy on the testing data. Moreover, generated data by smart classrooms is computed and transmitted through an IoT-based architecture using edge computing. A comparative performance study shows that our architecture outperforms in terms of data latency and real-time response.

Journal ArticleDOI
TL;DR: A robust edge detection algorithm using multiple threshold approaches (B-Edge) is proposed to cover both the limitations encountered in edge detection: edge connectivity and edge thickness.
Abstract: An edge detection is important for its reliability and security which delivers a better understanding of object recognition in the applications of computer vision, such as pedestrian detection, face detection, and video surveillance. This paper introduced two fundamental limitations encountered in edge detection: edge connectivity and edge thickness, those have been used by various developments in the state-of-the-art. An optimal selection of the threshold for effectual edge detection has constantly been a key challenge in computer vision. Therefore, a robust edge detection algorithm using multiple threshold approaches (B-Edge) is proposed to cover both the limitations. The majorly used canny edge operator focuses on two thresholds selections and still witnesses a few gaps for optimal results. To handle the loopholes of the canny edge operator, our method selects the simulated triple thresholds that target to the prime issues of the edge detection: image contrast, effective edge pixels selection, errors handling, and similarity to the ground truth. The qualitative and quantitative experimental evaluations demonstrate that our edge detection method outperforms competing algorithms for mentioned issues. The proposed approach endeavors an improvement for both grayscale and colored images.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: A binary face classifier which can detect any face present in the frame irrespective of its alignment is designed, which has shown great results in recognizing non-frontal faces and multiple facial masks in a single frame.
Abstract: Face Detection has evolved as a very popular problem in Image processing and Computer Vision. Many new algorithms are being devised using convolutional architectures to make the algorithm as accurate as possible. These convolutional architectures have made it possible to extract even the pixel details. We aim to design a binary face classifier which can detect any face present in the frame irrespective of its alignment. We present a method to generate accurate face segmentation masks from any arbitrary size input image. Beginning from the RGB image of any size, the method uses Predefined Training Weights of VGG – 16 Architecture for feature extraction. Training is performed through Fully Convolutional Networks to semantically segment out the faces present in that image. Gradient Descent is used for training while Binomial Cross Entropy is used as a loss function. Further the output image from the FCN is processed to remove the unwanted noise and avoid the false predictions if any and make bounding box around the faces. Furthermore, proposed model has also shown great results in recognizing non-frontal faces. Along with this it is also able to detect multiple facial masks in a single frame. Experiments were performed on Multi Parsing Human Dataset obtaining mean pixel level accuracy of 93.884 % for the segmented face masks.

Journal ArticleDOI
TL;DR: A different scales face detector (DSFD) based on Faster R-CNN is proposed that achieves promising performance on popular benchmarks including FDDB, AFW, PASCAL faces, and WIDER FACE.
Abstract: In recent years, the application of deep learning based on deep convolutional neural networks has gained great success in face detection. However, one of the remaining open challenges is the detection of small-scaled faces. The depth of the convolutional network can cause the projected feature map for small faces to be quickly shrunk, and most detection approaches with scale invariant can hardly handle less than $15\times 15$ pixel faces. To solve this problem, we propose a different scales face detector (DSFD) based on Faster R-CNN. The new network can improve the precision of face detection while performing as real-time a Faster R-CNN. First, an efficient multitask region proposal network (RPN), combined with boosting face detection, is developed to obtain the human face ROI. Setting the ROI as a constraint, an anchor is inhomogeneously produced on the top feature map by the multitask RPN. A human face proposal is extracted through the anchor combined with facial landmarks. Then, a parallel-type Fast R-CNN network is proposed based on the proposal scale. According to the different percentages they cover on the images, the proposals are assigned to three corresponding Fast R-CNN networks. The three networks are separated through the proposal scales and differ from each other in the weight of feature map concatenation. A variety of strategies is introduced in our face detection network, including multitask learning, feature pyramid, and feature concatenation. Compared to state-of-the-art face detection methods such as UnitBox, HyperFace, FastCNN, the proposed DSFD method achieves promising performance on popular benchmarks including FDDB, AFW, PASCAL faces, and WIDER FACE.

Proceedings ArticleDOI
01 Feb 2019
TL;DR: This paper surveys the various concepts of support vector machines, some of its real life applications and future aspects of SVM.
Abstract: The best way to acquire knowledge about an algorithm is feeding it data and checking the result. In a layman's language machine learning can be called as an ideological child or evolution of the idea of understanding algorithm through data. Machine learning can be subdivided into two paradigms, supervised learning and unsupervised learning. Supervised learning is implemented to classify data using algorithms like support vector machines (SVM), linear regression, logistic regression, neural networks, nearest neighbor etc. Supervised learning algorithm uses the concepts of classification and regression. Linear classification was earlier used to form the decision plane but was bidimensional. But a particular dataset might have required a non linear decision plane. This gave the idea of the support vector machine algorithm which can be used to generate a non linear decision boundary using the kernel function. SVM is a vast concept and can be implemented on various real world problems like face detection, handwriting detection and many more. This paper surveys the various concepts of support vector machines, some of its real life applications and future aspects of SVM.

Journal ArticleDOI
TL;DR: This paper proposes the first, to the best of the knowledge, joint multi-view convolutional network to handle large pose variations across faces in-the-wild, and elegantly bridge face detection and facial landmark localization tasks.
Abstract: The de facto algorithm for facial landmark estimation involves running a face detector with a subsequent deformable model fitting on the bounding box. This encompasses two basic problems: 1) the detection and deformable fitting steps are performed independently, while the detector might not provide the best-suited initialization for the fitting step, and 2) the face appearance varies hugely across different poses, which makes the deformable face fitting very challenging and thus distinct models have to be used (e.g., one for profile and one for frontal faces). In this paper, we propose the first, to the best of our knowledge, joint multi-view convolutional network to handle large pose variations across faces in-the-wild, and elegantly bridge face detection and facial landmark localization tasks. The existing joint face detection and landmark localization methods focus only on a very small set of landmarks. By contrast, our method can detect and align a large number of landmarks for semi-frontal (68 landmarks) and profile (39 landmarks) faces. We evaluate our model on a plethora of datasets including the standard static image datasets such as IBUG, 300W, COFW, and the latest Menpo Benchmark for both semi-frontal and profile faces. A significant improvement over the state-of-the-art methods on deformable face tracking is witnessed on the 300VW benchmark. We also demonstrate state-of-the-art results for face detection on FDDB and MALF datasets.

Journal ArticleDOI
TL;DR: The framework reliably performed object detection and classification on the data, comprising of 21,600 video streams and 175 GB in size, in 6.52 hours, thus making it at least twice as fast than the cloud deployment without GPUs.
Abstract: Object detection and classification are the basic tasks in video analytics and become the starting point for other complex applications. Traditional video analytics approaches are manual and time consuming. These are subjective due to the very involvement of human factor. We present a cloud based video analytics framework for scalable and robust analysis of video streams. The framework empowers an operator by automating the object detection and classification process from recorded video streams. An operator only specifies an analysis criteria and duration of video streams to analyse. The streams are then fetched from a cloud storage, decoded and analysed on the cloud. The framework executes compute intensive parts of the analysis to GPU powered servers in the cloud. Vehicle and face detection are presented as two case studies for evaluating the framework, with one month of data and a 15 node cloud. The framework reliably performed object detection and classification on the data, comprising of 21,600 video streams and 175 GB in size, in 6.52 hours. The GPU enabled deployment of the framework took 3 hours to perform analysis on the same number of video streams, thus making it at least twice as fast than the cloud deployment without GPUs.

Proceedings ArticleDOI
01 Jan 2019
TL;DR: A model for implementing an automated attendance management system for students of a class by making use of face recognition technique, by using Eigenface values, Principle Component Analysis (PCA) and Convolutional Neural Network (CNN) will be a successful technique to manage the attendance and records of students.
Abstract: The management of the attendance can be a great burden on the teachers if it is done by hand. To resolve this problem, smart and auto attendance management system is being utilized. But authentication is an important issue in this system. The smart attendance system is generally executed with the help of biometrics. Face recognition is one of the biometric methods to improve this system. Being a prime feature of biometric verification, facial recognition is being used enormously in several such applications, like video monitoring and CCTV footage system, an interaction between computer & humans and access systems present indoors and network security. By utilizing this framework, the problem of proxies and students being marked present even though they are not physically present can easily be solved. The main implementation steps used in this type of system are face detection and recognizing the detected face.This paper proposes a model for implementing an automated attendance management system for students of a class by making use of face recognition technique, by using Eigenface values, Principle Component Analysis (PCA) and Convolutional Neural Network (CNN). After these, the connection of recognized faces ought to be conceivable by comparing with the database containing student's faces. This model will be a successful technique to manage the attendance and records of students.

Journal ArticleDOI
TL;DR: A clear difficulty in translating the high facial expression recognition (FER) accuracy in controlled environments to uncontrolled and pose-variant environments is identified, and the future efforts in the FER field should be put into multimodal systems that are robust enough to face the adversities of real world scenarios.
Abstract: Emotion recognition has attracted major attention in numerous fields because of its relevant applications in the contemporary world: marketing, psychology, surveillance, and entertainment are some examples. It is possible to recognize an emotion through several ways; however, this paper focuses on facial expressions, presenting a systematic review on the matter. In addition, 112 papers published in ACM, IEEE, BASE and Springer between January 2006 and April 2019 regarding this topic were extensively reviewed. Their most used methods and algorithms will be firstly introduced and summarized for a better understanding, such as face detection, smoothing, Principal Component Analysis (PCA), Local Binary Patterns (LBP), Optical Flow (OF), Gabor filters, among others. This review identified a clear difficulty in translating the high facial expression recognition (FER) accuracy in controlled environments to uncontrolled and pose-variant environments. The future efforts in the FER field should be put into multimodal systems that are robust enough to face the adversities of real world scenarios. A thorough analysis on the research done on FER in Computer Vision based on the selected papers is presented. This review aims to not only become a reference for future research on emotion recognition, but also to provide an overview of the work done in this topic for potential readers.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: The primary concern to this work is about facial masks, and especially to enhance the recognition accuracy of different masked faces, and a feasible approach has been proposed that consists of first detecting the facial regions.
Abstract: Recognition from faces is a popular and significant technology in recent years. Face alterations and the presence of different masks make it too much challenging. In the real-world, when a person is uncooperative with the systems such as in video surveillance then masking is further common scenarios. For these masks, current face recognition performance degrades. An abundant number of researches work has been performed for recognizing faces under different conditions like changing pose or illumination, degraded images, etc. Still, difficulties created by masks are usually disregarded. The primary concern to this work is about facial masks, and especially to enhance the recognition accuracy of different masked faces. A feasible approach has been proposed that consists of first detecting the facial regions. The occluded face detection problem has been approached using Multi-Task Cascaded Convolutional Neural Network (MTCNN). Then facial features extraction is performed using the Google FaceNet embedding model. And finally, the classification task has been performed by Support Vector Machine (SVM). Experiments signify that this mentioned approach gives a remarkable performance on masked face recognition. Besides, its performance has been also evaluated within excessive facial masks and found attractive outcomes. Finally, a correlative study also made here for a better understanding.

Journal ArticleDOI
Jingchun Cheng1, Yali Li1, Jilong Wang1, Le Yu2, Shengjin Wang1 
TL;DR: It is shown that the classification accuracy of the proposed majority voting procedure for gender classification is comparable with that of state-of-the-art systems, which validates the effectiveness of this proposed method.

Proceedings ArticleDOI
02 Apr 2019
TL;DR: An automatic system of face expression recognition which is able to recognize all eight basic facial expressions which are normal, happy, angry, contempt, surprise, sad, fear and disgust is presented.
Abstract: Facial Expression Recognition (FER) has been an active topic of papers that were researched during 1990s till now, according to its importance, FER has achieved an extremely role in image processing area. FER typically performed in three stages include, face detection, feature extraction and classification. This paper presents an automatic system of face expression recognition which is able to recognize all eight basic facial expressions which are (normal, happy, angry, contempt, surprise, sad, fear and disgust) while many FER systems were proposed for recognizing only some of face expressions. For validating the method, the Extended Cohn-Kanade (CK+) dataset is used. The presented method uses Viola-Jones algorithm for face detection. Histogram of Oriented Gradients (HOG) is used as a descriptor for feature extraction from the images of expressive faces. Principal Component Analysis (PCA) applied to reduce dimensionality of the Features, to obtaining the most significant features. Finally, the presented method used three different classifiers which are Support Vector Machine (SVM), K-Nearest Neighbor (KNN) and Multilayer Perceptron Neural Network (MLPNN) for classifying the facial expressions and the results of them are compared. The experimental results show that the presented method provides the recognition rate with 93.53% when using SVM classifier, 82.97% when using MLP classifier and 79.97% when using KNN classifier which refers that the presented method provides better results while using SVM as a classifier.

Posted Content
TL;DR: A Light and Fast Face Detector (LFFD) for edge devices is introduced, which is anchor-free and belongs to the one-stage category, and rethink the importance of receptive field (RF) and effective receptiveField (ERF) in the background of face detection.
Abstract: Face detection, as a fundamental technology for various applications, is always deployed on edge devices which have limited memory storage and low computing power. This paper introduces a Light and Fast Face Detector (LFFD) for edge devices. The proposed method is anchor-free and belongs to the one-stage category. Specifically, we rethink the importance of receptive field (RF) and effective receptive field (ERF) in the background of face detection. Essentially, the RFs of neurons in a certain layer are distributed regularly in the input image and theses RFs are natural "anchors". Combining RF "anchors" and appropriate RF strides, the proposed method can detect a large range of continuous face scales with 100% coverage in theory. The insightful understanding of relations between ERF and face scales motivates an efficient backbone for one-stage detection. The backbone is characterized by eight detection branches and common layers, resulting in efficient computation. Comprehensive and extensive experiments on popular benchmarks: WIDER FACE and FDDB are conducted. A new evaluation schema is proposed for application-oriented scenarios. Under the new schema, the proposed method can achieve superior accuracy (WIDER FACE Val/Test -- Easy: 0.910/0.896, Medium: 0.881/0.865, Hard: 0.780/0.770; FDDB -- discontinuous: 0.973, continuous: 0.724). Multiple hardware platforms are introduced to evaluate the running efficiency. The proposed method can obtain fast inference speed (NVIDIA TITAN Xp: 131.45 FPS at 640x480; NVIDIA TX2: 136.99 PFS at 160x120; Raspberry Pi 3 Model B+: 8.44 FPS at 160x120) with model size of 9 MB.

Journal ArticleDOI
TL;DR: Face-SSD is the first network to perform face analysis without relying on pre-processing such as face detection and registration in advance and achieves real-time performance even when detecting multiple faces and recognising multiple classes in a given image.