On RGB-D face recognition using Kinect
01 Sep 2013-pp 1-6
TL;DR: The experimental results indicate that the RGB-D information obtained by Kinect can be used to achieve improved face recognition performance compared to existing 2D and 3D approaches.
Abstract: Face recognition algorithms generally use 2D images for feature extraction and matching. In order to achieve better performance, 3D faces captured via specialized acquisition methods have been used to develop improved algorithms. While such 3D images remain difficult to obtain due to several issues such as cost and accessibility, RGB-D images captured by low cost sensors (e.g. Kinect) are comparatively easier to acquire. This research introduces a novel face recognition algorithm for RGB-D images. The proposed algorithm computes a descriptor based on the entropy of RGB-D faces along with the saliency feature obtained from a 2D face. The probe RGB-D descriptor is used as input to a random decision forest classifier to establish the identity. This research also presents a novel RGB-D face database pertaining to 106 individuals. The experimental results indicate that the RGB-D information obtained by Kinect can be used to achieve improved face recognition performance compared to existing 2D and 3D approaches.
Citations
More filters
[...]
TL;DR: Major deep learning concepts pertinent to face image analysis and face recognition are reviewed, and a concise overview of studies on specific face recognition problems is provided, such as handling variations in pose, age, illumination, expression, and heterogeneous face matching.
Abstract: Deep learning, in particular the deep convolutional neural networks, has received increasing interests in face recognition recently, and a number of deep learning methods have been proposed. This paper summarizes about 330 contributions in this area. It reviews major deep learning concepts pertinent to face image analysis and face recognition, and provides a concise overview of studies on specific face recognition problems, such as handling variations in pose, age, illumination, expression, and heterogeneous face matching. A summary of databases used for deep face recognition is given as well. Finally, some open challenges and directions are discussed for future research.
128 citations
[...]
TL;DR: The experimental results indicate that the proposed algorithm achieves high face recognition accuracy on RGB-D images obtained using Kinect compared with existing 2D and 3D approaches.
Abstract: Face recognition algorithms generally utilize 2D images for feature extraction and matching. To achieve higher resilience toward covariates, such as expression, illumination, and pose, 3D face recognition algorithms are developed. While it is challenging to use specialized 3D sensors due to high cost, RGB-D images can be captured by low-cost sensors such as Kinect. This research introduces a novel face recognition algorithm using RGB-D images. The proposed algorithm computes a descriptor based on the entropy of RGB-D faces along with the saliency feature obtained from a 2D face. Geometric facial attributes are also extracted from the depth image and face recognition is performed by fusing both the descriptor and attribute match scores. The experimental results indicate that the proposed algorithm achieves high face recognition accuracy on RGB-D images obtained using Kinect compared with existing 2D and 3D approaches.
72 citations
Cites methods from "On RGB-D face recognition using Kin..."
[...]
[...]
TL;DR: The system uses a Microsoft Kinect sensor as a wearable device, performs face detection, and uses temporal coherence along with a simple biometric procedure to generate a sound associated with the identified person, virtualized at his/her estimated 3-D location.
Abstract: In this paper, we introduce a real-time face recognition (and announcement) system targeted at aiding the blind and low-vision people. The system uses a Microsoft Kinect sensor as a wearable device, performs face detection, and uses temporal coherence along with a simple biometric procedure to generate a sound associated with the identified person, virtualized at his/her estimated 3-D location. Our approach uses a variation of the K-nearest neighbors algorithm over histogram of oriented gradient descriptors dimensionally reduced by principal component analysis. The results show that our approach, on average, outperforms traditional face recognition methods while requiring much less computational resources (memory, processing power, and battery life) when compared with existing techniques in the literature, deeming it suitable for the wearable hardware constraints. We also show the performance of the system in the dark, using depth-only information acquired with Kinect's infrared camera. The validation uses a new dataset available for download, with 600 videos of 30 people, containing variation of illumination, background, and movement patterns. Experiments with existing datasets in the literature are also considered. Finally, we conducted user experience evaluations on both blindfolded and visually impaired users, showing encouraging results.
58 citations
Cites background from "On RGB-D face recognition using Kin..."
[...]
[...]
TL;DR: The experimental results demonstrate that NIRFaceNet has an overall advantage compared to other methods in the NIR face recognition domain when image blur and noise are present, and suggests that the proposed N IRFaceNet method may be more suitable for non-cooperative-user applications.
Abstract: Near-infrared (NIR) face recognition has attracted increasing attention because of its advantage of illumination invariance However, traditional face recognition methods based on NIR are designed for and tested in cooperative-user applications In this paper, we present a convolutional neural network (CNN) for NIR face recognition (specifically face identification) in non-cooperative-user applications The proposed NIRFaceNet is modified from GoogLeNet, but has a more compact structure designed specifically for the Chinese Academy of Sciences Institute of Automation (CASIA) NIR database and can achieve higher identification rates with less training time and less processing time The experimental results demonstrate that NIRFaceNet has an overall advantage compared to other methods in the NIR face recognition domain when image blur and noise are present The performance suggests that the proposed NIRFaceNet method may be more suitable for non-cooperative-user applications
52 citations
[...]
TL;DR: A survey of wearable/assistive devices and provides a critical presentation of each system, while emphasizing related strengths and limitations, to inform the research community and the VI people about the capabilities of existing systems, the progress in assistive technologies and provide a glimpse in the possible short/medium term axes of research that can improve existing devices.
Abstract: Recent statistics of the World Health Organization (WHO), published in October 2017, estimate that more than 253 million people worldwide suffer from visual impairment (VI) with 36 million of blinds and 217 million people with low vision. In the last decade, there was a tremendous amount of work in developing wearable assistive devices dedicated to the visually impaired people, aiming at increasing the user cognition when navigating in known/unknown, indoor/outdoor environments, and designed to improve the VI quality of life. This paper presents a survey of wearable/assistive devices and provides a critical presentation of each system, while emphasizing related strengths and limitations. The paper is designed to inform the research community and the VI people about the capabilities of existing systems, the progress in assistive technologies and provide a glimpse in the possible short/medium term axes of research that can improve existing devices. The survey is based on various features and performance parameters, established with the help of the blind community that allows systems classification using both qualitative and quantitative measu.res of evaluation. This makes it possible to rank the analyzed systems based on their potential impact on the VI people life.
42 citations
References
More filters
[...]
TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Abstract: We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.
28,803 citations
[...]
TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Abstract: This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.
17,417 citations
[...]
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
15,597 citations
"On RGB-D face recognition using Kin..." refers methods in this paper
[...]
[...]
[...]
[...]
TL;DR: In this article, a visual attention system inspired by the behavior and the neuronal architecture of the early primate visual system is presented, where multiscale image features are combined into a single topographical saliency map.
Abstract: A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail.
9,639 citations
[...]
TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.
Abstract: We describe an object detection system based on mixtures of multiscale deformable part models. Our system is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges. While deformable part models have become quite popular, their value had not been demonstrated on difficult benchmarks such as the PASCAL data sets. Our system relies on new methods for discriminative training with partially labeled data. We combine a margin-sensitive approach for data-mining hard negative examples with a formalism we call latent SVM. A latent SVM is a reformulation of MI--SVM in terms of latent variables. A latent SVM is semiconvex, and the training problem becomes convex once latent information is specified for the positive examples. This leads to an iterative training algorithm that alternates between fixing latent values for positive examples and optimizing the latent SVM objective function.
9,553 citations
"On RGB-D face recognition using Kin..." refers methods in this paper
[...]
Related Papers (5)
[...]
[...]
[...]
[...]