scispace - formally typeset
Search or ask a question
Journal ArticleDOI

3D texture-based face recognition system using fine-tuned deep residual networks.

TL;DR: In this experiment, the end-to-end face recognition system based on 3D face texture is proposed, combining the geometric invariants, histogram of oriented gradients and the fine-tuned residual neural networks, which costs less than traditional methods.
Abstract: As the technology for 3D photography has developed rapidly in recent years, an enormous amount of 3D images has been produced, one of the directions of research for which is face recognition. Improving the accuracy of a number of data is crucial in 3D face recognition problems. Traditional machine learning methods can be used to recognize 3D faces, but the face recognition rate has declined rapidly with the increasing number of 3D images. As a result, classifying large amounts of 3D image data is time-consuming, expensive, and inefficient. The deep learning methods have become the focus of attention in the 3D face recognition research. In our experiment, the end-to-end face recognition system based on 3D face texture is proposed, combining the geometric invariants, histogram of oriented gradients and the fine-tuned residual neural networks. The research shows that when the performance is evaluated by the FRGC-v2 dataset, as the fine-tuned ResNet deep neural network layers are increased, the best Top-1 accuracy is up to 98.26% and the Top-2 accuracy is 99.40%. The framework proposed costs less iterations than traditional methods. The analysis suggests that a large number of 3D face data by the proposed recognition framework could significantly improve recognition decisions in realistic 3D face scenarios.
Citations
More filters
Journal ArticleDOI
01 Jan 2021-PeerJ
TL;DR: Zhang et al. as discussed by the authors employed a unified attention and context mapping (ACM) block within the convolutional layers of network, without any additional computational resources overhead to learn the attention with reference to their spatial locations in context of the whole image.
Abstract: The discriminative parts of people's appearance play a significant role in their re-identification across non overlapping camera views. However, just focusing on the discriminative or attention regions without catering the contextual information does not always help. It is more important to learn the attention with reference to their spatial locations in context of the whole image. Current person re-identification (re-id) approaches either use separate modules or classifiers to learn both of these; the attention and its context, resulting in highly expensive person re-id solutions. In this work, instead of handling attentions and the context separately, we employ a unified attention and context mapping (ACM) block within the convolutional layers of network, without any additional computational resources overhead. The ACM block captures the attention regions as well as the relevant contextual information in a stochastic manner and enriches the final person representations for robust person re-identification. We evaluate the proposed method on 04 public benchmarks of person re-identification i.e., Market1501, DukeMTMC-Reid, CUHK03 and MSMT17 and find that the ACM block consistently improves the performance of person re-identification over the baseline networks.

8 citations

Book ChapterDOI
07 Oct 2020
TL;DR: In this article, 3D face recognition was performed with different facial expressions and occlusions based on the data of 105 people using Bosphorus database and 3D point clouds.
Abstract: With developing technology and urbanization, smart city applications have increased. Accordingly, this development brought some difficulties such as public security risk. Identifying people’s identities is a requirement in both smart city challenges and smart environment or smart interaction difficulties. Face recognition has a huge potential for people’s identification. It was possible to perform face recognition applications in larger databases and different situations with the development of deep learning methods. 2D images are usually used for face recognition applications. However, different challenges such as pose change and illumination cause difficulties in 2D facial recognition applications. Laser scanning technology has provided the production of 3D point clouds, including the geometric information of the faces. When the point clouds are combined with deep learning techniques, 3D face recognition has great potential. In the study, 2D images were created for facial recognition using feature maps obtained from 3D point clouds. ResNet-18, ResNet-50 and ResNet-101 architectures, which are different versions of ResNet architecture, were used for classification purposes. Bosphorus database was used in the study. 3D Face recognition was performed with different facial expressions and occlusions based on the data of 105 people. As a result of the study, overall accuracy was obtained with ResNet-18, ResNet-50, and ResNet-101 architectures at 77.36%, 77.03% and 81.54% respectively.

5 citations

Book ChapterDOI
TL;DR: In this paper , the authors presented an approach to customize MobileNet architecture and automatically find a good architecture variant for the 3D face recognition task, which is based on the split input and lengthening the network by the layer replication.
Abstract: Facial recognition is one of the problems that has been focused for a long time. In this paper, we consider the 3D face data set, and explore its facial recognition task considering automatic architecture finding. We present the approach to customize MobileNet architecture and automatically find a good architecture variant for the 3D face recognition task. The main concept is based on the split input and lengthening the network by the layer replication. The evaluation is done by using the dataset generated by the GAN model with style transfer to augment the makeup faces. The results show that the found modified model from our automatic finding approach yields the more cost-effective model, i.e., with a 0.005% increase in size compared to baseline 3D Mobilenet and 0.01% compared to a simple Mobilenet while the found model has 12% more accuracy compared to the 3D MobileNetV2 and 11% compared to the traditional MobileNetV2.

1 citations

DOI
01 Jan 2019
TL;DR: A kind of Gabor wavelet feature extraction method based on three-dimensional human face contour line is put forward that has higher recognition rate than several traditional schemes.
Abstract: Face recognition based on 2D face images is still challenged by the large change of illumination, pose and expression after more than 10 years’ research and its recognition rate is still far away from satisfaction under the change of the above three factors. Gabor wavelet and support vector machine (SVM) are studied. Gabor wavelet has good biological neurons function, and good adaptive change to illumination. Furthermore, SVM can provide good classification effect. Gabor wavelet and SVM are combined with PCAto improve recognition rate of human face. Besides, we put forward a kind of Gabor wavelet feature extraction method based on three-dimensional human face contour line. Five different types of human face recognition are used to verify the effectiveness of this algorithm. The experiment results show that the proposed scheme has higher recognition rate than several traditional schemes.

1 citations

Journal ArticleDOI
27 May 2021-PeerJ
TL;DR: In this article, the authors proposed a distribution-preserving data augmentation method that creates plausible image variations by shifting pixel colors to another point in the image color distribution, and achieved this by defining a regularized density decreasing direction to create paths from the original pixels' color to the distribution tails.
Abstract: In the last decade, deep learning has been applied in a wide range of problems with tremendous success. This success mainly comes from large data availability, increased computational power, and theoretical improvements in the training phase. As the dataset grows, the real world is better represented, making it possible to develop a model that can generalize. However, creating a labeled dataset is expensive, time-consuming, and sometimes not likely in some domains if not challenging. Therefore, researchers proposed data augmentation methods to increase dataset size and variety by creating variations of the existing data. For image data, variations can be obtained by applying color or spatial transformations, only one or a combination. Such color transformations perform some linear or nonlinear operations in the entire image or in the patches to create variations of the original image. The current color-based augmentation methods are usually based on image processing methods that apply color transformations such as equalizing, solarizing, and posterizing. Nevertheless, these color-based data augmentation methods do not guarantee to create plausible variations of the image. This paper proposes a novel distribution-preserving data augmentation method that creates plausible image variations by shifting pixel colors to another point in the image color distribution. We achieved this by defining a regularized density decreasing direction to create paths from the original pixels' color to the distribution tails. The proposed method provides superior performance compared to existing data augmentation methods which is shown using a transfer learning scenario on the UC Merced Land-use, Intel Image Classification, and Oxford-IIIT Pet datasets for classification and segmentation tasks.

1 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Abstract: We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

40,257 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

Journal ArticleDOI
TL;DR: This paper demonstrates how constraints from the task domain can be integrated into a backpropagation network through the architecture of the network, successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service.
Abstract: The ability of learning networks to generalize can be greatly enhanced by providing constraints from the task domain. This paper demonstrates how such constraints can be integrated into a backpropagation network through the architecture of the network. This approach has been successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service. A single network learns the entire recognition operation, going from the normalized image of the character to the final classification.

9,775 citations


"3D texture-based face recognition s..." refers methods in this paper

  • ...Her CNN’s layer configuration uses the same principle to design based on the LeCun model (LeCun et al., 1989)....

    [...]

Posted Content
TL;DR: This paper quantifies the generality versus specificity of neurons in each layer of a deep convolutional neural network and reports a few surprising results, including that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset.
Abstract: Many deep neural networks trained on natural images exhibit a curious phenomenon in common: on the first layer they learn features similar to Gabor filters and color blobs. Such first-layer features appear not to be specific to a particular dataset or task, but general in that they are applicable to many datasets and tasks. Features must eventually transition from general to specific by the last layer of the network, but this transition has not been studied extensively. In this paper we experimentally quantify the generality versus specificity of neurons in each layer of a deep convolutional neural network and report a few surprising results. Transferability is negatively affected by two distinct issues: (1) the specialization of higher layer neurons to their original task at the expense of performance on the target task, which was expected, and (2) optimization difficulties related to splitting networks between co-adapted neurons, which was not expected. In an example network trained on ImageNet, we demonstrate that either of these two issues may dominate, depending on whether features are transferred from the bottom, middle, or top of the network. We also document that the transferability of features decreases as the distance between the base task and target task increases, but that transferring features even from distant tasks can be better than using random features. A final surprising result is that initializing a network with transferred features from almost any number of layers can produce a boost to generalization that lingers even after fine-tuning to the target dataset.

4,663 citations