scispace - formally typeset

Proceedings ArticleDOI

Triplet Transform Learning for Automated Primate Face Recognition

01 Sep 2019-pp 3462-3466

TL;DR: A novel Triplet Transform Learning (TTL) model for learning discriminative representations of primate faces is proposed, where it outperforms the existing approaches and attains state-of-the-art performance on the primates database.

AbstractAutomated primate face recognition has enormous potential in effective conservation of species facing endangerment or extinction. The task is characterized by lack of training data, low inter-class variations, and large intra-class differences. Owing to the challenging nature of the problem, limited research has been performed to automate the process of primate face recognition. In this research, we propose a novel Triplet Transform Learning (TTL) model for learning discriminative representations of primate faces. The proposed model reduces the intra-class variations and increases the inter-class variations to obtain robust sparse representations for the primate faces. It is utilized to present a novel framework for primate face recognition, which is evaluated on the primate dataset, comprising of 80 identities including monkeys, gorillas, and chimpanzees. Experimental results demonstrate the efficacy of the proposed approach, where it outperforms the existing approaches and attains state-of-the-art performance on the primates database.

...read more


References
More filters
Proceedings ArticleDOI
07 Jun 2015
TL;DR: A system that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure offace similarity, and achieves state-of-the-art face recognition performance using only 128-bytes perface.
Abstract: Despite significant recent advances in the field of face recognition [10, 14, 15, 17], implementing face verification and recognition efficiently at scale presents serious challenges to current approaches. In this paper we present a system, called FaceNet, that directly learns a mapping from face images to a compact Euclidean space where distances directly correspond to a measure of face similarity. Once this space has been produced, tasks such as face recognition, verification and clustering can be easily implemented using standard techniques with FaceNet embeddings as feature vectors.

8,289 citations


"Triplet Transform Learning for Auto..." refers methods in this paper

  • ...In literature, the triplet loss framework [12] has been well explored as an effective technique to augment the quantity of training data, while introducing separability between different classes....

    [...]

Proceedings ArticleDOI
01 Jan 2015
TL;DR: It is shown how a very large scale dataset can be assembled by a combination of automation and human in the loop, and the trade off between data purity and time is discussed.
Abstract: The goal of this paper is face recognition – from either a single photograph or from a set of faces tracked in a video. Recent progress in this area has been due to two factors: (i) end to end learning for the task using a convolutional neural network (CNN), and (ii) the availability of very large scale training datasets. We make two contributions: first, we show how a very large scale dataset (2.6M images, over 2.6K people) can be assembled by a combination of automation and human in the loop, and discuss the trade off between data purity and time; second, we traverse through the complexities of deep network training and face recognition to present methods and procedures to achieve comparable state of the art results on the standard LFW and YTF face benchmarks.

4,347 citations


"Triplet Transform Learning for Auto..." refers methods in this paper

  • ...3, the proposed framework utilizes the TTL model with features extracted from the VGG-Face [15] and Light-CNN (LCNN) [16] architectures....

    [...]

  • ...[9], where, results have been reported using Linear Discriminant Analysis (LDA) [18], VGG-Face [15], and Fine-tuned VGG-Face....

    [...]

Journal ArticleDOI
TL;DR: The discriminatory power of various human facial features is studied and a new scheme for Automatic Face Recognition (AFR) is proposed and an efficient projection-based feature extraction and classification scheme for AFR is proposed.
Abstract: In this paper the discriminatory power of various human facial features is studied and a new scheme for Automatic Face Recognition (AFR) is proposed. Using Linear Discriminant Analysis (LDA) of different aspects of human faces in spatial domain, we first evaluate the significance of visual information in different parts/features of the face for identifying the human subject. The LDA of faces also provides us with a small set of features that carry the most relevant information for classification purposes. The features are obtained through eigenvector analysis of scatter matrices with the objective of maximizing between-class and minimizing within-class variations. The result is an efficient projection-based feature extraction and classification scheme for AFR. Soft decisions made based on each of the projections are combined, using probabilistic or evidential approaches to multisource data analysis. For medium-sized databases of human faces, good classification accuracy is achieved using very low-dimensional feature vectors.

874 citations

Journal ArticleDOI
TL;DR: Experimental results show that the proposed framework can utilize large-scale noisy data to learn a Light model that is efficient in computational costs and storage spaces and achieves state-of-the-art results on various face benchmarks without fine-tuning.
Abstract: The volume of convolutional neural network (CNN) models proposed for face recognition has been continuously growing larger to better fit the large amount of training data. When training data are obtained from the Internet, the labels are likely to be ambiguous and inaccurate. This paper presents a Light CNN framework to learn a compact embedding on the large-scale face data with massive noisy labels. First, we introduce a variation of maxout activation, called max-feature-map (MFM), into each convolutional layer of CNN. Different from maxout activation that uses many feature maps to linearly approximate an arbitrary convex activation function, MFM does so via a competitive relationship. MFM can not only separate noisy and informative signals but also play the role of feature selection between two feature maps. Second, three networks are carefully designed to obtain better performance, meanwhile, reducing the number of parameters and computational costs. Finally, a semantic bootstrapping method is proposed to make the prediction of the networks more consistent with noisy labels. Experimental results show that the proposed framework can utilize large-scale noisy data to learn a Light model that is efficient in computational costs and storage spaces. The learned single network with a 256-D representation achieves state-of-the-art results on various face benchmarks without fine-tuning.

617 citations


"Triplet Transform Learning for Auto..." refers methods in this paper

  • ...In the proposed framework, two separate transforms are learned for the VGG-Face and LCNN features, respectively....

    [...]

  • ...3, the proposed framework utilizes the TTL model with features extracted from the VGG-Face [15] and Light-CNN (LCNN) [16] architectures....

    [...]

  • ...Analysis of the Proposed Framework: The effectiveness of the TTL model can be observed by comparing the perfor- mance of VGG-Face (LCNN)+Cosine distance with VGGFace (LCNN)+TTL+Cosine distance (Table 1)....

    [...]

  • ...Further, the proposed framework demonstrates an improvement of around 6% as compared to independent VGGFace/LCNN+TTL+Cosine distance performance, thereby strengthening the inclusion of score-level fusion in the recognition pipeline....

    [...]

  • ...VGG-Face and LCNN are recent CNN architectures which have shown to perform well for face recognition, having the penultimate layer dimension as 4096 and 256, respectively....

    [...]