scispace - formally typeset
Search or ask a question
Author

Xi Yin

Bio: Xi Yin is an academic researcher. The author has contributed to research in topics: Deep learning & Neocognitron. The author has an hindex of 1, co-authored 1 publications receiving 17 citations.

Papers
More filters
Posted Content
15 Feb 2017
TL;DR: This article proposed a pose-directed multi-task CNN by grouping different poses to learn pose-specific identity features, simultaneously across all poses, and observed that the side tasks serve as regularizations to disentangle the variations from the learnt identity features.
Abstract: This paper explores multi-task learning (MTL) for face recognition We answer the questions of how and why MTL can improve the face recognition performance First, we propose a multi-task Convolutional Neural Network (CNN) for face recognition where identity recognition is the main task and pose, illumination, and expression estimations are the side tasks Second, we develop a dynamic-weighting scheme to automatically assign the loss weight to each side task Third, we propose a pose-directed multi-task CNN by grouping different poses to learn pose-specific identity features, simultaneously across all poses We observe that the side tasks serve as regularizations to disentangle the variations from the learnt identity features Extensive experiments on the entire Multi-PIE dataset demonstrate the effectiveness of the proposed approach To the best of our knowledge, this is the first work using all data in Multi-PIE for face recognition Our approach is also applicable to in-the-wild datasets for pose-invariant face recognition and we achieve comparable or better performance than state of the art on LFW, CFP, and IJB-A

17 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A Disentangled Representation learning-Generative Adversarial Network (DR-GAN) with three distinct novelties that demonstrate the superiority of DR-GAN over the state of the art in both learning representations and rotating large-pose face images.
Abstract: The large pose discrepancy between two face images is one of the fundamental challenges in automatic face recognition. Conventional approaches to pose-invariant face recognition either perform face frontalization on, or learn a pose-invariant representation from, a non-frontal face image. We argue that it is more desirable to perform both tasks jointly to allow them to leverage each other. To this end, this paper proposes a Disentangled Representation learning-Generative Adversarial Network (DR-GAN) with three distinct novelties. First, the encoder-decoder structure of the generator enables DR-GAN to learn a representation that is both generative and discriminative, which can be used for face image synthesis and pose-invariant face recognition. Second, this representation is explicitly disentangled from other face variations such as pose, through the pose code provided to the decoder and pose estimation in the discriminator. Third, DR-GAN can take one or multiple images as the input, and generate one unified identity representation along with an arbitrary number of synthetic face images. Extensive quantitative and qualitative evaluation on a number of controlled and in-the-wild databases demonstrate the superiority of DR-GAN over the state of the art in both learning representations and rotating large-pose face images.

142 citations

Proceedings ArticleDOI
18 Jun 2018
TL;DR: Deep Residual Equivariant Mapping (DREAM) as mentioned in this paper is proposed to adaptively add residuals to the input deep representation to transform a profile face representation to a canonical pose.
Abstract: Face recognition achieves exceptional success thanks to the emergence of deep learning. However, many contemporary face recognition models still perform relatively poor in processing profile faces compared to frontal faces. A key reason is that the number of frontal and profile training faces are highly imbalanced - there are extensively more frontal training samples compared to profile ones. In addition, it is intrinsically hard to learn a deep representation that is geometrically invariant to large pose variations. In this study, we hypothesize that there is an inherent mapping between frontal and profile faces, and consequently, their discrepancy in the deep representation space can be bridged by an equivariant mapping. To exploit this mapping, we formulate a novel Deep Residual EquivAriant Mapping (DREAM) block, which is capable of adaptively adding residuals to the input deep representation to transform a profile face representation to a canonical pose that simplifies recognition. The DREAM block consistently enhances the performance of profile face recognition for many strong deep networks, including ResNet models, without deliberately augmenting training data of profile faces. The block is easy to use, light-weight, and can be implemented with a negligible computational overhead1.

138 citations

Journal ArticleDOI
TL;DR: A deep multitask relationship learning network (DMTRL) that first obtains expressive and robust cardiac representations with a deep convolution neural network, then models the temporal dynamics of cardiac sequences effectively with two parallel recurrent neural network (RNN) modules, and estimates the cardiac phase with a softmax classifier.

133 citations

Book ChapterDOI
08 Sep 2018
TL;DR: A general modulation module is proposed, which can be inserted into any convolutional neural network architecture, to encourage the coupling and feature sharing of relevant tasks while disentangling the learning of irrelevant tasks with minor parameters addition.
Abstract: Multi-task learning has been widely adopted in many computer vision tasks to improve overall computation efficiency or boost the performance of individual tasks, under the assumption that those tasks are correlated and complementary to each other. However, the relationships between the tasks are complicated in practice, especially when the number of involved tasks scales up. When two tasks are of weak relevance, they may compete or even distract each other during joint training of shared parameters, and as a consequence undermine the learning of all the tasks. This will raise destructive interference which decreases learning efficiency of shared parameters and lead to low quality loss local optimum w.r.t. shared parameters. To address the this problem, we propose a general modulation module, which can be inserted into any convolutional neural network architecture, to encourage the coupling and feature sharing of relevant tasks while disentangling the learning of irrelevant tasks with minor parameters addition. Equipped with this module, gradient directions from different tasks can be enforced to be consistent for those shared parameters, which benefits multi-task joint training. The module is end-to-end learnable without ad-hoc design for specific tasks, and can naturally handle many tasks at the same time. We apply our approach on two retrieval tasks, face retrieval on the CelebA dataset [12] and product retrieval on the UT-Zappos50K dataset [34, 35], and demonstrate its advantage over other multi-task learning methods in both accuracy and storage efficiency.

113 citations

Journal ArticleDOI
TL;DR: A Dual-Agent Generative Adversarial Network (DA-GAN) model is proposed, which can improve the realism of a face simulator's output using unlabeled real faces while preserving the identity information during the realism refinement, and is a promising new approach for solving generic transfer learning problems more effectively.
Abstract: Synthesizing realistic profile faces is beneficial for more efficiently training deep pose-invariant models for large-scale unconstrained face recognition, by augmenting the number of samples with extreme poses and avoiding costly annotation work. However, learning from synthetic faces may not achieve the desired performance due to the discrepancy betwedistributions of the synthetic and real face images. To narrow this gap, we propose a Dual-Agent Generative Adversarial Network (DA-GAN) model, which can improve the realism of a face simulator's output using unlabeled real faces while preserving the identity information during the realism refinement. The dual agents are specially designed for distinguishing real versus fake and identities simultaneously. In particular, we employ an off-the-shelf 3D face model as a simulator to generate profile face images with varying poses. DA-GAN leverages a fully convolutional network as the generator to generate high-resolution images and an auto-encoder as the discriminator with the dual agents. Besides the novel architecture, we make several key modifications to the standard GAN to preserve pose, texture as well as identity, and stabilize the training process: (i) a pose perception loss; (ii) an identity perception loss; (iii) an adversarial loss with a boundary equilibrium regularization term. Experimental results show that DA-GAN not only achieves outstanding perceptual results but also significantly outperforms state-of-the-arts on the large-scale and challenging NIST IJB-A and CFP unconstrained face recognition benchmarks. In addition, the proposed DA-GAN is also a promising new approach for solving generic transfer learning problems more effectively. DA-GAN is the foundation of our winning entry to the NIST IJB-A face recognition competition in which we secured the $1^{st}$ places on the tracks of verification and identification.

103 citations