Facial Age Estimation by Learning from Label Distributions

doi:10.1109/TPAMI.2013.51

Home
/
Papers
/
Facial Age Estimation by Learning from Label Distributions

Journal Article•DOI•

Facial Age Estimation by Learning from Label Distributions

Xin Geng¹, Chao Yin¹, Zhi-Hua Zhou²•Institutions (2)

Southeast University¹, Nanjing University²

01 Oct 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE Computer Society)-Vol. 35, Iss: 10, pp 2401-2412

TL;DR: Li et al. as mentioned in this paper proposed a label distribution approach for facial age estimation, which covers a certain number of class labels, representing the degree that each label describes the instance, and two algorithms, named IIS-LLD and CPNN, are proposed to learn from such label distributions.

read less

Abstract: One of the main difficulties in facial age estimation is that the learning algorithms cannot expect sufficient and complete training data. Fortunately, the faces at close ages look quite similar since aging is a slow and smooth process. Inspired by this observation, instead of considering each face image as an instance with one label (age), this paper regards each face image as an instance associated with a label distribution. The label distribution covers a certain number of class labels, representing the degree that each label describes the instance. Through this way, one face image can contribute to not only the learning of its chronological age, but also the learning of its adjacent ages. Two algorithms, named IIS-LLD and CPNN, are proposed to learn from such label distributions. Experimental results on two aging face databases show remarkable advantages of the proposed label distribution learning algorithms over the compared single-label learning algorithms, either specially designed for age estimation or for general purpose.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering

[...]

Zhou Yu¹, Jun Yu¹, Chenchao Xiang¹, Jianping Fan², Dacheng Tao³ - Show less +1 more•Institutions (3)

Hangzhou Dianzi University¹, University of North Carolina at Charlotte², University of Sydney³

09 Apr 2018-IEEE Transactions on Neural Networks

TL;DR: Zhang et al. as mentioned in this paper proposed a coattention mechanism using a deep neural network (DNN) architecture to jointly learn the attentions for both the image and the question, which can reduce the irrelevant features effectively and obtain more discriminative features for image and question representations.

...read moreread less

Abstract: Visual question answering (VQA) is challenging, because it requires a simultaneous understanding of both visual content of images and textual content of questions. To support the VQA task, we need to find good solutions for the following three issues: 1) fine-grained feature representations for both the image and the question; 2) multimodal feature fusion that is able to capture the complex interactions between multimodal features; and 3) automatic answer prediction that is able to consider the complex correlations between multiple diverse answers for the same question. For fine-grained image and question representations, a “coattention” mechanism is developed using a deep neural network (DNN) architecture to jointly learn the attentions for both the image and the question, which can allow us to reduce the irrelevant features effectively and obtain more discriminative features for image and question representations. For multimodal feature fusion, a generalized multimodal factorized high-order pooling approach (MFH) is developed to achieve more effective fusion of multimodal features by exploiting their correlations sufficiently, which can further result in superior VQA performance as compared with the state-of-the-art approaches. For answer prediction, the Kullback–Leibler divergence is used as the loss function to achieve precise characterization of the complex correlations between multiple diverse answers with the same or similar meaning, which can allow us to achieve faster convergence rate and obtain slightly better accuracy on answer prediction. A DNN architecture is designed to integrate all these aforementioned modules into a unified model for achieving superior VQA performance. With an ensemble of our MFH models, we achieve the state-of-the-art performance on the large-scale VQA data sets and win the runner-up in VQA Challenge 2017.

...read moreread less

437 citations

Journal Article•DOI•

Label Distribution Learning

[...]

Xin Geng¹•Institutions (1)

Southeast University¹

01 Jul 2016-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper proposes six working LDL algorithms in three ways: problem transformation, algorithm adaptation, and specialized algorithm design, and results show clear advantages of the specialized algorithms, which indicates the importance of special design for the characteristics of the LDL problem.

...read moreread less

Abstract: Although multi-label learning can deal with many problems with label ambiguity, it does not fit some real applications well where the overall distribution of the importance of the labels matters. This paper proposes a novel learning paradigm named label distribution learning (LDL) for such kind of applications. The label distribution covers a certain number of labels, representing the degree to which each label describes the instance. LDL is a more general learning framework which includes both single-label and multi-label learning as its special cases. This paper proposes six working LDL algorithms in three ways: problem transformation, algorithm adaptation, and specialized algorithm design. In order to compare the performance of the LDL algorithms, six representative and diverse evaluation measures are selected via a clustering analysis, and the first batch of label distribution datasets are collected and made publicly available. Experimental results on one artificial and 15 real-world datasets show clear advantages of the specialized algorithms, which indicates the importance of special design for the characteristics of the LDL problem.

...read moreread less

428 citations

Journal Article•DOI•

Demographic Estimation from Face Images: Human vs. Machine Performance

[...]

Hu Han¹, Charles Otto¹, Xiaoming Liu¹, Anil K. Jain¹•Institutions (1)

Michigan State University¹

01 Jun 2015-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A generic framework for automatic demographic (age, gender and race) estimation is presented and crowdsourcing is used to study the human perception ability of estimating demographics from face images.

...read moreread less

Abstract: Demographic estimation entails automatic estimation of age, gender and race of a person from his face image, which has many potential applications ranging from forensics to social media. Automatic demographic estimation, particularly age estimation, remains a challenging problem because persons belonging to the same demographic group can be vastly different in their facial appearances due to intrinsic and extrinsic factors. In this paper, we present a generic framework for automatic demographic (age, gender and race) estimation. Given a face image, we first extract demographic informative features via a boosting algorithm, and then employ a hierarchical approach consisting of between-group classification, and within-group regression. Quality assessment is also developed to identify low-quality face images that are difficult to obtain reliable demographic estimates. Experimental results on a diverse set of face image databases, FG-NET ( $1K$ images), FERET ( $3K$ images), MORPH II ( $75K$ images), PCSO ( $100K$ images), and a subset of LFW ( $4K$ images), show that the proposed approach has superior performance compared to the state of the art. Finally, we use crowdsourcing to study the human perception ability of estimating demographics from face images. A side-by-side comparison of the demographic estimates from crowdsourced data and the proposed algorithm provides a number of insights into this challenging problem.

...read moreread less

316 citations

Journal Article•DOI•

Deep Label Distribution Learning With Label Ambiguity

[...]

Bin-Bin Gao¹, Chao Xing², Chen-Wei Xie¹, Jianxin Wu¹, Xin Geng² - Show less +1 more•Institutions (2)

Nanjing University¹, Southeast University²

01 Jun 2017-IEEE Transactions on Image Processing

TL;DR: The proposed deep label distribution learning (DLDL) method effectively utilizes the label ambiguity in both feature learning and classifier learning, which help prevent the network from overfitting even when the training set is small.

...read moreread less

Abstract: Convolutional neural networks (ConvNets) have achieved excellent recognition performance in various visual recognition tasks. A large labeled training set is one of the most important factors for its success. However, it is difficult to collect sufficient training images with precise labels in some domains, such as apparent age estimation, head pose estimation, multilabel classification, and semantic segmentation. Fortunately, there is ambiguous information among labels, which makes these tasks different from traditional classification. Based on this observation, we convert the label of each image into a discrete label distribution, and learn the label distribution by minimizing a Kullback–Leibler divergence between the predicted and ground-truth label distributions using deep ConvNets. The proposed deep label distribution learning (DLDL) method effectively utilizes the label ambiguity in both feature learning and classifier learning, which help prevent the network from overfitting even when the training set is small. Experimental results show that the proposed approach produces significantly better results than the state-of-the-art methods for age estimation and head pose estimation. At the same time, it also improves recognition performance for multi-label classification and semantic segmentation tasks.

...read moreread less

290 citations

Journal Article•DOI•

Binary relevance for multi-label learning: an overview

[...]

Min-Ling Zhang¹, Yu-Kun Li¹, Xu-Ying Liu¹, Xin Geng¹•Institutions (1)

Southeast University¹

22 Mar 2018-Frontiers of Computer Science

TL;DR: This paper aims to review the state of the art of binary relevance from three perspectives, and some of the recent studies on binary relevance aimed at issues other than label correlation exploitation are introduced.

...read moreread less

Abstract: Multi-label learning deals with problems where each example is represented by a single instance while being associated with multiple class labels simultaneously. Binary relevance is arguably the most intuitive solution for learning from multi-label examples. It works by decomposing the multi-label learning task into a number of independent binary learning tasks (one per class label). In view of its potential weakness in ignoring correlations between labels, many correlation-enabling extensions to binary relevance have been proposed in the past decade. In this paper, we aim to review the state of the art of binary relevance from three perspectives. First, basic settings for multi-label learning and binary relevance solutions are briefly summarized. Second, representative strategies to provide binary relevancewith label correlation exploitation abilities are discussed. Third, some of our recent studies on binary relevance aimed at issues other than label correlation exploitation are introduced. As a conclusion, we provide suggestions on future research directions.

...read moreread less

257 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104

Collapse

References

PDF

Open Access

More filters

Book•

Pattern Classification

[...]

Peter E. Hart, Richard O. Duda, David G. Stork

01 Jan 1973

20,541 citations

Journal Article•DOI•

Cluster analysis and display of genome-wide expression patterns

[...]

Michael B. Eisen¹, Paul T. Spellman¹, Patrick O. Brown¹, David Botstein¹•Institutions (1)

Stanford University¹

08 Dec 1998-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.

...read moreread less

Abstract: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is de- scribed that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be inter- preted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly charac- terized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.

...read moreread less

16,371 citations

Journal Article•DOI•

ANFIS: adaptive-network-based fuzzy inference system

[...]

Jyh-Shing Roger Jang¹•Institutions (1)

University of California, Berkeley¹

01 May 1993

TL;DR: The architecture and learning procedure underlying ANFIS (adaptive-network-based fuzzy inference system) is presented, which is a fuzzy inference System implemented in the framework of adaptive networks.

...read moreread less

Abstract: The architecture and learning procedure underlying ANFIS (adaptive-network-based fuzzy inference system) is presented, which is a fuzzy inference system implemented in the framework of adaptive networks. By using a hybrid learning procedure, the proposed ANFIS can construct an input-output mapping based on both human knowledge (in the form of fuzzy if-then rules) and stipulated input-output data pairs. In the simulation, the ANFIS architecture is employed to model nonlinear functions, identify nonlinear components on-line in a control system, and predict a chaotic time series, all yielding remarkable results. Comparisons with artificial neural networks and earlier work on fuzzy modeling are listed and discussed. Other extensions of the proposed ANFIS and promising applications to automatic control and signal processing are also suggested. >

...read moreread less

15,085 citations

Proceedings Article•DOI•

A direct adaptive method for faster backpropagation learning: the RPROP algorithm

[...]

Martin Riedmiller¹, Heinrich Braun¹•Institutions (1)

Karlsruhe Institute of Technology¹

28 Mar 1993

TL;DR: A learning algorithm for multilayer feedforward networks, RPROP (resilient propagation), is proposed that performs a local adaptation of the weight-updates according to the behavior of the error function to overcome the inherent disadvantages of pure gradient-descent.

...read moreread less

Abstract: A learning algorithm for multilayer feedforward networks, RPROP (resilient propagation), is proposed. To overcome the inherent disadvantages of pure gradient-descent, RPROP performs a local adaptation of the weight-updates according to the behavior of the error function. Contrary to other adaptive techniques, the effect of the RPROP adaptation process is not blurred by the unforeseeable influence of the size of the derivative, but only dependent on the temporal behavior of its sign. This leads to an efficient and transparent adaptation process. The capabilities of RPROP are shown in comparison to other adaptive techniques. >

...read moreread less

4,319 citations

Journal Article•DOI•

A maximum entropy approach to natural language processing

[...]

Adam L. Berger¹, Vincent J. Della Pietra², Stephen A. Della Pietra²•Institutions (2)

Columbia University¹, Renaissance Technologies²

01 Mar 1996-Computational Linguistics

TL;DR: A maximum-likelihood approach for automatically constructing maximum entropy models is presented and how to implement this approach efficiently is described, using as examples several problems in natural language processing.

...read moreread less

Abstract: The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper, we describe a method for statistical modeling based on maximum entropy. We present a maximum-likelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.

...read moreread less

3,392 citations