scispace - formally typeset
Search or ask a question
Author

Wanhua Li

Bio: Wanhua Li is an academic researcher from Tsinghua University. The author has contributed to research in topics: Computer science & Graph (abstract data type). The author has an hindex of 4, co-authored 17 publications receiving 109 citations.

Papers
More filters
Proceedings ArticleDOI
Wanhua Li1, Jiwen Lu1, Jianjiang Feng1, Chunjing Xu2, Jie Zhou1, Qi Tian2 
15 Jun 2019
TL;DR: Wang et al. as discussed by the authors proposed a bridge-tree structure to enforce the similarity between neighbor nodes, where local regressors partition the data space into multiple overlapping subspaces to tackle heterogeneous data and gating networks learn continuity-aware weights.
Abstract: Age estimation is an important yet very challenging problem in computer vision. Existing methods for age estimation usually apply a divide-and-conquer strategy to deal with heterogeneous data caused by the non-stationary aging process. However, the facial aging process is also a continuous process, and the continuity relationship between different components has not been effectively exploited. In this paper, we propose BridgeNet for age estimation, which aims to mine the continuous relation between age labels effectively. The proposed BridgeNet consists of local regressors and gating networks. Local regressors partition the data space into multiple overlapping subspaces to tackle heterogeneous data and gating networks learn continuity aware weights for the results of local regressors by employing the proposed bridge-tree structure, which introduces bridge connections into tree models to enforce the similarity between neighbor nodes. Moreover, these two components of BridgeNet can be jointly learned in an end-to-end way. We show experimental results on the MORPH II, FG-NET and Chalearn LAP 2015 datasets and find that BridgeNet outperforms the state-of-the-art methods.

86 citations

Posted Content
Wanhua Li1, Jiwen Lu1, Jianjiang Feng1, Chunjing Xu1, Jie Zhou2, Qi Tian2 
TL;DR: The proposed BridgeNet, which aims to mine the continuous relation between age labels effectively, outperforms the state-of-the-art methods and can be jointly learned in an end-to-end way.
Abstract: Age estimation is an important yet very challenging problem in computer vision. Existing methods for age estimation usually apply a divide-and-conquer strategy to deal with heterogeneous data caused by the non-stationary aging process. However, the facial aging process is also a continuous process, and the continuity relationship between different components has not been effectively exploited. In this paper, we propose BridgeNet for age estimation, which aims to mine the continuous relation between age labels effectively. The proposed BridgeNet consists of local regressors and gating networks. Local regressors partition the data space into multiple overlapping subspaces to tackle heterogeneous data and gating networks learn continuity aware weights for the results of local regressors by employing the proposed bridge-tree structure, which introduces bridge connections into tree models to enforce the similarity between neighbor nodes. Moreover, these two components of BridgeNet can be jointly learned in an end-to-end way. We show experimental results on the MORPH II, FG-NET and Chalearn LAP 2015 datasets and find that BridgeNet outperforms the state-of-the-art methods.

29 citations

Proceedings ArticleDOI
Wanhua Li1, Xiaoke Huang1, Jiwen Lu1, Jianjiang Feng1, Jie Zhou1 
20 Jun 2021
TL;DR: Li et al. as discussed by the authors proposed to learn probabilistic ordinal embeddings which represent each data as a multivariate Gaussian distribution rather than a deterministic point in the latent space.
Abstract: Uncertainty is the only certainty there is. Modeling data uncertainty is essential for regression, especially in unconstrained settings. Traditionally the direct regression formulation is considered and the uncertainty is modeled by modifying the output space to a certain family of probabilistic distributions. On the other hand, classification based regression and ranking based solutions are more popular in practice while the direct regression methods suffer from the limited performance. How to model the uncertainty within the present-day technologies for regression remains an open issue. In this paper, we propose to learn probabilistic ordinal embeddings which represent each data as a multivariate Gaussian distribution rather than a deterministic point in the latent space. An ordinal distribution constraint is proposed to exploit the ordinal nature of regression. Our probabilistic ordinal embeddings can be integrated into popular regression approaches and empower them with the ability of uncertainty estimation. Experimental results show that our approach achieves competitive performance. Code is available at https://github.com/Li-Wanhua/POEs.

26 citations

Proceedings ArticleDOI
Shuai Shen1, Wanhua Li1, Zheng Zhu1, Guan Huang, Dalong Du, Jiwen Lu1, Jie Zhou1 
01 Jun 2021
TL;DR: Zhang et al. as mentioned in this paper designed a structure-preserved subgraph sampling strategy to explore the power of large-scale training data, which can increase the training data scale from 105 to 107.
Abstract: Face clustering is a promising method for annotating un-labeled face images. Recent supervised approaches have boosted the face clustering accuracy greatly, however their performance is still far from satisfactory. These methods can be roughly divided into global-based and local-based ones. Global-based methods suffer from the limitation of training data scale, while local-based ones are difficult to grasp the whole graph structure information and usually take a long time for inference. Previous approaches fail to tackle these two challenges simultaneously. To address the dilemma of large-scale training and efficient inference, we propose the STructure-AwaRe Face Clustering (STAR-FC) method. Specifically, we design a structure-preserved subgraph sampling strategy to explore the power of large-scale training data, which can increase the training data scale from 105 to 107. During inference, the STAR-FC performs efficient full-graph clustering with two steps: graph parsing and graph refinement. And the concept of node intimacy is introduced in the second step to mine the local structural information. The STAR-FC gets 91.97 pairwise F-score on partial MS1M within 310s which surpasses the state-of-the-arts. Furthermore, we are the first to train on very large-scale graph with 20M nodes, and achieve superior inference results on 12M testing data. Overall, as a simple and effective method, the proposed STAR-FC provides a strong baseline for large-scale face clustering. Code is available at https://sstzal.github.io/STAR-FC/.

24 citations

Proceedings ArticleDOI
22 Apr 2020
TL;DR: A graph-based kinship reasoning network for kinship verification, which aims to effectively perform relational reasoning on the extracted features of an image pair, which outperforms the state-of-the-art methods.
Abstract: In this paper, we propose a graph-based kinship reasoning (GKR) network for kinship verification, which aims to effectively perform relational reasoning on the extracted features of an image pair. Unlike most existing methods which mainly focus on how to learn discriminative features, our method considers how to compare and fuse the extracted feature pair to reason about the kin relations. The proposed GKR constructs a star graph called kinship relational graph where each peripheral node represents the information comparison in one feature dimension and the central node is used as a bridge for information communication among peripheral nodes. Then the GKR performs relational reasoning on this graph with recursive message passing. Extensive experimental results on the KinFaceW-I and KinFaceW-II datasets show that the proposed GKR outperforms the state-of-the-art methods.

24 citations


Cited by
More filters
Posted Content
TL;DR: In this paper, the authors propose to represent videos as space-time region graphs which capture temporal shape dynamics and functional relationships between humans and objects, and perform reasoning on this graph representation via Graph Convolutional Networks.
Abstract: How do humans recognize the action "opening a book" ? We argue that there are two important cues: modeling temporal shape dynamics and modeling functional relationships between humans and objects. In this paper, we propose to represent videos as space-time region graphs which capture these two important cues. Our graph nodes are defined by the object region proposals from different frames in a long range video. These nodes are connected by two types of relations: (i) similarity relations capturing the long range dependencies between correlated objects and (ii) spatial-temporal relations capturing the interactions between nearby objects. We perform reasoning on this graph representation via Graph Convolutional Networks. We achieve state-of-the-art results on both Charades and Something-Something datasets. Especially for Charades, we obtain a huge 4.4% gain when our model is applied in complex environments.

278 citations

01 Jan 2006
TL;DR: It is concluded that the problem of age-progression on face recognition (FR) is not unique to the algorithm used in this work, and the efficacy of this algorithm is evaluated against the variables of gender and racial origin.
Abstract: This paper details MORPH a longitudinal face database developed for researchers investigating all facets of adult age-progression, e.g. face modeling, photo-realistic animation, face recognition, etc. This database contributes to several active research areas, most notably face recognition, by providing: the largest set of publicly available longitudinal images; longitudinal spans from a few months to over twenty years; and, the inclusion of key physical parameters that affect aging appearance. The direct contribution of this data corpus for face recognition is highlighted in the evaluation of a standard face recognition algorithm, which illustrates the impact that age-progression, has on recognition rates. Assessment of the efficacy of this algorithm is evaluated against the variables of gender and racial origin. This work further concludes that the problem of age-progression on face recognition (FR) is not unique to the algorithm used in this work.

139 citations

Book ChapterDOI
23 Aug 2020
TL;DR: An adaptive variance based distribution learning (AVDL) method for facial age estimation that introduces the data-driven optimization framework, meta-learning, to achieve this and can approximate the true age probability distribution more effectively.
Abstract: Estimating age from a single facial image is a classic and challenging topic in computer vision. One of its most intractable issues is label ambiguity, i.e., face images from adjacent age of the same person are often indistinguishable. Some existing methods adopt distribution learning to tackle this issue by exploiting the semantic correlation between age labels. Actually, most of them set a fixed value to the variance of Gaussian label distribution for all the images. However, the variance is closely related to the correlation between adjacent ages and should vary across ages and identities. To model a sample-specific variance, in this paper, we propose an adaptive variance based distribution learning (AVDL) method for facial age estimation. AVDL introduces the data-driven optimization framework, meta-learning, to achieve this. Specifically, AVDL performs a meta gradient descent step on the variable (i.e. variance) to minimize the loss on a clean unbiased validation set. By adaptively learning proper variance for each sample, our method can approximate the true age probability distribution more effectively. Extensive experiments on FG-NET and MORPH II datasets show the superiority of our proposed approach to the existing state-of-the-art methods.

42 citations

Journal ArticleDOI
TL;DR: An overview of the cutting-edge approaches that perform facial cue analysis in the healthcare area is given and a research taxonomy is introduced by dividing the face in its main features: eyes, mouth, muscles, skin, and shape.
Abstract: This paper gives an overview of the cutting-edge approaches that perform facial cue analysis in the healthcare area. The document is not limited to global face analysis but it also concentrates on methods related to local cues (e.g., the eyes). A research taxonomy is introduced by dividing the face in its main features: eyes, mouth, muscles, skin, and shape. For each facial feature, the computer vision-based tasks aiming at analyzing it and the related healthcare goals that could be pursued are detailed.

41 citations

Proceedings ArticleDOI
13 Mar 2022
TL;DR: A self-supervised face-depth learning method to automatically recover dense 3D facial geometry (i.e. depth) from the face videos without the requirement of any expensive 3D annotation data is introduced.
Abstract: Talking head video generation aims to produce a synthetic human face video that contains the identity and pose information respectively from a given source image and a driving video. Existing works for this task heavily rely on 2D representations (e.g. appearance and motion) learned from the input images. However, dense 3D facial geometry (e.g. pixel-wise depth) is extremely important for this task as it is particularly beneficial for us to essentially generate accurate 3D face structures and distinguish noisy information from the possibly cluttered background. Nevertheless, dense 3D geometry annotations are prohibitively costly for videos and are typically not available for this video generation task. In this paper, we introduce a self-supervised face-depth learning method to automatically recover dense 3D facial geometry (i.e. depth) from the face videos without the requirement of any expensive 3D annotation data. Based on the learned dense depth maps, we further propose to leverage them to estimate sparse facial keypoints that capture the critical movement of the human head. In a more dense way, the depth is also utilized to learn 3D-aware cross-modal (i.e. appearance and depth) attention to guide the generation of motion fields for warping source image representations. All these contributions compose a novel depth-aware generative adversarial network (DaGAN) for talking head generation. Extensive experiments conducted demonstrate that our proposed method can generate highly realistic faces, and achieve significant results on the unseen human faces. 11https://github.com/harlanhong/CVPR2022-DaGAN

36 citations