Home
/
Authors
/
Jie Hu

Author

Jie Hu

Bio: Jie Hu is an academic researcher from Xiamen University. The author has contributed to research in topics: Image translation & Feature (computer vision). The author has an hindex of 6, co-authored 12 publications receiving 102 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Image-to-image Translation via Hierarchical Style Disentanglement

[...]

Xinyang Li¹, Shengchuan Zhang¹, Jie Hu¹, Liujuan Cao¹, Xiaopeng Hong², Xudong Mao¹, Feiyue Huang³, Yongjian Wu³, Rongrong Ji¹ - Show less +5 more•Institutions (3)

Xiamen University¹, Xi'an Jiaotong University², Tencent³

02 Mar 2021

TL;DR: Li et al. as discussed by the authors propose Hierarchical Style Disentanglement (HiSD), which disentangles the labels into independent tags, exclusive attributes, and disentangled styles from top to bottom.

...read moreread less

Abstract: Recently, image-to-image translation has made significant progress in achieving both multi-label (i.e., translation conditioned on different labels) and multi-style (i.e., generation with diverse styles) tasks. However, due to the unexplored independence and exclusiveness in the labels, existing endeavors are defeated by involving uncontrolled manipulations to the translation results. In this paper, we propose Hierarchical Style Disentanglement (HiSD) to address this issue. Specifically, we organize the labels into a hierarchical tree structure, in which independent tags, exclusive attributes, and disentangled styles are allocated from top to bottom. Correspondingly, a new translation process is designed to adapt the above structure, in which the styles are identified for controllable translations. Both qualitative and quantitative results on the CelebA-HQ dataset verify the ability of the proposed HiSD. The code has been released at https://github.com/imlixinyang/HiSD.

...read moreread less

67 citations

Journal Article•DOI•

Face Sketch Synthesis by Multidomain Adversarial Learning

[...]

Shengchuan Zhang¹, Rongrong Ji¹, Jie Hu¹, Xiaoqiang Lu², Xuelong Li³ - Show less +1 more•Institutions (3)

Xiamen University¹, Chinese Academy of Sciences², Northwestern Polytechnical University³

01 May 2019-IEEE Transactions on Neural Networks

TL;DR: This paper presents a novel face sketch synthesis method by multidomain adversarial learning (termed MDAL), which overcomes the defects of blurs and deformations toward high-quality synthesis.

...read moreread less

Abstract: Given a training set of face photo-sketch pairs, face sketch synthesis targets at learning a mapping from the photo domain to the sketch domain. Despite the exciting progresses made in the literature, it retains as an open problem to synthesize high-quality sketches against blurs and deformations. Recent advances in generative adversarial training provide a new insight into face sketch synthesis, from which perspective the existing synthesis pipelines can be fundamentally revisited. In this paper, we present a novel face sketch synthesis method by multidomain adversarial learning (termed MDAL), which overcomes the defects of blurs and deformations toward high-quality synthesis. The principle of our scheme relies on the concept of “interpretation through synthesis.” In particular, we first interpret face photographs in the photodomain and face sketches in the sketch domain by reconstructing themselves respectively via adversarial learning. We define the intermediate products in the reconstruction process as latent variables, which form a latent domain. Second, via adversarial learning, we make the distributions of latent variables being indistinguishable between the reconstruction process of the face photograph and that of the face sketch. Finally, given an input face photograph, the latent variable obtained by reconstructing this face photograph is applied for synthesizing the corresponding sketch. Quantitative comparisons to the state-of-the-art methods demonstrate the superiority of the proposed MDAL method.

...read moreread less

59 citations

Proceedings Article•DOI•

Robust face sketch synthesis via generative adversarial fusion of priors and parametric sigmoid

[...]

Shengchuan Zhang¹, Rongrong Ji¹, Jie Hu¹, Yue Gao², Chia-Wen Lin³ - Show less +1 more•Institutions (3)

Xiamen University¹, Tsinghua University², National Tsing Hua University³

13 Jul 2018

TL;DR: A novel generative adversarial network termed pGAN is proposed, which can generate face sketches efficiently using training data under fixed conditions and handle the aforementioned uncontrolled conditions.

...read moreread less

Abstract: Despite the extensive progress in face sketch synthesis, existing methods are mostly workable under constrained conditions, such as fixed illumination, pose, background and ethnic origin that are hardly to control in real-world scenarios. The key issue lies in the difficulty to use data under fixed conditions to train a model against imaging variations. In this paper, we propose a novel generative adversarial network termed pGAN, which can generate face sketches efficiently using training data under fixed conditions and handle the aforementioned uncontrolled conditions. In pGAN, we embed key photo priors into the process of synthesis and design a parametric sigmoid activation function for compensating illumination variations. Compared to the existing methods, we quantitatively demonstrate that the proposed method can work well on face photos in the wild.

...read moreread less

33 citations

Proceedings Article•DOI•

Towards Visual Feature Translation

[...]

Jie Hu¹, Rongrong Ji¹, Hong Liu¹, Shengchuan Zhang¹, Cheng Deng², Qi Tian³ - Show less +2 more•Institutions (3)

Xiamen University¹, Xidian University², Huawei³

15 Jun 2019

TL;DR: This paper proposes a Hybrid Auto-Encoder (HAE) to translate visual features, which learns a mapping by minimizing the translation and reconstruction errors, and an Undirected Affinity Measurement (UAM) is further designed to quantify the affinity among different types of visual features.

...read moreread less

Abstract: Most existing visual search systems are deployed based upon fixed kinds of visual features, which prohibits the feature reusing across different systems or when upgrading systems with a new type of feature. Such a setting is obviously inflexible and time/memory consuming, which is indeed mendable if visual features can be ``translated" across systems. In this paper, we make the first attempt towards visual feature translation to break through the barrier of using features across different visual search systems. To this end, we propose a Hybrid Auto-Encoder (HAE) to translate visual features, which learns a mapping by minimizing the translation and reconstruction errors. Based upon HAE, an Undirected Affinity Measurement (UAM) is further designed to quantify the affinity among different types of visual features. Extensive experiments have been conducted on several public datasets with sixteen different types of widely-used features in visual search systems. Quantitative results show the encouraging possibilities of feature translation. For the first time, the affinity among widely-used features like SIFT and DELF is reported.

...read moreread less

18 citations

Posted Content•

Attribute Guided Unpaired Image-to-Image Translation with Semi-supervised Learning

[...]

Xinyang Li, Jie Hu, Shengchuan Zhang, Xiaopeng Hong, Qixiang Ye, Chenglin Wu, Rongrong Ji - Show less +3 more

29 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: An Attribute Guided UIT model termed AGUIT is proposed to tackle multi-modal and multi-domain tasks of UIT jointly with a novel semi-supervised setting, which also merits in representation disentanglement and fine control of outputs.

...read moreread less

Abstract: Unpaired Image-to-Image Translation (UIT) focuses on translating images among different domains by using unpaired data, which has received increasing research focus due to its practical usage. However, existing UIT schemes defect in the need of supervised training, as well as the lack of encoding domain information. In this paper, we propose an Attribute Guided UIT model termed AGUIT to tackle these two challenges. AGUIT considers multi-modal and multi-domain tasks of UIT jointly with a novel semi-supervised setting, which also merits in representation disentanglement and fine control of outputs. Especially, AGUIT benefits from two-fold: (1) It adopts a novel semi-supervised learning process by translating attributes of labeled data to unlabeled data, and then reconstructing the unlabeled data by a cycle consistency operation. (2) It decomposes image representation into domain-invariant content code and domain-specific style code. The redesigned style code embeds image style into two variables drawn from standard Gaussian distribution and the distribution of domain label, which facilitates the fine control of translation due to the continuity of both variables. Finally, we introduce a new challenge, i.e., disentangled transfer, for UIT models, which adopts the disentangled representation to translate data less related with the training set. Extensive experiments demonstrate the capacity of AGUIT over existing state-of-the-art models.

...read moreread less

16 citations

Cited by

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Journal Article•DOI•

A State-of-the-Art Review on Image Synthesis With Generative Adversarial Networks

[...]

Lei Wang¹, Wei Chen¹, Wenjia Yang¹, Fangming Bi¹, Fei Richard Yu² - Show less +1 more•Institutions (2)

China University of Mining and Technology¹, Carleton University²

20 Mar 2020-IEEE Access

TL;DR: The recent research on GANs in the field of image processing, including image synthesis, image generation, image semantic editing, image-to-image translation, image super-resolution, image inpainting, and cartoon generation is introduced.

...read moreread less

Abstract: Generative Adversarial Networks (GANs) have achieved impressive results in various image synthesis tasks, and are becoming a hot topic in computer vision research because of the impressive performance they achieved in various applications. In this paper, we introduce the recent research on GANs in the field of image processing, including image synthesis, image generation, image semantic editing, image-to-image translation, image super-resolution, image inpainting, and cartoon generation. We analyze and summarize the methods used in these applications which have improved the generated results. Then, we discuss the challenges faced by GANs and introduce some methods to deal with these problems. We also preview some likely future research directions in the field of GANs, such as video generation, facial animation synthesis and 3D face reconstruction. The purpose of this review is to provide insights into the research on GANs and to present the various applications based on GANs in different scenarios.

...read moreread less

83 citations

Journal Article•DOI•

Image synthesis with adversarial networks: A comprehensive survey and case studies

[...]

Pourya Shamsolmoali¹, Pourya Shamsolmoali², Masoumeh Zareapoor¹, Eric Granger², Huiyu Zhou³, Ruili Wang⁴, M. Emre Celebi⁵, Jie Yang¹ - Show less +4 more•Institutions (5)

Shanghai Jiao Tong University¹, École de technologie supérieure², University of Leicester³, Massey University⁴, University of Central Arkansas⁵

01 Aug 2021-Information Fusion

TL;DR: This survey provides a comprehensive review of adversarial models for image synthesis, and summarizes the synthetic image generation methods, and discusses the categories including image-to-image translation, fusion image generation, label- to-image mapping, and text-to -image translation.

...read moreread less

80 citations

Journal Article•DOI•

Toward Realistic Face Photo–Sketch Synthesis via Composition-Aided GANs

[...]

Jun Yu¹, Xingxin Xu¹, Fei Gao¹, Shengjie Shi¹, Meng Wang², Dacheng Tao³, Qingming Huang⁴ - Show less +3 more•Institutions (4)

Hangzhou Dianzi University¹, Hefei University of Technology², University of Sydney³, Chinese Academy of Sciences⁴

01 Sep 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: Zhang et al. as discussed by the authors proposed a novel composition-aided generative adversarial network (CA-GAN) for face photo-sketch synthesis, which utilizes paired inputs, including a face photo/Sketch and the corresponding pixelwise face labels for generating a sketch/photo.

...read moreread less

Abstract: Face photo–sketch synthesis aims at generating a facial sketch/photo conditioned on a given photo/sketch. It covers wide applications including digital entertainment and law enforcement. Precisely depicting face photos/sketches remains challenging due to the restrictions on structural realism and textural consistency. While existing methods achieve compelling results, they mostly yield blurred effects and great deformation over various facial components, leading to the unrealistic feeling of synthesized images. To tackle this challenge, in this article, we propose using facial composition information to help the synthesis of face sketch/photo. Especially, we propose a novel composition-aided generative adversarial network (CA-GAN) for face photo–sketch synthesis. In CA-GAN, we utilize paired inputs, including a face photo/sketch and the corresponding pixelwise face labels for generating a sketch/photo. Next, to focus training on hard-generated components and delicate facial structures, we propose a compositional reconstruction loss. In addition, we employ a perceptual loss function to encourage the synthesized image and real image to be perceptually similar. Finally, we use stacked CA-GANs (SCA-GANs) to further rectify defects and add compelling details. The experimental results show that our method is capable of generating both visually comfortable and identity-preserving face sketches/photos over a wide range of challenging data. In addition, our method significantly decreases the best previous Frechet inception distance (FID) from 36.2 to 26.2 for sketch synthesis, and from 60.9 to 30.5 for photo synthesis. Besides, we demonstrate that the proposed method is of considerable generalization ability.

...read moreread less

80 citations

Journal Article•DOI•

Real-Time Video Emotion Recognition based on Reinforcement Learning and Domain Knowledge

[...]

Ke Zhang¹, Yuanqing Li¹, Jingyu Wang¹, Erik Cambria², Xuelong Li¹ - Show less +1 more•Institutions (2)

Northwestern Polytechnical University¹, Nanyang Technological University²

12 Apr 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A novel multimodal emotion recognition model for conversational videos based on reinforcement learning and domain knowledge (ERLDK) is proposed in this paper and achieves the state-of-the-art results on weighted average and most of the specific emotion categories.

...read moreread less

Abstract: Multimodal emotion recognition in conversational videos (ERC) develops rapidly in recent years. To fully extract the relative context from video clips, most studies build their models on the entire dialogues which make them lack of real-time ERC ability. Different from related researches, a novel multimodal emotion recognition model for conversational videos based on reinforcement learning and domain knowledge (ERLDK) is proposed in this paper. In ERLDK, the reinforcement learning algorithm is introduced to conduct real-time ERC with the occurrence of conversations. The collection of history utterances is composed as an emotion-pair which represents the multimodal context of the following utterance to be recognized. Dueling deep-Q-network (DDQN) based on gated recurrent unit (GRU) layers is designed to learn the correct action from the alternative emotion categories. Domain knowledge is extracted from public dataset based on the former information of emotion-pairs. The extracted domain knowledge is used to revise the results from the RL module and is transformed into other dataset to examine the rationality. The experimental results on datasets show that ERLDK achieves the state-of-the-art results on weighted average and most of the specific emotion categories.

...read moreread less

71 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

Collapse