Home
/
Authors
/
Xin Yu

Author

Xin Yu

Other affiliations: Australian National University, University of Electronic Science and Technology of China, Chinese Academy of Sciences ...read more

Bio: Xin Yu is an academic researcher from University of Technology, Sydney. The author has contributed to research in topics: Face hallucination & Feature (computer vision). The author has an hindex of 22, co-authored 122 publications receiving 2058 citations. Previous affiliations of Xin Yu include Australian National University & University of Electronic Science and Technology of China.

Topics: Face hallucination, Feature (computer vision), Wavefront, Computer science, Pose ...read more

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2012
2011

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Ultra-Resolving Face Images by Discriminative Generative Networks

[...]

Xin Yu¹, Fatih Porikli¹•Institutions (1)

Australian National University¹

08 Oct 2016

TL;DR: This work presents a discriminative generative network that can ultra-resolve a very low resolution face image of size \(16 \times 16\) pixels to its \(8\times \) larger version by reconstructing 64 pixels from a single pixel.

...read moreread less

Abstract: Conventional face super-resolution methods, also known as face hallucination, are limited up to \(2 \! \sim \! 4\times \) scaling factors where \(4 \sim 16\) additional pixels are estimated for each given pixel. Besides, they become very fragile when the input low-resolution image size is too small that only little information is available in the input image. To address these shortcomings, we present a discriminative generative network that can ultra-resolve a very low resolution face image of size \(16 \times 16\) pixels to its \(8\times \) larger version by reconstructing 64 pixels from a single pixel. We introduce a pixel-wise \(\ell _2\) regularization term to the generative model and exploit the feedback of the discriminative network to make the upsampled face images more similar to real ones. In our framework, the discriminative network learns the essential constituent parts of the faces and the generative network blends these parts in the most accurate fashion to the input image. Since only frontal and ordinary aligned images are used in training, our method can ultra-resolve a wide range of very low-resolution images directly regardless of pose and facial expression variations. Our extensive experimental evaluations demonstrate that the presented ultra-resolution by discriminative generative networks (UR-DGN) achieves more appealing results than the state-of-the-art.

...read moreread less

311 citations

Proceedings Article•DOI•

Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison

[...]

Dongxu Li¹, Cristian Rodriguez Opazo¹, Xin Yu¹, Hongdong Li¹•Institutions (1)

Australian National University¹

01 Mar 2020

TL;DR: Wang et al. as mentioned in this paper introduced a large-scale Word-Level American Sign Language (WLASL) video dataset, containing more than 2000 words performed by over 100 signers.

...read moreread less

Abstract: Vision-based sign language recognition aims at helping the deaf people to communicate with others. However, most existing sign language datasets are limited to a small number of words. Due to the limited vocabulary size, models learned from those datasets cannot be applied in practice. In this paper, we introduce a new large-scale Word-Level American Sign Language (WLASL) video dataset, containing more than 2000 words performed by over 100 signers. This dataset will be made publicly available to the research community. To our knowledge,it is by far the largest public ASL dataset to facilitate word-level sign recognition research.Based on this new large-scale dataset, we are able to experiment with several deep learning methods for word-level sign recognition and evaluate their performances in large scale scenarios. Specifically we implement and compare two different models,i.e., (i) holistic visual appearance based approach, and (ii) 2D human pose based approach. Both models are valuable baselines that will benefit the community for method benchmarking. Moreover, we also propose a novel pose-based temporal graph convolution networks (Pose-TGCN) that model spatial and temporal dependencies in human pose trajectories simultaneously, which has further boosted the performance of the pose-based method. Our results show that pose-based and appearance-based models achieve comparable performances up to 62.63% at top-10 accuracy on 2,000 words/glosses, demonstrating the validity and challenges of our dataset. Our dataset and baseline deep models are available at https://dxli94.github.io/WLASL/.

...read moreread less

263 citations

Proceedings Article•DOI•

Weakly-Supervised Salient Object Detection via Scribble Annotations

[...]

Jing Zhang¹, Xin Yu², Aixuan Li³, Peipei Song¹, Bowen Liu³, Yuchao Dai³ - Show less +2 more•Institutions (3)

Australian National University¹, University of Technology, Sydney², Northwestern Polytechnical University³

14 Jun 2020

TL;DR: This paper proposes a weakly-supervised salient object detection model to learn saliency from scribble annotations, and presents a new metric, termed saliency structure measure, as a complementary metric to evaluate sharpness of the prediction.

...read moreread less

Abstract: Compared with laborious pixel-wise dense labeling, it is much easier to label data by scribbles, which only costs 1~2 seconds to label one image. However, using scribble labels to learn salient object detection has not been explored. In this paper, we propose a weakly-supervised salient object detection model to learn saliency from such annotations. In doing so, we first relabel an existing large-scale salient object detection dataset with scribbles, namely S-DUTS dataset. Since object structure and detail information is not identified by scribbles, directly training with scribble labels will lead to saliency maps of poor boundary localization. To mitigate this problem, we propose an auxiliary edge detection task to localize object edges explicitly, and a gated structure-aware loss to place constraints on the scope of structure to be recovered. Moreover, we design a scribble boosting scheme to iteratively consolidate our scribble annotations, which are then employed as supervision to learn high-quality saliency maps. As existing saliency evaluation metrics neglect to measure structure alignment of the predictions, the saliency map ranking may not comply with human perception. We present a new metric, termed saliency structure measure, as a complementary metric to evaluate sharpness of the prediction. Extensive experiments on six benchmark datasets demonstrate that our method not only outperforms existing weakly-supervised/unsupervised methods, but also is on par with several fully-supervised state-of-the-art models (Our code and data is publicly available at: https://github.com/JingZhang617/Scribble_Saliency).

...read moreread less

193 citations

Proceedings Article•DOI•

Super-Resolving Very Low-Resolution Face Images with Supplementary Attributes

[...]

Xin Yu¹, Basura Fernando¹, Richard Hartley¹, Fatih Porikli¹•Institutions (1)

Australian National University¹

18 Jun 2018

TL;DR: An attribute-embedded upsampling network that can super-resolve tiny (16Ã—16 pixels) unaligned face images with a large upscaling factor of 8Ã— while reducing the uncertainty of one-to-many mappings remarkably is developed.

...read moreread less

Abstract: Given a tiny face image, existing face hallucination methods aim at super-resolving its high-resolution (HR) counterpart by learning a mapping from an exemplar dataset. Since a low-resolution (LR) input patch may correspond to many HR candidate patches, this ambiguity may lead to distorted HR facial details and wrong attributes such as gender reversal. An LR input contains low-frequency facial components of its HR version while its residual face image, defined as the difference between the HR ground-truth and interpolated LR images, contains the missing high-frequency facial details. We demonstrate that supplementing residual images or feature maps with additional facial attribute information can significantly reduce the ambiguity in face super-resolution. To explore this idea, we develop an attribute-embedded upsampling network, which consists of an upsampling network and a discriminative network. The upsampling network is composed of an autoencoder with skip-connections, which incorporates facial attribute vectors into the residual features of LR inputs at the bottleneck of the autoencoder and deconvolutional layers used for upsampling. The discriminative network is designed to examine whether super-resolved faces contain the desired attributes or not and then its loss is used for updating the upsampling network. In this manner, we can super-resolve tiny (16A—16 pixels) unaligned face images with a large upscaling factor of 8A— while reducing the uncertainty of one-to-many mappings remarkably. By conducting extensive evaluations on a large-scale dataset, we demonstrate that our method achieves superior face hallucination results and outperforms the state-of-the-art.

...read moreread less

177 citations

Book Chapter•DOI•

Face Super-resolution Guided by Facial Component Heatmaps

[...]

Xin Yu¹, Basura Fernando¹, Bernard Ghanem², Fatih Porikli¹, Richard Hartley¹ - Show less +1 more•Institutions (2)

Australian National University¹, King Abdullah University of Science and Technology²

08 Sep 2018

TL;DR: This paper proposes a method that explicitly incorporates structural information of faces into the face super-resolution process by using a multi-task convolutional neural network (CNN) and achieves superior face hallucination results and outperforms the state-of-the-art.

...read moreread less

Abstract: State-of-the-art face super-resolution methods leverage deep convolutional neural networks to learn a mapping between low-resolution (LR) facial patterns and their corresponding high-resolution (HR) counterparts by exploring local appearance information. However, most of these methods do not account for facial structure and suffer from degradations due to large pose variations and misalignments. In this paper, we propose a method that explicitly incorporates structural information of faces into the face super-resolution process by using a multi-task convolutional neural network (CNN). Our CNN has two branches: one for super-resolving face images and the other branch for predicting salient regions of a face coined facial component heatmaps. These heatmaps encourage the upsampling stream to generate super-resolved faces with higher-quality details. Our method not only uses low-level information (i.e., intensity similarity), but also middle-level information (i.e., face structure) to further explore spatial constraints of facial components from LR inputs images. Therefore, we are able to super-resolve very small unaligned face images \((16\,\times \,16\hbox { pixels})\) with a large upscaling factor of 8\(\times \), while preserving face structure. Extensive experiments demonstrate that our network achieves superior face hallucination results and outperforms the state-of-the-art.

...read moreread less

174 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

[...]

Christian Ledig¹, Lucas Theis¹, Ferenc Huszar², Jose Caballero³, Andrew Cunningham, Alejandro Acosta², Andrew Peter Aitken², Alykhan Tejani², Johannes Totz², Zehan Wang², Wenzhe Shi² - Show less +7 more•Institutions (3)

Fırat University¹, Twitter², Imperial College London³

21 Jul 2017

TL;DR: SRGAN as mentioned in this paper proposes a perceptual loss function which consists of an adversarial loss and a content loss, which pushes the solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.

...read moreread less

Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

...read moreread less

6,884 citations

Posted Content•

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

[...]

Fırat University¹, Twitter², Imperial College London³

15 Sep 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: SRGAN, a generative adversarial network (GAN) for image super-resolution (SR), is presented, to its knowledge, the first framework capable of inferring photo-realistic natural images for 4x upscaling factors and a perceptual loss function which consists of an adversarial loss and a content loss.

...read moreread less

4,404 citations

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Reference Entry•DOI•

IEEE Transactions on Pattern Analysis and Machine Intelligence

[...]

King-Sun Fu

15 Oct 2004

2,118 citations

Journal Article•DOI•

Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources

[...]

Xiao Xiang Zhu¹, Devis Tuia², Lichao Mou¹, Gui-Song Xia³, Liangpei Zhang³, Feng Xu⁴, Friedrich Fraundorfer⁵ - Show less +3 more•Institutions (5)

Technische Universität München¹, Wageningen University and Research Centre², Wuhan University³, Fudan University⁴, Graz University of Technology⁵

01 Dec 2017-IEEE Geoscience and Remote Sensing Magazine

TL;DR: The challenges of using deep learning for remote-sensing data analysis are analyzed, recent advances are reviewed, and resources are provided that hope will make deep learning in remote sensing seem ridiculously simple.

...read moreread less

Abstract: Central to the looming paradigm shift toward data-intensive science, machine-learning techniques are becoming increasingly important. In particular, deep learning has proven to be both a major breakthrough and an extremely powerful tool in many fields. Shall we embrace deep learning as the key to everything? Or should we resist a black-box solution? These are controversial issues within the remote-sensing community. In this article, we analyze the challenges of using deep learning for remote-sensing data analysis, review recent advances, and provide resources we hope will make deep learning in remote sensing seem ridiculously simple. More importantly, we encourage remote-sensing scientists to bring their expertise into deep learning and use it as an implicit general model to tackle unprecedented, large-scale, influential challenges, such as climate change and urbanization.

...read moreread less

2,095 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse