Home
/
Authors
/
Chen Change Loy

Author

Chen Change Loy

Other affiliations: Harbin Institute of Technology, University of Sydney, Chinese Academy of Sciences ...read more

Bio: Chen Change Loy is an academic researcher from Nanyang Technological University. The author has contributed to research in topics: Feature (computer vision) & Image restoration. The author has an hindex of 15, co-authored 111 publications receiving 782 citations. Previous affiliations of Chen Change Loy include Harbin Institute of Technology & University of Sydney.

Papers published on a yearly basis

2021
2020
2019
2018
2015
2014
2012

Papers

PDF

Open Access

More filters

Posted Content•

Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation

[...]

Xingang Pan¹, Xiaohang Zhan¹, Bo Dai¹, Dahua Lin¹, Chen Change Loy², Ping Luo³ - Show less +2 more•Institutions (3)

The Chinese University of Hong Kong¹, Nanyang Technological University², University of Hong Kong³

30 Mar 2020-arXiv: Image and Video Processing

TL;DR: This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images by allowing the generator to be fine-tuned on-the-fly in a progressive manner regularized by feature distance obtained by the discriminator in GAN.

...read moreread less

Abstract: Learning a good image prior is a long-term goal for image restoration and manipulation. While existing methods like deep image prior (DIP) capture low-level image statistics, there are still gaps toward an image prior that captures rich image semantics including color, spatial coherence, textures, and high-level concepts. This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images. As shown in Fig.1, the deep generative prior (DGP) provides compelling results to restore missing semantics, e.g., color, patch, resolution, of various degraded images. It also enables diverse image manipulation including random jittering, image morphing, and category transfer. Such highly flexible restoration and manipulation are made possible through relaxing the assumption of existing GAN-inversion methods, which tend to fix the generator. Notably, we allow the generator to be fine-tuned on-the-fly in a progressive manner regularized by feature distance obtained by the discriminator in GAN. We show that these easy-to-implement and practical changes help preserve the reconstruction to remain in the manifold of nature image, and thus lead to more precise and faithful reconstruction for real images. Code is available at this https URL.

...read moreread less

214 citations

Posted Content•

BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond

[...]

Kelvin C.K. Chan¹, Xintao Wang², Ke Yu³, Chao Dong⁴, Chen Change Loy¹ - Show less +1 more•Institutions (4)

Nanyang Technological University¹, Tencent², The Chinese University of Hong Kong³, Chinese Academy of Sciences⁴

03 Dec 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A succinct pipeline is shown that achieves appealing improvements in terms of speed and restoration quality in comparison to many state-of-the-art algorithms and can serve as strong baselines for future VSR approaches.

...read moreread less

Abstract: Video super-resolution (VSR) approaches tend to have more components than the image counterparts as they need to exploit the additional temporal dimension. Complex designs are not uncommon. In this study, we wish to untangle the knots and reconsider some most essential components for VSR guided by four basic functionalities, i.e., Propagation, Alignment, Aggregation, and Upsampling. By reusing some existing components added with minimal redesigns, we show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality in comparison to many state-of-the-art algorithms. We conduct systematic analysis to explain how such gain can be obtained and discuss the pitfalls. We further show the extensibility of BasicVSR by presenting an information-refill mechanism and a coupled propagation scheme to facilitate information aggregation. The BasicVSR and its extension, IconVSR, can serve as strong baselines for future VSR approaches.

...read moreread less

167 citations

Proceedings Article•DOI•

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

[...]

Hang Zhou¹, Yasheng Sun², Wayne Wu², Chen Change Loy³, Xiaogang Wang¹, Ziwei Liu³ - Show less +2 more•Institutions (3)

The Chinese University of Hong Kong¹, SenseTime², Nanyang Technological University³

22 Apr 2021

TL;DR: In this article, a pose code is learned in a modulated convolution-based reconstruction framework to generate pose-controllable talking faces with audio-visual modality modularization.

...read moreread less

Abstract: While accurate lip synchronization has been achieved for arbitrary-subject audio-driven talking face generation, the problem of how to efficiently drive the head pose remains. Previous methods rely on pre-estimated structural information such as landmarks and 3D parameters, aiming to generate personalized rhythmic movements. However, the inaccuracy of such estimated information under extreme conditions would lead to degradation problems. In this paper, we propose a clean yet effective framework to generate pose-controllable talking faces. We operate on non-aligned raw face images, using only a single photo as an identity reference. The key is to modularize audio-visual representations by devising an implicit low-dimension pose code. Substantially, both speech content and head pose information lie in a joint non-identity embedding space. While speech content information can be defined by learning the intrinsic synchronization between audio-visual modalities, we identify that a pose code will be complementarily learned in a modulated convolution-based reconstruction framework.Extensive experiments show that our method generates accurately lip-synced talking faces whose poses are controllable by other videos. Moreover, our model has multiple advanced capabilities including extreme view robustness and talking face frontalization.1

...read moreread less

158 citations

Proceedings Article•DOI•

BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond

[...]

Kelvin C.K. Chan¹, Xintao Wang², Ke Yu³, Chao Dong⁴, Chen Change Loy¹ - Show less +1 more•Institutions (4)

Nanyang Technological University¹, Tencent², The Chinese University of Hong Kong³, Chinese Academy of Sciences⁴

20 Jun 2021

TL;DR: In this article, the authors propose a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality in comparison to many state-of-the-art algorithms.

...read moreread less

113 citations

Proceedings Article•DOI•

Audio-Driven Emotional Video Portraits

[...]

Xinya Ji¹, Hang Zhou², Kaisiyuan Wang³, Wayne Wu⁴, Chen Change Loy⁵, Xun Cao¹, Feng Xu⁶ - Show less +3 more•Institutions (6)

Nanjing University¹, The Chinese University of Hong Kong², University of Sydney³, SenseTime⁴, Nanyang Technological University⁵, Tsinghua University⁶

01 Jun 2021

TL;DR: In this paper, a cross-reconstructed emotion disentanglement technique is proposed to decompose speech into two decoupled spaces, i.e., a duration independent emotion space and a duration dependent content space.

...read moreread less

Abstract: Despite previous success in generating audio-driven talking heads, most of the previous studies focus on the correlation between speech content and the mouth shape. Facial emotion, which is one of the most important features on natural human faces, is always neglected in their methods. In this work, we present Emotional Video Portraits (EVP), a system for synthesizing high-quality video portraits with vivid emotional dynamics driven by audios. Specifically, we propose the Cross-Reconstructed Emotion Disentanglement technique to decompose speech into two decoupled spaces, i.e., a duration-independent emotion space and a duration- dependent content space. With the disentangled features, dynamic 2D emotional facial landmarks can be deduced. Then we propose the Target-Adaptive Face Synthesis technique to generate the final high-quality video portraits, by bridging the gap between the deduced landmarks and the natural head poses of target videos. Extensive experiments demonstrate the effectiveness of our method both qualitatively and quantitatively.1

...read moreread less

106 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Proceedings Article•

A morphable model for the synthesis of 3D faces

[...]

Matthew Turk

01 Jan 1999

2,010 citations

IEEE transactions on pattern analysis and machine intelligence

[...]

Ieee Xplore

01 Jan 1979

TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.

...read moreread less

Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

...read moreread less

1,758 citations

Posted Content•

Learning without Forgetting

[...]

Zhizhong Li¹, Derek Hoiem¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

29 Jun 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes the Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities, and performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques.

...read moreread less

Abstract: When building a unified vision system or gradually adding new capabilities to a system, the usual assumption is that training data for all tasks is always available. However, as the number of tasks grows, storing and retraining on such data becomes infeasible. A new problem arises where we add new capabilities to a Convolutional Neural Network (CNN), but the training data for its existing capabilities are unavailable. We propose our Learning without Forgetting method, which uses only new task data to train the network while preserving the original capabilities. Our method performs favorably compared to commonly used feature extraction and fine-tuning adaption techniques and performs similarly to multitask learning that uses original task data we assume unavailable. A more surprising observation is that Learning without Forgetting may be able to replace fine-tuning with similar old and new task datasets for improved new task performance.

...read moreread less

1,037 citations

Journal Article•DOI•

Knowledge Distillation: A Survey

[...]

Jianping Gou¹, Jianping Gou², Baosheng Yu¹, Stephen J. Maybank³, Dacheng Tao¹ - Show less +1 more•Institutions (3)

University of Sydney¹, Jiangsu University², Birkbeck, University of London³

09 Jun 2020-arXiv: Learning

TL;DR: A comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications can be found in this paper.

...read moreread less

Abstract: In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.

...read moreread less

1,027 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse