Home
/
Authors
/
Pong C. Yuen

Author

Pong C. Yuen

Other affiliations: Southwest Baptist University, Chinese Academy of Sciences, Sun Yat-sen University ...read more

Bio: Pong C. Yuen is an academic researcher from Hong Kong Baptist University. The author has contributed to research in topics: Facial recognition system & Feature extraction. The author has an hindex of 52, co-authored 257 publications receiving 9001 citations. Previous affiliations of Pong C. Yuen include Southwest Baptist University & Chinese Academy of Sciences.

Papers published on a yearly basis

2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1996
1995
1994
1993

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

The Visual Object Tracking VOT2016 Challenge Results

[...]

Matej Kristan¹, Ales Leonardis², Jiří Matas³, Michael Felsberg⁴, Roman Pflugfelder⁵, Luka Cehovin¹, Tomas Vojir³, Gustav Häger⁴, Alan Lukežič¹, Gustavo Fernandez⁵, Abhinav Gupta⁶, Alfredo Petrosino⁷, Alireza Memarmoghadam⁸, Alvaro Garcia-Martin⁹, Andres Solis Montero¹⁰, Andrea Vedaldi¹¹, Andreas Robinson⁴, Andy J. Ma¹², Anton Varfolomieiev¹³, A. Aydin Alatan¹⁴, Aykut Erdem¹⁵, Bernard Ghanem¹⁶, Bin Liu, Bohyung Han¹⁷, Brais Martinez¹⁸, Chang-Ming Chang¹⁹, Changsheng Xu²⁰, Chong Sun²¹, Daijin Kim¹⁷, Dapeng Chen²², Dawei Du²⁰, Deepak Mishra²³, Dit-Yan Yeung²⁴, Erhan Gundogdu²⁵, Erkut Erdem¹⁵, Fahad Shahbaz Khan⁴, Fatih Porikli²⁶, Fatih Porikli²⁷, Fei Zhao²⁰, Filiz Bunyak²⁸, Francesco Battistone⁷, Gao Zhu²⁷, Giorgio Roffo²⁹, Gorthi R. K. Sai Subrahmanyam²³, Guilherme Sousa Bastos³⁰, Guna Seetharaman³¹, Henry Medeiros³², Hongdong Li²⁷, Honggang Qi²⁰, Horst Bischof³³, Horst Possegger³³, Huchuan Lu²¹, Hyemin Lee¹⁷, Hyeonseob Nam³⁴, Hyung Jin Chang³⁵, Isabela Drummond³⁰, Jack Valmadre¹¹, Jae-chan Jeong³⁶, Jaeil Cho³⁶, Jae-Yeong Lee³⁶, Jianke Zhu³⁷, Jiayi Feng²⁰, Jin Gao²⁰, Jin-Young Choi, Jingjing Xiao², Ji-Wan Kim³⁶, Jiyeoup Jeong, João F. Henriques¹¹, Jochen Lang¹⁰, Jongwon Choi, José M. Martínez⁹, Junliang Xing²⁰, Junyu Gao²⁰, Kannappan Palaniappan²⁸, Karel Lebeda³⁸, Ke Gao²⁸, Krystian Mikolajczyk³⁵, Lei Qin²⁰, Lijun Wang²¹, Longyin Wen¹⁹, Luca Bertinetto¹¹, Madan Kumar Rapuru²³, Mahdieh Poostchi²⁸, Mario Edoardo Maresca⁷, Martin Danelljan⁴, Matthias Mueller¹⁶, Mengdan Zhang²⁰, Michael Arens, Michel Valstar¹⁸, Ming Tang²⁰, Mooyeol Baek¹⁷, Muhammad Haris Khan¹⁸, Naiyan Wang²⁴, Nana Fan³⁹, Noor M. Al-Shakarji²⁸, Ondrej Miksik¹¹, Osman Akin¹⁵, Payman Moallem⁸, Pedro Senna³⁰, Philip H. S. Torr¹¹, Pong C. Yuen¹², Qingming Huang³⁹, Qingming Huang²⁰, Rafael Martin-Nieto⁹, Rengarajan Pelapur²⁸, Richard Bowden³⁸, Robert Laganiere¹⁰, Rustam Stolkin², Ryan Walsh³², Sebastian B. Krah, Shengkun Li¹⁹, Shengping Zhang³⁹, Shizeng Yao²⁸, Simon Hadfield³⁸, Simone Melzi²⁹, Siwei Lyu¹⁹, Siyi Li²⁴, Stefan Becker, Stuart Golodetz¹¹, Sumithra Kakanuru²³, Sunglok Choi³⁶, Tao Hu²⁰, Thomas Mauthner³³, Tianzhu Zhang²⁰, Tony P. Pridmore¹⁸, Vincenzo Santopietro⁷, Weiming Hu²⁰, Wenbo Li⁴⁰, Wolfgang Hübner, Xiangyuan Lan¹², Xiaomeng Wang¹⁸, Xin Li³⁹, Yang Li³⁷, Yiannis Demiris³⁵, Yifan Wang²¹, Yuankai Qi³⁹, Zejian Yuan²², Zexiong Cai¹², Zhan Xu³⁷, Zhenyu He³⁹, Zhizhen Chi²¹ - Show less +137 more•Institutions (40)

University of Ljubljana¹, University of Birmingham², Czech Technical University in Prague³, Linköping University⁴, Austrian Institute of Technology⁵, Carnegie Mellon University⁶, Parthenope University of Naples⁷, University of Isfahan⁸, Autonomous University of Madrid⁹, University of Ottawa¹⁰, University of Oxford¹¹, Hong Kong Baptist University¹², Kyiv Polytechnic Institute¹³, Middle East Technical University¹⁴, Hacettepe University¹⁵, King Abdullah University of Science and Technology¹⁶, Pohang University of Science and Technology¹⁷, University of Nottingham¹⁸, University at Albany, SUNY¹⁹, Chinese Academy of Sciences²⁰, Dalian University of Technology²¹, Xi'an Jiaotong University²², Indian Institute of Space Science and Technology²³, Hong Kong University of Science and Technology²⁴, ASELSAN²⁵, Commonwealth Scientific and Industrial Research Organisation²⁶, Australian National University²⁷, University of Missouri²⁸, University of Verona²⁹, Universidade Federal de Itajubá³⁰, United States Naval Research Laboratory³¹, Marquette University³², Graz University of Technology³³, Naver Corporation³⁴, Imperial College London³⁵, Electronics and Telecommunications Research Institute³⁶, Zhejiang University³⁷, University of Surrey³⁸, Harbin Institute of Technology³⁹, Lehigh University⁴⁰

08 Oct 2016

TL;DR: The Visual Object Tracking challenge VOT2016 goes beyond its predecessors by introducing a new semi-automatic ground truth bounding box annotation methodology and extending the evaluation system with the no-reset experiment.

...read moreread less

Abstract: The Visual Object Tracking challenge VOT2016 aims at comparing short-term single-object visual trackers that do not apply pre-learned models of object appearance. Results of 70 trackers are presented, with a large number of trackers being published at major computer vision conferences and journals in the recent years. The number of tested state-of-the-art trackers makes the VOT 2016 the largest and most challenging benchmark on short-term tracking to date. For each participating tracker, a short description is provided in the Appendix. The VOT2016 goes beyond its predecessors by (i) introducing a new semi-automatic ground truth bounding box annotation methodology and (ii) extending the evaluation system with the no-reset experiment. The dataset, the evaluation kit as well as the results are publicly available at the challenge website (http://votchallenge.net).

...read moreread less

744 citations

Journal Article•DOI•

Very Low Resolution Face Recognition Problem

[...]

W. W. W. Zou¹, Pong C. Yuen¹•Institutions (1)

Hong Kong Baptist University¹

01 Jan 2012-IEEE Transactions on Image Processing

TL;DR: Experimental results show that the proposed method outperforms the existing methods, in terms of image quality and recognition accuracy, as well as face super-resolution methods.

...read moreread less

Abstract: This paper addresses the very low resolution (VLR) problem in face recognition in which the resolution of the face image to be recognized is lower than 16 × 16. With the increasing demand of surveillance camera-based applications, the VLR problem happens in many face application systems. Existing face recognition algorithms are not able to give satisfactory performance on the VLR face image. While face super-resolution (SR) methods can be employed to enhance the resolution of the images, the existing learning-based face SR methods do not perform well on such a VLR face image. To overcome this problem, this paper proposes a novel approach to learn the relationship between the high-resolution image space and the VLR image space for face SR. Based on this new approach, two constraints, namely, new data and discriminative constraints, are designed for good visuality and face recognition applications under the VLR problem, respectively. Experimental results show that the proposed SR algorithm based on relationship learning outperforms the existing algorithms in public face databases.

...read moreread less

467 citations

Proceedings Article•DOI•

Unsupervised Embedding Learning via Invariant and Spreading Instance Feature

[...]

Mang Ye¹, Xu Zhang², Pong C. Yuen³, Shih-Fu Chang²•Institutions (3)

Hong Kong Baptist University¹, Columbia University², Southwest Baptist University³

15 Jun 2019

TL;DR: A novel instance based softmax embedding method, which directly optimizes the `real' instance features on top of the softmax function, which achieves significantly faster learning speed and higher accuracy than all existing methods.

...read moreread less

Abstract: This paper studies the unsupervised embedding learning problem, which requires an effective similarity measurement between samples in low-dimensional embedding space. Motivated by the positive concentrated and negative separated properties observed from category-wise supervised learning, we propose to utilize the instance-wise supervision to approximate these properties, which aims at learning data augmentation invariant and instance spread-out features. To achieve this goal, we propose a novel instance based softmax embedding method, which directly optimizes the `real' instance features on top of the softmax function. It achieves significantly faster learning speed and higher accuracy than all existing methods. The proposed method performs well for both seen and unseen testing categories with cosine similarity. It also achieves competitive performance even without pre-trained network over samples from fine-grained categories.

...read moreread less

341 citations

Proceedings Article•

Hierarchical Discriminative Learning for Visible Thermal Person Re-Identification

[...]

Mang Ye¹, Xiangyuan Lan¹, Jiawei Li¹, Pong C. Yuen¹•Institutions (1)

Hong Kong Baptist University¹

27 Apr 2018

TL;DR: An improved two-stream CNN network is presented to learn the multimodality sharable feature representations and identity loss and contrastive loss are integrated to enhance the discriminability and modality-invariance with partially shared layer parameters.

...read moreread less

Abstract: Person re-identification is widely studied in visible spectrum, where all the person images are captured by visible cameras. However, visible cameras may not capture valid appearance information under poor illumination conditions, e.g, at night. In this case, thermal camera is superior since it is less dependent on the lighting by using infrared light to capture the human body. To this end, this paper investigates a cross-modal re-identification problem, namely visible-thermal person re-identification (VT-REID). Existing cross-modal matching methods mainly focus on modeling the cross-modality discrepancy, while VT-REID also suffers from cross-view variations caused by different camera views. Therefore, we propose a hierarchical cross-modality matching model by jointly optimizing the modality-specific and modality-shared metrics. The modality-specific metrics transform two heterogenous modalities into a consistent space that modality-shared metric can be subsequently learnt. Meanwhile, the modality-specific metric compacts features of the same person within each modality to handle the large intra-modality intra-person variations (e.g. viewpoints, pose). Additionally, an improved two-stream CNN network is presented to learn the multi-modality sharable feature representations. Identity loss and contrastive loss are integrated to enhance the discriminability and modality-invariance with partially shared layer parameters. Extensive experiments illustrate the effectiveness and robustness of the proposed method.

...read moreread less

281 citations

Proceedings Article•DOI•

Visible thermal person re-identification via dual-constrained top-ranking

[...]

Mang Ye¹, Zheng Wang², Xiangyuan Lan¹, Pong C. Yuen¹•Institutions (2)

Hong Kong Baptist University¹, National Institute of Informatics²

01 Jul 2018

TL;DR: A dual-path network with a novel bi-directional dual-constrained top-ranking loss to learn discriminative feature representations and identity loss is further incorporated to model the identity-specific information to handle large intra-class variations.

...read moreread less

Abstract: Cross-modality person re-identification between the thermal and visible domains is extremely important for night-time surveillance applications. Existing works in this filed mainly focus on learning sharable feature representations to handle the cross-modality discrepancies. However, besides the cross-modality discrepancy caused by different camera spectrums, visible thermal person re-identification also suffers from large cross-modality and intra-modality variations caused by different camera views and human poses. In this paper, we propose a dual-path network with a novel bi-directional dual-constrained top-ranking loss to learn discriminative feature representations. It is advantageous in two aspects: 1) end-to-end feature learning directly from the data without extra metric learning steps, 2) it simultaneously handles the cross-modality and intra-modality variations to ensure the discriminability of the learnt representations. Meanwhile, identity loss is further incorporated to model the identity-specific information to handle large intra-class variations. Extensive experiments on two datasets demonstrate the superior performance compared to the state-of-the-arts.

...read moreread less

269 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

Collapse

Cited by

PDF

Open Access

More filters

Posted Content•

A Simple Framework for Contrastive Learning of Visual Representations

[...]

Ting Chen¹, Simon Kornblith¹, Mohammad Norouzi¹, Geoffrey E. Hinton¹•Institutions (1)

Google¹

13 Feb 2020-arXiv: Learning

TL;DR: It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.

...read moreread less

Abstract: This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised learning algorithms without requiring specialized architectures or a memory bank. In order to understand what enables the contrastive prediction tasks to learn useful representations, we systematically study the major components of our framework. We show that (1) composition of data augmentations plays a critical role in defining effective predictive tasks, (2) introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and (3) contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning. By combining these findings, we are able to considerably outperform previous methods for self-supervised and semi-supervised learning on ImageNet. A linear classifier trained on self-supervised representations learned by SimCLR achieves 76.5% top-1 accuracy, which is a 7% relative improvement over previous state-of-the-art, matching the performance of a supervised ResNet-50. When fine-tuned on only 1% of the labels, we achieve 85.8% top-5 accuracy, outperforming AlexNet with 100X fewer labels.

...read moreread less

7,951 citations

Proceedings Article•DOI•

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

[...]

Christian Ledig¹, Lucas Theis¹, Ferenc Huszar², Jose Caballero³, Andrew Cunningham, Alejandro Acosta², Andrew Peter Aitken², Alykhan Tejani², Johannes Totz², Zehan Wang², Wenzhe Shi² - Show less +7 more•Institutions (3)

Fırat University¹, Twitter², Imperial College London³

21 Jul 2017

TL;DR: SRGAN as mentioned in this paper proposes a perceptual loss function which consists of an adversarial loss and a content loss, which pushes the solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.

...read moreread less

Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

...read moreread less

6,884 citations

Journal Article•DOI•

Content-based image retrieval at the end of the early years

[...]

Arnold W. M. Smeulders¹, Marcel Worring¹, Simone Santini², Amarnath Gupta², Ramesh Jain - Show less +1 more•Institutions (2)

University of Amsterdam¹, University of California, San Diego²

01 Dec 2000-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.

...read moreread less

Abstract: Presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.

...read moreread less

6,447 citations

Book Chapter•DOI•

Domain-adversarial training of neural networks

[...]

Yaroslav Ganin¹, Evgeniya Ustinova¹, Hana Ajakan², Pascal Germain², Hugo Larochelle³, François Laviolette², Mario Marchand², Victor Lempitsky¹ - Show less +4 more•Institutions (3)

Skolkovo Institute of Science and Technology¹, Laval University², Université de Sherbrooke³

01 Jan 2016-Journal of Machine Learning Research

TL;DR: In this article, a new representation learning approach for domain adaptation is proposed, in which data at training and test time come from similar but different distributions, and features that cannot discriminate between the training (source) and test (target) domains are used to promote the emergence of features that are discriminative for the main learning task on the source domain.

...read moreread less

Abstract: We introduce a new representation learning approach for domain adaptation, in which data at training and test time come from similar but different distributions. Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains. The approach implements this idea in the context of neural network architectures that are trained on labeled data from the source domain and unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of features that are (i) discriminative for the main learning task on the source domain and (ii) indiscriminate with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation and stochastic gradient descent, and can thus be implemented with little effort using any of the deep learning packages. We demonstrate the success of our approach for two distinct classification problems (document sentiment analysis and image classification), where state-of-the-art domain adaptation performance on standard benchmarks is achieved. We also validate the approach for descriptor learning task in the context of person re-identification application.

...read moreread less

4,862 citations

Posted Content•

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

[...]

Fırat University¹, Twitter², Imperial College London³

15 Sep 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: SRGAN, a generative adversarial network (GAN) for image super-resolution (SR), is presented, to its knowledge, the first framework capable of inferring photo-realistic natural images for 4x upscaling factors and a perceptual loss function which consists of an adversarial loss and a content loss.

...read moreread less

4,404 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse