Home
/
Authors
/
Xiaohang Zhan

Author

Xiaohang Zhan

Other affiliations: University of California, Berkeley, SenseTime

Bio: Xiaohang Zhan is an academic researcher from The Chinese University of Hong Kong. The author has contributed to research in topics: Feature learning & Computer science. The author has an hindex of 15, co-authored 32 publications receiving 937 citations. Previous affiliations of Xiaohang Zhan include University of California, Berkeley & SenseTime.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Large-Scale Long-Tailed Recognition in an Open World

[...]

Ziwei Liu¹, Zhongqi Miao², Xiaohang Zhan¹, Jiayun Wang², Boqing Gong³, Stella X. Yu² - Show less +2 more•Institutions (3)

The Chinese University of Hong Kong¹, University of California, Berkeley², Google³

15 Jun 2019

TL;DR: An integrated OLTR algorithm is developed that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world.

...read moreread less

Abstract: Real world data often have a long-tailed and open-ended distribution. A practical recognition system must classify among majority and minority classes, generalize from a few known instances, and acknowledge novelty upon a never seen instance. We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data and optimizing the classification accuracy over a balanced test set which include head, tail, and open classes. OLTR must handle imbalanced classification, few-shot learning, and open-set recognition in one integrated algorithm, whereas existing classification approaches focus only on one aspect and deliver poorly over the entire class spectrum. The key challenges are how to share visual knowledge between head and tail classes and how to reduce confusion between tail and open classes. We develop an integrated OLTR algorithm that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world. Our so-called dynamic meta-embedding combines a direct image feature and an associated memory feature, with the feature norm indicating the familiarity to known classes. On three large-scale OLTR datasets we curate from object-centric ImageNet, scene-centric Places, and face-centric MS1M data, our method consistently outperforms the state-of-the-art. Our code, datasets, and models enable future OLTR research and are publicly available at \url{https://liuziwei7.github.io/projects/LongTail.html}.

...read moreread less

780 citations

Posted Content•

Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation

[...]

Xingang Pan¹, Xiaohang Zhan¹, Bo Dai¹, Dahua Lin¹, Chen Change Loy², Ping Luo³ - Show less +2 more•Institutions (3)

The Chinese University of Hong Kong¹, Nanyang Technological University², University of Hong Kong³

30 Mar 2020-arXiv: Image and Video Processing

TL;DR: This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images by allowing the generator to be fine-tuned on-the-fly in a progressive manner regularized by feature distance obtained by the discriminator in GAN.

...read moreread less

Abstract: Learning a good image prior is a long-term goal for image restoration and manipulation. While existing methods like deep image prior (DIP) capture low-level image statistics, there are still gaps toward an image prior that captures rich image semantics including color, spatial coherence, textures, and high-level concepts. This work presents an effective way to exploit the image prior captured by a generative adversarial network (GAN) trained on large-scale natural images. As shown in Fig.1, the deep generative prior (DGP) provides compelling results to restore missing semantics, e.g., color, patch, resolution, of various degraded images. It also enables diverse image manipulation including random jittering, image morphing, and category transfer. Such highly flexible restoration and manipulation are made possible through relaxing the assumption of existing GAN-inversion methods, which tend to fix the generator. Notably, we allow the generator to be fine-tuned on-the-fly in a progressive manner regularized by feature distance obtained by the discriminator in GAN. We show that these easy-to-implement and practical changes help preserve the reconstruction to remain in the manifold of nature image, and thus lead to more precise and faithful reconstruction for real images. Code is available at this https URL.

...read moreread less

214 citations

Posted Content•

Large-Scale Long-Tailed Recognition in an Open World

[...]

Ziwei Liu¹, Zhongqi Miao¹, Xiaohang Zhan², Jiayun Wang², Boqing Gong², Stella X. Yu² - Show less +2 more•Institutions (2)

The Chinese University of Hong Kong¹, University of California, Berkeley²

10 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Open Long-Tailed Recognition (OLTR) as mentioned in this paper maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world.

...read moreread less

138 citations

Proceedings Article•DOI•

Learning to Cluster Faces on an Affinity Graph

[...]

Lei Yang¹, Xiaohang Zhan¹, Dapeng Chen², Junjie Yan², Chen Change Loy³, Dahua Lin¹ - Show less +2 more•Institutions (3)

The Chinese University of Hong Kong¹, SenseTime², Nanyang Technological University³

01 Jun 2019

TL;DR: This work explores a novel approach, namely, learning to cluster instead of relying on hand-crafted criteria, and proposes a framework based on graph convolutional network, which combines a detection and a segmentation module to pinpoint face clusters.

...read moreread less

Abstract: Face recognition sees remarkable progress in recent years, and its performance has reached a very high level. Taking it to a next level requires substantially larger data, which would involve prohibitive annotation cost. Hence, exploiting unlabeled data becomes an appealing alternative. Recent works have shown that clustering unlabeled faces is a promising approach, often leading to notable performance gains. Yet, how to effectively cluster, especially on a large-scale (i.e. million-level or above) dataset, remains an open question. A key challenge lies in the complex variations of cluster patterns, which make it difficult for conventional clustering methods to meet the needed accuracy. This work explores a novel approach, namely, learning to cluster instead of relying on hand-crafted criteria. Specifically, we propose a framework based on graph convolutional network, which combines a detection and a segmentation module to pinpoint face clusters. Experiments show that our method yields significantly more accurate face clusters, which, as a result, also lead to further performance gain in face recognition.

...read moreread less

109 citations

Proceedings Article•DOI•

Online Deep Clustering for Unsupervised Representation Learning

[...]

Xiaohang Zhan¹, Jiahao Xie², Ziwei Liu¹, Yew-Soon Ong², Chen Change Loy² - Show less +1 more•Institutions (2)

The Chinese University of Hong Kong¹, Nanyang Technological University²

14 Jun 2020

TL;DR: Online deep clustering (ODC) as mentioned in this paper performs clustering and network update simultaneously rather than alternatingly, where the cluster centroids should evolve steadily in keeping the classifier stably updated.

...read moreread less

Abstract: Joint clustering and feature learning methods have shown remarkable performance in unsupervised representation learning. However, the training schedule alternating between feature clustering and network parameters update leads to unstable learning of visual representations. To overcome this challenge, we propose Online Deep Clustering (ODC) that performs clustering and network update simultaneously rather than alternatingly. Our key insight is that the cluster centroids should evolve steadily in keeping the classifier stably updated. Specifically, we design and maintain two dynamic memory modules, i.e., samples memory to store samples' labels and features, and centroids memory for centroids evolution. We break down the abrupt global clustering into steady memory update and batch-wise label re-assignment. The process is integrated into network update iterations. In this way, labels and the network evolve shoulder-to-shoulder rather than alternatingly. Extensive experiments demonstrate that ODC stabilizes the training process and boosts the performance effectively.

...read moreread less

106 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

IEEE transactions on pattern analysis and machine intelligence

[...]

Ieee Xplore

01 Jan 1979

TL;DR: This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis and addressing interesting real-world computer Vision and multimedia applications.

...read moreread less

Abstract: In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to: • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information. Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

...read moreread less

1,758 citations

Proceedings Article•DOI•

Multi-Stage Progressive Image Restoration

[...]

Syed Waqas Zamir, Aditya Arora, Salman Khan¹, Munawar Hayat², Fahad Shahbaz Khan¹, Ming-Hsuan Yang³, Ling Shao - Show less +3 more•Institutions (3)

Zayed University¹, Monash University², University of California, Merced³

04 Feb 2021

TL;DR: MPRNet as discussed by the authors proposes a multi-stage architecture that progressively learns restoration functions for the degraded inputs, thereby breaking down the overall recovery process into more manageable steps, and introduces a novel per-pixel adaptive design that leverages in-situ supervised attention to reweight the local features.

...read moreread less

Abstract: Image restoration tasks demand a complex balance between spatial details and high-level contextualized information while recovering images. In this paper, we propose a novel synergistic design that can optimally balance these competing goals. Our main proposal is a multi-stage architecture, that progressively learns restoration functions for the degraded inputs, thereby breaking down the overall recovery process into more manageable steps. Specifically, our model first learns the contextualized features using encoder-decoder architectures and later combines them with a high-resolution branch that retains local information. At each stage, we introduce a novel per-pixel adaptive design that leverages in-situ supervised attention to reweight the local features. A key ingredient in such a multi-stage architecture is the information exchange between different stages. To this end, we propose a two-faceted approach where the information is not only exchanged sequentially from early to late stages, but lateral connections between feature processing blocks also exist to avoid any loss of information. The resulting tightly interlinked multi-stage architecture, named as MPRNet, delivers strong performance gains on ten datasets across a range of tasks including image deraining, deblurring, and denoising. The source code and pre-trained models are available at https://github.com/swz30/MPRNet.

...read moreread less

716 citations

Proceedings Article•

Decoupling Representation and Classifier for Long-Tailed Recognition

[...]

Bingyi Kang¹, Saining Xie², Marcus Rohrbach², Zhicheng Yan³, Albert Gordo², Jiashi Feng¹, Yannis Kalantidis² - Show less +3 more•Institutions (3)

National University of Singapore¹, Facebook², University of Illinois at Urbana–Champaign³

30 Apr 2020

TL;DR: It is shown that it is possible to outperform carefully designed losses, sampling strategies, even complex modules with memory, by using a straightforward approach that decouples representation and classification.

...read moreread less

Abstract: The long-tail distribution of the visual world poses great challenges for deep learning based classification models on how to handle the class imbalance problem. Existing solutions usually involve class-balancing strategies, e.g., by loss re-weighting, data re-sampling, or transfer learning from head- to tail-classes, but all of them adhere to the scheme of jointly learning representations and classifiers. In this work, we decouple the learning procedure into representation learning and classification, and systematically explore how different balancing strategies affect them for long-tailed recognition. The findings are surprising: (1) data imbalance might not be an issue in learning high-quality representations; (2) with representations learned with the simplest instance-balanced (natural) sampling, it is also possible to achieve strong long-tailed recognition ability at little to no cost by adjusting only the classifier. We conduct extensive experiments and set new state-of-the-art performance on common long-tailed benchmarks like ImageNet-LT, Places-LT and iNaturalist, showing that it is possible to outperform carefully designed losses, sampling strategies, even complex modules with memory, by using a straightforward approach that decouples representation and classification.

...read moreread less

631 citations

Posted Content•

Prototypical Contrastive Learning of Unsupervised Representations

[...]

Junnan Li¹, Pan Zhou¹, Caiming Xiong¹, Richard Socher¹, Steven C. H. Hoi¹ - Show less +1 more•Institutions (1)

Salesforce.com¹

11 May 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper introduces prototypes as latent variables to help find the maximum-likelihood estimation of the network parameters in an Expectation-Maximization framework and proposes ProtoNCE loss, a generalized version of the InfoN CE loss for contrastive learning, which encourages representations to be closer to their assigned prototypes.

...read moreread less

Abstract: This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of instance-wise contrastive learning. PCL not only learns low-level features for the task of instance discrimination, but more importantly, it implicitly encodes semantic structures of the data into the learned embedding space. Specifically, we introduce prototypes as latent variables to help find the maximum-likelihood estimation of the network parameters in an Expectation-Maximization framework. We iteratively perform E-step as finding the distribution of prototypes via clustering and M-step as optimizing the network via contrastive learning. We propose ProtoNCE loss, a generalized version of the InfoNCE loss for contrastive learning, which encourages representations to be closer to their assigned prototypes. PCL achieves state-of-the-art results on multiple unsupervised representation learning benchmarks, with >10% accuracy improvement in low-resource transfer tasks. Code is available at this https URL.

...read moreread less

493 citations

Journal Article•DOI•

Recent Advances in Open Set Recognition: A Survey

[...]

Chuanxing Geng¹, Sheng-Jun Huang¹, Songcan Chen¹•Institutions (1)

Nanjing University of Aeronautics and Astronautics¹

01 Oct 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper provides a comprehensive survey of existing open set recognition techniques covering various aspects ranging from related definitions, representations of models, datasets, evaluation criteria, and algorithm comparisons to highlight the limitations of existing approaches and point out some promising subsequent research directions.

...read moreread less

Abstract: In real-world recognition/classification tasks, limited by various objective factors, it is usually difficult to collect training samples to exhaust all classes when training a recognizer or classifier. A more realistic scenario is open set recognition (OSR), where incomplete knowledge of the world exists at training time, and unknown classes can be submitted to an algorithm during testing, requiring the classifiers to not only accurately classify the seen classes, but also effectively deal with unseen ones. This paper provides a comprehensive survey of existing open set recognition techniques covering various aspects ranging from related definitions, representations of models, datasets, evaluation criteria, and algorithm comparisons. Furthermore, we briefly analyze the relationships between OSR and its related tasks including zero-shot, one-shot (few-shot) recognition/learning techniques, classification with reject option, and so forth. Additionally, we also review the open world recognition which can be seen as a natural extension of OSR. Importantly, we highlight the limitations of existing approaches and point out some promising subsequent research directions in this field.

...read moreread less

492 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse