Home
/
Authors
/
Xin Jin

Author

Xin Jin

University of Science and Technology of China

Other affiliations: Microsoft

Bio: Xin Jin is an academic researcher from University of Science and Technology of China. The author has contributed to research in topics: Computer science & Feature (computer vision). The author has an hindex of 12, co-authored 46 publications receiving 584 citations. Previous affiliations of Xin Jin include Microsoft.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Relation-Aware Global Attention for Person Re-Identification

[...]

Zhizheng Zhang¹, Cuiling Lan², Wenjun Zeng², Xin Jin¹, Zhibo Chen¹ - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

14 Jun 2020

TL;DR: This work proposes an effective Relation-Aware Global Attention (RGA) module which captures the global structural information for better attention learning and proposes to stack the relations, i.e., its pairwise correlations/affinities with all the feature positions together to learn the attention with a shallow convolutional model.

...read moreread less

Abstract: For person re-identification (re-id), attention mechanisms have become attractive as they aim at strengthening discriminative features and suppressing irrelevant ones, which matches well the key of re-id, i.e., discriminative feature learning. Previous approaches typically learn attention using local convolutions, ignoring the mining of knowledge from global structure patterns. Intuitively, the affinities among spatial positions/nodes in the feature map provide clustering-like information and are helpful for inferring semantics and thus attention, especially for person images where the feasible human poses are constrained. In this work, we propose an effective Relation-Aware Global Attention (RGA) module which captures the global structural information for better attention learning. Specifically, for each feature position, in order to compactly grasp the structural information of global scope and local appearance information, we propose to stack the relations, i.e., its pairwise correlations/affinities with all the feature positions (e.g., in raster scan order), and the feature itself together to learn the attention with a shallow convolutional model. Extensive ablation studies demonstrate that our RGA can significantly enhance the feature representation power and help achieve the state-of-the-art performance on several popular benchmarks. The source code is available at https://github.com/microsoft/Relation-Aware-Global-Attention-Networks.

...read moreread less

354 citations

Proceedings Article•DOI•

Style Normalization and Restitution for Generalizable Person Re-Identification

[...]

Xin Jin¹, Cuiling Lan², Wenjun Zeng², Zhibo Chen¹, Li Zhang³ - Show less +1 more•Institutions (3)

University of Science and Technology of China¹, Microsoft², University of Oxford³

14 Jun 2020

TL;DR: The aim of this paper is to design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains, and to enforce a dual causal loss constraint in SNR to encourage the separation of identity-relevant features and identity-irrelevant features.

...read moreread less

Abstract: Existing fully-supervised person re-identification (ReID) methods usually suffer from poor generalization capability caused by domain gaps. The key to solving this problem lies in filtering out identity-irrelevant interference and learning domain-invariant person representations. In this paper, we aim to design a generalizable person ReID framework which trains a model on source domains yet is able to generalize/perform well on target domains. To achieve this goal, we propose a simple yet effective Style Normalization and Restitution (SNR) module. Specifically, we filter out style variations (e.g., illumination, color contrast) by Instance Normalization (IN). However, such a process inevitably removes discriminative information. We propose to distill identity-relevant feature from the removed information and restitute it to the network to ensure high discrimination. For better disentanglement, we enforce a dual causal loss constraint in SNR to encourage the separation of identity-relevant features and identity-irrelevant features. Extensive experiments demonstrate the strong generalization capability of our framework. Our models empowered by the SNR modules significantly outperform the state-of-the-art domain generalization approaches on multiple widely-used person ReID benchmarks, and also show superiority on unsupervised domain adaptation.

...read moreread less

276 citations

Journal Article•DOI•

Region Normalization for Image Inpainting

[...]

Tao Yu¹, Zongyu Guo¹, Xin Jin¹, Shilin Wu¹, Zhibo Chen¹, Weiping Li¹, Zhizheng Zhang¹, Sen Liu¹ - Show less +4 more•Institutions (1)

University of Science and Technology of China¹

03 Apr 2020

TL;DR: It is shown that the mean and variance shifts caused by full-spatial FN limit the image inpainting network training and a spatial region-wise normalization named Region Normalization (RN) is proposed to overcome the limitation.

...read moreread less

Abstract: Feature Normalization (FN) is an important technique to help neural network training, which typically normalizes features across spatial dimensions. Most previous image inpainting methods apply FN in their networks without considering the impact of the corrupted regions of the input image on normalization, e.g. mean and variance shifts. In this work, we show that the mean and variance shifts caused by full-spatial FN limit the image inpainting network training and we propose a spatial region-wise normalization named Region Normalization (RN) to overcome the limitation. RN divides spatial pixels into different regions according to the input mask, and computes the mean and variance in each region for normalization. We develop two kinds of RN for our image inpainting network: (1) Basic RN (RN-B), which normalizes pixels from the corrupted and uncorrupted regions separately based on the original inpainting mask to solve the mean and variance shift problem; (2) Learnable RN (RN-L), which automatically detects potentially corrupted and uncorrupted regions for separate normalization, and performs global affine transformation to enhance their fusion. We apply RN-B in the early layers and RN-L in the latter layers of the network respectively. Experiments show that our method outperforms current state-of-the-art methods quantitatively and qualitatively. We further generalize RN to other inpainting networks and achieve consistent performance improvements.

...read moreread less

123 citations

Journal Article•DOI•

Learning for Video Compression

[...]

Zhibo Chen¹, Tianyu He¹, Xin Jin¹, Feng Wu¹•Institutions (1)

University of Science and Technology of China¹

01 Feb 2020-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The proposed PixelMotionCNN (PMCNN) which includes motion extension and hybrid prediction networks can model spatiotemporal coherence to effectively perform predictive coding inside the learning network and provides a possible new direction to further improve compression efficiency and functionalities of future video coding.

...read moreread less

Abstract: One key challenge to learning-based video compression is that motion predictive coding, a very effective tool for video compression, can hardly be trained into a neural network. In this paper, we propose the concept of PixelMotionCNN (PMCNN) which includes motion extension and hybrid prediction networks. PMCNN can model spatiotemporal coherence to effectively perform predictive coding inside the learning network. On the basis of PMCNN, we further explore a learning-based framework for video compression with additional components of iterative analysis/synthesis and binarization. The experimental results demonstrate the effectiveness of the proposed scheme. Although entropy coding and complex configurations are not employed in this paper, we still demonstrate superior performance compared with MPEG-2 and achieve comparable results with H.264 codec. The proposed learning-based scheme provides a possible new direction to further improve compression efficiency and functionalities of future video coding.

...read moreread less

116 citations

Journal Article•DOI•

Semantics-Aligned Representation Learning for Person Re-Identification

[...]

Xin Jin¹, Cuiling Lan², Wenjun Zeng², Guoqiang Wei¹, Zhibo Chen¹ - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

03 Apr 2020

TL;DR: A framework that drives the reID network to learn semantics-aligned feature representation through delicate supervision designs is proposed and achieves the state-of-the-art performances on the benchmark datasets CUHK03, Market1501, MSMT17, and the partial person reID dataset Partial REID.

...read moreread less

Abstract: Person re-identification (reID) aims to match person images to retrieve the ones with the same identity. This is a challenging task, as the images to be matched are generally semantically misaligned due to the diversity of human poses and capture viewpoints, incompleteness of the visible bodies (due to occlusion), etc. In this paper, we propose a framework that drives the reID network to learn semantics-aligned feature representation through delicate supervision designs. Specifically, we build a Semantics Aligning Network (SAN) which consists of a base network as encoder (SA-Enc) for re-ID, and a decoder (SA-Dec) for reconstructing/regressing the densely semantics aligned full texture image. We jointly train the SAN under the supervisions of person re-identification and aligned texture generation. Moreover, at the decoder, besides the reconstruction loss, we add Triplet ReID constraints over the feature maps as the perceptual losses. The decoder is discarded in the inference and thus our scheme is computationally efficient. Ablation studies demonstrate the effectiveness of our design. We achieve the state-of-the-art performances on the benchmark datasets CUHK03, Market1501, MSMT17, and the partial person reID dataset Partial REID.

...read moreread less

85 citations

1
2
3
4
…
5
6
7
8
9
10
11
12

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

[...]

杉山拓海

12 Sep 2017-Computers & Graphics

3,940 citations

Reference Entry•DOI•

IEEE Transactions on Pattern Analysis and Machine Intelligence

[...]

King-Sun Fu

15 Oct 2004

2,118 citations

Journal Article•DOI•

A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges

[...]

Moloud Abdar¹, Farhad Pourpanah², Sadiq Hussain³, Dana Rezazadegan⁴, Li Liu⁵, Mohammad Ghavamzadeh⁶, Paul Fieguth⁷, Xiaochun Cao⁸, Abbas Khosravi¹, U. Rajendra Acharya⁹, U. Rajendra Acharya¹⁰, U. Rajendra Acharya¹¹, Vladimir Makarenkov¹², Saeid Nahavandi¹ - Show less +10 more•Institutions (12)

Deakin University¹, Shenzhen University², Dibrugarh University³, Swinburne University of Technology⁴, University of Oulu⁵, Google⁶, University of Waterloo⁷, Chinese Academy of Sciences⁸, National University of Singapore⁹, Asia University (Taiwan)¹⁰, Ngee Ann Polytechnic¹¹, Université du Québec¹²

12 Nov 2020-arXiv: Learning

TL;DR: This study reviews recent advances in UQ methods used in deep learning and investigates the application of these methods in reinforcement learning (RL), and outlines a few important applications of UZ methods.

...read moreread less

Abstract: Uncertainty quantification (UQ) plays a pivotal role in reduction of uncertainties during both optimization and decision making processes. It can be applied to solve a variety of real-world applications in science and engineering. Bayesian approximation and ensemble learning techniques are two most widely-used UQ methods in the literature. In this regard, researchers have proposed different UQ methods and examined their performance in a variety of applications such as computer vision (e.g., self-driving cars and object detection), image processing (e.g., image restoration), medical image analysis (e.g., medical image classification and segmentation), natural language processing (e.g., text classification, social media texts and recidivism risk-scoring), bioinformatics, etc. This study reviews recent advances in UQ methods used in deep learning. Moreover, we also investigate the application of these methods in reinforcement learning (RL). Then, we outline a few important applications of UQ methods. Finally, we briefly highlight the fundamental research challenges faced by UQ methods and discuss the future research directions in this field.

...read moreread less

809 citations

Posted Content•

Deep Learning for Person Re-identification: A Survey and Outlook

[...]

Mang Ye, Jianbing Shen¹, Gaojie Lin², Tao Xiang³, Ling Shao⁴, Steven C. H. Hoi⁵ - Show less +2 more•Institutions (5)

University of California, Los Angeles¹, Beijing Institute of Technology², University of Surrey³, University of East Anglia⁴, Singapore Management University⁵

13 Jan 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A powerful AGW baseline is designed, achieving state-of-the-art or at least comparable performance on twelve datasets for four different Re-ID tasks, and a new evaluation metric (mINP) is introduced, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re- ID system for real applications.

...read moreread less

Abstract: Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on twelve datasets for FOUR different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed.

...read moreread less

737 citations

Journal Article•DOI•

EnlightenGAN: Deep Light Enhancement Without Paired Supervision

[...]

Yifan Jiang¹, Xinyu Gong¹, Ding Liu, Yu Cheng², Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou³, Zhangyang Wang¹ - Show less +5 more•Institutions (3)

University of Texas at Austin¹, Microsoft², Huazhong University of Science and Technology³

22 Jan 2021-IEEE Transactions on Image Processing

TL;DR: EnlightenGAN as mentioned in this paper proposes a highly effective unsupervised generative adversarial network that can be trained without low/normal-light image pairs, yet proves to generalize very well on various real-world test images.

...read moreread less

Abstract: Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data? As one such example, this paper explores the low-light image enhancement problem, where in practice it is extremely challenging to simultaneously take a low-light and a normal-light photo of the same visual scene. We propose a highly effective unsupervised generative adversarial network, dubbed EnlightenGAN , that can be trained without low/normal-light image pairs, yet proves to generalize very well on various real-world test images. Instead of supervising the learning using ground truth data, we propose to regularize the unpaired training using the information extracted from the input itself, and benchmark a series of innovations for the low-light image enhancement problem, including a global-local discriminator structure, a self-regularized perceptual loss fusion, and the attention mechanism. Through extensive experiments, our proposed approach outperforms recent methods under a variety of metrics in terms of visual quality and subjective user study. Thanks to the great flexibility brought by unpaired training, EnlightenGAN is demonstrated to be easily adaptable to enhancing real-world images from various domains. Our codes and pre-trained models are available at: https://github.com/VITA-Group/EnlightenGAN .

...read moreread less

537 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse