Home
/
Authors
/
Xiaodong Xie

Author

Xiaodong Xie

Bio: Xiaodong Xie is an academic researcher from Peking University. The author has contributed to research in topics: Encoder & Motion estimation. The author has an hindex of 13, co-authored 85 publications receiving 864 citations.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010

Papers

PDF

Open Access

More filters

Posted Content•

FFA-Net: Feature Fusion Attention Network for Single Image Dehazing

[...]

Xu Qin¹, Zhilin Wang¹, Yuanchao Bai¹, Xiaodong Xie¹, Huizhu Jia² - Show less +1 more•Institutions (2)

Peking University¹, Beihang University²

18 Nov 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: The proposed FFA-Net surpasses previous state-of-the-art single image dehazing methods by a very large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 30.23 dB to 36.39 dB on the SOTS indoor test dataset.

...read moreread less

Abstract: In this paper, we propose an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image. The FFA-Net architecture consists of three key components: 1) A novel Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels. FA treats different features and pixels unequally, which provides additional flexibility in dealing with different types of information, expanding the representational ability of CNNs. 2) A basic block structure consists of Local Residual Learning and Feature Attention, Local Residual Learning allowing the less important information such as thin haze region or low-frequency to be bypassed through multiple local residual connections, let main network architecture focus on more effective information. 3) An Attention-based different levels Feature Fusion (FFA) structure, the feature weights are adaptively learned from the Feature Attention (FA) module, giving more weight to important features. This structure can also retain the information of shallow layers and pass it into deep layers. The experimental results demonstrate that our proposed FFA-Net surpasses previous state-of-the-art single image dehazing methods by a very large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 30.23db to 36.39db on the SOTS indoor test dataset. Code has been made available at GitHub.

...read moreread less

406 citations

Journal Article•DOI•

FFA-Net: Feature Fusion Attention Network for Single Image Dehazing

[...]

Xu Qin¹, Zhilin Wang², Yuanchao Bai¹, Xiaodong Xie¹, Huizhu Jia¹ - Show less +1 more•Institutions (2)

Peking University¹, Beihang University²

03 Apr 2020

TL;DR: Zhang et al. as mentioned in this paper proposed an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image, which consists of three key components: Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels.

...read moreread less

Abstract: In this paper, we propose an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image. The FFA-Net architecture consists of three key components:1) A novel Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels. FA treats different features and pixels unequally, which provides additional flexibility in dealing with different types of information, expanding the representational ability of CNNs. 2) A basic block structure consists of Local Residual Learning and Feature Attention, Local Residual Learning allowing the less important information such as thin haze region or low-frequency to be bypassed through multiple local residual connections, let main network architecture focus on more effective information. 3) An Attention-based different levels Feature Fusion (FFA) structure, the feature weights are adaptively learned from the Feature Attention (FA) module, giving more weight to important features. This structure can also retain the information of shallow layers and pass it into deep layers.The experimental results demonstrate that our proposed FFA-Net surpasses previous state-of-the-art single image dehazing methods by a very large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 30.23 dB to 36.39 dB on the SOTS indoor test dataset. Code has been made available at GitHub.

...read moreread less

382 citations

Journal Article•DOI•

SACNN: Self-Attention Convolutional Neural Network for Low-Dose CT Denoising With Self-Supervised Perceptual Loss Network

[...]

Meng Li¹, William Hsu², Xiaodong Xie¹, Jason Cong², Wen Gao¹ - Show less +1 more•Institutions (2)

Peking University¹, University of California, Los Angeles²

21 Jan 2020-IEEE Transactions on Medical Imaging

TL;DR: A novel 3D self-attention convolutional neural network for the LDCT denoising problem and a self-supervised learning scheme to train a domain-specific autoencoder as the perceptual loss function are proposed.

...read moreread less

Abstract: Computed tomography (CT) is a widely used screening and diagnostic tool that allows clinicians to obtain a high-resolution, volumetric image of internal structures in a non-invasive manner. Increasingly, efforts have been made to improve the image quality of low-dose CT (LDCT) to reduce the cumulative radiation exposure of patients undergoing routine screening exams. The resurgence of deep learning has yielded a new approach for noise reduction by training a deep multi-layer convolutional neural networks (CNN) to map the low-dose to normal-dose CT images. However, CNN-based methods heavily rely on convolutional kernels, which use fixed-size filters to process one local neighborhood within the receptive field at a time. As a result, they are not efficient at retrieving structural information across large regions. In this paper, we propose a novel 3D self-attention convolutional neural network for the LDCT denoising problem. Our 3D self-attention module leverages the 3D volume of CT images to capture a wide range of spatial information both within CT slices and between CT slices. With the help of the 3D self-attention module, CNNs are able to leverage pixels with stronger relationships regardless of their distance and achieve better denoising results. In addition, we propose a self-supervised learning scheme to train a domain-specific autoencoder as the perceptual loss function. We combine these two methods and demonstrate their effectiveness on both CNN-based neural networks and WGAN-based neural networks with comprehensive experiments. Tested on the AAPM-Mayo Clinic Low Dose CT Grand Challenge data set, our experiments demonstrate that self-attention (SA) module and autoencoder (AE) perceptual loss function can efficiently enhance traditional CNNs and can achieve comparable or better results than the state-of-the-art methods.

...read moreread less

166 citations

Proceedings Article•DOI•

LLCNN: A convolutional neural network for low-light image enhancement

[...]

Li Tao¹, Chuang Zhu¹, Guoqing Xiang¹, Yuan Li¹, Huizhu Jia¹, Xiaodong Xie¹ - Show less +2 more•Institutions (1)

Peking University¹

01 Dec 2017

TL;DR: A CNN based method to perform low-light image enhancement with a special module to utilize multiscale feature maps, which can avoid gradient vanishing problem and demonstrates that this method outperforms other contrast enhancement methods.

...read moreread less

Abstract: In this paper, we propose a CNN based method to perform low-light image enhancement. We design a special module to utilize multiscale feature maps, which can avoid gradient vanishing problem as well. In order to preserve image textures as much as possible, we use SSIM loss to train our model. The contrast of low-light images can be adaptively enhanced using our method. Results demonstrate that our CNN based method outperforms other contrast enhancement methods.

...read moreread less

151 citations

Journal Article•DOI•

Attention driven person re-identification

[...]

Fan Yang¹, Ke Yan², Shijian Lu², Huizhu Jia¹, Xiaodong Xie¹, Wen Gao¹ - Show less +2 more•Institutions (2)

Peking University¹, Nanyang Technological University²

01 Feb 2019-Pattern Recognition

TL;DR: A novel attention-driven multi-branch network that learns robust and discriminative human representation from global whole-body images and local body-part images simultaneously simultaneously is proposed.

...read moreread less

130 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•

다중혈관 관상동맥 환자에서 y-문합을 이용하여 양쪽 내흉동맥만을 사용한 우회술의 조기 성적

[...]

성기익, 이영탁, 박계현, 전태국, 박표원, 한일용, 장윤희 - Show less +3 more

01 Mar 2003-The Korean Journal of Thoracic and Cardiovascular Surgery

28,685 citations

Proceedings Article•DOI•

Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation

[...]

Chenxi Liu¹, Liang-Chieh Chen², Florian Schroff², Hartwig Adam², Wei Hua², Alan L. Yuille¹, Li Fei-Fei³ - Show less +3 more•Institutions (3)

Johns Hopkins University¹, Google², Stanford University³

15 Jun 2019

TL;DR: Li et al. as mentioned in this paper proposed to search the network level structure in addition to the cell level structure, which formed a hierarchical architecture search space and achieved state-of-the-art performance without any ImageNet pretraining.

...read moreread less

Abstract: Recently, Neural Architecture Search (NAS) has successfully identified neural network architectures that exceed human designed ones on large-scale image classification. In this paper, we study NAS for semantic image segmentation. Existing works often focus on searching the repeatable cell structure, while hand-designing the outer network structure that controls the spatial resolution changes. This choice simplifies the search space, but becomes increasingly problematic for dense image prediction which exhibits a lot more network level architectural variations. Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space. We present a network level search space that includes many popular designs, and develop a formulation that allows efficient gradient-based architecture search (3 P100 GPU days on Cityscapes images). We demonstrate the effectiveness of the proposed method on the challenging Cityscapes, PASCAL VOC 2012, and ADE20K datasets. Auto-DeepLab, our architecture searched specifically for semantic image segmentation, attains state-of-the-art performance without any ImageNet pretraining.

...read moreread less

863 citations

Posted Content•

Deep Learning for Person Re-identification: A Survey and Outlook

[...]

Mang Ye, Jianbing Shen¹, Gaojie Lin², Tao Xiang³, Ling Shao⁴, Steven C. H. Hoi⁵ - Show less +2 more•Institutions (5)

University of California, Los Angeles¹, Beijing Institute of Technology², University of Surrey³, University of East Anglia⁴, Singapore Management University⁵

13 Jan 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A powerful AGW baseline is designed, achieving state-of-the-art or at least comparable performance on twelve datasets for four different Re-ID tasks, and a new evaluation metric (mINP) is introduced, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re- ID system for real applications.

...read moreread less

Abstract: Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on twelve datasets for FOUR different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed.

...read moreread less

737 citations

Journal Article•DOI•

Deep Multi-Modal Object Detection and Semantic Segmentation for Autonomous Driving: Datasets, Methods, and Challenges

[...]

Di Feng¹, Christian Haase-Schutz¹, Lars Rosenbaum¹, Heinz Hertlein¹, Claudius Gläser¹, Fabian Timm¹, Werner Wiesbeck², Klaus Dietmayer³ - Show less +4 more•Institutions (3)

Bosch¹, Karlsruhe Institute of Technology², University of Ulm³

01 Mar 2021-IEEE Transactions on Intelligent Transportation Systems

TL;DR: In this article, the authors systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving and provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection.

...read moreread less

Abstract: Recent advancements in perception for autonomous driving are driven by deep learning. In order to achieve robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs, Radars), and multiple sensing modalities can be fused to exploit their complementary properties. In this context, many methods have been proposed for deep multi-modal perception problems. However, there is no general guideline for network architecture design, and questions of “what to fuse”, “when to fuse”, and “how to fuse” remain open. This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving. To this end, we first provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection and semantic segmentation in autonomous driving research. We then summarize the fusion methodologies and discuss challenges and open questions. In the appendix, we provide tables that summarize topics and methods. We also provide an interactive online platform to navigate each reference: https://boschresearch.github.io/multimodalperception/ .

...read moreread less

674 citations

Journal Article•DOI•

Deep learning on image denoising: An overview.

[...]

Chunwei Tian¹, Lunke Fei², Wenxian Zheng³, Yong Xu¹, Wangmeng Zuo¹, Chia-Wen Lin⁴ - Show less +2 more•Institutions (4)

Harbin Institute of Technology¹, Guangdong University of Technology², Tsinghua University³, National Tsing Hua University⁴

01 Nov 2020-Neural Networks

TL;DR: A comparative study of deep techniques in image denoising by classifying the deep convolutional neural networks for additive white noisy images, the deep CNNs for real noisy images; the deepCNNs for blind Denoising and the deep network for hybrid noisy images.

...read moreread less

518 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse