scispace - formally typeset
Search or ask a question
Author

Xiaodong Xie

Bio: Xiaodong Xie is an academic researcher from Peking University. The author has contributed to research in topics: Encoder & Motion estimation. The author has an hindex of 13, co-authored 85 publications receiving 864 citations.


Papers
More filters
Posted Content
Xu Qin1, Zhilin Wang1, Yuanchao Bai1, Xiaodong Xie1, Huizhu Jia2 
TL;DR: The proposed FFA-Net surpasses previous state-of-the-art single image dehazing methods by a very large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 30.23 dB to 36.39 dB on the SOTS indoor test dataset.
Abstract: In this paper, we propose an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image. The FFA-Net architecture consists of three key components: 1) A novel Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels. FA treats different features and pixels unequally, which provides additional flexibility in dealing with different types of information, expanding the representational ability of CNNs. 2) A basic block structure consists of Local Residual Learning and Feature Attention, Local Residual Learning allowing the less important information such as thin haze region or low-frequency to be bypassed through multiple local residual connections, let main network architecture focus on more effective information. 3) An Attention-based different levels Feature Fusion (FFA) structure, the feature weights are adaptively learned from the Feature Attention (FA) module, giving more weight to important features. This structure can also retain the information of shallow layers and pass it into deep layers. The experimental results demonstrate that our proposed FFA-Net surpasses previous state-of-the-art single image dehazing methods by a very large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 30.23db to 36.39db on the SOTS indoor test dataset. Code has been made available at GitHub.

406 citations

Journal ArticleDOI
Xu Qin1, Zhilin Wang2, Yuanchao Bai1, Xiaodong Xie1, Huizhu Jia1 
03 Apr 2020
TL;DR: Zhang et al. as mentioned in this paper proposed an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image, which consists of three key components: Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels.
Abstract: In this paper, we propose an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image. The FFA-Net architecture consists of three key components:1) A novel Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels. FA treats different features and pixels unequally, which provides additional flexibility in dealing with different types of information, expanding the representational ability of CNNs. 2) A basic block structure consists of Local Residual Learning and Feature Attention, Local Residual Learning allowing the less important information such as thin haze region or low-frequency to be bypassed through multiple local residual connections, let main network architecture focus on more effective information. 3) An Attention-based different levels Feature Fusion (FFA) structure, the feature weights are adaptively learned from the Feature Attention (FA) module, giving more weight to important features. This structure can also retain the information of shallow layers and pass it into deep layers.The experimental results demonstrate that our proposed FFA-Net surpasses previous state-of-the-art single image dehazing methods by a very large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 30.23 dB to 36.39 dB on the SOTS indoor test dataset. Code has been made available at GitHub.

382 citations

Journal ArticleDOI
TL;DR: A novel 3D self-attention convolutional neural network for the LDCT denoising problem and a self-supervised learning scheme to train a domain-specific autoencoder as the perceptual loss function are proposed.
Abstract: Computed tomography (CT) is a widely used screening and diagnostic tool that allows clinicians to obtain a high-resolution, volumetric image of internal structures in a non-invasive manner. Increasingly, efforts have been made to improve the image quality of low-dose CT (LDCT) to reduce the cumulative radiation exposure of patients undergoing routine screening exams. The resurgence of deep learning has yielded a new approach for noise reduction by training a deep multi-layer convolutional neural networks (CNN) to map the low-dose to normal-dose CT images. However, CNN-based methods heavily rely on convolutional kernels, which use fixed-size filters to process one local neighborhood within the receptive field at a time. As a result, they are not efficient at retrieving structural information across large regions. In this paper, we propose a novel 3D self-attention convolutional neural network for the LDCT denoising problem. Our 3D self-attention module leverages the 3D volume of CT images to capture a wide range of spatial information both within CT slices and between CT slices. With the help of the 3D self-attention module, CNNs are able to leverage pixels with stronger relationships regardless of their distance and achieve better denoising results. In addition, we propose a self-supervised learning scheme to train a domain-specific autoencoder as the perceptual loss function. We combine these two methods and demonstrate their effectiveness on both CNN-based neural networks and WGAN-based neural networks with comprehensive experiments. Tested on the AAPM-Mayo Clinic Low Dose CT Grand Challenge data set, our experiments demonstrate that self-attention (SA) module and autoencoder (AE) perceptual loss function can efficiently enhance traditional CNNs and can achieve comparable or better results than the state-of-the-art methods.

166 citations

Proceedings ArticleDOI
Li Tao1, Chuang Zhu1, Guoqing Xiang1, Yuan Li1, Huizhu Jia1, Xiaodong Xie1 
01 Dec 2017
TL;DR: A CNN based method to perform low-light image enhancement with a special module to utilize multiscale feature maps, which can avoid gradient vanishing problem and demonstrates that this method outperforms other contrast enhancement methods.
Abstract: In this paper, we propose a CNN based method to perform low-light image enhancement. We design a special module to utilize multiscale feature maps, which can avoid gradient vanishing problem as well. In order to preserve image textures as much as possible, we use SSIM loss to train our model. The contrast of low-light images can be adaptively enhanced using our method. Results demonstrate that our CNN based method outperforms other contrast enhancement methods.

151 citations

Journal ArticleDOI
TL;DR: A novel attention-driven multi-branch network that learns robust and discriminative human representation from global whole-body images and local body-part images simultaneously simultaneously is proposed.

130 citations


Cited by
More filters
Proceedings ArticleDOI
15 Jun 2019
TL;DR: Li et al. as mentioned in this paper proposed to search the network level structure in addition to the cell level structure, which formed a hierarchical architecture search space and achieved state-of-the-art performance without any ImageNet pretraining.
Abstract: Recently, Neural Architecture Search (NAS) has successfully identified neural network architectures that exceed human designed ones on large-scale image classification. In this paper, we study NAS for semantic image segmentation. Existing works often focus on searching the repeatable cell structure, while hand-designing the outer network structure that controls the spatial resolution changes. This choice simplifies the search space, but becomes increasingly problematic for dense image prediction which exhibits a lot more network level architectural variations. Therefore, we propose to search the network level structure in addition to the cell level structure, which forms a hierarchical architecture search space. We present a network level search space that includes many popular designs, and develop a formulation that allows efficient gradient-based architecture search (3 P100 GPU days on Cityscapes images). We demonstrate the effectiveness of the proposed method on the challenging Cityscapes, PASCAL VOC 2012, and ADE20K datasets. Auto-DeepLab, our architecture searched specifically for semantic image segmentation, attains state-of-the-art performance without any ImageNet pretraining.

863 citations

Posted Content
TL;DR: A powerful AGW baseline is designed, achieving state-of-the-art or at least comparable performance on twelve datasets for four different Re-ID tasks, and a new evaluation metric (mINP) is introduced, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re- ID system for real applications.
Abstract: Person re-identification (Re-ID) aims at retrieving a person of interest across multiple non-overlapping cameras. With the advancement of deep neural networks and increasing demand of intelligent video surveillance, it has gained significantly increased interest in the computer vision community. By dissecting the involved components in developing a person Re-ID system, we categorize it into the closed-world and open-world settings. The widely studied closed-world setting is usually applied under various research-oriented assumptions, and has achieved inspiring success using deep learning techniques on a number of datasets. We first conduct a comprehensive overview with in-depth analysis for closed-world person Re-ID from three different perspectives, including deep feature representation learning, deep metric learning and ranking optimization. With the performance saturation under closed-world setting, the research focus for person Re-ID has recently shifted to the open-world setting, facing more challenging issues. This setting is closer to practical applications under specific scenarios. We summarize the open-world Re-ID in terms of five different aspects. By analyzing the advantages of existing methods, we design a powerful AGW baseline, achieving state-of-the-art or at least comparable performance on twelve datasets for FOUR different Re-ID tasks. Meanwhile, we introduce a new evaluation metric (mINP) for person Re-ID, indicating the cost for finding all the correct matches, which provides an additional criteria to evaluate the Re-ID system for real applications. Finally, some important yet under-investigated open issues are discussed.

737 citations

Journal ArticleDOI
TL;DR: In this article, the authors systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving and provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection.
Abstract: Recent advancements in perception for autonomous driving are driven by deep learning. In order to achieve robust and accurate scene understanding, autonomous vehicles are usually equipped with different sensors (e.g. cameras, LiDARs, Radars), and multiple sensing modalities can be fused to exploit their complementary properties. In this context, many methods have been proposed for deep multi-modal perception problems. However, there is no general guideline for network architecture design, and questions of “what to fuse”, “when to fuse”, and “how to fuse” remain open. This review paper attempts to systematically summarize methodologies and discuss challenges for deep multi-modal object detection and semantic segmentation in autonomous driving. To this end, we first provide an overview of on-board sensors on test vehicles, open datasets, and background information for object detection and semantic segmentation in autonomous driving research. We then summarize the fusion methodologies and discuss challenges and open questions. In the appendix, we provide tables that summarize topics and methods. We also provide an interactive online platform to navigate each reference: https://boschresearch.github.io/multimodalperception/ .

674 citations

Journal ArticleDOI
TL;DR: A comparative study of deep techniques in image denoising by classifying the deep convolutional neural networks for additive white noisy images, the deep CNNs for real noisy images; the deepCNNs for blind Denoising and the deep network for hybrid noisy images.

518 citations