Showing papers on "Upsampling published in 2020"

PDF

Open Access

Proceedings Article•DOI•

3DSSD: Point-Based 3D Single Stage Object Detector

[...]

Zetong Yang¹, Yanan Sun², Shu Liu, Jiaya Jia¹•Institutions (2)

The Chinese University of Hong Kong¹, Hong Kong University of Science and Technology²

14 Jun 2020

TL;DR: Wang et al. as discussed by the authors proposed a lightweight point-based 3D single-stage object detector 3DSSD to achieve decent balance of accuracy and efficiency, and proposed a fusion sampling strategy in downsampling process to make detection on less representative points feasible.

...read moreread less

Abstract: Prevalence of voxel-based 3D single-stage detectors contrast with underexplored point-based methods. In this paper, we present a lightweight point-based 3D single stage object detector 3DSSD to achieve decent balance of accuracy and efficiency. In this paradigm, all upsampling layers and the refinement stage, which are indispensable in all existing point-based methods, are abandoned. We instead propose a fusion sampling strategy in downsampling process to make detection on less representative points feasible. A delicate box prediction network, including a candidate generation layer and an anchor-free regression head with a 3D center-ness assignment strategy, is developed to meet the demand of high accuracy and speed. Our 3DSSD paradigm is an elegant single-stage anchor-free one. We evaluate it on widely used KITTI dataset and more challenging nuScenes dataset. Our method outperforms all state-of-the-art voxel-based single-stage methods by a large margin, and even yields comparable performance with two-stage point-based methods, with amazing inference speed of 25+ FPS, 2x faster than former state-of-the-art point-based methods.

...read moreread less

349 citations

Posted Content•

3DSSD: Point-based 3D Single Stage Object Detector

[...]

Zetong Yang¹, Yanan Sun², Shu Liu, Jiaya Jia¹•Institutions (2)

The Chinese University of Hong Kong¹, Hong Kong University of Science and Technology²

24 Feb 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper presents a lightweight point-based 3D single stage object detector 3DSSD to achieve decent balance of accuracy and efficiency, and outperforms all state-of-the-art voxel-based single-stage methods by a large margin.

...read moreread less

Abstract: Currently, there have been many kinds of voxel-based 3D single stage detectors, while point-based single stage methods are still underexplored. In this paper, we first present a lightweight and effective point-based 3D single stage object detector, named 3DSSD, achieving a good balance between accuracy and efficiency. In this paradigm, all upsampling layers and refinement stage, which are indispensable in all existing point-based methods, are abandoned to reduce the large computation cost. We novelly propose a fusion sampling strategy in downsampling process to make detection on less representative points feasible. A delicate box prediction network including a candidate generation layer, an anchor-free regression head with a 3D center-ness assignment strategy is designed to meet with our demand of accuracy and speed. Our paradigm is an elegant single stage anchor-free framework, showing great superiority to other existing methods. We evaluate 3DSSD on widely used KITTI dataset and more challenging nuScenes dataset. Our method outperforms all state-of-the-art voxel-based single stage methods by a large margin, and has comparable performance to two stage point-based methods as well, with inference speed more than 25 FPS, 2x faster than former state-of-the-art point-based methods.

...read moreread less

303 citations

Proceedings Article•DOI•

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

[...]

Sachit Menon¹, Alexandru Damian¹, Shijia Hu¹, Nikhil Ravi¹, Cynthia Rudin¹ - Show less +1 more•Institutions (1)

Duke University¹

14 Jun 2020

TL;DR: PULSE as mentioned in this paper proposes a self-supervised approach to generate realistic SR images that downscale to the original LR image by leveraging properties of high-dimensional Gaussians, which guides exploration through the latent space of a generative model.

...read moreread less

Abstract: The primary aim of single-image super-resolution is to construct a high-resolution (HR) image from a corresponding low-resolution (LR) input. In previous approaches, which have generally been supervised, the training objective typically measures a pixel-wise average distance between the super-resolved (SR) and HR images. Optimizing such metrics often leads to blurring, especially in high variance (detailed) regions. We propose an alternative formulation of the super-resolution problem based on creating realistic SR images that downscale correctly. We present a novel super-resolution algorithm addressing this problem, PULSE (Photo Upsampling via Latent Space Exploration), which generates high-resolution, realistic images at resolutions previously unseen in the literature. It accomplishes this in an entirely self-supervised fashion and is not confined to a specific degradation operator used during training, unlike previous methods (which require training on databases of LR-HR image pairs for supervised learning). Instead of starting with the LR image and slowly adding detail, PULSE traverses the high-resolution natural image manifold, searching for images that downscale to the original LR image. This is formalized through the “downscaling loss,” which guides exploration through the latent space of a generative model. By leveraging properties of high-dimensional Gaussians, we restrict the search space to guarantee that our outputs are realistic. PULSE thereby generates super-resolved images that both are realistic and downscale correctly. We show extensive experimental results demonstrating the efficacy of our approach in the domain of face super-resolution (also known as face hallucination). Our method outperforms state-of-the-art methods in perceptual quality at higher resolutions and scale factors than previously possible.

...read moreread less

226 citations

Proceedings Article•DOI•

Learning in the Frequency Domain

[...]

Kai Xu¹, Minghai Qin², Fei Sun², Yuhao Wang², Yen-Kuang Chen², Fengbo Ren¹ - Show less +2 more•Institutions (2)

Arizona State University¹, Alibaba Group²

14 Jun 2020

TL;DR: Inspired by digital signal processing theories, the spectral bias from the frequency perspective is analyzed and a learning-based frequency selection method is proposed to identify the trivial frequency components which can be removed without accuracy loss.

...read moreread less

Abstract: Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.60% and 0.63% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1.42%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.

...read moreread less

208 citations

Posted Content•

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

[...]

Sachit Menon¹, Alexandru Damian¹, Shijia Hu¹, Nikhil Ravi¹, Cynthia Rudin¹ - Show less +1 more•Institutions (1)

Duke University¹

08 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a novel super-resolution algorithm, PULSE (Photo Upsampling via Latent Space Exploration), which generates high-resolution, realistic images at resolutions previously unseen in the literature, and outperforms state-of-the-art methods in perceptual quality at higher resolutions and scale factors than previously possible.

...read moreread less

Abstract: The primary aim of single-image super-resolution is to construct high-resolution (HR) images from corresponding low-resolution (LR) inputs. In previous approaches, which have generally been supervised, the training objective typically measures a pixel-wise average distance between the super-resolved (SR) and HR images. Optimizing such metrics often leads to blurring, especially in high variance (detailed) regions. We propose an alternative formulation of the super-resolution problem based on creating realistic SR images that downscale correctly. We present an algorithm addressing this problem, PULSE (Photo Upsampling via Latent Space Exploration), which generates high-resolution, realistic images at resolutions previously unseen in the literature. It accomplishes this in an entirely self-supervised fashion and is not confined to a specific degradation operator used during training, unlike previous methods (which require supervised training on databases of LR-HR image pairs). Instead of starting with the LR image and slowly adding detail, PULSE traverses the high-resolution natural image manifold, searching for images that downscale to the original LR image. This is formalized through the "downscaling loss," which guides exploration through the latent space of a generative model. By leveraging properties of high-dimensional Gaussians, we restrict the search space to guarantee realistic outputs. PULSE thereby generates super-resolved images that both are realistic and downscale correctly. We show proof of concept of our approach in the domain of face super-resolution (i.e., face hallucination). We also present a discussion of the limitations and biases of the method as currently implemented with an accompanying model card with relevant metrics. Our method outperforms state-of-the-art methods in perceptual quality at higher resolutions and scale factors than previously possible.

...read moreread less

183 citations

Posted Content•

Leveraging Frequency Analysis for Deep Fake Image Recognition

[...]

Joel Frank¹, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, Thorsten Holz - Show less +2 more•Institutions (1)

Ruhr University Bochum¹

19 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: It is demonstrated how the frequency representation can be used to identify deep fake images in an automated way, surpassing state-of-the-art methods.

...read moreread less

Abstract: Deep neural networks can generate images that are astonishingly realistic, so much so that it is often hard for humans to distinguish them from actual photos. These achievements have been largely made possible by Generative Adversarial Networks (GANs). While deep fake images have been thoroughly investigated in the image domain - a classical approach from the area of image forensics - an analysis in the frequency domain has been missing so far. In this paper, we address this shortcoming and our results reveal that in frequency space, GAN-generated images exhibit severe artifacts that can be easily identified. We perform a comprehensive analysis, showing that these artifacts are consistent across different neural network architectures, data sets, and resolutions. In a further investigation, we demonstrate that these artifacts are caused by upsampling operations found in all current GAN architectures, indicating a structural and fundamental problem in the way images are generated via GANs. Based on this analysis, we demonstrate how the frequency representation can be used to identify deep fake images in an automated way, surpassing state-of-the-art methods.

...read moreread less

175 citations

Journal Article•DOI•

Learning Spatial-Spectral Prior for Super-Resolution of Hyperspectral Imagery

[...]

Junjun Jiang¹, He Sun¹, Xianming Liu¹, Jiayi Ma²•Institutions (2)

Harbin Institute of Technology¹, Wuhan University²

20 May 2020-IEEE Transactions on Computational Imaging

TL;DR: A spatial-spectral prior network (SSPN) is introduced to fully exploit the spatial information and the correlation between the spectra of the hyperspectral data, and a group convolution (with shared network parameters) and progressive upsampling framework is proposed to make the training process more stable.

...read moreread less

Abstract: Recently, single gray/RGB image super-resolution reconstruction task has been extensively studied and made significant progress by leveraging the advanced machine learning techniques based on deep convolutional neural networks (DCNNs). However, there has been limited technical development focusing on single hyperspectral image super-resolution due to the high-dimensional and complex spectral patterns in hyperspectral image. In this article, we make a step forward by investigating how to adapt state-of-the-art deep learning based single gray/RGB image super-resolution approaches for computationally efficient single hyperspectral image super-resolution, referred as SSPSR. Specifically, we introduce a spatial-spectral prior network (SSPN) to fully exploit the spatial information and the correlation between the spectra of the hyperspectral data. Considering that the hyperspectral training samples are scarce and the spectral dimension of hyperspectral image data is very high, it is nontrivial to train a stable and effective deep network. Therefore, a group convolution (with shared network parameters) and progressive upsampling framework is proposed. This will not only alleviate the difficulty in feature extraction due to high dimension of the hyperspectral data, but also make the training process more stable. To exploit the spatial and spectral prior, we design a spatial-spectral block (SSB), which consists of a spatial residual module and a spectral attention residual module. Experimental results on some hyperspectral images demonstrate that the proposed SSPSR method enhances the details of the recovered high-resolution hyperspectral images, and outperforms state-of-the-arts. The source code is available at [Online]. Available: https://github.com/junjun-jiang/SSPSR .

...read moreread less

139 citations

Book Chapter•DOI•

What Matters in Unsupervised Optical Flow

[...]

Rico Jonschkowski¹, Austin Stone¹, Jonathan T. Barron¹, Ariel Gordon¹, Kurt Konolige¹, Anelia Angelova¹ - Show less +2 more•Institutions (1)

Google¹

23 Aug 2020

TL;DR: A new unsupervised flow technique is presented that significantly outperforms the previous unsuper supervised state-of-the-art and performs on par with supervised FlowNet2 on the KITTI 2015 dataset, while also being significantly simpler than related approaches.

...read moreread less

Abstract: We systematically compare and analyze a set of key components in unsupervised optical flow to identify which photometric loss, occlusion handling, and smoothness regularization is most effective. Alongside this investigation we construct a number of novel improvements to unsupervised flow models, such as cost volume normalization, stopping the gradient at the occlusion mask, encouraging smoothness before upsampling the flow field, and continual self-supervision with image resizing. By combining the results of our investigation with our improved model components, we are able to present a new unsupervised flow technique that significantly outperforms the previous unsupervised state-of-the-art and performs on par with supervised FlowNet2 on the KITTI 2015 dataset, while also being significantly simpler than related approaches.

...read moreread less

133 citations

Posted Content•

Efficient Image Super-Resolution Using Pixel Attention

[...]

Hengyuan Zhao¹, Xiangtao Kong¹, Jingwen He¹, Yu Qiao¹, Chao Dong¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

02 Oct 2020-arXiv: Image and Video Processing

TL;DR: This work designs a lightweight convolutional neural network for image super resolution with a newly proposed pixel attention scheme that could achieve similar performance as the lightweight networks - SRResNet and CARN, but with only 272K parameters.

...read moreread less

Abstract: This work aims at designing a lightweight convolutional neural network for image super resolution (SR). With simplicity bare in mind, we construct a pretty concise and effective network with a newly proposed pixel attention scheme. Pixel attention (PA) is similar as channel attention and spatial attention in formulation. The difference is that PA produces 3D attention maps instead of a 1D attention vector or a 2D map. This attention scheme introduces fewer additional parameters but generates better SR results. On the basis of PA, we propose two building blocks for the main branch and the reconstruction branch, respectively. The first one - SC-PA block has the same structure as the Self-Calibrated convolution but with our PA layer. This block is much more efficient than conventional residual/dense blocks, for its twobranch architecture and attention scheme. While the second one - UPA block combines the nearest-neighbor upsampling, convolution and PA layers. It improves the final reconstruction quality with little parameter cost. Our final model- PAN could achieve similar performance as the lightweight networks - SRResNet and CARN, but with only 272K parameters (17.92% of SRResNet and 17.09% of CARN). The effectiveness of each proposed component is also validated by ablation study. The code is available at this https URL.

...read moreread less

128 citations

Proceedings Article•DOI•

CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement

[...]

Ho Kei Cheng¹, Jihoon Chung¹, Yu-Wing Tai², Chi-Keung Tang¹•Institutions (2)

Hong Kong University of Science and Technology¹, Tencent²

14 Jun 2020

TL;DR: This paper presents quantitative and qualitative studies on different datasets to show that CascadePSP can reveal pixel-accurate segmentation boundaries using the novel refinement module without any finetuning.

...read moreread less

Abstract: State-of-the-art semantic segmentation methods were almost exclusively trained on images within a fixed resolution range. These segmentations are inaccurate for very high-resolution images since using bicubic upsampling of low-resolution segmentation does not adequately capture high-resolution details along object boundaries. In this paper, we propose a novel approach to address the high-resolution segmentation problem without using any high-resolution training data. The key insight is our CascadePSP network which refines and corrects local boundaries whenever possible. Although our network is trained with low-resolution segmentation data, our method is applicable to any resolution even for very high-resolution images larger than 4K. We present quantitative and qualitative studies on different datasets to show that CascadePSP can reveal pixel-accurate segmentation boundaries using our novel refinement module without any finetuning. Thus, our method can be regarded as class-agnostic. Finally, we demonstrate the application of our model to scene parsing in multi-class segmentation.

...read moreread less

126 citations

Book Chapter•DOI•

High-Resolution Image Inpainting with Iterative Confidence Feedback and Guided Upsampling

[...]

Yu Zeng¹, Zhe Lin², Jimei Yang², Jianming Zhang², Eli Shechtman², Huchuan Lu¹ - Show less +2 more•Institutions (2)

Dalian University of Technology¹, Adobe Systems²

23 Aug 2020

TL;DR: A deep generative model which not only outputs an inpainting result but also a corresponding confidence map is introduced, which progressively fills the hole by trusting only high-confidence pixels inside the hole at each iteration and focuses on the remaining pixels in the next iteration.

...read moreread less

Abstract: Existing image inpainting methods often produce artifacts when dealing with large holes in real applications. To address this challenge, we propose an iterative inpainting method with a feedback mechanism. Specifically, we introduce a deep generative model which not only outputs an inpainting result but also a corresponding confidence map. Using this map as feedback, it progressively fills the hole by trusting only high-confidence pixels inside the hole at each iteration and focuses on the remaining pixels in the next iteration. As it reuses partial predictions from the previous iterations as known pixels, this process gradually improves the result. In addition, we propose a guided upsampling network to enable generation of high-resolution inpainting results. We achieve this by extending the Contextual Attention module [39] to borrow high-resolution feature patches in the input image. Furthermore, to mimic real object removal scenarios, we collect a large object mask dataset and synthesize more realistic training data that better simulates user inputs. Experiments show that our method significantly outperforms existing methods in both quantitative and qualitative evaluations. More results and Web APP are available at https://zengxianyu.github.io/iic.

...read moreread less

Proceedings Article•DOI•

Superpixel Segmentation With Fully Convolutional Networks

[...]

Fengting Yang¹, Qian Sun¹, Hailin Jin², Zihan Zhou¹•Institutions (2)

Pennsylvania State University¹, Adobe Systems²

14 Jun 2020

TL;DR: Inspired by an initialization strategy commonly adopted by traditional superpixel algorithms, this paper presented a novel method that employs a simple fully convolutional network to predict superpixels on a regular image grid.

...read moreread less

Abstract: In computer vision, superpixels have been widely used as an effective way to reduce the number of image primitives for subsequent processing. But only a few attempts have been made to incorporate them into deep neural networks. One main reason is that the standard convolution operation is defined on regular grids and becomes inefficient when applied to superpixels. Inspired by an initialization strategy commonly adopted by traditional superpixel algorithms, we present a novel method that employs a simple fully convolutional network to predict superpixels on a regular image grid. Experimental results on benchmark datasets show that our method achieves state-of-the-art superpixel segmentation performance while running at about 50fps. Based on the predicted superpixels, we further develop a downsampling/upsampling scheme for deep networks with the goal of generating high-resolution outputs for dense prediction tasks. Specifically, we modify a popular network architecture for stereo matching to simultaneously predict superpixels and disparities. We show that improved disparity estimation accuracy can be obtained on public datasets.

...read moreread less

Journal Article•DOI•

Hierarchical dense recursive network for image super-resolution

[...]

Kui Jiang¹, Zhongyuan Wang¹, Peng Yi¹, Junjun Jiang²•Institutions (2)

Wuhan University¹, Harbin Institute of Technology²

01 Nov 2020-Pattern Recognition

TL;DR: A novel hierarchical dense connection network (HDN) is advocated for image SR that outperforms the state-of-the-art methods in terms of quantitative indicators and realistic visual effects, as well as enjoys a fast and accurate reconstruction.

...read moreread less

Journal Article•DOI•

Efficient Residual Dense Block Search for Image Super-Resolution

[...]

Dehua Song¹, Chang Xu², Xu Jia¹, Yiyi Chen¹, Chunjing Xu¹, Yunhe Wang¹ - Show less +2 more•Institutions (2)

Huawei¹, University of Sydney²

03 Apr 2020

TL;DR: An efficient residual dense block search algorithm with multiple objectives to hunt for fast, lightweight and accurate networks for image super-resolution models achieves better performance than the state-of-the-art methods with limited number of parameters and FLOPs.

...read moreread less

Abstract: Although remarkable progress has been made on single image super-resolution due to the revival of deep convolutional neural networks, deep learning methods are confronted with the challenges of computation and memory consumption in practice, especially for mobile devices. Focusing on this issue, we propose an efficient residual dense block search algorithm with multiple objectives to hunt for fast, lightweight and accurate networks for image super-resolution. Firstly, to accelerate super-resolution network, we exploit the variation of feature scale adequately with the proposed efficient residual dense blocks. In the proposed evolutionary algorithm, the locations of pooling and upsampling operator are searched automatically. Secondly, network architecture is evolved with the guidance of block credits to acquire accurate super-resolution network. The block credit reflects the effect of current block and is earned during model evaluation process. It guides the evolution by weighing the sampling probability of mutation to favor admirable blocks. Extensive experimental results demonstrate the effectiveness of the proposed searching method and the found efficient super-resolution models achieve better performance than the state-of-the-art methods with limited number of parameters and FLOPs.

...read moreread less

Proceedings Article•DOI•

Densely Connected Neural Network with Dilated Convolutions for Real-Time Speech Enhancement in The Time Domain

[...]

Ashutosh Pandey¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

04 May 2020

TL;DR: Experimental results show that the proposed model significantly outperforms other real-time state-of-the-art models in terms of objective intelligibility and quality scores.

...read moreread less

Abstract: In this work, we propose a fully convolutional neural network for real-time speech enhancement in the time domain. The proposed network is an encoder-decoder based architecture with skip connections. The layers in the encoder and the decoder are followed by densely connected blocks comprising of dilated and causal convolutions. The dilated convolutions help in context aggregation at different resolutions. The causal convolutions are used to avoid information flow from future frames, hence making the network suitable for real-time applications. We also propose to use sub-pixel convolutional layers in the decoder for upsampling. Further, the model is trained using a loss function with two components; a time-domain loss and a frequency-domain loss. The proposed loss function outperforms the time-domain loss. Experimental results show that the proposed model significantly outperforms other real-time state-of-the-art models in terms of objective intelligibility and quality scores.

...read moreread less

Book Chapter•DOI•

Efficient Image Super-Resolution Using Pixel Attention

[...]

Hengyuan Zhao¹, Xiangtao Kong¹, Jingwen He¹, Yu Qiao¹, Chao Dong¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

23 Aug 2020

TL;DR: Zhao et al. as discussed by the authors designed a lightweight convolutional neural network with a pixel attention scheme, which produces 3D attention maps instead of a 1D attention vector or a 2D map.

...read moreread less

Abstract: This work aims at designing a lightweight convolutional neural network for image super resolution (SR). With simplicity bare in mind, we construct a pretty concise and effective network with a newly proposed pixel attention scheme. Pixel attention (PA) is similar as channel attention and spatial attention in formulation. The difference is that PA produces 3D attention maps instead of a 1D attention vector or a 2D map. This attention scheme introduces fewer additional parameters but generates better SR results. On the basis of PA, we propose two building blocks for the main branch and the reconstruction branch, respectively. The first one—SC-PA block has the same structure as the Self-Calibrated convolution but with our PA layer. This block is much more efficient than conventional residual/dense blocks, for its two-branch architecture and attention scheme. While the second one—U-PA block combines the nearest-neighbor upsampling, convolution and PA layers. It improves the final reconstruction quality with little parameter cost. Our final model—PAN could achieve similar performance as the lightweight networks—SRResNet and CARN, but with only 272K parameters (17.92% of SRResNet and 17.09% of CARN). The effectiveness of each proposed component is also validated by ablation study. The code is available at https://github.com/zhaohengyuan1/PAN.

...read moreread less

Proceedings Article•DOI•

LiDAR Panoptic Segmentation for Autonomous Driving

[...]

Andres Milioto¹, Jens Behley¹, Chris McCool¹, Cyrill Stachniss¹•Institutions (1)

University of Bonn¹

24 Oct 2020

TL;DR: LiDAR-Bonnetal as discussed by the authors leverages the geometric information of the LiDAR scan to perform a novel, distance-aware tri-linear upsampling, which allows the approach to use larger output strides than using transpose convolutions leading to substantial savings in computation time.

...read moreread less

Abstract: Truly autonomous driving without the need for human intervention can only be attained when self-driving cars fully understand their surroundings. Most of these vehicles rely on a suite of active and passive sensors. LiDAR sensors are a cornerstone in most of these hardware stacks, and leveraging them as a complement to other passive sensors such as RGB cameras is an enticing goal. Understanding the semantic class of each point in a LiDAR sweep is important, as well as knowing to which instance of that class it belongs to. To this end, we present a novel, single-stage, and real-time capable panoptic segmentation approach using a shared encoder with a semantic and instance decoder. We leverage the geometric information of the LiDAR scan to perform a novel, distance- aware tri-linear upsampling, which allows our approach to use larger output strides than using transpose convolutions leading to substantial savings in computation time. Our experimental evaluation and ablation studies for each module show that combining our geometric and semantic embeddings with our learned, variable instance thresholds, a category-specific loss, and the novel trilinear upsampling module leads to higher panoptic quality. We will release the code of our approach in our LiDAR processing library LiDAR-Bonnetal [27].

...read moreread less

Proceedings Article•DOI•

Perceptual Extreme Super-Resolution Network With Receptive Field Block

[...]

Taizhang Shang, Qiuju Dai, Shengchen Zhu, Tong Yang, Yandong Guo - Show less +1 more

01 Jun 2020

TL;DR: A super resolution network with receptive field block based on Enhanced SRGAN based on RFB for the purpose of extracting multi-scale information and enhance the feature discriminability and the ensemble of 10 models of different iteration is used to improve the robustness of model and reduce the noise introduced by each individual model.

...read moreread less

Abstract: Perceptual Extreme Super-Resolution for single image is extremely difficult, because the texture details of different images vary greatly. To tackle this difficulty, we develop a super resolution network with receptive field block based on Enhanced SRGAN. We call our network RFB-ESRGAN. The key contributions are listed as follows. First, for the purpose of extracting multi-scale information and enhance the feature discriminability, we applied receptive field block (RFB) to super resolution. RFB has achieved competitive results in object detection and classification. Second, instead of using large convolution kernels in multi-scale receptive field block, several small kernels are used in RFB, which makes us be able to extract detailed features and reduce the computation complexity. Third, we alternately use different upsampling methods in the upsampling stage to reduce the high computation complexity and still remain satisfactory performance. Fourth, we use the ensemble of 10 models of different iteration to improve the robustness of model and reduce the noise introduced by each individual model. Our experimental results show the superior performance of RFB-ESRGAN. According to the preliminary results of NTIRE 2020 Perceptual Extreme Super-Resolution Challenge, our solution ranks first among all the participants.

...read moreread less

Journal Article•DOI•

AUNet: attention-guided dense-upsampling networks for breast mass segmentation in whole mammograms.

[...]

Hui Sun¹, Cheng Li¹, Boqiang Liu², Zaiyi Liu³, Meiyun Wang, Hairong Zheng¹, David Dagan Feng⁴, Shanshan Wang¹ - Show less +4 more•Institutions (4)

Chinese Academy of Sciences¹, Shandong University², Guangdong General Hospital³, University of Sydney⁴

28 Feb 2020-Physics in Medicine and Biology

TL;DR: A novel attention-guided dense-upsampling network (AUNet) is proposed for accurate breast mass segmentation in whole mammograms directly and compared to three state-of-the-art fully convolutional networks, AUNet achieved the best performances.

...read moreread less

Abstract: Mammography is one of the most commonly applied tools for early breast cancer screening. Automatic segmentation of breast masses in mammograms is essential but challenging due to the low signal-to-noise ratio and the wide variety of mass shapes and sizes. Existing methods deal with these challenges mainly by extracting mass-centered image patches manually or automatically. However, manual patch extraction is time-consuming and automatic patch extraction brings errors that could not be compensated in the following segmentation step. In this study, we propose a novel attention-guided dense-upsampling network (AUNet) for accurate breast mass segmentation in whole mammograms directly. In AUNet, we employ an asymmetrical encoder-decoder structure and propose an effective upsampling block, attention-guided dense-upsampling block (AU block). Especially, the AU block is designed to have three merits. Firstly, it compensates the information loss of bilinear upsampling by dense upsampling. Secondly, it designs a more effective method to fuse high- and low-level features. Thirdly, it includes a channel-attention function to highlight rich-information channels. We evaluated the proposed method on two publicly available datasets, CBIS-DDSM and INbreast. Compared to three state-of-the-art fully convolutional networks, AUNet achieved the best performances with an average Dice similarity coefficient of 81.8% for CBIS-DDSM and 79.1% for INbreast.

...read moreread less

Book Chapter•DOI•

PUGeo-Net: A Geometry-Centric Network for 3D Point Cloud Upsampling

[...]

Yue Qian¹, Junhui Hou¹, Sam Kwong¹, Ying He²•Institutions (2)

City University of Hong Kong¹, Nanyang Technological University²

24 Feb 2020

TL;DR: Zhang et al. as mentioned in this paper proposed a novel deep neural network based method, called PUGeo-Net, which learns a linear transformation matrix for each input point and projects the samples to the curved surface by computing a displacement along the normal of the tangent plane.

...read moreread less

Abstract: This paper addresses the problem of generating uniform dense point clouds to describe the underlying geometric structures from given sparse point clouds. Due to the irregular and unordered nature, point cloud densification as a generative task is challenging. To tackle the challenge, we propose a novel deep neural network based method, called PUGeo-Net, that learns a $3\times 3$ linear transformation matrix $\bf T$ for each input point. Matrix $\mathbf T$ approximates the augmented Jacobian matrix of a local parameterization and builds a one-to-one correspondence between the 2D parametric domain and the 3D tangent plane so that we can lift the adaptively distributed 2D samples (which are also learned from data) to 3D space. After that, we project the samples to the curved surface by computing a displacement along the normal of the tangent plane. PUGeo-Net is fundamentally different from the existing deep learning methods that are largely motivated by the image super-resolution techniques and generate new points in the abstract feature space. Thanks to its geometry-centric nature, PUGeo-Net works well for both CAD models with sharp features and scanned models with rich geometric details. Moreover, PUGeo-Net can compute the normal for the original and generated points, which is highly desired by the surface reconstruction algorithms. Computational results show that PUGeo-Net, the first neural network that can jointly generate vertex coordinates and normals, consistently outperforms the state-of-the-art in terms of accuracy and efficiency for upsampling factor $4\sim 16$.

...read moreread less

Posted Content•

Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3.

[...]

Petr Hurtik, Vojtech Molek, Jan Hula, Marek Vajgl, Pavel Vlašánek, Tomas Nejezchleba - Show less +2 more

27 May 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new version of Y OLO with better performance and extended with instance segmentation called Poly-YOLO, which has the same precision as YOLOv3, but it is three times smaller and twice as fast, thus suitable for embedded devices.

...read moreread less

Abstract: We present a new version of YOLO with better performance and extended with instance segmentation called Poly-YOLO. Poly-YOLO builds on the original ideas of YOLOv3 and removes two of its weaknesses: a large amount of rewritten labels and inefficient distribution of anchors. Poly-YOLO reduces the issues by aggregating features from a light SE-Darknet-53 backbone with a hypercolumn technique, using stairstep upsampling, and produces a single scale output with high resolution. In comparison with YOLOv3, Poly-YOLO has only 60% of its trainable parameters but improves mAP by a relative 40%. We also present Poly-YOLO lite with fewer parameters and a lower output resolution. It has the same precision as YOLOv3, but it is three times smaller and twice as fast, thus suitable for embedded devices. Finally, Poly-YOLO performs instance segmentation using bounding polygons. The network is trained to detect size-independent polygons defined on a polar grid. Vertices of each polygon are being predicted with their confidence, and therefore Poly-YOLO produces polygons with a varying number of vertices.

...read moreread less

Posted Content•

HITNet: Hierarchical Iterative Tile Refinement Network for Real-time Stereo Matching

[...]

Vladimir Tankovich, Christian Häne, Sean Fanello, Yinda Zhang, Shahram Izadi, Sofien Bouaziz¹ - Show less +2 more•Institutions (1)

Google¹

23 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: HITNet is a novel neural network architecture for real-time stereo matching that not only geometrically reasons about disparities but also infers slanted plane hypotheses allowing to more accurately perform geometric warping and upsampling operations.

...read moreread less

Abstract: This paper presents HITNet, a novel neural network architecture for real-time stereo matching. Contrary to many recent neural network approaches that operate on a full cost volume and rely on 3D convolutions, our approach does not explicitly build a volume and instead relies on a fast multi-resolution initialization step, differentiable 2D geometric propagation and warping mechanisms to infer disparity hypotheses. To achieve a high level of accuracy, our network not only geometrically reasons about disparities but also infers slanted plane hypotheses allowing to more accurately perform geometric warping and upsampling operations. Our architecture is inherently multi-resolution allowing the propagation of information across different levels. Multiple experiments prove the effectiveness of the proposed approach at a fraction of the computation required by state-of-the-art methods. At the time of writing, HITNet ranks 1st-3rd on all the metrics published on the ETH3D website for two view stereo, ranks 1st on most of the metrics among all the end-to-end learning approaches on Middlebury-v3, ranks 1st on the popular KITTI 2012 and 2015 benchmarks among the published methods faster than 100ms.

...read moreread less

Journal Article•DOI•

Neural supersampling for real-time rendering

[...]

Lei Xiao¹, Salah Nouri¹, Matt Chapman¹, Alexander Jobe Fix¹, Douglas Robert Lanman¹, Anton S. Kaplanyan¹ - Show less +2 more•Institutions (1)

Facebook¹

08 Jul 2020-ACM Transactions on Graphics

TL;DR: This work demonstrates high fidelity and temporally stable results in real-time, even in the highly challenging 4 × 4 upsampling scenario, significantly outperforming existing superresolution and temporal antialiasing work.

...read moreread less

Abstract: Due to higher resolutions and refresh rates, as well as more photorealistic effects, real-time rendering has become increasingly challenging for video games and emerging virtual reality headsets. To meet this demand, modern graphics hardware and game engines often reduce the computational cost by rendering at a lower resolution and then upsampling to the native resolution. Following the recent advances in image and video superresolution in computer vision, we propose a machine learning approach that is specifically tailored for high-quality upsampling of rendered content in real-time applications. The main insight of our work is that in rendered content, the image pixels are point-sampled, but precise temporal dynamics are available. Our method combines this specific information that is typically available in modern renderers (i.e., depth and dense motion vectors) with a novel temporal network design that takes into account such specifics and is aimed at maximizing video quality while delivering real-time performance. By training on a large synthetic dataset rendered from multiple 3D scenes with recorded camera motion, we demonstrate high fidelity and temporally stable results in real-time, even in the highly challenging 4 × 4 upsampling scenario, significantly outperforming existing superresolution and temporal antialiasing work.

...read moreread less

Journal Article•DOI•

ECG: Edge-aware Point Cloud Completion with Graph Convolution

[...]

Liang Pan¹•Institutions (1)

National University of Singapore¹

14 May 2020

TL;DR: The proposed ECG - an Edge-aware point cloud Completion network with Graph convolution, which facilitates fine-grained 3D point cloud shape generation with multi-scale edge features, significantly outperforms previous state-of-the-art (SOTA) methods for point cloud completion.

...read moreread less

Abstract: Scanned 3D point clouds for real-world scenes often suffer from noise and incompletion. Observing that prior point cloud shape completion networks overlook local geometric features, we propose our ECG - an Edge-aware point cloud Completion network with Graph convolution, which facilitates fine-grained 3D point cloud shape generation with multi-scale edge features. Our ECG consists of two consecutive stages: 1) skeleton generation and 2) details refinement. Each stage is a generation sub-network conditioned on the input incomplete point cloud. The first stage generates coarse skeletons to facilitate capturing useful edge features against noisy measurements. Subsequently, we design a deep hierarchical encoder with graph convolution to propagate multi-scale edge features for local geometric details refinement. To preserve local geometrical details while upsampling, we propose the Edge-aware Feature Expansion (EFE) module to smoothly expand/upsample point features by emphasizing their local edges. Extensive experiments show that our ECG significantly outperforms previous state-of-the-art (SOTA) methods for point cloud completion.

...read moreread less

Proceedings Article•

LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-resolution and Beyond

[...]

Wenbo Li¹, Kun Zhou¹, Lu Qi¹, Nianjuan Jiang², Jiangbo Lu³, Jiaya Jia¹ - Show less +2 more•Institutions (3)

The Chinese University of Hong Kong¹, Agency for Science, Technology and Research², Tsinghua University³

01 Jan 2020

TL;DR: A linearly-assembled pixel-adaptive regression network (LAPAR) is proposed, which casts the direct LR to HR mapping learning into a linear coefficient regression task over a dictionary of multiple predefined filter bases, which renders the model highly lightweight and easy to optimize while achieving state-of-the-art results on SISR benchmarks.

...read moreread less

Abstract: Single image super-resolution (SISR) deals with a fundamental problem of upsampling a low-resolution (LR) image to its high-resolution (HR) version. Last few years have witnessed impressive progress propelled by deep learning methods. However, one critical challenge faced by existing methods is to strike a sweet spot of deep model complexity and resulting SISR quality. This paper addresses this pain point by proposing a linearly-assembled pixel-adaptive regression network (LAPAR), which casts the direct LR to HR mapping learning into a linear coefficient regression task over a dictionary of multiple predefined filter bases. Such a parametric representation renders our model highly lightweight and easy to optimize while achieving state-of-the-art results on SISR benchmarks. Moreover, based on the same idea, LAPAR is extended to tackle other restoration tasks, e.g., image denoising and JPEG image deblocking, and again, yields strong performance. The code is available at this https URL.

...read moreread less

Journal Article•DOI•

Anchor-free Convolutional Network with Dense Attention Feature Aggregation for Ship Detection in SAR Images

[...]

Fei Gao¹, Yishan He¹, Jun Wang¹, Amir Hussain², Huiyu Zhou³ - Show less +1 more•Institutions (3)

Beihang University¹, Edinburgh Napier University², University of Leicester³

13 Aug 2020-Remote Sensing

TL;DR: An anchor-free convolutional network with dense attention feature aggregation, which is a center-point-based ship predictor (CSP), and a novel feature aggregation scheme called DAFA is proposed to obtain a high-resolution feature map with multiscale information.

...read moreread less

Abstract: In recent years, with the improvement of synthetic aperture radar (SAR) imaging resolution, it is urgent to develop methods with higher accuracy and faster speed for ship detection in high-resolution SAR images. Among all kinds of methods, deep-learning-based algorithms bring promising performance due to end-to-end detection and automated feature extraction. However, several challenges still exist: (1) standard deep learning detectors based on anchors have certain unsolved problems, such as tuning of anchor-related parameters, scale-variation and high computational costs. (2) SAR data is huge but the labeled data is relatively small, which may lead to overfitting in training. (3) To improve detection speed, deep learning detectors generally detect targets based on low-resolution features, which may cause missed detections for small targets. In order to address the above problems, an anchor-free convolutional network with dense attention feature aggregation is proposed in this paper. Firstly, we use a lightweight feature extractor to extract multiscale ship features. The inverted residual blocks with depth-wise separable convolution reduce the network parameters and improve the detection speed. Secondly, a novel feature aggregation scheme called dense attention feature aggregation (DAFA) is proposed to obtain a high-resolution feature map with multiscale information. By combining the multiscale features through dense connections and iterative fusions, DAFA improves the generalization performance of the network. In addition, an attention block, namely spatial and channel squeeze and excitation (SCSE) block is embedded in the upsampling process of DAFA to enhance the salient features of the target and suppress the background clutters. Third, an anchor-free detector, which is a center-point-based ship predictor (CSP), is adopted in this paper. CSP regresses the ship centers and ship sizes simultaneously on the high-resolution feature map to implement anchor-free and nonmaximum suppression (NMS)-free ship detection. The experiments on the AirSARShip-1.0 dataset demonstrate the effectiveness of our method. The results show that the proposed method outperforms several mainstream detection algorithms in both accuracy and speed.

...read moreread less

Journal Article•DOI•

Super-resolution of Sentinel-2 imagery using generative adversarial networks

[...]

Luis Salgueiro Romero, Javier Marcello, Verónica Vilaplana

28 Jul 2020-Remote Sensing

TL;DR: This research has been supported by the ARTEMISAT-2 and MALEGRA projects, funded by the Spanish Agencia Estatal de Investigacion, by the Fondo Europeo de Desarrollo Regional and the Spanish Ministerio de Economia y Competitividad, respectively.

...read moreread less

Abstract: Sentinel-2 satellites provide multi-spectral optical remote sensing images with four bands at 10 m of spatial resolution. These images, due to the open data distribution policy, are becoming an important resource for several applications. However, for small scale studies, the spatial detail of these images might not be sufficient. On the other hand, WorldView commercial satellites offer multi-spectral images with a very high spatial resolution, typically less than 2 m, but their use can be impractical for large areas or multi-temporal analysis due to their high cost. To exploit the free availability of Sentinel imagery, it is worth considering deep learning techniques for single-image super-resolution tasks, allowing the spatial enhancement of low-resolution (LR) images by recovering high-frequency details to produce high-resolution (HR) super-resolved images. In this work, we implement and train a model based on the Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) with pairs of WorldView-Sentinel images to generate a super-resolved multispectral Sentinel-2 output with a scaling factor of 5. Our model, named RS-ESRGAN, removes the upsampling layers of the network to make it feasible to train with co-registered remote sensing images. Results obtained outperform state-of-the-art models using standard metrics like PSNR, SSIM, ERGAS, SAM and CC. Moreover, qualitative visual analysis shows spatial improvements as well as the preservation of the spectral information, allowing the super-resolved Sentinel-2 imagery to be used in studies requiring very high spatial resolution.

...read moreread less

Proceedings Article•DOI•

Investigating Loss Functions for Extreme Super-Resolution

[...]

Younghyun Jo¹, Sejong Yang¹, Seon Joo Kim¹•Institutions (1)

Yonsei University¹

14 Jun 2020

TL;DR: Experimental results show that the method outperforms the conventional perceptual loss, and the method achieved second and first place in the LPIPS and PI measures respectively for NTIRE 2020 perceptual extreme SR challenge.

...read moreread less

Abstract: The performance of image super-resolution (SR) has been greatly improved by using convolutional neural networks. Most of the previous SR methods have been studied up to ×4 upsampling, and few were studied for ×16 upsampling. The general approach for perceptual ×4 SR is using GAN with VGG based perceptual loss, however, we found that it creates inconsistent details for perceptual ×16 SR. To this end, we have investigated loss functions and we propose to use GAN with LPIPS [23] loss for perceptual extreme SR. In addition, we use U-net structure discriminator [14] together to consider both the global and local context of an input image. Experimental results show that our method outperforms the conventional perceptual loss, and we achieved second and first place in the LPIPS and PI measures respectively for NTIRE 2020 perceptual extreme SR challenge.

...read moreread less

Journal Article•DOI•

RSCNN: A CNN-Based Method to Enhance Low-Light Remote-Sensing Images

[...]

Linshu Hu, Mengjiao Qin, Feng Zhang, Zhenhong Du, Renyi Liu - Show less +1 more

26 Dec 2020-Remote Sensing

TL;DR: In this article, the authors adopt CNN to propose a new neural network architecture with end-to-end strategy for low-light remote-sensing image enhancement, named Remote-Sensing CNN (RSCNN).

...read moreread less

Abstract: Image enhancement (IE) technology can help enhance the brightness of remote-sensing images to obtain better interpretation and visualization effects. Convolutional neural networks (CNN), such as the Low-light CNN (LLCNN) and Super-resolution CNN (SRCNN), have achieved great success in image enhancement, image super resolution, and other image-processing applications. Therefore, we adopt CNN to propose a new neural network architecture with end-to-end strategy for low-light remote-sensing IE, named remote-sensing CNN (RSCNN). In RSCNN, an upsampling operator is adopted to help learn more multi-scaled features. With respect to the lack of labeled training data in remote-sensing image datasets for IE, we use real natural image patches to train firstly and then perform fine-tuning operations with simulated remote-sensing image pairs. Reasonably designed experiments are carried out, and the results quantitatively show the superiority of RSCNN in terms of structural similarity index (SSIM) and peak signal-to-noise ratio (PSNR) over conventional techniques for low-light remote-sensing IE. Furthermore, the results of our method have obvious qualitative advantages in denoising and maintaining the authenticity of colors and textures.

...read moreread less

Journal Article•DOI•

Hierarchical Neural Architecture Search for Single Image Super-Resolution

[...]

Yong Guo¹, Yongsheng Luo¹, Zhenhao He¹, Jin Huang², Jian Chen¹ - Show less +1 more•Institutions (2)

South China University of Technology¹, South China Normal University²

10 Mar 2020-IEEE Signal Processing Letters

TL;DR: This work designs a hierarchical SR search space and proposes a hierarchical controller for architecture search that is able to simultaneously find promising cell-level blocks and network-level positions of upsampling layers.

...read moreread less

Abstract: Deep neural networks have exhibited promising performance in image super-resolution (SR). Most SR models follow a hierarchical architecture that contains both the cell-level design of computational blocks and the network-level design of the positions of upsampling blocks. However, designing SR models heavily relies on human expertise and is very labor-intensive. More critically, these SR models often contain a huge number of parameters and may not meet the requirements of computation resources in real-world applications. To address the above issues, we propose a Hierarchical Neural Architecture Search (HNAS) method to automatically design promising architectures with different requirements of computation cost. To this end, we design a hierarchical SR search space and propose a hierarchical controller for architecture search. Such a hierarchical controller is able to simultaneously find promising cell-level blocks and network-level positions of upsampling layers. Moreover, to design compact architectures with promising performance, we build a joint reward by considering both the performance and computation cost to guide the search process. Extensive experiments on five benchmark datasets demonstrate the superiority of our method over existing methods.

...read moreread less

Collapse