Showing papers on "Kernel (image processing) published in 2019"

PDF

Open Access

Proceedings Article•DOI•

KPConv: Flexible and Deformable Convolution for Point Clouds

[...]

Hugues Thomas¹, Charles R. Qi², Jean-Emmanuel Deschaud³, Beatriz Marcotegui³, François Goulette³, Leonidas J. Guibas⁴ - Show less +2 more•Institutions (4)

PSL Research University¹, Facebook², Mines ParisTech³, Stanford University⁴

18 Apr 2019

TL;DR: KPConv is a new design of point convolution, i.e. that operates on point clouds without any intermediate representation, that outperform state-of-the-art classification and segmentation approaches on several datasets.

...read moreread less

Abstract: We present Kernel Point Convolution (KPConv), a new design of point convolution, i.e. that operates on point clouds without any intermediate representation. The convolution weights of KPConv are located in Euclidean space by kernel points, and applied to the input points close to them. Its capacity to use any number of kernel points gives KPConv more flexibility than fixed grid convolutions. Furthermore, these locations are continuous in space and can be learned by the network. Therefore, KPConv can be extended to deformable convolutions that learn to adapt kernel points to local geometry. Thanks to a regular subsampling strategy, KPConv is also efficient and robust to varying densities. Whether they use deformable KPConv for complex tasks, or rigid KPconv for simpler tasks, our networks outperform state-of-the-art classification and segmentation approaches on several datasets. We also offer ablation studies and visualizations to provide understanding of what has been learned by KPConv and to validate the descriptive power of deformable KPConv.

...read moreread less

1,742 citations

Proceedings Article•DOI•

Selective Kernel Networks

[...]

Xiang Li¹, Wenhai Wang², Xiaolin Hu³, Jian Yang¹•Institutions (3)

Nanjing University of Science and Technology¹, Tsinghua University², Nanjing University³

01 Jun 2019

TL;DR: SKNet as discussed by the authors proposes a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information, which can capture target objects with different scales.

...read moreread less

Abstract: In standard Convolutional Neural Networks (CNNs), the receptive fields of artificial neurons in each layer are designed to share the same size. It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches. Different attentions on these branches yield different sizes of the effective receptive fields of neurons in the fusion layer. Multiple SK units are stacked to a deep network termed Selective Kernel Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show that SKNet outperforms the existing state-of-the-art architectures with lower model complexity. Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input. The code and models are available at https://github.com/implus/SKNet.

...read moreread less

1,401 citations

Proceedings Article•DOI•

PointConv: Deep Convolutional Networks on 3D Point Clouds

[...]

Wenxuan Wu¹, Zhongang Qi¹, Li Fuxin¹•Institutions (1)

Oregon State University¹

15 Jun 2019

TL;DR: The dynamic filter is extended to a new convolution operation, named PointConv, which can be applied on point clouds to build deep convolutional networks and is able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds.

...read moreread less

Abstract: Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and the density functions through kernel density estimation. A novel reformulation is proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure.

...read moreread less

1,321 citations

Proceedings Article•DOI•

Graph Attention Convolution for Point Cloud Semantic Segmentation

[...]

Lei Wang¹, Yuchun Huang¹, Yaolin Hou¹, Shenman Zhang¹, Jie Shan² - Show less +1 more•Institutions (2)

Wuhan University¹, Purdue University²

15 Jun 2019

TL;DR: A novel graph attention convolution, whose kernels can be dynamically carved into specific shapes to adapt to the structure of an object, which can capture the structured features of point clouds for fine-grained segmentation and avoid feature contamination between objects.

...read moreread less

Abstract: Standard convolution is inherently limited for semantic segmentation of point cloud due to its isotropy about features. It neglects the structure of an object, results in poor object delineation and small spurious regions in the segmentation result. This paper proposes a novel graph attention convolution (GAC), whose kernels can be dynamically carved into specific shapes to adapt to the structure of an object. Specifically, by assigning proper attentional weights to different neighboring points, GAC is designed to selectively focus on the most relevant part of them according to their dynamically learned features. The shape of the convolution kernel is then determined by the learned distribution of the attentional weights. Though simple, GAC can capture the structured features of point clouds for fine-grained segmentation and avoid feature contamination between objects. Theoretically, we provided a thorough analysis on the expressive capabilities of GAC to show how it can learn about the features of point clouds. Empirically, we evaluated the proposed GAC on challenging indoor and outdoor datasets and achieved the state-of-the-art results in both scenarios.

...read moreread less

558 citations

Proceedings Article•DOI•

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks With Octave Convolution

[...]

Yunpeng Chen¹, Haoqi Fan², Bing Xu², Zhicheng Yan², Yannis Kalantidis², Marcus Rohrbach², Yan Shuicheng¹, Jiashi Feng¹ - Show less +4 more•Institutions (2)

National University of Singapore¹, Facebook²

10 Apr 2019

TL;DR: OctConv as discussed by the authors factorizes the mixed feature maps by their frequencies, and design a novel Octave Convolution operation to store and process feature maps that vary spatially "slower" at a lower spatial resolution.

...read moreread less

Abstract: In natural images, information is conveyed at different frequencies where higher frequencies are usually encoded with fine details and lower frequencies are usually encoded with global structures. Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies. In this work, we propose to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially “slower” at a lower spatial resolution reducing both memory and computation cost. Unlike existing multi-scale methods, OctConv is formulated as a single, generic, plug-and-play convolutional unit that can be used as a direct replacement of (vanilla) convolutions without any adjustments in the network architecture. It is also orthogonal and complementary to methods that suggest better topologies or reduce channel-wise redundancy like group or depth-wise convolutions. We experimentally show that by simply replacing convolutions with OctConv, we can consistently boost accuracy for both image and video recognition tasks, while reducing memory and computational cost. An OctConv-equipped ResNet-152 can achieve 82.9% top-1 classification accuracy on ImageNet with merely 22.2 GFLOPs.

...read moreread less

374 citations

Proceedings Article•DOI•

Blind Super-Resolution With Iterative Kernel Correction

[...]

Jinjin Gu¹, Hannan Lu², Wangmeng Zuo², Chao Dong•Institutions (2)

The Chinese University of Hong Kong¹, Harbin Institute of Technology²

15 Jun 2019

TL;DR: In this paper, an iterative kernel correction (IKC) method was proposed for blur kernel estimation in blind super-resolution (SR) problem, where the blur kernels are unknown and the kernel mismatch could bring regular artifacts (either over-sharpening or over-smoothing), which can be applied to correct inaccurate blur kernels.

...read moreread less

Abstract: Deep learning based methods have dominated super-resolution (SR) field due to their remarkable performance in terms of effectiveness and efficiency. Most of these methods assume that the blur kernel during downsampling is predefined/known (e.g., bicubic). However, the blur kernels involved in real applications are complicated and unknown, resulting in severe performance drop for the advanced SR methods. In this paper, we propose an Iterative Kernel Correction (IKC) method for blur kernel estimation in blind SR problem, where the blur kernels are unknown. We draw the observation that kernel mismatch could bring regular artifacts (either over-sharpening or over-smoothing), which can be applied to correct inaccurate blur kernels. Thus we introduce an iterative correction scheme -- IKC that achieves better results than direct kernel estimation. We further propose an effective SR network architecture using spatial feature transform (SFT) layers to handle multiple blur kernels, named SFTMD. Extensive experiments on synthetic and real-world images show that the proposed IKC method with SFTMD can provide visually favorable SR results and the state-of-the-art performance in blind SR problem.

...read moreread less

357 citations

Proceedings Article•DOI•

Toward Real-World Single Image Super-Resolution: A New Benchmark and a New Model

[...]

Jianrui Cai¹, Hui Zeng¹, Hongwei Yong¹, Zisheng Cao, Lei Zhang¹ - Show less +1 more•Institutions (1)

Hong Kong Polytechnic University¹

01 Oct 2019

TL;DR: Li et al. as mentioned in this paper proposed a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image, which achieved better visual quality with sharper edges and finer textures on real-world scenes.

...read moreread less

Abstract: Most of the existing learning-based single image super-resolution (SISR) methods are trained and evaluated on simulated datasets, where the low-resolution (LR) images are generated by applying a simple and uniform degradation (i.e., bicubic downsampling) to their high-resolution (HR) counterparts. However, the degradations in real-world LR images are far more complicated. As a consequence, the SISR models trained on simulated data become less effective when applied to practical scenarios. In this paper, we build a real-world super-resolution (RealSR) dataset where paired LR-HR images on the same scene are captured by adjusting the focal length of a digital camera. An image registration algorithm is developed to progressively align the image pairs at different resolutions. Considering that the degradation kernels are naturally non-uniform in our dataset, we present a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image. Our extensive experiments demonstrate that SISR models trained on our RealSR dataset deliver better visual quality with sharper edges and finer textures on real-world scenes than those trained on simulated datasets. Though our RealSR dataset is built by using only two cameras (Canon 5D3 and Nikon D810), the trained model generalizes well to other camera devices such as Sony a7II and mobile phones.

...read moreread less

318 citations

Posted Content•

Selective Kernel Networks

[...]

Xiang Li¹, Wenhai Wang², Xiaolin Hu³, Jian Yang¹•Institutions (3)

Nanjing University of Science and Technology¹, Tsinghua University², Nanjing University³

15 Mar 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input.

...read moreread less

309 citations

Posted Content•

Dynamic Convolution: Attention over Convolution Kernels

[...]

Yinpeng Chen¹, Xiyang Dai¹, Mengchen Liu¹, Dongdong Chen¹, Lu Yuan¹, Zicheng Liu¹ - Show less +2 more•Institutions (1)

Microsoft¹

07 Dec 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: By simply using dynamic convolution for the state-of-the-art architecture MobileNetV3-Small, the top-1 accuracy of ImageNet classification is boosted by 2.9% with only 4% additional FLOPs and 2.8 AP gain is achieved on COCO keypoint detection.

...read moreread less

Abstract: Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited representation capability. To address this issue, we present Dynamic Convolution, a new design that increases model complexity without increasing the network depth or width. Instead of using a single convolution kernel per layer, dynamic convolution aggregates multiple parallel convolution kernels dynamically based upon their attentions, which are input dependent. Assembling multiple kernels is not only computationally efficient due to the small kernel size, but also has more representation power since these kernels are aggregated in a non-linear way via attention. By simply using dynamic convolution for the state-of-the-art architecture MobileNetV3-Small, the top-1 accuracy of ImageNet classification is boosted by 2.9% with only 4% additional FLOPs and 2.9 AP gain is achieved on COCO keypoint detection.

...read moreread less

303 citations

Proceedings Article•DOI•

ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks

[...]

Xiaohan Ding¹, Yuchen Guo¹, Guiguang Ding¹, Jungong Han²•Institutions (2)

Tsinghua University¹, Lancaster University²

01 Oct 2019

TL;DR: Asymmetric convolution block (ACB) as mentioned in this paper uses 1D asymmetric convolutions to strengthen the square convolution kernels, which can be trained to reach a higher level of accuracy.

...read moreread less

Abstract: As designing appropriate Convolutional Neural Network (CNN) architecture in the context of a given application usually involves heavy human works or numerous GPU hours, the research community is soliciting the architecture-neutral CNN structures, which can be easily plugged into multiple mature architectures to improve the performance on our real-world applications. We propose Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels. For an off-the-shelf architecture, we replace the standard square-kernel convolutional layers with ACBs to construct an Asymmetric Convolutional Network (ACNet), which can be trained to reach a higher level of accuracy. After training, we equivalently convert the ACNet into the same original architecture, thus requiring no extra computations anymore. We have observed that ACNet can improve the performance of various models on CIFAR and ImageNet by a clear margin. Through further experiments, we attribute the effectiveness of ACB to its capability of enhancing the model's robustness to rotational distortions and strengthening the central skeleton parts of square convolution kernels.

...read moreread less

294 citations

Proceedings Article•

General E(2)-Equivariant Steerable CNNs

[...]

Maurice Weiler¹, Gabriele Cesa•Institutions (1)

University of Amsterdam¹

01 Jan 2019

TL;DR: In this article, the authors give a general description of E(2)-equivariant convolutions in the framework of Steerable CNNs and show that these constraints for arbitrary group representations can be reduced to constraints under irreducible representations.

...read moreread less

Abstract: The big empirical success of group equivariant networks has led in recent years to the sprouting of a great variety of equivariant network architectures. A particular focus has thereby been on rotation and reflection equivariant CNNs for planar images. Here we give a general description of E(2)-equivariant convolutions in the framework of Steerable CNNs. The theory of Steerable CNNs thereby yields constraints on the convolution kernels which depend on group representations describing the transformation laws of feature spaces. We show that these constraints for arbitrary group representations can be reduced to constraints under irreducible representations. A general solution of the kernel space constraint is given for arbitrary representations of the Euclidean group E(2) and its subgroups. We implement a wide range of previously proposed and entirely new equivariant network architectures and extensively compare their performances. E(2)-steerable convolutions are further shown to yield remarkable gains on CIFAR-10, CIFAR-100 and STL-10 when used as drop in replacement for non-equivariant convolutions.

...read moreread less

Proceedings Article•DOI•

Interpolated Convolutional Networks for 3D Point Cloud Understanding

[...]

Jiageng Mao¹, Xiaogang Wang¹, Hongsheng Li¹•Institutions (1)

The Chinese University of Hong Kong¹

01 Oct 2019

TL;DR: A novel Interpolated Convolution operation, InterpConv, is proposed to tackle the point cloud feature learning and understanding problem, to utilize a set of discrete kernel weights and interpolate point features to neighboring kernel-weight coordinates by an interpolation function for convolution.

...read moreread less

Abstract: Point cloud is an important type of 3D representation. However, directly applying convolutions on point clouds is challenging due to the sparse, irregular and unordered data structure. In this paper, we propose a novel Interpolated Convolution operation, InterpConv, to tackle the point cloud feature learning and understanding problem. The key idea is to utilize a set of discrete kernel weights and interpolate point features to neighboring kernel-weight coordinates by an interpolation function for convolution. A normalization term is introduced to handle neighborhoods of different sparsity levels. Our InterpConv is shown to be permutation and sparsity invariant, and can directly handle irregular inputs. We further design Interpolated Convolutional Neural Networks (InterpCNNs) based on InterpConv layers to handle point cloud recognition tasks including shape classification, object part segmentation and indoor scene semantic parsing. Experiments show that the networks can capture both fine-grained local structures and global shape context information effectively. The proposed approach achieves state-of-the-art performance on public benchmarks including ModelNet40, ShapeNet Parts and S3DIS.

...read moreread less

Posted Content•

ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks

[...]

Xiaohan Ding¹, Yuchen Guo¹, Guiguang Ding¹, Jungong Han²•Institutions (2)

Tsinghua University¹, Lancaster University²

11 Aug 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels, which can improve the performance of various models on CIFAR and ImageNet by a clear margin.

...read moreread less

Proceedings Article•DOI•

Pixel-Adaptive Convolutional Neural Networks

[...]

Hang Su¹, Varun Jampani², Deqing Sun², Orazio Gallo², Erik Learned-Miller¹, Jan Kautz² - Show less +2 more•Institutions (2)

University of Massachusetts Amherst¹, Nvidia²

10 Apr 2019

TL;DR: In this article, a pixel-adaptive convolution (PAC) operation is proposed, in which the filter weights are multiplied with a spatially varying kernel that depends on learnable, local pixel features.

...read moreread less

Abstract: Convolutions are the fundamental building blocks of CNNs. The fact that their weights are spatially shared is one of the main reasons for their widespread use, but it is also a major limitation, as it makes convolutions content-agnostic. We propose a pixel-adaptive convolution (PAC) operation, a simple yet effective modification of standard convolutions, in which the filter weights are multiplied with a spatially varying kernel that depends on learnable, local pixel features. PAC is a generalization of several popular filtering techniques and thus can be used for a wide range of use cases. Specifically, we demonstrate state-of-the-art performance when PAC is used for deep joint image upsampling. PAC also offers an effective alternative to fully-connected CRF (Full-CRF), called PAC-CRF, which performs competitively compared to Full-CRF, while being considerably faster. In addition, we also demonstrate that PAC can be used as a drop-in replacement for convolution layers in pre-trained networks, resulting in consistent performance improvements.

...read moreread less

Journal Article•DOI•

Iterative PET Image Reconstruction Using Convolutional Neural Network Representation

[...]

Kuang Gong¹, Jiahui Guan², Kyungsang Kim¹, Xuezhu Zhang², Jaewon Yang³, Youngho Seo³, Georges El Fakhri¹, Jinyi Qi², Quanzheng Li¹ - Show less +5 more•Institutions (3)

Harvard University¹, University of California, Davis², University of California, San Francisco³

01 Mar 2019-IEEE Transactions on Medical Imaging

TL;DR: Quantification results show that the proposed iterative neural network method can outperform the neural network denoising and conventional penalized maximum likelihood methods.

...read moreread less

Abstract: PET image reconstruction is challenging due to the ill-poseness of the inverse problem and limited number of detected photons. Recently, the deep neural networks have been widely and successfully used in computer vision tasks and attracted growing interests in medical imaging. In this paper, we trained a deep residual convolutional neural network to improve PET image quality by using the existing inter-patient information. An innovative feature of the proposed method is that we embed the neural network in the iterative reconstruction framework for image representation, rather than using it as a post-processing tool. We formulate the objective function as a constrained optimization problem and solve it using the alternating direction method of multipliers algorithm. Both simulation data and hybrid real data are used to evaluate the proposed method. Quantification results show that our proposed iterative neural network method can outperform the neural network denoising and conventional penalized maximum likelihood methods.

...read moreread less

Journal Article•DOI•

InnoHAR: A Deep Neural Network for Complex Human Activity Recognition

[...]

Cheng Xu, Duo Chai, Jie He, Xiaotong Zhang, Shihong Duan - Show less +1 more

01 Jan 2019-IEEE Access

TL;DR: This work proposes a deep learning model (InnoHAR) based on the combination of inception neural network and recurrent neural network, which shows consistent superior performance and has good generalization performance, when compared with the state-of-the-art.

...read moreread less

Abstract: Human activity recognition (HAR) based on sensor networks is an important research direction in the fields of pervasive computing and body area network. Existing researches often use statistical machine learning methods to manually extract and construct features of different motions. However, in the face of extremely fast-growing waveform data with no obvious laws, the traditional feature engineering methods are becoming more and more incapable. With the development of deep learning technology, we do not need to manually extract features and can improve the performance in complex human activity recognition problems. By migrating deep neural network experience in image recognition, we propose a deep learning model (InnoHAR) based on the combination of inception neural network and recurrent neural network. The model inputs the waveform data of multi-channel sensors end-to-end. Multi-dimensional features are extracted by inception-like modules by using various kernel-based convolution layers. Combined with GRU, modeling for time series features is realized, making full use of data characteristics to complete classification tasks. Through experimental verification on three most widely used public HAR datasets, our proposed method shows consistent superior performance and has good generalization performance, when compared with the state-of-the-art.

...read moreread less

Journal Article•DOI•

PET Image Reconstruction Using Deep Image Prior

[...]

Kuang Gong¹, Ciprian Catana¹, Jinyi Qi², Quanzheng Li¹•Institutions (2)

Harvard University¹, University of California, Davis²

01 Jul 2019-IEEE Transactions on Medical Imaging

TL;DR: Quantification results based on simulation and real data show that the proposed reconstruction framework can outperform Gaussian post-smoothing and anatomically guided reconstructions using the kernel method or the neural-network penalty.

...read moreread less

Abstract: Recently, deep neural networks have been widely and successfully applied in computer vision tasks and have attracted growing interest in medical imaging. One barrier for the application of deep neural networks to medical imaging is the need for large amounts of prior training pairs, which is not always feasible in clinical practice. This is especially true for medical image reconstruction problems, where raw data are needed. Inspired by the deep image prior framework, in this paper, we proposed a personalized network training method where no prior training pairs are needed, but only the patient’s own prior information. The network is updated during the iterative reconstruction process using the patient-specific prior information and measured data. We formulated the maximum-likelihood estimation as a constrained optimization problem and solved it using the alternating direction method of multipliers algorithm. Magnetic resonance imaging guided positron emission tomography reconstruction was employed as an example to demonstrate the effectiveness of the proposed framework. Quantification results based on simulation and real data show that the proposed reconstruction framework can outperform Gaussian post-smoothing and anatomically guided reconstructions using the kernel method or the neural-network penalty.

...read moreread less

Proceedings Article•DOI•

VV-Net: Voxel VAE Net With Group Convolutions for Point Cloud Segmentation

[...]

Hsien-Yu Meng¹, Lin Gao², Yu-Kun Lai³, Dinesh Manocha⁴•Institutions (4)

Tsinghua University¹, Chinese Academy of Sciences², Cardiff University³, University of Maryland, College Park⁴

01 Oct 2019

TL;DR: This work presents a novel algorithm for point cloud segmentation that transforms unstructured point clouds into regular voxel grids, and further uses a kernel-based interpolated variational autoencoder (VAE) architecture to encode the local geometry within each voxels.

...read moreread less

Abstract: We present a novel algorithm for point cloud segmentation.Our approach transforms unstructured point clouds into regular voxel grids, and further uses a kernel-based interpolated variational autoencoder (VAE) architecture to encode the local geometry within each voxel.Traditionally, the voxel representation only comprises Boolean occupancy information, which fails to capture the sparsely distributed points within voxels in a compact manner. In order to handle sparse distributions of points, we further employ radial basis functions (RBF) to compute a local, continuous representation within each voxel. Our approach results in a good volumetric representation that effectively tackles noisy point cloud datasets and is more robust for learning. Moreover, we further introduce group equivariant CNN to 3D, by defining the convolution operator on a symmetry group acting on $\mathbb{Z}^3$ and its isomorphic sets. This improves the expressive capacity without increasing parameters, leading to more robust segmentation results.We highlight the performance on standard benchmarks and show that our approach outperforms state-of-the-art segmentation algorithms on the ShapeNet and S3DIS datasets.

...read moreread less

Book Chapter•DOI•

Dual Encoding U-Net for Retinal Vessel Segmentation

[...]

Bo Wang¹, Shuang Qiu¹, Huiguang He¹•Institutions (1)

Chinese Academy of Sciences¹

13 Oct 2019

TL;DR: A novel Dual Encoding U-Net (DEU-Net), which has two encoders: a spatial path with large kernel to preserve the spatial information and a context path with multiscale convolution block to capture more semantic information, and a feature fusion module to combine the different level of feature representation.

...read moreread less

Abstract: Retinal Vessel Segmentation is an essential step for the early diagnosis of eye-related diseases, such as diabetes and hypertension. Segmentation of blood vessels requires both sizeable receptive field and rich spatial information. In this paper, we propose a novel Dual Encoding U-Net (DEU-Net), which have two encoders: a spatial path with large kernel to preserve the spatial information and a context path with multiscale convolution block to capture more semantic information. On the top of the two paths, we introduce a feature fusion module to combine the different level of feature representation. Besides, we apply channel attention to select useful feature map in a skip connection. Furthermore, low-level and high-level prediction are combined in multiscale prediction module for a better accuracy. We evaluated this model on the digital retinal images for vessel extraction (DRIVE) dataset and the child heart and health study (CHASEDB1) dataset. Results show that the proposed DEU-Net model achieved the state-of-the-art retinal vessel segmentation accuracy on both datasets.

...read moreread less

Proceedings Article•

MixConv: Mixed Depthwise Convolutional Kernels

[...]

Mingxing Tan¹, Quoc V. Le¹•Institutions (1)

Google¹

22 Jul 2019

TL;DR: This paper proposes a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a single convolution, and improves the accuracy and efficiency for existing MobileNets on both ImageNet classification and COCO object detection.

...read moreread less

Abstract: Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often overlooked. In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency. Based on this observation, we propose a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a single convolution. As a simple drop-in replacement of vanilla depthwise convolution, our MixConv improves the accuracy and efficiency for existing MobileNets on both ImageNet classification and COCO object detection. To demonstrate the effectiveness of MixConv, we integrate it into AutoML search space and develop a new family of models, named as MixNets, which outperform previous mobile models including MobileNetV2 [20] (ImageNet top-1 accuracy +4.2%), ShuffleNetV2 [16] (+3.5%), MnasNet [26] (+1.3%), ProxylessNAS [2] (+2.2%), and FBNet [27] (+2.0%). In particular, our MixNet-L achieves a new state-of-the-art 78.9% ImageNet top-1 accuracy under typical mobile settings (<600M FLOPS). Code is at this https URL tensorflow/tpu/tree/master/models/official/mnasnet/mixnet

...read moreread less

Proceedings Article•DOI•

Unsupervised Learning for Real-World Super-Resolution

[...]

Andreas Lugmayr¹, Martin Danelljan¹, Radu Timofte¹•Institutions (1)

ETH Zurich¹

20 Sep 2019

TL;DR: This work learns to invert the effects of bicubic downsampling in order to restore the natural image characteristics present in the data, and can be trained with direct pixel-wise supervision in the high resolution domain, while robustly generalizing to real input.

...read moreread less

Abstract: Most current super-resolution methods rely on low and high resolution image pairs to train a network in a fully supervised manner. However, such image pairs are not available in real-world applications. Instead of directly addressing this problem, most works employ the popular bicubic downsampling strategy to artificially generate a corresponding low resolution image. Unfortunately, this strategy introduces significant artifacts, removing natural sensor noise and other real-world characteristics. Super-resolution networks trained on such bicubic images therefore struggle to generalize to natural images. In this work, we propose an unsupervised approach for image super-resolution. Given only unpaired data, we learn to invert the effects of bicubic downsampling in order to restore the natural image characteristics present in the data. This allows us to generate realistic image pairs, faithfully reflecting the distribution of real-world images. Our super-resolution network can therefore be trained with direct pixel-wise supervision in the high resolution domain, while robustly generalizing to real input. We demonstrate the effectiveness of our approach in quantitative and qualitative experiments.

...read moreread less

Journal Article•DOI•

Scale-Free Convolutional Neural Network for Remote Sensing Scene Classification

[...]

Jie Xie¹, Nanjun He¹, Leyuan Fang¹, Antonio Plaza²•Institutions (2)

Hunan University¹, University of Extremadura²

25 Apr 2019-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A scale-free CNN (SF-CNN) is introduced for remote sensing scene classification that not only allows the input images to be of arbitrary sizes but also retain the ability to extract discriminative features using a traditional sliding-window-based strategy.

...read moreread less

Abstract: Fine-tuning of pretrained convolutional neural networks (CNNs) has been proven to be an effective strategy for remote sensing image scene classification, particularly when a limited number of labeled data sets are available for training purposes. However, such a fine-tuning process often needs that the input images are resized into a fixed size to generate input vectors of the size required by fully connected layers (FCLs) in the pretrained CNN model. Such a resizing process often discards key information in the scenes and thus deteriorates the classification performance. To address this issue, in this paper, we introduce a scale-free CNN (SF-CNN) for remote sensing scene classification. Specifically, the FCLs in the CNN model are first converted into convolutional layers, which not only allow the input images to be of arbitrary sizes but also retain the ability to extract discriminative features using a traditional sliding-window-based strategy. Then, a global average pooling (GAP) layer is added after the final convolutional layer so that input images of arbitrary size can be mapped to feature maps of uniform size. Finally, we utilize the resulting feature maps to create a new FCL that is fed to a softmax layer for final classification. Our experimental results conducted using several real data sets demonstrate the superiority of the proposed SF-CNN method over several well-known classification methods, including pretrained CNN-based ones.

...read moreread less

Proceedings Article•

Spherical CNNs on Unstructured Grids

[...]

Chiyu "Max" Jiang¹, Jingwei Huang², Karthik Kashinath³, Prabhat³, Philip Marcus¹, Matthias Niessner⁴ - Show less +2 more•Institutions (4)

University of California, Berkeley¹, Stanford University², Lawrence Berkeley National Laboratory³, Technische Universität München⁴

01 Jan 2019

TL;DR: In this paper, an efficient convolution kernel for convolutional neural networks (CNNs) on unstructured grids using parameterized differential operators while focusing on spherical signals such as panorama images or planetary signals was proposed.

...read moreread less

Abstract: We present an efficient convolution kernel for Convolutional Neural Networks (CNNs) on unstructured grids using parameterized differential operators while focusing on spherical signals such as panorama images or planetary signals. To this end, we replace conventional convolution kernels with linear combinations of differential operators that are weighted by learnable parameters. Differential operators can be efficiently estimated on unstructured grids using one-ring neighbors, and learnable parameters can be optimized through standard back-propagation. As a result, we obtain extremely efficient neural networks that match or outperform state-of-the-art network architectures in terms of performance but with a significantly lower number of network parameters. We evaluate our algorithm in an extensive series of experiments on a variety of computer vision and climate science tasks, including shape classification, climate pattern segmentation, and omnidirectional image semantic segmentation. Overall, we present (1) a novel CNN approach on unstructured grids using parameterized differential operators for spherical signals, and (2) we show that our unique kernel parameterization allows our model to achieve the same or higher accuracy with significantly fewer network parameters.

...read moreread less

Posted Content•

Blind Super-Resolution Kernel Estimation using an Internal-GAN

[...]

Sefi Bell-Kligler, Assaf Shocher¹, Michal Irani¹•Institutions (1)

Weizmann Institute of Science¹

14 Sep 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: KernelGAN is introduced, an image-specific Internal-GAN, which trains solely on the LR test image at test time, and learns its internal distribution of patches, and leads to state-of-the-art results in Blind-SR when plugged into existing SR algorithms.

...read moreread less

Abstract: Super resolution (SR) methods typically assume that the low-resolution (LR) image was downscaled from the unknown high-resolution (HR) image by a fixed 'ideal' downscaling kernel (e.g. Bicubic downscaling). However, this is rarely the case in real LR images, in contrast to synthetically generated SR datasets. When the assumed downscaling kernel deviates from the true one, the performance of SR methods significantly deteriorates. This gave rise to Blind-SR - namely, SR when the downscaling kernel ("SR-kernel") is unknown. It was further shown that the true SR-kernel is the one that maximizes the recurrence of patches across scales of the LR image. In this paper we show how this powerful cross-scale recurrence property can be realized using Deep Internal Learning. We introduce "KernelGAN", an image-specific Internal-GAN, which trains solely on the LR test image at test time, and learns its internal distribution of patches. Its Generator is trained to produce a downscaled version of the LR test image, such that its Discriminator cannot distinguish between the patch distribution of the downscaled image, and the patch distribution of the original LR image. The Generator, once trained, constitutes the downscaling operation with the correct image-specific SR-kernel. KernelGAN is fully unsupervised, requires no training data other than the input image itself, and leads to state-of-the-art results in Blind-SR when plugged into existing SR algorithms.

...read moreread less

Posted Content•

MixConv: Mixed Depthwise Convolutional Kernels

[...]

Mingxing Tan¹, Quoc V. Le¹•Institutions (1)

Google¹

22 Jul 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: MixConv as discussed by the authors proposes a new mixed depthwise convolution, which naturally mixes up multiple kernel sizes in a single convolution and improves the accuracy and efficiency for existing MobileNets on both ImageNet classification and object detection.

...read moreread less

Proceedings Article•DOI•

Blind Image Deblurring With Local Maximum Gradient Prior

[...]

Liang Chen¹, Faming Fang¹, Tingting Wang¹, Guixu Zhang¹•Institutions (1)

East China Normal University¹

15 Jun 2019

TL;DR: A blind deblurring method based on Local Maximum Gradient (LMG) prior, inspired by the simple and intuitive observation that the maximum value of a local patch gradient will diminish after the blur process, which is proved to be true both mathematically and empirically.

...read moreread less

Abstract: Blind image deblurring aims to recover sharp image from a blurred one while the blur kernel is unknown. To solve this ill-posed problem, a great amount of image priors have been explored and employed in this area. In this paper, we present a blind deblurring method based on Local Maximum Gradient (LMG) prior. Our work is inspired by the simple and intuitive observation that the maximum value of a local patch gradient will diminish after the blur process, which is proved to be true both mathematically and empirically. This inherent property of blur process helps us to establish a new energy function. By introducing an liner operator to compute the Local Maximum Gradient, together with an effective optimization scheme, our method can handle various specific scenarios. Extensive experimental results illustrate that our method is able to achieve favorable performance against state-of-the-art algorithms on both synthetic and real-world images.

...read moreread less

Journal Article•DOI•

Nonlinear random matrix theory for deep learning

[...]

Jeffrey Pennington¹, Pratik Worah¹•Institutions (1)

Google¹

20 Dec 2019-Journal of Statistical Mechanics: Theory and Experiment

TL;DR: This work demonstrates that the pointwise nonlinearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method, and identifies an intriguing new class of activation functions with favorable properties.

...read moreread less

Abstract: Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obstacle in this direction is that neural networks are nonlinear, which prevents the straightforward utilization of many of the existing mathematical results. In this work, we open the door for direct applications of random matrix theory to deep learning by demonstrating that the pointwise nonlinearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method. The test case for our study is the Gram matrix $Y^TY$, $Y=f(WX)$, where $W$ is a random weight matrix, $X$ is a random data matrix, and $f$ is a pointwise nonlinear activation function. We derive an explicit representation for the trace of the resolvent of this matrix, which defines its limiting spectral distribution. We apply these results to the computation of the asymptotic performance of single-layer random feature methods on a memorization task and to the analysis of the eigenvalues of the data covariance matrix as it propagates through a neural network. As a byproduct of our analysis, we identify an intriguing new class of activation functions with favorable properties.

...read moreread less

Journal Article•DOI•

Infrared Small Target Detection Based on Facet Kernel and Random Walker

[...]

Yao Qin¹, Lorenzo Bruzzone², Chengqiang Gao³, Biao Li¹•Institutions (3)

National University of Defense Technology¹, University of Trento², Chongqing University of Posts and Telecommunications³

03 May 2019-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A novel small target detection algorithm derived from facet kernel and random walker algorithm which includes four main stages and a novel local contrast descriptor (NLCD) based on the RW algorithm to achieve clutter suppression and target enhancement is proposed.

...read moreread less

Abstract: Efficient detection of targets immersed in a complex background with a low signal-to-clutter ratio (SCR) is very important in infrared search and tracking (IRST) applications. In this paper, we address the target detection problem in terms of local image segmentation and propose a novel small target detection algorithm derived from facet kernel and random walker (RW) algorithm which includes four main stages. First, since the RW algorithm is suitable for images with less noises, local order-statistic and mean filtering are applied to remove the pixel-sized noises with high brightness (PNHB) and smooth the infrared images. Second, the infrared image is filtered by the facet kernel to enhance the target pixels and candidate target pixels are extracted by an adaptive threshold operation. Third, inspired by the properties of infrared targets, a novel local contrast descriptor (NLCD) based on the RW algorithm is proposed to achieve clutter suppression and target enhancement. Then, the candidate target pixels are selected as central pixels to construct the local regions and the NLCD map of all local regions is computed. The obtained NLCD map is weighted by the filtered map of facet kernel to further enhance target. Finally, the target is detected by a thresholding operation on the weighted map. Experimental results on three data sets show that the proposed method outperforms conventional baseline methods in terms of target detection accuracy.

...read moreread less

Proceedings Article•DOI•

Kernel Modeling Super-Resolution on Real Low-Resolution Images

[...]

Ruofan Zhou¹, Sabine Süsstrunk¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 Oct 2019

TL;DR: The proposed KMSR consists of two stages: a pool of realistic blur-kernels with a generative adversarial network (GAN) and then a super-resolution network with HR and corresponding LR images constructed with the generated kernels that incorporates blur-kernel modeling in the training.

...read moreread less

Abstract: Deep convolutional neural networks (CNNs), trained on corresponding pairs of high- and low-resolution images, achieve state-of-the-art performance in single-image super-resolution and surpass previous signal-processing based approaches. However, their performance is limited when applied to real photographs. The reason lies in their training data: low-resolution (LR) images are obtained by bicubic interpolation of the corresponding high-resolution (HR) images. The applied convolution kernel significantly differs from real-world camera-blur. Consequently, while current CNNs well super-resolve bicubic-downsampled LR images, they often fail on camera-captured LR images. To improve generalization and robustness of deep super-resolution CNNs on real photographs, we present a kernel modeling super-resolution network (KMSR) that incorporates blur-kernel modeling in the training. Our proposed KMSR consists of two stages: we first build a pool of realistic blur-kernels with a generative adversarial network (GAN) and then we train a super-resolution network with HR and corresponding LR images constructed with the generated kernels. Our extensive experimental validations demonstrate the effectiveness of our single-image super-resolution approach on photographs with unknown blur-kernels.

...read moreread less