scispace - formally typeset
Search or ask a question

Showing papers on "Kernel (image processing) published in 2019"


Proceedings ArticleDOI
18 Apr 2019
TL;DR: KPConv is a new design of point convolution, i.e. that operates on point clouds without any intermediate representation, that outperform state-of-the-art classification and segmentation approaches on several datasets.
Abstract: We present Kernel Point Convolution (KPConv), a new design of point convolution, i.e. that operates on point clouds without any intermediate representation. The convolution weights of KPConv are located in Euclidean space by kernel points, and applied to the input points close to them. Its capacity to use any number of kernel points gives KPConv more flexibility than fixed grid convolutions. Furthermore, these locations are continuous in space and can be learned by the network. Therefore, KPConv can be extended to deformable convolutions that learn to adapt kernel points to local geometry. Thanks to a regular subsampling strategy, KPConv is also efficient and robust to varying densities. Whether they use deformable KPConv for complex tasks, or rigid KPconv for simpler tasks, our networks outperform state-of-the-art classification and segmentation approaches on several datasets. We also offer ablation studies and visualizations to provide understanding of what has been learned by KPConv and to validate the descriptive power of deformable KPConv.

1,742 citations


Proceedings ArticleDOI
01 Jun 2019
TL;DR: SKNet as discussed by the authors proposes a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information, which can capture target objects with different scales.
Abstract: In standard Convolutional Neural Networks (CNNs), the receptive fields of artificial neurons in each layer are designed to share the same size. It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches. Different attentions on these branches yield different sizes of the effective receptive fields of neurons in the fusion layer. Multiple SK units are stacked to a deep network termed Selective Kernel Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show that SKNet outperforms the existing state-of-the-art architectures with lower model complexity. Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input. The code and models are available at https://github.com/implus/SKNet.

1,401 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: The dynamic filter is extended to a new convolution operation, named PointConv, which can be applied on point clouds to build deep convolutional networks and is able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds.
Abstract: Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and the density functions through kernel density estimation. A novel reformulation is proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure.

1,321 citations


Proceedings ArticleDOI
Lei Wang1, Yuchun Huang1, Yaolin Hou1, Shenman Zhang1, Jie Shan2 
15 Jun 2019
TL;DR: A novel graph attention convolution, whose kernels can be dynamically carved into specific shapes to adapt to the structure of an object, which can capture the structured features of point clouds for fine-grained segmentation and avoid feature contamination between objects.
Abstract: Standard convolution is inherently limited for semantic segmentation of point cloud due to its isotropy about features. It neglects the structure of an object, results in poor object delineation and small spurious regions in the segmentation result. This paper proposes a novel graph attention convolution (GAC), whose kernels can be dynamically carved into specific shapes to adapt to the structure of an object. Specifically, by assigning proper attentional weights to different neighboring points, GAC is designed to selectively focus on the most relevant part of them according to their dynamically learned features. The shape of the convolution kernel is then determined by the learned distribution of the attentional weights. Though simple, GAC can capture the structured features of point clouds for fine-grained segmentation and avoid feature contamination between objects. Theoretically, we provided a thorough analysis on the expressive capabilities of GAC to show how it can learn about the features of point clouds. Empirically, we evaluated the proposed GAC on challenging indoor and outdoor datasets and achieved the state-of-the-art results in both scenarios.

558 citations


Proceedings ArticleDOI
10 Apr 2019
TL;DR: OctConv as discussed by the authors factorizes the mixed feature maps by their frequencies, and design a novel Octave Convolution operation to store and process feature maps that vary spatially "slower" at a lower spatial resolution.
Abstract: In natural images, information is conveyed at different frequencies where higher frequencies are usually encoded with fine details and lower frequencies are usually encoded with global structures. Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies. In this work, we propose to factorize the mixed feature maps by their frequencies, and design a novel Octave Convolution (OctConv) operation to store and process feature maps that vary spatially “slower” at a lower spatial resolution reducing both memory and computation cost. Unlike existing multi-scale methods, OctConv is formulated as a single, generic, plug-and-play convolutional unit that can be used as a direct replacement of (vanilla) convolutions without any adjustments in the network architecture. It is also orthogonal and complementary to methods that suggest better topologies or reduce channel-wise redundancy like group or depth-wise convolutions. We experimentally show that by simply replacing convolutions with OctConv, we can consistently boost accuracy for both image and video recognition tasks, while reducing memory and computational cost. An OctConv-equipped ResNet-152 can achieve 82.9% top-1 classification accuracy on ImageNet with merely 22.2 GFLOPs.

374 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this paper, an iterative kernel correction (IKC) method was proposed for blur kernel estimation in blind super-resolution (SR) problem, where the blur kernels are unknown and the kernel mismatch could bring regular artifacts (either over-sharpening or over-smoothing), which can be applied to correct inaccurate blur kernels.
Abstract: Deep learning based methods have dominated super-resolution (SR) field due to their remarkable performance in terms of effectiveness and efficiency. Most of these methods assume that the blur kernel during downsampling is predefined/known (e.g., bicubic). However, the blur kernels involved in real applications are complicated and unknown, resulting in severe performance drop for the advanced SR methods. In this paper, we propose an Iterative Kernel Correction (IKC) method for blur kernel estimation in blind SR problem, where the blur kernels are unknown. We draw the observation that kernel mismatch could bring regular artifacts (either over-sharpening or over-smoothing), which can be applied to correct inaccurate blur kernels. Thus we introduce an iterative correction scheme -- IKC that achieves better results than direct kernel estimation. We further propose an effective SR network architecture using spatial feature transform (SFT) layers to handle multiple blur kernels, named SFTMD. Extensive experiments on synthetic and real-world images show that the proposed IKC method with SFTMD can provide visually favorable SR results and the state-of-the-art performance in blind SR problem.

357 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: Li et al. as mentioned in this paper proposed a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image, which achieved better visual quality with sharper edges and finer textures on real-world scenes.
Abstract: Most of the existing learning-based single image super-resolution (SISR) methods are trained and evaluated on simulated datasets, where the low-resolution (LR) images are generated by applying a simple and uniform degradation (i.e., bicubic downsampling) to their high-resolution (HR) counterparts. However, the degradations in real-world LR images are far more complicated. As a consequence, the SISR models trained on simulated data become less effective when applied to practical scenarios. In this paper, we build a real-world super-resolution (RealSR) dataset where paired LR-HR images on the same scene are captured by adjusting the focal length of a digital camera. An image registration algorithm is developed to progressively align the image pairs at different resolutions. Considering that the degradation kernels are naturally non-uniform in our dataset, we present a Laplacian pyramid based kernel prediction network (LP-KPN), which efficiently learns per-pixel kernels to recover the HR image. Our extensive experiments demonstrate that SISR models trained on our RealSR dataset deliver better visual quality with sharper edges and finer textures on real-world scenes than those trained on simulated datasets. Though our RealSR dataset is built by using only two cameras (Canon 5D3 and Nikon D810), the trained model generalizes well to other camera devices such as Sony a7II and mobile phones.

318 citations


Posted Content
TL;DR: Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input.
Abstract: In standard Convolutional Neural Networks (CNNs), the receptive fields of artificial neurons in each layer are designed to share the same size. It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches. Different attentions on these branches yield different sizes of the effective receptive fields of neurons in the fusion layer. Multiple SK units are stacked to a deep network termed Selective Kernel Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show that SKNet outperforms the existing state-of-the-art architectures with lower model complexity. Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input. The code and models are available at this https URL.

309 citations


Posted Content
Yinpeng Chen1, Xiyang Dai1, Mengchen Liu1, Dongdong Chen1, Lu Yuan1, Zicheng Liu1 
TL;DR: By simply using dynamic convolution for the state-of-the-art architecture MobileNetV3-Small, the top-1 accuracy of ImageNet classification is boosted by 2.9% with only 4% additional FLOPs and 2.8 AP gain is achieved on COCO keypoint detection.
Abstract: Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited representation capability. To address this issue, we present Dynamic Convolution, a new design that increases model complexity without increasing the network depth or width. Instead of using a single convolution kernel per layer, dynamic convolution aggregates multiple parallel convolution kernels dynamically based upon their attentions, which are input dependent. Assembling multiple kernels is not only computationally efficient due to the small kernel size, but also has more representation power since these kernels are aggregated in a non-linear way via attention. By simply using dynamic convolution for the state-of-the-art architecture MobileNetV3-Small, the top-1 accuracy of ImageNet classification is boosted by 2.9% with only 4% additional FLOPs and 2.9 AP gain is achieved on COCO keypoint detection.

303 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: Asymmetric convolution block (ACB) as mentioned in this paper uses 1D asymmetric convolutions to strengthen the square convolution kernels, which can be trained to reach a higher level of accuracy.
Abstract: As designing appropriate Convolutional Neural Network (CNN) architecture in the context of a given application usually involves heavy human works or numerous GPU hours, the research community is soliciting the architecture-neutral CNN structures, which can be easily plugged into multiple mature architectures to improve the performance on our real-world applications. We propose Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels. For an off-the-shelf architecture, we replace the standard square-kernel convolutional layers with ACBs to construct an Asymmetric Convolutional Network (ACNet), which can be trained to reach a higher level of accuracy. After training, we equivalently convert the ACNet into the same original architecture, thus requiring no extra computations anymore. We have observed that ACNet can improve the performance of various models on CIFAR and ImageNet by a clear margin. Through further experiments, we attribute the effectiveness of ACB to its capability of enhancing the model's robustness to rotational distortions and strengthening the central skeleton parts of square convolution kernels.

294 citations


Proceedings Article
01 Jan 2019
TL;DR: In this article, the authors give a general description of E(2)-equivariant convolutions in the framework of Steerable CNNs and show that these constraints for arbitrary group representations can be reduced to constraints under irreducible representations.
Abstract: The big empirical success of group equivariant networks has led in recent years to the sprouting of a great variety of equivariant network architectures. A particular focus has thereby been on rotation and reflection equivariant CNNs for planar images. Here we give a general description of E(2)-equivariant convolutions in the framework of Steerable CNNs. The theory of Steerable CNNs thereby yields constraints on the convolution kernels which depend on group representations describing the transformation laws of feature spaces. We show that these constraints for arbitrary group representations can be reduced to constraints under irreducible representations. A general solution of the kernel space constraint is given for arbitrary representations of the Euclidean group E(2) and its subgroups. We implement a wide range of previously proposed and entirely new equivariant network architectures and extensively compare their performances. E(2)-steerable convolutions are further shown to yield remarkable gains on CIFAR-10, CIFAR-100 and STL-10 when used as drop in replacement for non-equivariant convolutions.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: A novel Interpolated Convolution operation, InterpConv, is proposed to tackle the point cloud feature learning and understanding problem, to utilize a set of discrete kernel weights and interpolate point features to neighboring kernel-weight coordinates by an interpolation function for convolution.
Abstract: Point cloud is an important type of 3D representation. However, directly applying convolutions on point clouds is challenging due to the sparse, irregular and unordered data structure. In this paper, we propose a novel Interpolated Convolution operation, InterpConv, to tackle the point cloud feature learning and understanding problem. The key idea is to utilize a set of discrete kernel weights and interpolate point features to neighboring kernel-weight coordinates by an interpolation function for convolution. A normalization term is introduced to handle neighborhoods of different sparsity levels. Our InterpConv is shown to be permutation and sparsity invariant, and can directly handle irregular inputs. We further design Interpolated Convolutional Neural Networks (InterpCNNs) based on InterpConv layers to handle point cloud recognition tasks including shape classification, object part segmentation and indoor scene semantic parsing. Experiments show that the networks can capture both fine-grained local structures and global shape context information effectively. The proposed approach achieves state-of-the-art performance on public benchmarks including ModelNet40, ShapeNet Parts and S3DIS.

Posted Content
TL;DR: Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels, which can improve the performance of various models on CIFAR and ImageNet by a clear margin.
Abstract: As designing appropriate Convolutional Neural Network (CNN) architecture in the context of a given application usually involves heavy human works or numerous GPU hours, the research community is soliciting the architecture-neutral CNN structures, which can be easily plugged into multiple mature architectures to improve the performance on our real-world applications. We propose Asymmetric Convolution Block (ACB), an architecture-neutral structure as a CNN building block, which uses 1D asymmetric convolutions to strengthen the square convolution kernels. For an off-the-shelf architecture, we replace the standard square-kernel convolutional layers with ACBs to construct an Asymmetric Convolutional Network (ACNet), which can be trained to reach a higher level of accuracy. After training, we equivalently convert the ACNet into the same original architecture, thus requiring no extra computations anymore. We have observed that ACNet can improve the performance of various models on CIFAR and ImageNet by a clear margin. Through further experiments, we attribute the effectiveness of ACB to its capability of enhancing the model's robustness to rotational distortions and strengthening the central skeleton parts of square convolution kernels.

Proceedings ArticleDOI
10 Apr 2019
TL;DR: In this article, a pixel-adaptive convolution (PAC) operation is proposed, in which the filter weights are multiplied with a spatially varying kernel that depends on learnable, local pixel features.
Abstract: Convolutions are the fundamental building blocks of CNNs. The fact that their weights are spatially shared is one of the main reasons for their widespread use, but it is also a major limitation, as it makes convolutions content-agnostic. We propose a pixel-adaptive convolution (PAC) operation, a simple yet effective modification of standard convolutions, in which the filter weights are multiplied with a spatially varying kernel that depends on learnable, local pixel features. PAC is a generalization of several popular filtering techniques and thus can be used for a wide range of use cases. Specifically, we demonstrate state-of-the-art performance when PAC is used for deep joint image upsampling. PAC also offers an effective alternative to fully-connected CRF (Full-CRF), called PAC-CRF, which performs competitively compared to Full-CRF, while being considerably faster. In addition, we also demonstrate that PAC can be used as a drop-in replacement for convolution layers in pre-trained networks, resulting in consistent performance improvements.

Journal ArticleDOI
TL;DR: Quantification results show that the proposed iterative neural network method can outperform the neural network denoising and conventional penalized maximum likelihood methods.
Abstract: PET image reconstruction is challenging due to the ill-poseness of the inverse problem and limited number of detected photons. Recently, the deep neural networks have been widely and successfully used in computer vision tasks and attracted growing interests in medical imaging. In this paper, we trained a deep residual convolutional neural network to improve PET image quality by using the existing inter-patient information. An innovative feature of the proposed method is that we embed the neural network in the iterative reconstruction framework for image representation, rather than using it as a post-processing tool. We formulate the objective function as a constrained optimization problem and solve it using the alternating direction method of multipliers algorithm. Both simulation data and hybrid real data are used to evaluate the proposed method. Quantification results show that our proposed iterative neural network method can outperform the neural network denoising and conventional penalized maximum likelihood methods.

Journal ArticleDOI
TL;DR: This work proposes a deep learning model (InnoHAR) based on the combination of inception neural network and recurrent neural network, which shows consistent superior performance and has good generalization performance, when compared with the state-of-the-art.
Abstract: Human activity recognition (HAR) based on sensor networks is an important research direction in the fields of pervasive computing and body area network. Existing researches often use statistical machine learning methods to manually extract and construct features of different motions. However, in the face of extremely fast-growing waveform data with no obvious laws, the traditional feature engineering methods are becoming more and more incapable. With the development of deep learning technology, we do not need to manually extract features and can improve the performance in complex human activity recognition problems. By migrating deep neural network experience in image recognition, we propose a deep learning model (InnoHAR) based on the combination of inception neural network and recurrent neural network. The model inputs the waveform data of multi-channel sensors end-to-end. Multi-dimensional features are extracted by inception-like modules by using various kernel-based convolution layers. Combined with GRU, modeling for time series features is realized, making full use of data characteristics to complete classification tasks. Through experimental verification on three most widely used public HAR datasets, our proposed method shows consistent superior performance and has good generalization performance, when compared with the state-of-the-art.

Journal ArticleDOI
TL;DR: Quantification results based on simulation and real data show that the proposed reconstruction framework can outperform Gaussian post-smoothing and anatomically guided reconstructions using the kernel method or the neural-network penalty.
Abstract: Recently, deep neural networks have been widely and successfully applied in computer vision tasks and have attracted growing interest in medical imaging. One barrier for the application of deep neural networks to medical imaging is the need for large amounts of prior training pairs, which is not always feasible in clinical practice. This is especially true for medical image reconstruction problems, where raw data are needed. Inspired by the deep image prior framework, in this paper, we proposed a personalized network training method where no prior training pairs are needed, but only the patient’s own prior information. The network is updated during the iterative reconstruction process using the patient-specific prior information and measured data. We formulated the maximum-likelihood estimation as a constrained optimization problem and solved it using the alternating direction method of multipliers algorithm. Magnetic resonance imaging guided positron emission tomography reconstruction was employed as an example to demonstrate the effectiveness of the proposed framework. Quantification results based on simulation and real data show that the proposed reconstruction framework can outperform Gaussian post-smoothing and anatomically guided reconstructions using the kernel method or the neural-network penalty.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: This work presents a novel algorithm for point cloud segmentation that transforms unstructured point clouds into regular voxel grids, and further uses a kernel-based interpolated variational autoencoder (VAE) architecture to encode the local geometry within each voxels.
Abstract: We present a novel algorithm for point cloud segmentation.Our approach transforms unstructured point clouds into regular voxel grids, and further uses a kernel-based interpolated variational autoencoder (VAE) architecture to encode the local geometry within each voxel.Traditionally, the voxel representation only comprises Boolean occupancy information, which fails to capture the sparsely distributed points within voxels in a compact manner. In order to handle sparse distributions of points, we further employ radial basis functions (RBF) to compute a local, continuous representation within each voxel. Our approach results in a good volumetric representation that effectively tackles noisy point cloud datasets and is more robust for learning. Moreover, we further introduce group equivariant CNN to 3D, by defining the convolution operator on a symmetry group acting on $\mathbb{Z}^3$ and its isomorphic sets. This improves the expressive capacity without increasing parameters, leading to more robust segmentation results.We highlight the performance on standard benchmarks and show that our approach outperforms state-of-the-art segmentation algorithms on the ShapeNet and S3DIS datasets.

Book ChapterDOI
13 Oct 2019
TL;DR: A novel Dual Encoding U-Net (DEU-Net), which has two encoders: a spatial path with large kernel to preserve the spatial information and a context path with multiscale convolution block to capture more semantic information, and a feature fusion module to combine the different level of feature representation.
Abstract: Retinal Vessel Segmentation is an essential step for the early diagnosis of eye-related diseases, such as diabetes and hypertension. Segmentation of blood vessels requires both sizeable receptive field and rich spatial information. In this paper, we propose a novel Dual Encoding U-Net (DEU-Net), which have two encoders: a spatial path with large kernel to preserve the spatial information and a context path with multiscale convolution block to capture more semantic information. On the top of the two paths, we introduce a feature fusion module to combine the different level of feature representation. Besides, we apply channel attention to select useful feature map in a skip connection. Furthermore, low-level and high-level prediction are combined in multiscale prediction module for a better accuracy. We evaluated this model on the digital retinal images for vessel extraction (DRIVE) dataset and the child heart and health study (CHASEDB1) dataset. Results show that the proposed DEU-Net model achieved the state-of-the-art retinal vessel segmentation accuracy on both datasets.

Proceedings Article
Mingxing Tan1, Quoc V. Le1
22 Jul 2019
TL;DR: This paper proposes a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a single convolution, and improves the accuracy and efficiency for existing MobileNets on both ImageNet classification and COCO object detection.
Abstract: Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often overlooked. In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency. Based on this observation, we propose a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a single convolution. As a simple drop-in replacement of vanilla depthwise convolution, our MixConv improves the accuracy and efficiency for existing MobileNets on both ImageNet classification and COCO object detection. To demonstrate the effectiveness of MixConv, we integrate it into AutoML search space and develop a new family of models, named as MixNets, which outperform previous mobile models including MobileNetV2 [20] (ImageNet top-1 accuracy +4.2%), ShuffleNetV2 [16] (+3.5%), MnasNet [26] (+1.3%), ProxylessNAS [2] (+2.2%), and FBNet [27] (+2.0%). In particular, our MixNet-L achieves a new state-of-the-art 78.9% ImageNet top-1 accuracy under typical mobile settings (<600M FLOPS). Code is at this https URL tensorflow/tpu/tree/master/models/official/mnasnet/mixnet

Proceedings ArticleDOI
20 Sep 2019
TL;DR: This work learns to invert the effects of bicubic downsampling in order to restore the natural image characteristics present in the data, and can be trained with direct pixel-wise supervision in the high resolution domain, while robustly generalizing to real input.
Abstract: Most current super-resolution methods rely on low and high resolution image pairs to train a network in a fully supervised manner. However, such image pairs are not available in real-world applications. Instead of directly addressing this problem, most works employ the popular bicubic downsampling strategy to artificially generate a corresponding low resolution image. Unfortunately, this strategy introduces significant artifacts, removing natural sensor noise and other real-world characteristics. Super-resolution networks trained on such bicubic images therefore struggle to generalize to natural images. In this work, we propose an unsupervised approach for image super-resolution. Given only unpaired data, we learn to invert the effects of bicubic downsampling in order to restore the natural image characteristics present in the data. This allows us to generate realistic image pairs, faithfully reflecting the distribution of real-world images. Our super-resolution network can therefore be trained with direct pixel-wise supervision in the high resolution domain, while robustly generalizing to real input. We demonstrate the effectiveness of our approach in quantitative and qualitative experiments.

Journal ArticleDOI
TL;DR: A scale-free CNN (SF-CNN) is introduced for remote sensing scene classification that not only allows the input images to be of arbitrary sizes but also retain the ability to extract discriminative features using a traditional sliding-window-based strategy.
Abstract: Fine-tuning of pretrained convolutional neural networks (CNNs) has been proven to be an effective strategy for remote sensing image scene classification, particularly when a limited number of labeled data sets are available for training purposes. However, such a fine-tuning process often needs that the input images are resized into a fixed size to generate input vectors of the size required by fully connected layers (FCLs) in the pretrained CNN model. Such a resizing process often discards key information in the scenes and thus deteriorates the classification performance. To address this issue, in this paper, we introduce a scale-free CNN (SF-CNN) for remote sensing scene classification. Specifically, the FCLs in the CNN model are first converted into convolutional layers, which not only allow the input images to be of arbitrary sizes but also retain the ability to extract discriminative features using a traditional sliding-window-based strategy. Then, a global average pooling (GAP) layer is added after the final convolutional layer so that input images of arbitrary size can be mapped to feature maps of uniform size. Finally, we utilize the resulting feature maps to create a new FCL that is fed to a softmax layer for final classification. Our experimental results conducted using several real data sets demonstrate the superiority of the proposed SF-CNN method over several well-known classification methods, including pretrained CNN-based ones.

Proceedings Article
01 Jan 2019
TL;DR: In this paper, an efficient convolution kernel for convolutional neural networks (CNNs) on unstructured grids using parameterized differential operators while focusing on spherical signals such as panorama images or planetary signals was proposed.
Abstract: We present an efficient convolution kernel for Convolutional Neural Networks (CNNs) on unstructured grids using parameterized differential operators while focusing on spherical signals such as panorama images or planetary signals. To this end, we replace conventional convolution kernels with linear combinations of differential operators that are weighted by learnable parameters. Differential operators can be efficiently estimated on unstructured grids using one-ring neighbors, and learnable parameters can be optimized through standard back-propagation. As a result, we obtain extremely efficient neural networks that match or outperform state-of-the-art network architectures in terms of performance but with a significantly lower number of network parameters. We evaluate our algorithm in an extensive series of experiments on a variety of computer vision and climate science tasks, including shape classification, climate pattern segmentation, and omnidirectional image semantic segmentation. Overall, we present (1) a novel CNN approach on unstructured grids using parameterized differential operators for spherical signals, and (2) we show that our unique kernel parameterization allows our model to achieve the same or higher accuracy with significantly fewer network parameters.

Posted Content
TL;DR: KernelGAN is introduced, an image-specific Internal-GAN, which trains solely on the LR test image at test time, and learns its internal distribution of patches, and leads to state-of-the-art results in Blind-SR when plugged into existing SR algorithms.
Abstract: Super resolution (SR) methods typically assume that the low-resolution (LR) image was downscaled from the unknown high-resolution (HR) image by a fixed 'ideal' downscaling kernel (e.g. Bicubic downscaling). However, this is rarely the case in real LR images, in contrast to synthetically generated SR datasets. When the assumed downscaling kernel deviates from the true one, the performance of SR methods significantly deteriorates. This gave rise to Blind-SR - namely, SR when the downscaling kernel ("SR-kernel") is unknown. It was further shown that the true SR-kernel is the one that maximizes the recurrence of patches across scales of the LR image. In this paper we show how this powerful cross-scale recurrence property can be realized using Deep Internal Learning. We introduce "KernelGAN", an image-specific Internal-GAN, which trains solely on the LR test image at test time, and learns its internal distribution of patches. Its Generator is trained to produce a downscaled version of the LR test image, such that its Discriminator cannot distinguish between the patch distribution of the downscaled image, and the patch distribution of the original LR image. The Generator, once trained, constitutes the downscaling operation with the correct image-specific SR-kernel. KernelGAN is fully unsupervised, requires no training data other than the input image itself, and leads to state-of-the-art results in Blind-SR when plugged into existing SR algorithms.

Posted Content
Mingxing Tan1, Quoc V. Le1
TL;DR: MixConv as discussed by the authors proposes a new mixed depthwise convolution, which naturally mixes up multiple kernel sizes in a single convolution and improves the accuracy and efficiency for existing MobileNets on both ImageNet classification and object detection.
Abstract: Depthwise convolution is becoming increasingly popular in modern efficient ConvNets, but its kernel size is often overlooked. In this paper, we systematically study the impact of different kernel sizes, and observe that combining the benefits of multiple kernel sizes can lead to better accuracy and efficiency. Based on this observation, we propose a new mixed depthwise convolution (MixConv), which naturally mixes up multiple kernel sizes in a single convolution. As a simple drop-in replacement of vanilla depthwise convolution, our MixConv improves the accuracy and efficiency for existing MobileNets on both ImageNet classification and COCO object detection. To demonstrate the effectiveness of MixConv, we integrate it into AutoML search space and develop a new family of models, named as MixNets, which outperform previous mobile models including MobileNetV2 [20] (ImageNet top-1 accuracy +4.2%), ShuffleNetV2 [16] (+3.5%), MnasNet [26] (+1.3%), ProxylessNAS [2] (+2.2%), and FBNet [27] (+2.0%). In particular, our MixNet-L achieves a new state-of-the-art 78.9% ImageNet top-1 accuracy under typical mobile settings (<600M FLOPS). Code is at this https URL tensorflow/tpu/tree/master/models/official/mnasnet/mixnet

Proceedings ArticleDOI
15 Jun 2019
TL;DR: A blind deblurring method based on Local Maximum Gradient (LMG) prior, inspired by the simple and intuitive observation that the maximum value of a local patch gradient will diminish after the blur process, which is proved to be true both mathematically and empirically.
Abstract: Blind image deblurring aims to recover sharp image from a blurred one while the blur kernel is unknown. To solve this ill-posed problem, a great amount of image priors have been explored and employed in this area. In this paper, we present a blind deblurring method based on Local Maximum Gradient (LMG) prior. Our work is inspired by the simple and intuitive observation that the maximum value of a local patch gradient will diminish after the blur process, which is proved to be true both mathematically and empirically. This inherent property of blur process helps us to establish a new energy function. By introducing an liner operator to compute the Local Maximum Gradient, together with an effective optimization scheme, our method can handle various specific scenarios. Extensive experimental results illustrate that our method is able to achieve favorable performance against state-of-the-art algorithms on both synthetic and real-world images.

Journal ArticleDOI
Jeffrey Pennington1, Pratik Worah1
TL;DR: This work demonstrates that the pointwise nonlinearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method, and identifies an intriguing new class of activation functions with favorable properties.
Abstract: Neural network configurations with random weights play an important role in the analysis of deep learning. They define the initial loss landscape and are closely related to kernel and random feature methods. Despite the fact that these networks are built out of random matrices, the vast and powerful machinery of random matrix theory has so far found limited success in studying them. A main obstacle in this direction is that neural networks are nonlinear, which prevents the straightforward utilization of many of the existing mathematical results. In this work, we open the door for direct applications of random matrix theory to deep learning by demonstrating that the pointwise nonlinearities typically applied in neural networks can be incorporated into a standard method of proof in random matrix theory known as the moments method. The test case for our study is the Gram matrix $Y^TY$, $Y=f(WX)$, where $W$ is a random weight matrix, $X$ is a random data matrix, and $f$ is a pointwise nonlinear activation function. We derive an explicit representation for the trace of the resolvent of this matrix, which defines its limiting spectral distribution. We apply these results to the computation of the asymptotic performance of single-layer random feature methods on a memorization task and to the analysis of the eigenvalues of the data covariance matrix as it propagates through a neural network. As a byproduct of our analysis, we identify an intriguing new class of activation functions with favorable properties.

Journal ArticleDOI
TL;DR: A novel small target detection algorithm derived from facet kernel and random walker algorithm which includes four main stages and a novel local contrast descriptor (NLCD) based on the RW algorithm to achieve clutter suppression and target enhancement is proposed.
Abstract: Efficient detection of targets immersed in a complex background with a low signal-to-clutter ratio (SCR) is very important in infrared search and tracking (IRST) applications. In this paper, we address the target detection problem in terms of local image segmentation and propose a novel small target detection algorithm derived from facet kernel and random walker (RW) algorithm which includes four main stages. First, since the RW algorithm is suitable for images with less noises, local order-statistic and mean filtering are applied to remove the pixel-sized noises with high brightness (PNHB) and smooth the infrared images. Second, the infrared image is filtered by the facet kernel to enhance the target pixels and candidate target pixels are extracted by an adaptive threshold operation. Third, inspired by the properties of infrared targets, a novel local contrast descriptor (NLCD) based on the RW algorithm is proposed to achieve clutter suppression and target enhancement. Then, the candidate target pixels are selected as central pixels to construct the local regions and the NLCD map of all local regions is computed. The obtained NLCD map is weighted by the filtered map of facet kernel to further enhance target. Finally, the target is detected by a thresholding operation on the weighted map. Experimental results on three data sets show that the proposed method outperforms conventional baseline methods in terms of target detection accuracy.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: The proposed KMSR consists of two stages: a pool of realistic blur-kernels with a generative adversarial network (GAN) and then a super-resolution network with HR and corresponding LR images constructed with the generated kernels that incorporates blur-kernel modeling in the training.
Abstract: Deep convolutional neural networks (CNNs), trained on corresponding pairs of high- and low-resolution images, achieve state-of-the-art performance in single-image super-resolution and surpass previous signal-processing based approaches. However, their performance is limited when applied to real photographs. The reason lies in their training data: low-resolution (LR) images are obtained by bicubic interpolation of the corresponding high-resolution (HR) images. The applied convolution kernel significantly differs from real-world camera-blur. Consequently, while current CNNs well super-resolve bicubic-downsampled LR images, they often fail on camera-captured LR images. To improve generalization and robustness of deep super-resolution CNNs on real photographs, we present a kernel modeling super-resolution network (KMSR) that incorporates blur-kernel modeling in the training. Our proposed KMSR consists of two stages: we first build a pool of realistic blur-kernels with a generative adversarial network (GAN) and then we train a super-resolution network with HR and corresponding LR images constructed with the generated kernels. Our extensive experimental validations demonstrate the effectiveness of our single-image super-resolution approach on photographs with unknown blur-kernels.