scispace - formally typeset
Search or ask a question

Showing papers on "Kernel (image processing) published in 2017"


Proceedings ArticleDOI
01 Jul 2017
TL;DR: This work proposes a multi-scale convolutional neural network that restores sharp images in an end-to-end manner where blur is caused by various sources and presents a new large-scale dataset that provides pairs of realistic blurry image and the corresponding ground truth sharp image that are obtained by a high-speed camera.
Abstract: Non-uniform blind deblurring for general dynamic scenes is a challenging computer vision problem as blurs arise not only from multiple object motions but also from camera shake, scene depth variation. To remove these complicated motion blurs, conventional energy optimization based methods rely on simple assumptions such that blur kernel is partially uniform or locally linear. Moreover, recent machine learning based methods also depend on synthetic blur datasets generated under these assumptions. This makes conventional deblurring methods fail to remove blurs where blur kernel is difficult to approximate or parameterize (e.g. object motion boundaries). In this work, we propose a multi-scale convolutional neural network that restores sharp images in an end-to-end manner where blur is caused by various sources. Together, we present multi-scale loss function that mimics conventional coarse-to-fine approaches. Furthermore, we propose a new large-scale dataset that provides pairs of realistic blurry image and the corresponding ground truth sharp image that are obtained by a high-speed camera. With the proposed model trained on this dataset, we demonstrate empirically that our method achieves the state-of-the-art performance in dynamic scene deblurring not only qualitatively, but also quantitatively.

1,560 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus on proposed solutions and results and gauges the state-of-the-art in single imagesuper-resolution.
Abstract: This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus on proposed solutions and results. A new DIVerse 2K resolution image dataset (DIV2K) was employed. The challenge had 6 competitions divided into 2 tracks with 3 magnification factors each. Track 1 employed the standard bicubic downscaling setup, while Track 2 had unknown downscaling operators (blur kernel and decimation) but learnable through low and high res train images. Each competition had ∽100 registered participants and 20 teams competed in the final testing phase. They gauge the state-of-the-art in single image super-resolution.

1,243 citations


Proceedings ArticleDOI
Chao Peng1, Xiangyu Zhang, Gang Yu, Guiming Luo1, Jian Sun 
21 Jul 2017
TL;DR: This work proposes a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation and suggests a residual-based boundary refinement to further refine the object boundaries.
Abstract: One of recent trends [31, 32, 14] in network architecture design is stacking small filters (e.g., 1x1 or 3x3) in the entire network because the stacked small filters is more efficient than a large kernel, given the same computational complexity. However, in the field of semantic segmentation, where we need to perform dense per-pixel prediction, we find that the large kernel (and effective receptive field) plays an important role when we have to perform the classification and localization tasks simultaneously. Following our design principle, we propose a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. We also suggest a residual-based boundary refinement to further refine the object boundaries. Our approach achieves state-of-art performance on two public benchmarks and significantly outperforms previous results, 82.2% (vs 80.2%) on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.

1,047 citations


Posted Content
Chao Peng1, Xiangyu Zhang, Gang Yu, Guiming Luo1, Jian Sun 
TL;DR: In this paper, a Global Convolutional Network (GCN) is proposed to address both the classification and localization issues for the semantic segmentation, which achieves state-of-the-art performance on two public benchmarks.
Abstract: One of recent trends [30, 31, 14] in network architec- ture design is stacking small filters (e.g., 1x1 or 3x3) in the entire network because the stacked small filters is more ef- ficient than a large kernel, given the same computational complexity. However, in the field of semantic segmenta- tion, where we need to perform dense per-pixel prediction, we find that the large kernel (and effective receptive field) plays an important role when we have to perform the clas- sification and localization tasks simultaneously. Following our design principle, we propose a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. We also suggest a residual-based boundary refinement to further refine the ob- ject boundaries. Our approach achieves state-of-art perfor- mance on two public benchmarks and significantly outper- forms previous results, 82.2% (vs 80.2%) on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.

935 citations


Journal ArticleDOI
TL;DR: In this article, the authors present an updated summary of the penalized pixel-fitting (pPXF) method, which is used to extract the stellar and gas kinematics, as well as the stellar population of galaxies via full spectrum fitting.
Abstract: I start by providing an updated summary of the penalized pixel-fitting (pPXF) method, which is used to extract the stellar and gas kinematics, as well as the stellar population of galaxies, via full spectrum fitting. I then focus on the problem of extracting the kinematic when the velocity dispersion $\sigma$ is smaller than the velocity sampling $\Delta V$, which is generally, by design, close to the instrumental dispersion $\sigma_{\rm inst}$. The standard approach consists of convolving templates with a discretized kernel, while fitting for its parameters. This is obviously very inaccurate when $\sigma<\Delta V/2$, due to undersampling. Oversampling can prevent this, but it has drawbacks. Here I present a more accurate and efficient alternative. It avoids the evaluation of the under-sampled kernel, and instead directly computes its well-sampled analytic Fourier transform, for use with the convolution theorem. A simple analytic transform exists when the kernel is described by the popular Gauss-Hermite parametrization (which includes the Gaussian as special case) for the line-of-sight velocity distribution. I describe how this idea was implemented in a significant upgrade to the publicly available pPXF software. The key advantage of the new approach is that it provides accurate velocities regardless of $\sigma$. This is important e.g. for spectroscopic surveys targeting galaxies with $\sigma\ll\sigma_{\rm inst}$, for galaxy redshift determinations, or for measuring line-of-sight velocities of individual stars. The proposed method could also be used to fix Gaussian convolution algorithms used in today's popular software packages.

866 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: In this article, a deep fully convolutional neural network is proposed to estimate pairs of 1D kernels for all pixels simultaneously, which allows for the incorporation of perceptual loss to train the network to produce visually pleasing frames.
Abstract: Standard video frame interpolation methods first estimate optical flow between input frames and then synthesize an intermediate frame guided by motion. Recent approaches merge these two steps into a single convolution process by convolving input frames with spatially adaptive kernels that account for motion and re-sampling simultaneously. These methods require large kernels to handle large motion, which limits the number of pixels whose kernels can be estimated at once due to the large memory demand. To address this problem, this paper formulates frame interpolation as local separable convolution over input frames using pairs of 1D kernels. Compared to regular 2D kernels, the 1D kernels require significantly fewer parameters to be estimated. Our method develops a deep fully convolutional neural network that takes two input frames and estimates pairs of 1D kernels for all pixels simultaneously. Since our method is able to estimate kernels and synthesizes the whole video frame at once, it allows for the incorporation of perceptual loss to train the neural network to produce visually pleasing frames. This deep neural network is trained end-to-end using widely available video data without any human annotation. Both qualitative and quantitative experiments show that our method provides a practical solution to high-quality video frame interpolation.

616 citations


Proceedings ArticleDOI
07 Aug 2017
TL;DR: K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score.
Abstract: This paper proposes K-NRM, a kernel based neural model for document ranking. Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score. The whole model is trained end-to-end. The ranking layer learns desired feature patterns from the pairwise ranking loss. The kernels transfer the feature patterns into soft-match targets at each similarity level and enforce them on the translation matrix. The word embeddings are tuned accordingly so that they can produce the desired soft matches. Experiments on a commercial search engine's query log demonstrate the improvements of K-NRM over prior feature-based and neural-based states-of-the-art, and explain the source of K-NRM's advantage: Its kernel-guided embedding encodes a similarity metric tailored for matching query words to document words, and provides effective multi-level soft matches.

572 citations


Proceedings ArticleDOI
21 Jul 2017
TL;DR: This work directly estimates the motion flow from the blurred image through a fully-convolutional deep neural network (FCN) and recovers the unblurred image from the estimated motion flow and is the first universal end-to-end mapping from the blur image to the dense motion flow.
Abstract: Removing pixel-wise heterogeneous motion blur is challenging due to the ill-posed nature of the problem. The predominant solution is to estimate the blur kernel by adding a prior, but extensive literature on the subject indicates the difficulty in identifying a prior which is suitably informative, and general. Rather than imposing a prior based on theory, we propose instead to learn one from the data. Learning a prior over the latent image would require modeling all possible image content. The critical observation underpinning our approach, however, is that learning the motion flow instead allows the model to focus on the cause of the blur, irrespective of the image content. This is a much easier learning task, but it also avoids the iterative process through which latent image priors are typically applied. Our approach directly estimates the motion flow from the blurred image through a fully-convolutional deep neural network (FCN) and recovers the unblurred image from the estimated motion flow. Our FCN is the first universal end-to-end mapping from the blurred image to the dense motion flow. To train the FCN, we simulate motion flows to generate synthetic blurred-image-motion-flow pairs thus avoiding the need for human labeling. Extensive experiments on challenging realistic blurred images demonstrate that the proposed method outperforms the state-of-the-art.

354 citations


Proceedings ArticleDOI
Yin Cui1, Feng Zhou2, Jiang Wang3, Xiao Liu2, Yuanqing Lin2, Serge Belongie1 
21 Jul 2017
TL;DR: This work demonstrates how to approximate kernels such as Gaussian RBF up to a given order using compact explicit feature maps in a parameter-free manner and proposes a general pooling framework that captures higher order interactions of features in the form of kernels.
Abstract: Convolutional Neural Networks (CNNs) with Bilinear Pooling, initially in their full form and later using compact representations, have yielded impressive performance gains on a wide range of visual tasks, including fine-grained visual categorization, visual question answering, face recognition, and description of texture and style. The key to their success lies in the spatially invariant modeling of pairwise (2nd order) feature interactions. In this work, we propose a general pooling framework that captures higher order interactions of features in the form of kernels. We demonstrate how to approximate kernels such as Gaussian RBF up to a given order using compact explicit feature maps in a parameter-free manner. Combined with CNNs, the composition of the kernel can be learned from data in an end-to-end fashion via error back-propagation. The proposed kernel pooling scheme is evaluated in terms of both kernel approximation error and visual recognition accuracy. Experimental evaluations demonstrate state-of-the-art performance on commonly used fine-grained recognition datasets.

344 citations


Proceedings ArticleDOI
27 Mar 2017
TL;DR: In this article, a StyleBank is proposed, which is composed of multiple convolution filter banks and each filter bank explicitly represents one style, for neural image style transfer, where the auto-encoder does not encode any style information thanks to the flexibility introduced by the explicit style representation.
Abstract: We propose StyleBank, which is composed of multiple convolution filter banks and each filter bank explicitly represents one style, for neural image style transfer. To transfer an image to a specific style, the corresponding filter bank is operated on top of the intermediate feature embedding produced by a single auto-encoder. The StyleBank and the auto-encoder are jointly learnt, where the learning is conducted in such a way that the auto-encoder does not encode any style information thanks to the flexibility introduced by the explicit filter bank representation. It also enables us to conduct incremental learning to add a new image style by learning a new filter bank while holding the auto-encoder fixed. The explicit style representation along with the flexible network design enables us to fuse styles at not only the image level, but also the region level. Our method is the first style transfer network that links back to traditional texton mapping methods, and hence provides new understanding on neural style transfer. Our method is easy to train, runs in real-time, and produces results that qualitatively better or at least comparable to existing methods.

307 citations


Posted Content
TL;DR: This paper presents a robust video frame interpolation method that considers pixel synthesis for the interpolated frame as local convolution over two input frames and employs a deep fully convolutional neural network to estimate a spatially-adaptive convolution kernel for each pixel.
Abstract: Video frame interpolation typically involves two steps: motion estimation and pixel synthesis. Such a two-step approach heavily depends on the quality of motion estimation. This paper presents a robust video frame interpolation method that combines these two steps into a single process. Specifically, our method considers pixel synthesis for the interpolated frame as local convolution over two input frames. The convolution kernel captures both the local motion between the input frames and the coefficients for pixel synthesis. Our method employs a deep fully convolutional neural network to estimate a spatially-adaptive convolution kernel for each pixel. This deep neural network can be directly trained end to end using widely available video data without any difficult-to-obtain ground-truth data like optical flow. Our experiments show that the formulation of video interpolation as a single convolution process allows our method to gracefully handle challenges like occlusion, blur, and abrupt brightness change and enables high-quality video frame interpolation.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: In this paper, a deep fully convolutional neural network is proposed to estimate a spatially-adaptive convolution kernel for each pixel, which captures both the local motion between the input frames and the coefficients for pixel synthesis.
Abstract: Video frame interpolation typically involves two steps: motion estimation and pixel synthesis. Such a two-step approach heavily depends on the quality of motion estimation. This paper presents a robust video frame interpolation method that combines these two steps into a single process. Specifically, our method considers pixel synthesis for the interpolated frame as local convolution over two input frames. The convolution kernel captures both the local motion between the input frames and the coefficients for pixel synthesis. Our method employs a deep fully convolutional neural network to estimate a spatially-adaptive convolution kernel for each pixel. This deep neural network can be directly trained end to end using widely available video data without any difficult-to-obtain ground-truth data like optical flow. Our experiments show that the formulation of video interpolation as a single convolution process allows our method to gracefully handle challenges like occlusion, blur, and abrupt brightness change and enables high-quality video frame interpolation.

Journal ArticleDOI
TL;DR: A novel, supervised learning approach that allows the filtering kernel to be more complex and general by leveraging a deep convolutional neural network (CNN) architecture and introduces a novel, kernel-prediction network which uses the CNN to estimate the local weighting kernels used to compute each denoised pixel from its neighbors.
Abstract: Regression-based algorithms have shown to be good at denoising Monte Carlo (MC) renderings by leveraging its inexpensive by-products (e.g., feature buffers). However, when using higher-order models to handle complex cases, these techniques often overfit to noise in the input. For this reason, supervised learning methods have been proposed that train on a large collection of reference examples, but they use explicit filters that limit their denoising ability. To address these problems, we propose a novel, supervised learning approach that allows the filtering kernel to be more complex and general by leveraging a deep convolutional neural network (CNN) architecture. In one embodiment of our framework, the CNN directly predicts the final denoised pixel value as a highly non-linear combination of the input features. In a second approach, we introduce a novel, kernel-prediction network which uses the CNN to estimate the local weighting kernels used to compute each denoised pixel from its neighbors. We train and evaluate our networks on production data and observe improvements over state-of-the-art MC denoisers, showing that our methods generalize well to a variety of scenes. We conclude by analyzing various components of our architecture and identify areas of further research in deep learning for MC denoising.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This paper presents a simple and modularized neural network architecture, named interleaved group convolutional neural networks (IGCNets), and discusses one representative advantage: Wider than a regular convolution with the number of parameters and the computation complexity preserved.
Abstract: In this paper, we present a simple and modularized neural network architecture, named interleaved group convolutional neural networks (IGCNets). The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution. The two group convolutions are complementary: (i) the convolution on each partition in primary group convolution is a spatial convolution, while on each partition in secondary group convolution, the convolution is a point-wise convolution; (ii) the channels in the same secondary partition come from different primary partitions. We discuss one representative advantage: Wider than a regular convolution with the number of parameters and the computation complexity preserved. We also show that regular convolutions, group convolution with summation fusion, and the Xception block are special cases of interleaved group convolutions. Empirical results over standard benchmarks, CIFAR-10, CIFAR-100, SVHN and ImageNet demonstrate that our networks are more efficient in using parameters and computation complexity with similar or higher accuracy.

Proceedings ArticleDOI
04 Dec 2017
TL;DR: The proposed method uses a Convolutional Neural Network with a custom pooling layer to optimize current best-performing algorithms feature extraction scheme and outperforms state of the art methods for both local and full image classification.
Abstract: This paper presents a deep-learning method for distinguishing computer generated graphics from real photographic images The proposed method uses a Convolutional Neural Network (CNN) with a custom pooling layer to optimize current best-performing algorithms feature extraction scheme Local estimates of class probabilities are computed and aggregated to predict the label of the whole picture We evaluate our work on recent photo-realistic computer graphics and show that it outperforms state of the art methods for both local and full image classification

Journal ArticleDOI
Fengbin Tu1, Shouyi Yin1, Peng Ouyang1, Shibin Tang1, Leibo Liu1, Shaojun Wei1 
TL;DR: A DCNN acceleration architecture called deep neural architecture (DNA), with reconfigurable computation patterns for different models, which outperforms the state-of-the-art designs by one to two orders.
Abstract: Deep convolutional neural networks (DCNNs) have been successfully used in many computer vision tasks. Previous works on DCNN acceleration usually use a fixed computation pattern for diverse DCNN models, leading to imbalance between power efficiency and performance. We solve this problem by designing a DCNN acceleration architecture called deep neural architecture (DNA), with reconfigurable computation patterns for different models. The computation pattern comprises a data reuse pattern and a convolution mapping method. For massive and different layer sizes, DNA reconfigures its data paths to support a hybrid data reuse pattern, which reduces total energy consumption by 5.9~8.4 times over conventional methods. For various convolution parameters, DNA reconfigures its computing resources to support a highly scalable convolution mapping method, which obtains 93% computing resource utilization on modern DCNNs. Finally, a layer-based scheduling framework is proposed to balance DNA’s power efficiency and performance for different DCNNs. DNA is implemented in the area of 16 mm2 at 65 nm. On the benchmarks, it achieves 194.4 GOPS at 200 MHz and consumes only 479 mW. The system-level power efficiency is 152.9 GOPS/W (considering DRAM access power), which outperforms the state-of-the-art designs by one to two orders.

Journal ArticleDOI
TL;DR: The proposed image prior is based on distinctive properties of text images, with which an efficient optimization algorithm is developed to generate reliable intermediate results for kernel estimation and an effective method to remove artifacts for better deblurred results is presented.
Abstract: We propose a simple yet effective $L_0$ -regularized prior based on intensity and gradient for text image deblurring. The proposed image prior is based on distinctive properties of text images, with which we develop an efficient optimization algorithm to generate reliable intermediate results for kernel estimation. The proposed algorithm does not require any heuristic edge selection methods, which are critical to the state-of-the-art edge-based deblurring methods. We discuss the relationship with other edge-based deblurring methods and present how to select salient edges more principally. For the final latent image restoration step, we present an effective method to remove artifacts for better deblurred results. We show the proposed algorithm can be extended to deblur natural images with complex scenes and low illumination, as well as non-uniform deblurring. Experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art image deblurring methods.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: A novel and general network structure towards accelerating the inference process of convolutional neural networks, which is more complicated in network structure yet with less inference complexity.
Abstract: In this paper, we present a novel and general network structure towards accelerating the inference process of convolutional neural networks, which is more complicated in network structure yet with less inference complexity. The core idea is to equip each original convolutional layer with another low-cost collaborative layer (LCCL), and the element-wise multiplication of the ReLU outputs of these two parallel layers produces the layer-wise output. The combined layer is potentially more discriminative than the original convolutional layer, and its inference is faster for two reasons: 1) the zero cells of the LCCL feature maps will remain zero after element-wise multiplication, and thus it is safe to skip the calculation of the corresponding high-cost convolution in the original convolutional layer, 2) LCCL is very fast if it is implemented as a 1*1 convolution or only a single filter shared by all channels. Extensive experiments on the CIFAR-10, CIFAR-100 and ILSCRC-2012 benchmarks show that our proposed network structure can accelerate the inference process by 32% on average with negligible performance drop.

Journal ArticleDOI
TL;DR: This paper analyzes and evaluates different MKL algorithms and their respective characteristics in different cases of HSI classification cases, and discusses the future direction and trends of research in this area.
Abstract: With the rapid development of spectral imaging techniques, classification of hyperspectral images (HSIs) has attracted great attention in various applications such as land survey and resource monitoring in the field of remote sensing. A key challenge in HSI classification is how to explore effective approaches to fully use the spatial–spectral information provided by the data cube. Multiple kernel learning (MKL) has been successfully applied to HSI classification due to its capacity to handle heterogeneous fusion of both spectral and spatial features. This approach can generate an adaptive kernel as an optimally weighted sum of a few fixed kernels to model a nonlinear data structure. In this way, the difficulty of kernel selection and the limitation of a fixed kernel can be alleviated. Various MKL algorithms have been developed in recent years, such as the general MKL, the subspace MKL, the nonlinear MKL, the sparse MKL, and the ensemble MKL. The goal of this paper is to provide a systematic review of MKL methods, which have been applied to HSI classification. We also analyze and evaluate different MKL algorithms and their respective characteristics in different cases of HSI classification cases. Finally, we discuss the future direction and trends of research in this area.

Proceedings ArticleDOI
18 Jun 2017
TL;DR: This paper proposes a fusion architecture that can fuse multiple layers naturally in CNNs, reusing the intermediate data, and designs an optimal algorithm to determine the fusion and algorithm strategy for each layer.
Abstract: Convolutional neural network (CNN) finds applications in a variety of computer vision applications ranging from object recognition and detection to scene understanding owing to its exceptional accuracy. There exist different algorithms for CNNs computation. In this paper, we explore conventional convolution algorithm with a faster algorithm using Winograd's minimal filtering theory for efficient FPGA implementation. Distinct from the conventional convolution algorithm, Winograd algorithm uses less computing resources but puts more pressure on the memory bandwidth. We first propose a fusion architecture that can fuse multiple layers naturally in CNNs, reusing the intermediate data. Based on this fusion architecture, we explore heterogeneous algorithms to maximize the throughput of a CNN. We design an optimal algorithm to determine the fusion and algorithm strategy for each layer. We also develop an automated toolchain to ease the mapping from Caffe model to FPGA bitstream using Vivado HLS. Experiments using widely used VGG and AlexNet demonstrate that our design achieves up to 1.99× performance speedup compared to the prior fusion-based FPGA accelerator for CNNs.

Posted Content
TL;DR: By finetuning this network, the proposed video convolutional network T3D outperforms the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, and finetuned on the target datasets, e.g. HMDB51/UCF101.
Abstract: The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on spatio-temporal approaches with fixed temporal convolution kernel depths. We introduce a new temporal layer that models variable temporal convolution kernel depths. We embed this new temporal layer in our proposed 3D CNN. We extend the DenseNet architecture - which normally is 2D - with 3D filters and pooling kernels. We name our proposed video convolutional network `Temporal 3D ConvNet'~(T3D) and its new temporal layer `Temporal Transition Layer'~(TTL). Our experiments show that T3D outperforms the current state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D ConvNets is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D ConvNets is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by finetuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and finetuned on the target datasets, e.g. HMDB51/UCF101. The T3D codes will be released

Proceedings ArticleDOI
01 Oct 2017
TL;DR: In this paper, the authors propose to factorize the convolutional layer to reduce its computation, which can effectively preserve the spatial information and maintain the accuracy with significantly less computation.
Abstract: In this paper, we propose to factorize the convolutional layer to reduce its computation. The 3D convolution operation in a convolutional layer can be considered as performing spatial convolution in each channel and linear projection across channels simultaneously. By unravelling them and arranging the spatial convolutions sequentially, the proposed layer is composed of a low-cost single intra-channel convolution and a linear channel projection. When combined with residual connection, it can effectively preserve the spatial information and maintain the accuracy with significantly less computation. We also introduce a topological subdivisioning to reduce the connection between the input and output channels. Our experiments demonstrate that the proposed layers outperform the standard convolutional layers on performance/complexity ratio. Our models achieve similar performance to VGG-16, ResNet-34, ResNet-50, ResNet-101 while requiring 42x,7.32x,4.38x,5.85x less computation respectively.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: A 1D convolution neural network (CNN) based method is proposed to classify ECG signals, which achieves a promising classification accuracy of 97.5%, significantly outperforming several typical ECG classification methods.
Abstract: Recently, with the obvious increasing number of cardiovascular disease, the automatic classification research of Electrocardiogram signals (ECG) has been playing a significantly important part in the clinical diagnosis of cardiovascular disease. In this paper, a 1D convolution neural network (CNN) based method is proposed to classify ECG signals. The proposed CNN model consists of five layers in addition to the input layer and the output layer, i.e., two convolution layers, two down sampling layers and one full connection layer, extracting the effective features from the original data and classifying the features automatically. This model realizes the classification of 5 typical kinds of arrhythmia signals, i.e., normal, left bundle branch block, right bundle branch block, atrial premature contraction and ventricular premature contraction. The experimental results on the public MIT-BIH arrhythmia database show that the proposed method achieves a promising classification accuracy of 97.5%, significantly outperforming several typical ECG classification methods.

Posted Content
TL;DR: In this paper, a kernelized ridge regression model was proposed for robust visual tracking, where the kernel value is defined as the weighted sum of similarity scores of all pairs of patches between two samples.
Abstract: In this paper, we analyze the spatial information of deep features, and propose two complementary regressions for robust visual tracking. First, we propose a kernelized ridge regression model wherein the kernel value is defined as the weighted sum of similarity scores of all pairs of patches between two samples. We show that this model can be formulated as a neural network and thus can be efficiently solved. Second, we propose a fully convolutional neural network with spatially regularized kernels, through which the filter kernel corresponding to each output channel is forced to focus on a specific region of the target. Distance transform pooling is further exploited to determine the effectiveness of each output channel of the convolution layer. The outputs from the kernelized ridge regression model and the fully convolutional neural network are combined to obtain the ultimate response. Experimental results on two benchmark datasets validate the effectiveness of the proposed method.

Posted Content
TL;DR: An initialization method for sub-pixel convolution known as convolution NN resize, which is free from checkerboard artifacts immediately after initialization and has more modelling power and converges to solutions with smaller test errors.
Abstract: The most prominent problem associated with the deconvolution layer is the presence of checkerboard artifacts in output images and dense labels. To combat this problem, smoothness constraints, post processing and different architecture designs have been proposed. Odena et al. highlight three sources of checkerboard artifacts: deconvolution overlap, random initialization and loss functions. In this note, we proposed an initialization method for sub-pixel convolution known as convolution NN resize. Compared to sub-pixel convolution initialized with schemes designed for standard convolution kernels, it is free from checkerboard artifacts immediately after initialization. Compared to resize convolution, at the same computational complexity, it has more modelling power and converges to solutions with smaller test errors.

Proceedings ArticleDOI
Yunho Jeon1, Junmo Kim1
25 Jul 2017
TL;DR: The proposed ACU is a generalization of convolution, it can define not only all conventional convolutions, but also convolutions with fractional pixel coordinates, which provides greater freedom to form CNN structures.
Abstract: In recent years, deep learning has achieved great success in many computer vision applications. Convolutional neural networks (CNNs) have lately emerged as a major approach to image classification. Most research on CNNs thus far has focused on developing architectures such as the Inception and residual networks. The convolution layer is the core of the CNN, but few studies have addressed the convolution unit itself. In this paper, we introduce a convolution unit called the active convolution unit (ACU). A new convolution has no fixed shape, because of which we can define any form of convolution. Its shape can be learned through backpropagation during training. Our proposed unit has a few advantages. First, the ACU is a generalization of convolution, it can define not only all conventional convolutions, but also convolutions with fractional pixel coordinates. We can freely change the shape of the convolution, which provides greater freedom to form CNN structures. Second, the shape of the convolution is learned while training and there is no need to tune it by hand. Third, the ACU can learn better than a conventional unit, where we obtained the improvement simply by changing the conventional convolution to an ACU. We tested our proposed method on plain and residual networks, and the results showed significant improvement using our method on various datasets and architectures in comparison with the baseline. Code is available at https://github.com/jyh2986/Active-Convolution.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: In this article, a fully convolutional network (FCN) is proposed to remove noise in the gradient domain and use the learned gradients to guide the image deconvolution step.
Abstract: In this paper, we propose a fully convolutional network for iterative non-blind deconvolution. We decompose the non-blind deconvolution problem into image denoising and image deconvolution. We train a FCNN to remove noise in the gradient domain and use the learned gradients to guide the image deconvolution step. In contrast to the existing deep neural network based methods, we iteratively deconvolve the blurred images in a multi-stage framework. The proposed method is able to learn an adaptive image prior, which keeps both local (details) and global (structures) information. Both quantitative and qualitative evaluations on the benchmark datasets demonstrate that the proposed method performs favorably against state-of-the-art algorithms in terms of quality and speed.

Proceedings ArticleDOI
01 Jan 2017
TL;DR: WireGuard is a secure network tunnel, operating at layer 3, implemented as a kernel virtual network interface for Linux, which aims to replace both IPsec for most use cases, as well as popular user space and/or TLS-based solutions like OpenVPN, while being more secure, more performant, and easier to use.
Abstract: WireGuard is a secure network tunnel, operating at layer 3, implemented as a kernel virtual network interface for Linux, which aims to replace both IPsec for most use cases, as well as popular user space and/or TLS-based solutions like OpenVPN, while being more secure, more performant, and easier to use. The virtual tunnel interface is based on a proposed fundamental principle of secure tunnels: an association between a peer public key and a tunnel source IP address. It uses a single round trip key exchange, based on NoiseIK, and handles all session creation transparently to the user using a novel timer state machine mechanism. Short pre-shared static keys—Curve25519 points—are used for mutual authentication in the style of OpenSSH. The protocol provides strong perfect forward secrecy in addition to a high degree of identity hiding. Transport speed is accomplished using ChaCha20Poly1305 authenticated-encryption for encapsulation of packets in UDP. An improved take on IP-binding cookies is used for mitigating denial of service attacks, improving greatly on IKEv2 and DTLS’s cookie mechanisms to add encryption and authentication. The overall design allows for allocating no resources in response to received packets, and from a systems perspective, there are multiple interesting Linux implementation techniques for queues and parallelism. Finally, WireGuard can be simply implemented for Linux in less than 4,000 lines of code, making it easily audited and verified. Permanent ID of this document: 4846ada1492f5d92198df154f48c3d54205657bc. Static link: wireguard.com/papers/wireguard .pdf. Date: June 1, 2020. This is draft revision e2da747. A version of this paper appears in Proceedings of the Network and Distributed System Security Symposium, NDSS 2017. Copyright © 2015–2020 Jason A. Donenfeld. All Rights Reserved.

Proceedings ArticleDOI
10 Jul 2017
TL;DR: In this article, the authors proposed a new approach to MCMK convolution that is based on General Matrix Multiplication (GEMM), but not on im2col, which eliminates the need for data replication on the input.
Abstract: Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally-intensive parts of CNNs are the convolutional layers, which convolve multi-channel images with multiple kernels. A common approach to implementing convolutional layers is to expand the image into a column matrix (im2col) and perform Multiple Channel Multiple Kernel (MCMK) convolution using an existing parallel General Matrix Multiplication (GEMM) library. This im2col conversion greatly increases the memory footprint of the input matrix and reduces data locality. In this paper we propose a new approach to MCMK convolution that is based on General Matrix Multiplication (GEMM), but not on im2col. Our algorithm eliminates the need for data replication on the input thereby enabling us to apply the convolution kernels on the input images directly. We have implemented several variants of our algorithm on a CPU processor and an embedded ARM processor. On the CPU, our algorithm is faster than im2col in most cases.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This paper proposes a novel deep filter based on Generative Adversarial Network architecture integrated with global skip connection and dense architecture which outperforms the state-of-the-art blind deblurring algorithms both quantitatively and qualitatively.
Abstract: Removing blur caused by camera shake in images has always been a challenging problem in computer vision literature due to its ill-posed nature. Motion blur caused due to the relative motion between the camera and the object in 3D space induces a spatially varying blurring effect over the entire image. In this paper, we propose a novel deep filter based on Generative Adversarial Network (GAN) architecture integrated with global skip connection and dense architecture in order to tackle this problem. Our model, while bypassing the process of blur kernel estimation, significantly reduces the test time which is necessary for practical applications. The experiments on the benchmark datasets prove the effectiveness of the proposed method which outperforms the state-of-the-art blind deblurring algorithms both quantitatively and qualitatively.