Showing papers on "Kernel (image processing) published in 2017"

PDF

Open Access

Proceedings Article•DOI•

Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring

[...]

Seungjun Nah¹, Tae Hyun Kim¹, Kyoung Mu Lee¹•Institutions (1)

01 Jul 2017

TL;DR: This work proposes a multi-scale convolutional neural network that restores sharp images in an end-to-end manner where blur is caused by various sources and presents a new large-scale dataset that provides pairs of realistic blurry image and the corresponding ground truth sharp image that are obtained by a high-speed camera.

...read moreread less

Abstract: Non-uniform blind deblurring for general dynamic scenes is a challenging computer vision problem as blurs arise not only from multiple object motions but also from camera shake, scene depth variation. To remove these complicated motion blurs, conventional energy optimization based methods rely on simple assumptions such that blur kernel is partially uniform or locally linear. Moreover, recent machine learning based methods also depend on synthetic blur datasets generated under these assumptions. This makes conventional deblurring methods fail to remove blurs where blur kernel is difficult to approximate or parameterize (e.g. object motion boundaries). In this work, we propose a multi-scale convolutional neural network that restores sharp images in an end-to-end manner where blur is caused by various sources. Together, we present multi-scale loss function that mimics conventional coarse-to-fine approaches. Furthermore, we propose a new large-scale dataset that provides pairs of realistic blurry image and the corresponding ground truth sharp image that are obtained by a high-speed camera. With the proposed model trained on this dataset, we demonstrate empirically that our method achieves the state-of-the-art performance in dynamic scene deblurring not only qualitatively, but also quantitatively.

...read moreread less

1,560 citations

Proceedings Article•DOI•

NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results

[...]

Radu Timofte¹, Eirikur Agustsson¹, Luc Van Gool¹, Ming-Hsuan Yang², Lei Zhang³, Bee Oh Lim⁴, Sanghyun Son⁴, Heewon Kim⁴, Seungjun Nah⁴, Kyoung Mu Lee⁴, Xintao Wang⁵, Yapeng Tian⁶, Ke Yu⁵, Yulun Zhang⁶, Shixiang Wu⁶, Chao Dong, Liang Lin, Yu Qiao⁶, Chen Change Loy⁵, Woong Bae⁷, Jaejun Yoo⁷, Yoseob Han⁷, Jong Chul Ye⁷, Jae-Seok Choi⁷, Munchurl Kim⁷, Yuchen Fan⁸, Jiahui Yu⁸, Wei Han⁸, Ding Liu⁸, Haichao Yu⁸, Zhangyang Wang⁸, Honghui Shi⁸, Xinchao Wang⁸, Thomas S. Huang⁸, Yunjin Chen, Kai Zhang⁹, Wangmeng Zuo⁹, Zhimin Tang¹⁰, Linkai Luo¹⁰, Shaohui Li¹⁰, Min Fu¹⁰, Lei Cao¹⁰, Wen Heng¹¹, Giang Bui¹², Truc Le¹², Ye Duan¹², Dacheng Tao¹³, Ruxin Wang, Xu Lin, Jianxin Pang, Xu Jinchang¹⁴, Yu Zhao¹⁴, Xiangyu Xu², Jinshan Pan², Deqing Sun², Yujin Zhang², Xibin Song¹⁵, Yuchao Dai¹⁶, Xueying Qin¹⁵, Xuan-Phung Huynh¹⁷, Tiantong Guo¹⁸, Hojjat Seyed Mousavi¹⁸, Tiep H. Vu¹⁸, Vishal Monga¹⁸, Cristóvão Cruz¹⁹, Karen Egiazarian¹⁹, Vladimir Katkovnik¹⁹, Rakesh Mehta¹⁹, Arnav Kumar Jain²⁰, Abhinav Agarwalla²⁰, Ch V. Sai Praveen²⁰, Ruofan Zhou²¹, Hongdiao Wen²², Che Zhu²², Zhiqiang Xia²², Zhengtao Wang²², Qi Guo²² - Show less +73 more•Institutions (22)

21 Jul 2017

TL;DR: This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus on proposed solutions and results and gauges the state-of-the-art in single imagesuper-resolution.

...read moreread less

Abstract: This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus on proposed solutions and results. A new DIVerse 2K resolution image dataset (DIV2K) was employed. The challenge had 6 competitions divided into 2 tracks with 3 magnification factors each. Track 1 employed the standard bicubic downscaling setup, while Track 2 had unknown downscaling operators (blur kernel and decimation) but learnable through low and high res train images. Each competition had ∽100 registered participants and 20 teams competed in the final testing phase. They gauge the state-of-the-art in single image super-resolution.

...read moreread less

1,243 citations

Proceedings Article•DOI•

Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network

[...]

Chao Peng¹, Xiangyu Zhang, Gang Yu, Guiming Luo¹, Jian Sun - Show less +1 more•Institutions (1)

Tsinghua University¹

21 Jul 2017

TL;DR: This work proposes a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation and suggests a residual-based boundary refinement to further refine the object boundaries.

...read moreread less

Abstract: One of recent trends [31, 32, 14] in network architecture design is stacking small filters (e.g., 1x1 or 3x3) in the entire network because the stacked small filters is more efficient than a large kernel, given the same computational complexity. However, in the field of semantic segmentation, where we need to perform dense per-pixel prediction, we find that the large kernel (and effective receptive field) plays an important role when we have to perform the classification and localization tasks simultaneously. Following our design principle, we propose a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. We also suggest a residual-based boundary refinement to further refine the object boundaries. Our approach achieves state-of-art performance on two public benchmarks and significantly outperforms previous results, 82.2% (vs 80.2%) on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.

...read moreread less

1,047 citations

Posted Content•

Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network

[...]

Chao Peng¹, Xiangyu Zhang, Gang Yu, Guiming Luo¹, Jian Sun - Show less +1 more•Institutions (1)

Tsinghua University¹

08 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a Global Convolutional Network (GCN) is proposed to address both the classification and localization issues for the semantic segmentation, which achieves state-of-the-art performance on two public benchmarks.

...read moreread less

Abstract: One of recent trends [30, 31, 14] in network architec- ture design is stacking small filters (e.g., 1x1 or 3x3) in the entire network because the stacked small filters is more ef- ficient than a large kernel, given the same computational complexity. However, in the field of semantic segmenta- tion, where we need to perform dense per-pixel prediction, we find that the large kernel (and effective receptive field) plays an important role when we have to perform the clas- sification and localization tasks simultaneously. Following our design principle, we propose a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. We also suggest a residual-based boundary refinement to further refine the ob- ject boundaries. Our approach achieves state-of-art perfor- mance on two public benchmarks and significantly outper- forms previous results, 82.2% (vs 80.2%) on PASCAL VOC 2012 dataset and 76.9% (vs 71.8%) on Cityscapes dataset.

...read moreread less

935 citations

Journal Article•DOI•

Improving the full spectrum fitting method: accurate convolution with Gauss-Hermite functions

[...]

Michele Cappellari¹•Institutions (1)

University of Oxford¹

01 Apr 2017-Monthly Notices of the Royal Astronomical Society

TL;DR: In this article, the authors present an updated summary of the penalized pixel-fitting (pPXF) method, which is used to extract the stellar and gas kinematics, as well as the stellar population of galaxies via full spectrum fitting.

...read moreread less

Abstract: I start by providing an updated summary of the penalized pixel-fitting (pPXF) method, which is used to extract the stellar and gas kinematics, as well as the stellar population of galaxies, via full spectrum fitting. I then focus on the problem of extracting the kinematic when the velocity dispersion $\sigma$ is smaller than the velocity sampling $\Delta V$, which is generally, by design, close to the instrumental dispersion $\sigma_{\rm inst}$. The standard approach consists of convolving templates with a discretized kernel, while fitting for its parameters. This is obviously very inaccurate when $\sigma<\Delta V/2$, due to undersampling. Oversampling can prevent this, but it has drawbacks. Here I present a more accurate and efficient alternative. It avoids the evaluation of the under-sampled kernel, and instead directly computes its well-sampled analytic Fourier transform, for use with the convolution theorem. A simple analytic transform exists when the kernel is described by the popular Gauss-Hermite parametrization (which includes the Gaussian as special case) for the line-of-sight velocity distribution. I describe how this idea was implemented in a significant upgrade to the publicly available pPXF software. The key advantage of the new approach is that it provides accurate velocities regardless of $\sigma$. This is important e.g. for spectroscopic surveys targeting galaxies with $\sigma\ll\sigma_{\rm inst}$, for galaxy redshift determinations, or for measuring line-of-sight velocities of individual stars. The proposed method could also be used to fix Gaussian convolution algorithms used in today's popular software packages.

...read moreread less

866 citations

Proceedings Article•DOI•

Video Frame Interpolation via Adaptive Separable Convolution

[...]

Simon Niklaus¹, Long Mai¹, Feng Liu¹•Institutions (1)

Portland State University¹

01 Oct 2017

TL;DR: In this article, a deep fully convolutional neural network is proposed to estimate pairs of 1D kernels for all pixels simultaneously, which allows for the incorporation of perceptual loss to train the network to produce visually pleasing frames.

...read moreread less

Abstract: Standard video frame interpolation methods first estimate optical flow between input frames and then synthesize an intermediate frame guided by motion. Recent approaches merge these two steps into a single convolution process by convolving input frames with spatially adaptive kernels that account for motion and re-sampling simultaneously. These methods require large kernels to handle large motion, which limits the number of pixels whose kernels can be estimated at once due to the large memory demand. To address this problem, this paper formulates frame interpolation as local separable convolution over input frames using pairs of 1D kernels. Compared to regular 2D kernels, the 1D kernels require significantly fewer parameters to be estimated. Our method develops a deep fully convolutional neural network that takes two input frames and estimates pairs of 1D kernels for all pixels simultaneously. Since our method is able to estimate kernels and synthesizes the whole video frame at once, it allows for the incorporation of perceptual loss to train the neural network to produce visually pleasing frames. This deep neural network is trained end-to-end using widely available video data without any human annotation. Both qualitative and quantitative experiments show that our method provides a practical solution to high-quality video frame interpolation.

...read moreread less

616 citations

Proceedings Article•DOI•

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

[...]

Chenyan Xiong¹, Zhuyun Dai¹, Jamie Callan¹, Zhiyuan Liu², Russell Power³ - Show less +1 more•Institutions (3)

Carnegie Mellon University¹, Tsinghua University², Allen Institute for Artificial Intelligence³

07 Aug 2017

TL;DR: K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score.

...read moreread less

Abstract: This paper proposes K-NRM, a kernel based neural model for document ranking. Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score. The whole model is trained end-to-end. The ranking layer learns desired feature patterns from the pairwise ranking loss. The kernels transfer the feature patterns into soft-match targets at each similarity level and enforce them on the translation matrix. The word embeddings are tuned accordingly so that they can produce the desired soft matches. Experiments on a commercial search engine's query log demonstrate the improvements of K-NRM over prior feature-based and neural-based states-of-the-art, and explain the source of K-NRM's advantage: Its kernel-guided embedding encodes a similarity metric tailored for matching query words to document words, and provides effective multi-level soft matches.

...read moreread less

572 citations

Proceedings Article•DOI•

From Motion Blur to Motion Flow: A Deep Learning Solution for Removing Heterogeneous Motion Blur

[...]

Dong Gong, Jie Yang¹, Lingqiao Liu¹, Yanning Zhang, Ian Reid¹, Chunhua Shen¹, Anton van den Hengel¹, Qinfeng Shi¹ - Show less +4 more•Institutions (1)

University of Adelaide¹

21 Jul 2017

TL;DR: This work directly estimates the motion flow from the blurred image through a fully-convolutional deep neural network (FCN) and recovers the unblurred image from the estimated motion flow and is the first universal end-to-end mapping from the blur image to the dense motion flow.

...read moreread less

Abstract: Removing pixel-wise heterogeneous motion blur is challenging due to the ill-posed nature of the problem. The predominant solution is to estimate the blur kernel by adding a prior, but extensive literature on the subject indicates the difficulty in identifying a prior which is suitably informative, and general. Rather than imposing a prior based on theory, we propose instead to learn one from the data. Learning a prior over the latent image would require modeling all possible image content. The critical observation underpinning our approach, however, is that learning the motion flow instead allows the model to focus on the cause of the blur, irrespective of the image content. This is a much easier learning task, but it also avoids the iterative process through which latent image priors are typically applied. Our approach directly estimates the motion flow from the blurred image through a fully-convolutional deep neural network (FCN) and recovers the unblurred image from the estimated motion flow. Our FCN is the first universal end-to-end mapping from the blurred image to the dense motion flow. To train the FCN, we simulate motion flows to generate synthetic blurred-image-motion-flow pairs thus avoiding the need for human labeling. Extensive experiments on challenging realistic blurred images demonstrate that the proposed method outperforms the state-of-the-art.

...read moreread less

354 citations

Proceedings Article•DOI•

Kernel Pooling for Convolutional Neural Networks

[...]

Yin Cui¹, Feng Zhou², Jiang Wang³, Xiao Liu², Yuanqing Lin², Serge Belongie¹ - Show less +2 more•Institutions (3)

Cornell University¹, Baidu², Google³

21 Jul 2017

TL;DR: This work demonstrates how to approximate kernels such as Gaussian RBF up to a given order using compact explicit feature maps in a parameter-free manner and proposes a general pooling framework that captures higher order interactions of features in the form of kernels.

...read moreread less

Abstract: Convolutional Neural Networks (CNNs) with Bilinear Pooling, initially in their full form and later using compact representations, have yielded impressive performance gains on a wide range of visual tasks, including fine-grained visual categorization, visual question answering, face recognition, and description of texture and style. The key to their success lies in the spatially invariant modeling of pairwise (2nd order) feature interactions. In this work, we propose a general pooling framework that captures higher order interactions of features in the form of kernels. We demonstrate how to approximate kernels such as Gaussian RBF up to a given order using compact explicit feature maps in a parameter-free manner. Combined with CNNs, the composition of the kernel can be learned from data in an end-to-end fashion via error back-propagation. The proposed kernel pooling scheme is evaluated in terms of both kernel approximation error and visual recognition accuracy. Experimental evaluations demonstrate state-of-the-art performance on commonly used fine-grained recognition datasets.

...read moreread less

344 citations

Proceedings Article•DOI•

StyleBank: An Explicit Representation for Neural Image Style Transfer

[...]

Dongdong Chen¹, Lu Yuan², Jing Liao², Nenghai Yu¹, Gang Hua² - Show less +1 more•Institutions (2)

University of Science and Technology of China¹, Microsoft²

27 Mar 2017

TL;DR: In this article, a StyleBank is proposed, which is composed of multiple convolution filter banks and each filter bank explicitly represents one style, for neural image style transfer, where the auto-encoder does not encode any style information thanks to the flexibility introduced by the explicit style representation.

...read moreread less

Abstract: We propose StyleBank, which is composed of multiple convolution filter banks and each filter bank explicitly represents one style, for neural image style transfer. To transfer an image to a specific style, the corresponding filter bank is operated on top of the intermediate feature embedding produced by a single auto-encoder. The StyleBank and the auto-encoder are jointly learnt, where the learning is conducted in such a way that the auto-encoder does not encode any style information thanks to the flexibility introduced by the explicit filter bank representation. It also enables us to conduct incremental learning to add a new image style by learning a new filter bank while holding the auto-encoder fixed. The explicit style representation along with the flexible network design enables us to fuse styles at not only the image level, but also the region level. Our method is the first style transfer network that links back to traditional texton mapping methods, and hence provides new understanding on neural style transfer. Our method is easy to train, runs in real-time, and produces results that qualitatively better or at least comparable to existing methods.

...read moreread less

307 citations

Posted Content•

Video Frame Interpolation via Adaptive Convolution

[...]

Simon Niklaus¹, Long Mai¹, Feng Liu¹•Institutions (1)

Portland State University¹

22 Mar 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper presents a robust video frame interpolation method that considers pixel synthesis for the interpolated frame as local convolution over two input frames and employs a deep fully convolutional neural network to estimate a spatially-adaptive convolution kernel for each pixel.

...read moreread less

Abstract: Video frame interpolation typically involves two steps: motion estimation and pixel synthesis. Such a two-step approach heavily depends on the quality of motion estimation. This paper presents a robust video frame interpolation method that combines these two steps into a single process. Specifically, our method considers pixel synthesis for the interpolated frame as local convolution over two input frames. The convolution kernel captures both the local motion between the input frames and the coefficients for pixel synthesis. Our method employs a deep fully convolutional neural network to estimate a spatially-adaptive convolution kernel for each pixel. This deep neural network can be directly trained end to end using widely available video data without any difficult-to-obtain ground-truth data like optical flow. Our experiments show that the formulation of video interpolation as a single convolution process allows our method to gracefully handle challenges like occlusion, blur, and abrupt brightness change and enables high-quality video frame interpolation.

...read moreread less

Proceedings Article•DOI•

Video Frame Interpolation via Adaptive Convolution

[...]

Simon Niklaus¹, Long Mai¹, Feng Liu¹•Institutions (1)

Portland State University¹

01 Jul 2017

TL;DR: In this paper, a deep fully convolutional neural network is proposed to estimate a spatially-adaptive convolution kernel for each pixel, which captures both the local motion between the input frames and the coefficients for pixel synthesis.

...read moreread less

Journal Article•DOI•

Kernel-predicting convolutional networks for denoising Monte Carlo renderings

[...]

Steve Bako¹, Thijs Vogels², Brian McWilliams², Mark Meyer, Jan Novák², Alex Harvill, Pradeep Sen¹, Tony DeRose, Fabrice Rousselle² - Show less +5 more•Institutions (2)

University of California¹, Disney Research²

20 Jul 2017-ACM Transactions on Graphics

TL;DR: A novel, supervised learning approach that allows the filtering kernel to be more complex and general by leveraging a deep convolutional neural network (CNN) architecture and introduces a novel, kernel-prediction network which uses the CNN to estimate the local weighting kernels used to compute each denoised pixel from its neighbors.

...read moreread less

Abstract: Regression-based algorithms have shown to be good at denoising Monte Carlo (MC) renderings by leveraging its inexpensive by-products (e.g., feature buffers). However, when using higher-order models to handle complex cases, these techniques often overfit to noise in the input. For this reason, supervised learning methods have been proposed that train on a large collection of reference examples, but they use explicit filters that limit their denoising ability. To address these problems, we propose a novel, supervised learning approach that allows the filtering kernel to be more complex and general by leveraging a deep convolutional neural network (CNN) architecture. In one embodiment of our framework, the CNN directly predicts the final denoised pixel value as a highly non-linear combination of the input features. In a second approach, we introduce a novel, kernel-prediction network which uses the CNN to estimate the local weighting kernels used to compute each denoised pixel from its neighbors. We train and evaluate our networks on production data and observe improvements over state-of-the-art MC denoisers, showing that our methods generalize well to a variety of scenes. We conclude by analyzing various components of our architecture and identify areas of further research in deep learning for MC denoising.

...read moreread less

Proceedings Article•DOI•

Interleaved Group Convolutions

[...]

Ting Zhang¹, Guo-Jun Qi², Bin Xiao³, Jingdong Wang³•Institutions (3)

University of Science and Technology of China¹, University of Central Florida², Microsoft³

01 Oct 2017

TL;DR: This paper presents a simple and modularized neural network architecture, named interleaved group convolutional neural networks (IGCNets), and discusses one representative advantage: Wider than a regular convolution with the number of parameters and the computation complexity preserved.

...read moreread less

Abstract: In this paper, we present a simple and modularized neural network architecture, named interleaved group convolutional neural networks (IGCNets). The main point lies in a novel building block, a pair of two successive interleaved group convolutions: primary group convolution and secondary group convolution. The two group convolutions are complementary: (i) the convolution on each partition in primary group convolution is a spatial convolution, while on each partition in secondary group convolution, the convolution is a point-wise convolution; (ii) the channels in the same secondary partition come from different primary partitions. We discuss one representative advantage: Wider than a regular convolution with the number of parameters and the computation complexity preserved. We also show that regular convolutions, group convolution with summation fusion, and the Xception block are special cases of interleaved group convolutions. Empirical results over standard benchmarks, CIFAR-10, CIFAR-100, SVHN and ImageNet demonstrate that our networks are more efficient in using parameters and computation complexity with similar or higher accuracy.

...read moreread less

Proceedings Article•DOI•

Distinguishing computer graphics from natural images using convolution neural networks

[...]

Nicolas Rahmouni¹, Vincent Nozick, Junichi Yamagishi, Isao Echizen•Institutions (1)

École Polytechnique¹

04 Dec 2017

TL;DR: The proposed method uses a Convolutional Neural Network with a custom pooling layer to optimize current best-performing algorithms feature extraction scheme and outperforms state of the art methods for both local and full image classification.

...read moreread less

Abstract: This paper presents a deep-learning method for distinguishing computer generated graphics from real photographic images The proposed method uses a Convolutional Neural Network (CNN) with a custom pooling layer to optimize current best-performing algorithms feature extraction scheme Local estimates of class probabilities are computed and aggregated to predict the label of the whole picture We evaluate our work on recent photo-realistic computer graphics and show that it outperforms state of the art methods for both local and full image classification

...read moreread less

Journal Article•DOI•

Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns

[...]

Fengbin Tu¹, Shouyi Yin¹, Peng Ouyang¹, Shibin Tang¹, Leibo Liu¹, Shaojun Wei¹ - Show less +2 more•Institutions (1)

Tsinghua University¹

12 Apr 2017-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A DCNN acceleration architecture called deep neural architecture (DNA), with reconfigurable computation patterns for different models, which outperforms the state-of-the-art designs by one to two orders.

...read moreread less

Abstract: Deep convolutional neural networks (DCNNs) have been successfully used in many computer vision tasks. Previous works on DCNN acceleration usually use a fixed computation pattern for diverse DCNN models, leading to imbalance between power efficiency and performance. We solve this problem by designing a DCNN acceleration architecture called deep neural architecture (DNA), with reconfigurable computation patterns for different models. The computation pattern comprises a data reuse pattern and a convolution mapping method. For massive and different layer sizes, DNA reconfigures its data paths to support a hybrid data reuse pattern, which reduces total energy consumption by 5.9~8.4 times over conventional methods. For various convolution parameters, DNA reconfigures its computing resources to support a highly scalable convolution mapping method, which obtains 93% computing resource utilization on modern DCNNs. Finally, a layer-based scheduling framework is proposed to balance DNA’s power efficiency and performance for different DCNNs. DNA is implemented in the area of 16 mm2 at 65 nm. On the benchmarks, it achieves 194.4 GOPS at 200 MHz and consumes only 479 mW. The system-level power efficiency is 152.9 GOPS/W (considering DRAM access power), which outperforms the state-of-the-art designs by one to two orders.

...read moreread less

Journal Article•DOI•

$L_0$ -Regularized Intensity and Gradient Prior for Deblurring Text Images and Beyond

[...]

Jinshan Pan¹, Zhe Hu², Zhixun Su¹, Ming-Hsuan Yang²•Institutions (2)

Dalian University of Technology¹, University of California, Merced²

01 Feb 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The proposed image prior is based on distinctive properties of text images, with which an efficient optimization algorithm is developed to generate reliable intermediate results for kernel estimation and an effective method to remove artifacts for better deblurred results is presented.

...read moreread less

Abstract: We propose a simple yet effective $L_0$ -regularized prior based on intensity and gradient for text image deblurring. The proposed image prior is based on distinctive properties of text images, with which we develop an efficient optimization algorithm to generate reliable intermediate results for kernel estimation. The proposed algorithm does not require any heuristic edge selection methods, which are critical to the state-of-the-art edge-based deblurring methods. We discuss the relationship with other edge-based deblurring methods and present how to select salient edges more principally. For the final latent image restoration step, we present an effective method to remove artifacts for better deblurred results. We show the proposed algorithm can be extended to deblur natural images with complex scenes and low illumination, as well as non-uniform deblurring. Experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art image deblurring methods.

...read moreread less

Proceedings Article•DOI•

More is Less: A More Complicated Network with Less Inference Complexity

[...]

Xuanyi Dong¹, Junshi Huang, Yi Yang¹, Shuicheng Yan²•Institutions (2)

University of Technology, Sydney¹, National University of Singapore²

01 Jul 2017

TL;DR: A novel and general network structure towards accelerating the inference process of convolutional neural networks, which is more complicated in network structure yet with less inference complexity.

...read moreread less

Abstract: In this paper, we present a novel and general network structure towards accelerating the inference process of convolutional neural networks, which is more complicated in network structure yet with less inference complexity. The core idea is to equip each original convolutional layer with another low-cost collaborative layer (LCCL), and the element-wise multiplication of the ReLU outputs of these two parallel layers produces the layer-wise output. The combined layer is potentially more discriminative than the original convolutional layer, and its inference is faster for two reasons: 1) the zero cells of the LCCL feature maps will remain zero after element-wise multiplication, and thus it is safe to skip the calculation of the corresponding high-cost convolution in the original convolutional layer, 2) LCCL is very fast if it is implemented as a 1*1 convolution or only a single filter shared by all channels. Extensive experiments on the CIFAR-10, CIFAR-100 and ILSCRC-2012 benchmarks show that our proposed network structure can accelerate the inference process by 32% on average with negligible performance drop.

...read moreread less

Journal Article•DOI•

Multiple Kernel Learning for Hyperspectral Image Classification: A Review

[...]

Yanfeng Gu¹, Jocelyn Chanussot², Xiuping Jia³, Jon Atli Benediktsson⁴•Institutions (4)

Harbin Institute of Technology¹, Grenoble Institute of Technology², University of New South Wales³, University of Iceland⁴

07 Aug 2017-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: This paper analyzes and evaluates different MKL algorithms and their respective characteristics in different cases of HSI classification cases, and discusses the future direction and trends of research in this area.

...read moreread less

Abstract: With the rapid development of spectral imaging techniques, classification of hyperspectral images (HSIs) has attracted great attention in various applications such as land survey and resource monitoring in the field of remote sensing. A key challenge in HSI classification is how to explore effective approaches to fully use the spatial–spectral information provided by the data cube. Multiple kernel learning (MKL) has been successfully applied to HSI classification due to its capacity to handle heterogeneous fusion of both spectral and spatial features. This approach can generate an adaptive kernel as an optimally weighted sum of a few fixed kernels to model a nonlinear data structure. In this way, the difficulty of kernel selection and the limitation of a fixed kernel can be alleviated. Various MKL algorithms have been developed in recent years, such as the general MKL, the subspace MKL, the nonlinear MKL, the sparse MKL, and the ensemble MKL. The goal of this paper is to provide a systematic review of MKL methods, which have been applied to HSI classification. We also analyze and evaluate different MKL algorithms and their respective characteristics in different cases of HSI classification cases. Finally, we discuss the future direction and trends of research in this area.

...read moreread less

Proceedings Article•DOI•

Exploring Heterogeneous Algorithms for Accelerating Deep Convolutional Neural Networks on FPGAs

[...]

Qingcheng Xiao¹, Yun Liang¹, Liqiang Lu¹, Shengen Yan², Yu-Wing Tai³ - Show less +1 more•Institutions (3)

Peking University¹, The Chinese University of Hong Kong², SenseTime³

18 Jun 2017

TL;DR: This paper proposes a fusion architecture that can fuse multiple layers naturally in CNNs, reusing the intermediate data, and designs an optimal algorithm to determine the fusion and algorithm strategy for each layer.

...read moreread less

Abstract: Convolutional neural network (CNN) finds applications in a variety of computer vision applications ranging from object recognition and detection to scene understanding owing to its exceptional accuracy. There exist different algorithms for CNNs computation. In this paper, we explore conventional convolution algorithm with a faster algorithm using Winograd's minimal filtering theory for efficient FPGA implementation. Distinct from the conventional convolution algorithm, Winograd algorithm uses less computing resources but puts more pressure on the memory bandwidth. We first propose a fusion architecture that can fuse multiple layers naturally in CNNs, reusing the intermediate data. Based on this fusion architecture, we explore heterogeneous algorithms to maximize the throughput of a CNN. We design an optimal algorithm to determine the fusion and algorithm strategy for each layer. We also develop an automated toolchain to ease the mapping from Caffe model to FPGA bitstream using Vivado HLS. Experiments using widely used VGG and AlexNet demonstrate that our design achieves up to 1.99× performance speedup compared to the prior fusion-based FPGA accelerator for CNNs.

...read moreread less

Posted Content•

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification.

[...]

Ali Diba, Mohsen Fayyaz, Vivek Sharma, Amir Hossein Karami, Mohammad Mahdi Arzani, Rahman Yousefzadeh, Luc Van Gool - Show less +3 more

22 Nov 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: By finetuning this network, the proposed video convolutional network T3D outperforms the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, and finetuned on the target datasets, e.g. HMDB51/UCF101.

...read moreread less

Abstract: The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on spatio-temporal approaches with fixed temporal convolution kernel depths. We introduce a new temporal layer that models variable temporal convolution kernel depths. We embed this new temporal layer in our proposed 3D CNN. We extend the DenseNet architecture - which normally is 2D - with 3D filters and pooling kernels. We name our proposed video convolutional network `Temporal 3D ConvNet'~(T3D) and its new temporal layer `Temporal Transition Layer'~(TTL). Our experiments show that T3D outperforms the current state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D ConvNets is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D ConvNets is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by finetuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and finetuned on the target datasets, e.g. HMDB51/UCF101. The T3D codes will be released

...read moreread less

Proceedings Article•DOI•

Factorized Convolutional Neural Networks

[...]

Min Wang¹, Baoyuan Liu², Hassan Foroosh¹•Institutions (2)

University of Central Florida¹, Amazon.com²

01 Oct 2017

TL;DR: In this paper, the authors propose to factorize the convolutional layer to reduce its computation, which can effectively preserve the spatial information and maintain the accuracy with significantly less computation.

...read moreread less

Abstract: In this paper, we propose to factorize the convolutional layer to reduce its computation. The 3D convolution operation in a convolutional layer can be considered as performing spatial convolution in each channel and linear projection across channels simultaneously. By unravelling them and arranging the spatial convolutions sequentially, the proposed layer is composed of a low-cost single intra-channel convolution and a linear channel projection. When combined with residual connection, it can effectively preserve the spatial information and maintain the accuracy with significantly less computation. We also introduce a topological subdivisioning to reduce the connection between the input and output channels. Our experiments demonstrate that the proposed layers outperform the standard convolutional layers on performance/complexity ratio. Our models achieve similar performance to VGG-16, ResNet-34, ResNet-50, ResNet-101 while requiring 42x,7.32x,4.38x,5.85x less computation respectively.

...read moreread less

Proceedings Article•DOI•

Classification of ECG signals based on 1D convolution neural network

[...]

Li Dan¹, Jianxin Zhang¹, Qiang Zhang¹, Xiaopeng Wei¹•Institutions (1)

Dalian University of Technology¹

01 Oct 2017

TL;DR: A 1D convolution neural network (CNN) based method is proposed to classify ECG signals, which achieves a promising classification accuracy of 97.5%, significantly outperforming several typical ECG classification methods.

...read moreread less

Abstract: Recently, with the obvious increasing number of cardiovascular disease, the automatic classification research of Electrocardiogram signals (ECG) has been playing a significantly important part in the clinical diagnosis of cardiovascular disease. In this paper, a 1D convolution neural network (CNN) based method is proposed to classify ECG signals. The proposed CNN model consists of five layers in addition to the input layer and the output layer, i.e., two convolution layers, two down sampling layers and one full connection layer, extracting the effective features from the original data and classifying the features automatically. This model realizes the classification of 5 typical kinds of arrhythmia signals, i.e., normal, left bundle branch block, right bundle branch block, atrial premature contraction and ventricular premature contraction. The experimental results on the public MIT-BIH arrhythmia database show that the proposed method achieves a promising classification accuracy of 97.5%, significantly outperforming several typical ECG classification methods.

...read moreread less

Posted Content•

Learning Spatial-Aware Regressions for Visual Tracking

[...]

Chong Sun¹, Dong Wang¹, Huchuan Lu¹, Ming-Hsuan Yang²•Institutions (2)

Dalian University of Technology¹, University of California, Merced²

22 Jun 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a kernelized ridge regression model was proposed for robust visual tracking, where the kernel value is defined as the weighted sum of similarity scores of all pairs of patches between two samples.

...read moreread less

Abstract: In this paper, we analyze the spatial information of deep features, and propose two complementary regressions for robust visual tracking. First, we propose a kernelized ridge regression model wherein the kernel value is defined as the weighted sum of similarity scores of all pairs of patches between two samples. We show that this model can be formulated as a neural network and thus can be efficiently solved. Second, we propose a fully convolutional neural network with spatially regularized kernels, through which the filter kernel corresponding to each output channel is forced to focus on a specific region of the target. Distance transform pooling is further exploited to determine the effectiveness of each output channel of the convolution layer. The outputs from the kernelized ridge regression model and the fully convolutional neural network are combined to obtain the ultimate response. Experimental results on two benchmark datasets validate the effectiveness of the proposed method.

...read moreread less

Posted Content•

Checkerboard artifact free sub-pixel convolution: A note on sub-pixel convolution, resize convolution and convolution resize

[...]

Andrew Peter Aitken, Christian Ledig, Lucas Theis, Jose Caballero, Zehan Wang, Wenzhe Shi - Show less +2 more

10 Jul 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: An initialization method for sub-pixel convolution known as convolution NN resize, which is free from checkerboard artifacts immediately after initialization and has more modelling power and converges to solutions with smaller test errors.

...read moreread less

Abstract: The most prominent problem associated with the deconvolution layer is the presence of checkerboard artifacts in output images and dense labels. To combat this problem, smoothness constraints, post processing and different architecture designs have been proposed. Odena et al. highlight three sources of checkerboard artifacts: deconvolution overlap, random initialization and loss functions. In this note, we proposed an initialization method for sub-pixel convolution known as convolution NN resize. Compared to sub-pixel convolution initialized with schemes designed for standard convolution kernels, it is free from checkerboard artifacts immediately after initialization. Compared to resize convolution, at the same computational complexity, it has more modelling power and converges to solutions with smaller test errors.

...read moreread less

Proceedings Article•DOI•

Active Convolution: Learning the Shape of Convolution for Image Classification

[...]

Yunho Jeon¹, Junmo Kim¹•Institutions (1)

KAIST¹

25 Jul 2017

TL;DR: The proposed ACU is a generalization of convolution, it can define not only all conventional convolutions, but also convolutions with fractional pixel coordinates, which provides greater freedom to form CNN structures.

...read moreread less

Abstract: In recent years, deep learning has achieved great success in many computer vision applications. Convolutional neural networks (CNNs) have lately emerged as a major approach to image classification. Most research on CNNs thus far has focused on developing architectures such as the Inception and residual networks. The convolution layer is the core of the CNN, but few studies have addressed the convolution unit itself. In this paper, we introduce a convolution unit called the active convolution unit (ACU). A new convolution has no fixed shape, because of which we can define any form of convolution. Its shape can be learned through backpropagation during training. Our proposed unit has a few advantages. First, the ACU is a generalization of convolution, it can define not only all conventional convolutions, but also convolutions with fractional pixel coordinates. We can freely change the shape of the convolution, which provides greater freedom to form CNN structures. Second, the shape of the convolution is learned while training and there is no need to tune it by hand. Third, the ACU can learn better than a conventional unit, where we obtained the improvement simply by changing the conventional convolution to an ACU. We tested our proposed method on plain and residual networks, and the results showed significant improvement using our method on various datasets and architectures in comparison with the baseline. Code is available at https://github.com/jyh2986/Active-Convolution.

...read moreread less

Proceedings Article•DOI•

Learning Fully Convolutional Networks for Iterative Non-blind Deconvolution

[...]

Jiawei Zhang¹, Jinshan Pan², Wei-Sheng Lai³, Rynson W. H. Lau¹, Ming-Hsuan Yang³ - Show less +1 more•Institutions (3)

City University of Hong Kong¹, Dalian University of Technology², University of California, Merced³

01 Jul 2017

TL;DR: In this article, a fully convolutional network (FCN) is proposed to remove noise in the gradient domain and use the learned gradients to guide the image deconvolution step.

...read moreread less

Abstract: In this paper, we propose a fully convolutional network for iterative non-blind deconvolution. We decompose the non-blind deconvolution problem into image denoising and image deconvolution. We train a FCNN to remove noise in the gradient domain and use the learned gradients to guide the image deconvolution step. In contrast to the existing deep neural network based methods, we iteratively deconvolve the blurred images in a multi-stage framework. The proposed method is able to learn an adaptive image prior, which keeps both local (details) and global (structures) information. Both quantitative and qualitative evaluations on the benchmark datasets demonstrate that the proposed method performs favorably against state-of-the-art algorithms in terms of quality and speed.

...read moreread less

Proceedings Article•DOI•

WireGuard: Next Generation Kernel Network Tunnel.

[...]

Jason A. Donenfeld

01 Jan 2017

TL;DR: WireGuard is a secure network tunnel, operating at layer 3, implemented as a kernel virtual network interface for Linux, which aims to replace both IPsec for most use cases, as well as popular user space and/or TLS-based solutions like OpenVPN, while being more secure, more performant, and easier to use.

...read moreread less

Abstract: WireGuard is a secure network tunnel, operating at layer 3, implemented as a kernel virtual network interface for Linux, which aims to replace both IPsec for most use cases, as well as popular user space and/or TLS-based solutions like OpenVPN, while being more secure, more performant, and easier to use. The virtual tunnel interface is based on a proposed fundamental principle of secure tunnels: an association between a peer public key and a tunnel source IP address. It uses a single round trip key exchange, based on NoiseIK, and handles all session creation transparently to the user using a novel timer state machine mechanism. Short pre-shared static keys—Curve25519 points—are used for mutual authentication in the style of OpenSSH. The protocol provides strong perfect forward secrecy in addition to a high degree of identity hiding. Transport speed is accomplished using ChaCha20Poly1305 authenticated-encryption for encapsulation of packets in UDP. An improved take on IP-binding cookies is used for mitigating denial of service attacks, improving greatly on IKEv2 and DTLS’s cookie mechanisms to add encryption and authentication. The overall design allows for allocating no resources in response to received packets, and from a systems perspective, there are multiple interesting Linux implementation techniques for queues and parallelism. Finally, WireGuard can be simply implemented for Linux in less than 4,000 lines of code, making it easily audited and verified. Permanent ID of this document: 4846ada1492f5d92198df154f48c3d54205657bc. Static link: wireguard.com/papers/wireguard .pdf. Date: June 1, 2020. This is draft revision e2da747. A version of this paper appears in Proceedings of the Network and Distributed System Security Symposium, NDSS 2017. Copyright © 2015–2020 Jason A. Donenfeld. All Rights Reserved.

...read moreread less

Proceedings Article•DOI•

Parallel Multi Channel convolution using General Matrix Multiplication

[...]

Aravind Vasudevan¹, Andrew Anderson¹, David Gregg¹•Institutions (1)

Trinity College, Dublin¹

10 Jul 2017

TL;DR: In this article, the authors proposed a new approach to MCMK convolution that is based on General Matrix Multiplication (GEMM), but not on im2col, which eliminates the need for data replication on the input.

...read moreread less

Abstract: Convolutional neural networks (CNNs) have emerged as one of the most successful machine learning technologies for image and video processing. The most computationally-intensive parts of CNNs are the convolutional layers, which convolve multi-channel images with multiple kernels. A common approach to implementing convolutional layers is to expand the image into a column matrix (im2col) and perform Multiple Channel Multiple Kernel (MCMK) convolution using an existing parallel General Matrix Multiplication (GEMM) library. This im2col conversion greatly increases the memory footprint of the input matrix and reduces data locality. In this paper we propose a new approach to MCMK convolution that is based on General Matrix Multiplication (GEMM), but not on im2col. Our algorithm eliminates the need for data replication on the input thereby enabling us to apply the convolution kernels on the input images directly. We have implemented several variants of our algorithm on a CPU processor and an embedded ARM processor. On the CPU, our algorithm is faster than im2col in most cases.

...read moreread less

Proceedings Article•DOI•

Deep Generative Filter for Motion Deblurring

[...]

Sainandan Ramakrishnan¹, Shubham Pachori², Aalok Gangopadhyay², Shanmuganathan Raman²•Institutions (2)

Indian Institute of Technology Madras¹, Indian Institute of Technology Gandhinagar²

01 Oct 2017

TL;DR: This paper proposes a novel deep filter based on Generative Adversarial Network architecture integrated with global skip connection and dense architecture which outperforms the state-of-the-art blind deblurring algorithms both quantitatively and qualitatively.

...read moreread less

Abstract: Removing blur caused by camera shake in images has always been a challenging problem in computer vision literature due to its ill-posed nature. Motion blur caused due to the relative motion between the camera and the object in 3D space induces a spatially varying blurring effect over the entire image. In this paper, we propose a novel deep filter based on Generative Adversarial Network (GAN) architecture integrated with global skip connection and dense architecture in order to tackle this problem. Our model, while bypassing the process of blur kernel estimation, significantly reduces the test time which is necessary for practical applications. The experiments on the benchmark datasets prove the effectiveness of the proposed method which outperforms the state-of-the-art blind deblurring algorithms both quantitatively and qualitatively.

...read moreread less

Collapse