Showing papers on "Kernel (image processing) published in 2020"

PDF

Open Access

Proceedings Article•DOI•

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

[...]

Qilong Wang¹, Banggu Wu¹, Pengfei Zhu¹, Peihua Li², Wangmeng Zuo³, Qinghua Hu¹ - Show less +2 more•Institutions (3)

Tianjin University¹, Dalian University of Technology², Harbin Institute of Technology³

14 Jun 2020

TL;DR: The Efficient Channel Attention (ECA) module as discussed by the authors proposes a local cross-channel interaction strategy without dimensionality reduction, which can be efficiently implemented via 1D convolution, which only involves a handful of parameters while bringing clear performance gain.

...read moreread less

Abstract: Recently, channel attention mechanism has demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most existing methods dedicate to developing more sophisticated attention modules for achieving better performance, which inevitably increase model complexity. To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. By dissecting the channel attention module in SENet, we empirically show avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity. Therefore, we propose a local cross-channel interaction strategy without dimensionality reduction, which can be efficiently implemented via 1D convolution. Furthermore, we develop a method to adaptively select kernel size of 1D convolution, determining coverage of local cross-channel interaction. The proposed ECA module is both efficient and effective, e.g., the parameters and computations of our modules against backbone of ResNet50 are 80 vs. 24.37M and 4.7e-4 GFlops vs. 3.86 GFlops, respectively, and the performance boost is more than 2% in terms of Top-1 accuracy. We extensively evaluate our ECA module on image classification, object detection and instance segmentation with backbones of ResNets and MobileNetV2. The experimental results show our module is more efficient while performing favorably against its counterparts.

...read moreread less

1,378 citations

Posted Content•

Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains

[...]

Matthew Tancik¹, Pratul P. Srinivasan¹, Ben Mildenhall¹, Sara Fridovich-Keil¹, Nithin Raghavan, Utkarsh Singhal¹, Ravi Ramamoorthi², Jonathan T. Barron³, Ren Ng¹ - Show less +5 more•Institutions (3)

University of California, Berkeley¹, University of California, San Diego², Google³

18 Jun 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: An approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities is suggested.

...read moreread less

Abstract: We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron (MLP) to learn high-frequency functions in low-dimensional problem domains These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP fails to learn high frequencies both in theory and in practice To overcome this spectral bias, we use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities

...read moreread less

787 citations

Posted Content•

Fourier Neural Operator for Parametric Partial Differential Equations

[...]

Zongyi Li¹, Nikola B. Kovachki¹, Kamyar Azizzadenesheli², Burigede Liu¹, Kaushik Bhattacharya¹, Andrew M. Stuart¹, Animashree Anandkumar¹ - Show less +3 more•Institutions (2)

California Institute of Technology¹, Purdue University²

18 Oct 2020-arXiv: Learning

TL;DR: This work forms a new neural operator by parameterizing the integral kernel directly in Fourier space, allowing for an expressive and efficient architecture and shows state-of-the-art performance compared to existing neural network methodologies.

...read moreread less

Abstract: The classical development of neural networks has primarily focused on learning mappings between finite-dimensional Euclidean spaces. Recently, this has been generalized to neural operators that learn mappings between function spaces. For partial differential equations (PDEs), neural operators directly learn the mapping from any functional parametric dependence to the solution. Thus, they learn an entire family of PDEs, in contrast to classical methods which solve one instance of the equation. In this work, we formulate a new neural operator by parameterizing the integral kernel directly in Fourier space, allowing for an expressive and efficient architecture. We perform experiments on Burgers' equation, Darcy flow, and Navier-Stokes equation. The Fourier neural operator is the first ML-based method to successfully model turbulent flows with zero-shot super-resolution. It is up to three orders of magnitude faster compared to traditional PDE solvers. Additionally, it achieves superior accuracy compared to previous learning-based solvers under fixed resolution.

...read moreread less

762 citations

Journal Article•DOI•

Meltdown: reading kernel memory from user space

[...]

Moritz Lipp¹, Michael Schwarz¹, Daniel Gruss¹, Thomas Prescher, Werner Haas, Jann Horn², Stefan Mangard¹, Paul C. Kocher, Daniel Genkin³, Yuval Yarom⁴, Mike Hamburg⁵, Raoul Strackx - Show less +8 more•Institutions (5)

Graz University of Technology¹, Google², University of Michigan³, University of Adelaide⁴, Cryptography Research⁵

21 May 2020-Communications of The ACM

TL;DR: Meltdown as mentioned in this paper exploits side effects of out-of-order execution on modern processors to read arbitrary kernel-memory locations including personal data and passwords, and it does not rely on any software vulnerabilities.

...read moreread less

Abstract: The security of computer systems fundamentally relies on memory isolation, e.g., kernel address ranges are marked as non-accessible and are protected from user access. In this paper, we present Meltdown. Meltdown exploits side effects of out-of-order execution on modern processors to read arbitrary kernel-memory locations including personal data and passwords. Out-of-order execution is an indispensable performance feature and present in a wide range of modern processors. The attack is independent of the operating system, and it does not rely on any software vulnerabilities. Meltdown breaks all security guarantees provided by address space isolation as well as paravirtualized environments and, thus, every security mechanism building upon this foundation. On affected systems, Meltdown enables an adversary to read memory of other processes or virtual machines in the cloud without any permissions or privileges, affecting millions of customers and virtually every user of a personal computer. We show that the KAISER defense mechanism for KASLR has the important (but inadvertent) side effect of impeding Meltdown. We stress that KAISER must be deployed immediately to prevent large-scale exploitation of this severe information leakage.

...read moreread less

497 citations

Proceedings Article•DOI•

Dynamic Convolution: Attention Over Convolution Kernels

[...]

Yinpeng Chen¹, Xiyang Dai¹, Mengchen Liu¹, Dongdong Chen¹, Lu Yuan¹, Zicheng Liu¹ - Show less +2 more•Institutions (1)

Microsoft¹

14 Jun 2020

TL;DR: Dynamic convolution as mentioned in this paper aggregates multiple parallel convolution kernels dynamically based on their attentions, which are input dependent, to increase model complexity without increasing the network depth or width.

...read moreread less

Abstract: Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited representation capability. To address this issue, we present Dynamic Convolution, a new design that increases model complexity without increasing the network depth or width. Instead of using a single convolution kernel per layer, dynamic convolution aggregates multiple parallel convolution kernels dynamically based upon their attentions, which are input dependent. Assembling multiple kernels is not only computationally efficient due to the small kernel size, but also has more representation power since these kernels are aggregated in a non-linear way via attention. By simply using dynamic convolution for the state-of-the-art architecture MobileNetV3-Small, the top-1 accuracy of ImageNet classification is boosted by 2.9% with only 4% additional FLOPs and 2.9 AP gain is achieved on COCO keypoint detection.

...read moreread less

291 citations

Journal Article•DOI•

Multiscale Dynamic Graph Convolutional Network for Hyperspectral Image Classification

[...]

Sheng Wan¹, Chen Gong¹, Ping Zhong², Bo Du³, Lefei Zhang³, Jian Yang¹ - Show less +2 more•Institutions (3)

Nanjing University of Science and Technology¹, National University of Defense Technology², Wuhan University³

01 May 2020-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: The proposed multiscale dynamic GCN (MDGCN) enables the graph to be dynamically updated along with the graph convolution process so that these two steps can be benefited from each other to gradually produce the discriminative embedded features as well as a refined graph.

...read moreread less

Abstract: Convolutional neural network (CNN) has demonstrated impressive ability to represent hyperspectral images and to achieve promising results in hyperspectral image classification. However, traditional CNN models can only operate convolution on regular square image regions with fixed size and weights, and thus, they cannot universally adapt to the distinct local regions with various object distributions and geometric appearances. Therefore, their classification performances are still to be improved, especially in class boundaries. To alleviate this shortcoming, we consider employing the recently proposed graph convolutional network (GCN) for hyperspectral image classification, as it can conduct the convolution on arbitrarily structured non-Euclidean data and is applicable to the irregular image regions represented by graph topological information. Different from the commonly used GCN models that work on a fixed graph, we enable the graph to be dynamically updated along with the graph convolution process so that these two steps can be benefited from each other to gradually produce the discriminative embedded features as well as a refined graph. Moreover, to comprehensively deploy the multiscale information inherited by hyperspectral images, we establish multiple input graphs with different neighborhood scales to extensively exploit the diversified spectral–spatial correlations at multiple scales. Therefore, our method is termed multiscale dynamic GCN (MDGCN). The experimental results on three typical benchmark data sets firmly demonstrate the superiority of the proposed MDGCN to other state-of-the-art methods in both qualitative and quantitative aspects.

...read moreread less

270 citations

Posted Content•

SOLOv2: Dynamic and Fast Instance Segmentation

[...]

Xinlong Wang¹, Rufeng Zhang², Tao Kong³, Lei Li, Chunhua Shen¹ - Show less +1 more•Institutions (3)

University of Adelaide¹, Tongji University², Tsinghua University³

23 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: State-of-the-art results in object detection (from the authors' mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation.

...read moreread less

Abstract: In this work, we aim at building a simple, direct, and fast instance segmentation framework with strong performance. We follow the principle of the SOLO method of Wang et al. "SOLO: segmenting objects by locations". Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location. Specifically, the mask branch is decoupled into a mask kernel branch and mask feature branch, which are responsible for learning the convolution kernel and the convolved features respectively. Moreover, we propose Matrix NMS (non maximum suppression) to significantly reduce the inference time overhead due to NMS of masks. Our Matrix NMS performs NMS with parallel matrix operations in one shot, and yields better results. We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art methods in both speed and accuracy. A light-weight version of SOLOv2 executes at 31.3 FPS and yields 37.1% AP. Moreover, our state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation. Code is available at: this https URL

...read moreread less

261 citations

Proceedings Article•DOI•

Improving Convolutional Networks With Self-Calibrated Convolutions

[...]

Jiang-Jiang Liu¹, Qibin Hou², Ming-Ming Cheng¹, Changhu Wang, Jiashi Feng² - Show less +1 more•Institutions (2)

Nankai University¹, National University of Singapore²

14 Jun 2020

TL;DR: A novel self-calibrated convolution that explicitly expand fields-of-view of each convolutional layers through internal communications and hence enrich the output features to help CNNs generate more discriminative representations by explicitly incorporating richer information.

...read moreread less

Abstract: Recent advances on CNNs are mostly devoted to designing more complex architectures to enhance their representation learning capacity. In this paper, we consider how to improve the basic convolutional feature transformation process of CNNs without tuning the model architectures. To this end, we present a novel self-calibrated convolutions that explicitly expand fields-of-view of each convolutional layers through internal communications and hence enrich the output features. In particular, unlike the standard convolutions that fuse spatial and channel-wise information using small kernels (e.g., 3x3), self-calibrated convolutions adaptively build long-range spatial and inter-channel dependencies around each spatial location through a novel self-calibration operation. Thus, it can help CNNs generate more discriminative representations by explicitly incorporating richer information. Our self-calibrated convolution design is simple and generic, and can be easily applied to augment standard convolutional layers without introducing extra parameters and complexity. Extensive experiments demonstrate that when applying self-calibrated convolutions into different backbones, our networks can significantly improve the baseline models in a variety of vision tasks, including image recognition, object detection, instance segmentation, and keypoint detection, with no need to change the network architectures. We hope this work could provide a promising way for future research in designing novel convolutional feature transformations for improving convolutional networks. Code is available on the project page.

...read moreread less

239 citations

Proceedings Article•DOI•

Deep Snake for Real-Time Instance Segmentation

[...]

Sida Peng¹, Wen Jiang¹, Huaijin Pi¹, Xiuli Li, Hujun Bao¹, Xiaowei Zhou¹ - Show less +2 more•Institutions (1)

Zhejiang University¹

14 Jun 2020

TL;DR: This paper develops a two-stage pipeline for instance segmentation: initial contour proposal and contour deformation, which can handle errors in object localization and proposes to use circular convolution in deep snake for structured feature learning on the contour.

...read moreread less

Abstract: This paper introduces a novel contour-based approach named deep snake for real-time instance segmentation. Unlike some recent methods that directly regress the coordinates of the object boundary points from an image, deep snake uses a neural network to iteratively deform an initial contour to match the object boundary, which implements the classic idea of snake algorithms with a learning-based approach. For structured feature learning on the contour, we propose to use circular convolution in deep snake, which better exploits the cycle-graph structure of a contour compared against generic graph convolution. Based on deep snake, we develop a two-stage pipeline for instance segmentation: initial contour proposal and contour deformation, which can handle errors in object localization. Experiments show that the proposed approach achieves competitive performances on the Cityscapes, KINS, SBD and COCO datasets while being efficient for real-time applications with a speed of 32.3 fps for 512 x 512 images on a 1080Ti GPU. The code is available at https://github.com/zju3dv/snake/.

...read moreread less

225 citations

Proceedings Article•DOI•

Meta-Transfer Learning for Zero-Shot Super-Resolution

[...]

Jae Woong Soh¹, Sunwoo Cho¹, Nam Ik Cho¹•Institutions (1)

Seoul National University¹

14 Jun 2020

TL;DR: Meta-Transfer Learning for Zero-Shot Super-Resolution (MZSR) is presented, which leverages ZSSR and can exploit both external and internal information, where one single gradient update can yield quite considerable results.

...read moreread less

Abstract: Convolutional neural networks (CNNs) have shown dramatic improvements in single image super-resolution (SISR) by using large-scale external samples. Despite their remarkable performance based on the external dataset, they cannot exploit internal information within a specific image. Another problem is that they are applicable only to the specific condition of data that they are supervised. For instance, the low-resolution (LR) image should be a "bicubic" downsampled noise-free image from a high-resolution (HR) one. To address both issues, zero-shot super-resolution (ZSSR) has been proposed for flexible internal learning. However, they require thousands of gradient updates, i.e., long inference time. In this paper, we present Meta-Transfer Learning for Zero-Shot Super-Resolution (MZSR), which leverages ZSSR. Precisely, it is based on finding a generic initial parameter that is suitable for internal learning. Thus, we can exploit both external and internal information, where one single gradient update can yield quite considerable results. With our method, the network can quickly adapt to a given image condition. In this respect, our method can be applied to a large spectrum of image conditions within a fast adaptation process.

...read moreread less

223 citations

Proceedings Article•DOI•

Neural Blind Deconvolution Using Deep Priors

[...]

Dongwei Ren¹, Kai Zhang², Qilong Wang¹, Qinghua Hu¹, Wangmeng Zuo² - Show less +1 more•Institutions (2)

Tianjin University¹, Harbin Institute of Technology²

14 Jun 2020

TL;DR: Experimental results show that the proposed SelfDeblur can achieve notable quantitative gains as well as more visually plausible deblurring results in comparison to state-of-the-art blind deconvolution methods on benchmark datasets and real-world blurry images.

...read moreread less

Abstract: Blind deconvolution is a classical yet challenging low-level vision problem with many real-world applications. Traditional maximum a posterior (MAP) based methods rely heavily on fixed and handcrafted priors that certainly are insufficient in characterizing clean images and blur kernels, and usually adopt specially designed alternating minimization to avoid trivial solution. In contrast, existing deep motion deblurring networks learn from massive training images the mapping to clean image or blur kernel, but are limited in handling various complex and large size blur kernels. To connect MAP and deep models, we in this paper present two generative networks for respectively modeling the deep priors of clean image and blur kernel, and propose an unconstrained neural optimization solution to blind deconvolution. In particular, we adopt an asymmetric Autoencoder with skip connections for generating latent clean image, and a fully-connected network (FCN) for generating blur kernel. Moreover, the SoftMax nonlinearity is applied to the output layer of FCN to meet the non-negative and equality constraints. The process of neural optimization can be explained as a kind of ''zero-shot" self-supervised learning of the generative networks, and thus our proposed method is dubbed SelfDeblur. Experimental results show that our SelfDeblur can achieve notable quantitative gains as well as more visually plausible deblurring results in comparison to state-of-the-art blind deconvolution methods on benchmark datasets and real-world blurry images. The source code is publicly available at https://github.com/csdwren/SelfDeblur

...read moreread less

Proceedings Article•DOI•

Dense Extreme Inception Network: Towards a Robust CNN Model for Edge Detection

[...]

Xavier Gimeno Soria¹, Edgar Riba¹, Angel D. Sappa¹•Institutions (1)

Autonomous University of Barcelona¹

01 Mar 2020

TL;DR: In this paper, a deep learning based edge detector is proposed, which is inspired on both HED (Holistically-Nested Edge Detection) and Xception networks; the proposed approach generates thin edge-maps that are plausible for human eyes; it can be used in any edge detection task without previous training or fine tuning process.

...read moreread less

Abstract: This paper proposes a Deep Learning based edge detector, which is inspired on both HED (Holistically-Nested Edge Detection) and Xception networks. The proposed approach generates thin edge-maps that are plausible for human eyes; it can be used in any edge detection task without previous training or fine tuning process. As a second contribution, a large dataset with carefully annotated edges, has been generated. This dataset has been used for training the proposed approach as well the state-of-the-art algorithms for comparisons. Quantitative and qualitative evaluations have been performed on different benchmarks showing improvements with the proposed method when F-measure of ODS and OIS are considered.

...read moreread less

Journal Article•DOI•

Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy

[...]

Abhinav Agrawal¹, Namita Mittal¹•Institutions (1)

Malaviya National Institute of Technology, Jaipur¹

01 Feb 2020-The Visual Computer

TL;DR: This work proposes two novel CNN architectures which achieve a human-like accuracy of 65% and can serve as a basis for standardization of the base model for the much inquired FER-2013 dataset.

...read moreread less

Abstract: Facial expression recognition is a challenging problem in image classification. Recently, the use of deep learning is gaining importance in image classification. This has led to increased efforts in solving the problem of facial expression recognition using convolutional neural networks (CNNs). A significant challenge in deep learning is to design a network architecture that is simple and effective. A simple architecture is fast to train and easy to implement. An effective architecture achieves good accuracy on the test data. CNN architectures are black boxes to us. VGGNet, AlexNet and Inception are well-known CNN architectures. These architectures have strongly influenced CNN model designs for new datasets. Almost all CNN models known to achieve high accuracy on facial expression recognition problem are influenced by these architectures. This work tries to overcome this limitation by using FER-2013 dataset as starting point to design new CNN models. In this work, the effect of CNN parameters namely kernel size and number of filters on the classification accuracy is investigated using FER-2013 dataset. Our major contribution is a thorough evaluation of different kernel sizes and number of filters to propose two novel CNN architectures which achieve a human-like accuracy of 65% (Goodfellow et al. in: Neural information processing, Springer, Berlin, pp 117–124, 2013) on FER-2013 dataset. These architectures can serve as a basis for standardization of the base model for the much inquired FER-2013 dataset.

...read moreread less

Journal Article•DOI•

Analysis of fractal fractional differential equations

[...]

Abdon Atangana, Ali Akgül¹, Kolade M. Owolabi²•Institutions (2)

Siirt University¹, Ton Duc Thang University²

01 Jun 2020-alexandria engineering journal

TL;DR: In this paper, the authors consider an advection-dispersion model, where the velocity is considered to be 1 and the kernels are power law, exponential decay law and the generalized Mittag-Leffler kernel.

...read moreread less

Abstract: Nonlocal differential and integral operators with fractional order and fractal dimension have been recently introduced and appear to be powerful mathematical tools to model complex real world problems that could not be modeled with classical and nonlocal differential and integral operators with single order. To stress further possible application of such operators, we consider in this work an advection-dispersion model, where the velocity is considered to be 1. We consider three cases of the models, when the kernels are power law, exponential decay law and the generalized Mittag-Leffler kernel. For each case, we present a detailed analysis including, numerical solution, stability analysis and error analysis. We present some numerical simulation.

...read moreread less

Journal Article•DOI•

Image recognition of four rice leaf diseases based on deep learning and support vector machine

[...]

Feng Jiang, Yang Lu, Yu Chen, Di Cai, Gongfa Li¹ - Show less +1 more•Institutions (1)

Wuhan University of Science and Technology¹

01 Dec 2020-Computers and Electronics in Agriculture

TL;DR: A new method for the further research of crop diseases diagnosis by using deep learning and SVM, which has higher accuracy than the traditional back propagation neural networks models.

...read moreread less

Proceedings Article•DOI•

Orthogonal Convolutional Neural Networks

[...]

Jiayun Wang¹, Yubei Chen¹, Rudrasis Chakraborty¹, Stella X. Yu¹•Institutions (1)

University of California, Berkeley¹

14 Jun 2020

TL;DR: The proposed orthogonal convolution requires no additional parameters and little computational overhead and consistently outperforms the kernel orthogonality alternative on a wide range of tasks such as image classification and inpainting under supervised, semi-supervised and unsupervised settings.

...read moreread less

Abstract: Deep convolutional neural networks are hindered by training instability and feature redundancy towards further performance improvement. A promising solution is to impose orthogonality on convolutional filters. We develop an efficient approach to impose filter orthogonality on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel, instead of the common kernel orthogonality approach, which we show is only necessary but not sufficient for ensuring orthogonal convolutions. Our proposed orthogonal convolution requires no additional parameters and little computational overhead. It consistently outperforms the kernel orthogonality alternative on a wide range of tasks such as image classification and inpainting under supervised, semi-supervised and unsupervised settings. It learns more diverse and expressive features with better training stability, robustness, and generalization. Our code is publicly available.

...read moreread less

Journal Article•DOI•

Depth-Wise Separable Convolutions and Multi-Level Pooling for an Efficient Spatial CNN-Based Steganalysis

[...]

Ru Zhang¹, Feng Zhu¹, Jianyi Liu¹, Gongshen Liu²•Institutions (2)

Beijing University of Posts and Telecommunications¹, Shanghai Jiao Tong University²

01 Jan 2020-IEEE Transactions on Information Forensics and Security

TL;DR: The experimental results show that the proposed CNN structure is significantly better than other five methods when it is used to detect three spatial algorithms such as WOW, S-UNIWARD and HILL with a wide variety of datasets and payloads.

...read moreread less

Abstract: For steganalysis, many studies showed that convolutional neural network (CNN) has better performances than the two-part structure of traditional machine learning methods. Existing CNN architectures use various tricks to improve the performance of steganalysis, such as fixed convolutional kernels, the absolute value layer, data augmentation and the domain knowledge. However, some designing of the network structure were not extensively studied so far, such as different convolutions (inception, xception, etc.) and variety ways of pooling(spatial pyramid pooling, etc.). In this paper, we focus on designing a new CNN network structure to improve detection accuracy of spatial-domain steganography. First, we use $3\times 3$ kernels instead of the traditional $5\times 5$ kernels and optimize convolution kernels in the preprocessing layer. The smaller convolution kernels are used to reduce the number of parameters and model the features in a small local region. Next, we use separable convolutions to utilize channel correlation of the residuals, compress the image content and increase the signal-to-noise ratio (between the stego signal and the image signal). Then, we use spatial pyramid pooling (SPP) to aggregate the local features and enhance the representation ability of features by multi-level pooling. Finally, data augmentation is adopted to further improve network performance. The experimental results show that the proposed CNN structure is significantly better than other five methods such as SRM, Ye-Net, Xu-Net, Yedroudj-Net and SRNet, when it is used to detect three spatial algorithms such as WOW, S-UNIWARD and HILL with a wide variety of datasets and payloads.

...read moreread less

Proceedings Article•DOI•

Convolution in the Cloud: Learning Deformable Kernels in 3D Graph Convolution Networks for Point Cloud Analysis

[...]

Zhi-Hao Lin¹, Sheng Yu Huang¹, Yu-Chiang Frank Wang¹•Institutions (1)

National Taiwan University¹

14 Jun 2020

TL;DR: 3D Graph Convolution Networks (3D-GCN), which is designed to extract local 3D features from point clouds across scales, while shift and scale-invariance properties are introduced.

...read moreread less

Abstract: Point clouds are among the popular geometry representations for 3D vision applications. However, without regular structures like 2D images, processing and summarizing information over these unordered data points are very challenging. Although a number of previous works attempt to analyze point clouds and achieve promising performances, their performances would degrade significantly when data variations like shift and scale changes are presented. In this paper, we propose 3D Graph Convolution Networks (3D-GCN), which is designed to extract local 3D features from point clouds across scales, while shift and scale-invariance properties are introduced. The novelty of our 3D-GCN lies in the definition of learnable kernels with a graph max-pooling mechanism. We show that 3D-GCN can be applied to 3D classification and segmentation tasks, with ablation studies and visualizations verifying the design of 3D-GCN.

...read moreread less

Proceedings Article•DOI•

Squeeze-and-Attention Networks for Semantic Segmentation

[...]

Zilong Zhong¹, Zhong Qiu Lin², Rene Bidart², Xiaodan Hu², Ibrahim Ben Daya², Zhifeng Li, Wei-Shi Zheng¹, Jonathan Li², Alexander Wong² - Show less +5 more•Institutions (2)

Chinese Ministry of Education¹, University of Waterloo²

14 Jun 2020

TL;DR: A novel squeeze-and-attention network (SANet) architecture is proposed that leverages an effective squeeze- and-att attention (SA) module to account for two distinctive characteristics of segmentation: i) pixel-group attention, and ii) pixels-wise prediction.

...read moreread less

Abstract: The recent integration of attention mechanisms into segmentation networks improves their representational capabilities through a great emphasis on more informative features. However, these attention mechanisms ignore an implicit sub-task of semantic segmentation and are constrained by the grid structure of convolution kernels. In this paper, we propose a novel squeeze-and-attention network (SANet) architecture that leverages an effective squeeze-and-attention (SA) module to account for two distinctive characteristics of segmentation: i) pixel-group attention, and ii) pixel-wise prediction. Specifically, the proposed SA modules impose pixel-group attention on conventional convolution by introducing an 'attention' convolutional channel, thus taking into account spatial-channel inter-dependencies in an efficient manner. The final segmentation results are produced by merging outputs from four hierarchical stages of a SANet to integrate multi-scale contexts for obtaining an enhanced pixel-wise prediction. Empirical experiments on two challenging public datasets validate the effectiveness of the proposed SANets, which achieves 83.2 % mIoU (without COCO pre-training) on PASCAL VOC and a state-of-the-art mIoU of 54.4 % on PASCAL Context.

...read moreread less

Proceedings Article•DOI•

Learning Depth-Guided Convolutions for Monocular 3D Object Detection

[...]

Mingyu Ding¹, Yuqi Huo², Hongwei Yi³, Zhe Wang⁴, Jianping Shi⁴, Zhiwu Lu², Ping Luo¹ - Show less +3 more•Institutions (4)

University of Hong Kong¹, Renmin University of China², Peking University³, SenseTime⁴

14 Jun 2020

TL;DR: D4LCN overcomes the limitation of conventional 2D convolutions and narrows the gap between image representation and 3D point cloud representation, where the filters and their receptive fields can be automatically learned from image-based depth maps.

...read moreread less

Abstract: 3D object detection from a single image without LiDAR is a challenging task due to the lack of accurate depth information. Conventional 2D convolutions are unsuitable for this task because they fail to capture local object and its scale information, which are vital for 3D object detection. To better represent 3D structure, prior arts typically transform depth maps estimated from 2D images into a pseudo-LiDAR representation, and then apply existing 3D point-cloud based object detectors. However, their results depend heavily on the accuracy of the estimated depth maps, resulting in suboptimal performance. In this work, instead of using pseudo-LiDAR representation, we improve the fundamental 2D fully convolutions by proposing a new local convolutional network (LCN), termed Depth-guided Dynamic-Depthwise-Dilated LCN (D4LCN), where the filters and their receptive fields can be automatically learned from image-based depth maps, making different pixels of different images have different filters. D4LCN overcomes the limitation of conventional 2D convolutions and narrows the gap between image representation and 3D point cloud representation. Extensive experiments show that D$^4$LCN outperforms existing works by large margins. For example, the relative improvement of D4LCN against the state-of-the-art on KITTI is 9.1\% in the moderate setting. D4LCN ranks 1st on KITTI monocular 3D object detection benchmark at the time of submission (car, December 2019). The code is available at https://github.com/dingmyu/D4LCN

...read moreread less

Journal Article•DOI•

A Simplified 2D-3D CNN Architecture for Hyperspectral Image Classification Based on Spatial–Spectral Fusion

[...]

Chunyan Yu¹, Rui Han¹, Meiping Song¹, Caiyu Liu¹, Chein-I Chang² - Show less +1 more•Institutions (2)

Dalian Maritime University¹, National Yunlin University of Science and Technology²

27 Apr 2020-IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

TL;DR: The results and analysis of extensive real HSIC experiments demonstrate that the proposed light-weighted 2D-3D CNN network can effectively extract refined features and improve the classification accuracy.

...read moreread less

Abstract: Convolutional neural networks (CNN) have led to a successful breakthrough for hyperspectral image classification (HSIC). Due to the intrinsic spatial-spectral specificities of a hyperspectral cube, feature extraction with 3-D convolution operation is a straightforward way for HSIC. However, the overwhelming features obtained from the original 3-D CNN network suffers from the overfitting and more training cost problem. To address this issue, in this article, a novel HSIC framework based on a simplified 2D-3D CNN is implemented by the cooperation between a 2-D CNN and a 3-D convolution layer. First, the 2-D convolution block aims to extract the spatial features abundantly involved spectral information as a training channel. Then, the 3-D CNN approach primarily concentrates on exploiting band co-relation data by using a reduced kernel. The proposed architecture achieves the spatial and spectral features simultaneously based on a joint 2D-3D pattern to achieve superior fused feature for the subsequent classification. Furthermore, a deconvolution layer intends to enhance the robustness of the deep features is utilized in the proposed CNN network. The results and analysis of extensive real HSIC experiments demonstrate that the proposed light-weighted 2D-3D CNN network can effectively extract refined features and improve the classification accuracy.

...read moreread less

Journal Article•DOI•

Free vibration and buckling analyses of CNT reinforced laminated non-rectangular plates by discrete singular convolution method

[...]

Ömer Civalek¹, Mehmet Avcar²•Institutions (2)

China Medical University (Taiwan)¹, Süleyman Demirel University²

11 Sep 2020-Engineering With Computers

TL;DR: In this paper, the free vibration and buckling analyses of functionally graded carbon nanotube-reinforced (FG-CNTR) laminated non-rectangular plates, i.e., quadrilateral and skew plates, using a four-nodded straight-sided transformation method.

...read moreread less

Abstract: This paper presents the free vibration and buckling analyses of functionally graded carbon nanotube-reinforced (FG-CNTR) laminated non-rectangular plates, i.e., quadrilateral and skew plates, using a four-nodded straight-sided transformation method. At first, the related equations of motion and buckling of quadrilateral plate have been given, and then, these equations are transformed from the irregular physical domain into a square computational domain using the geometric transformation formulation via discrete singular convolution (DSC). The discretization of these equations is obtained via two-different regularized kernel, i.e., regularized Shannon’s delta (RSD) and Lagrange-delta sequence (LDS) kernels in conjunctions with the discrete singular convolution numerical integration. Convergence and accuracy of the present DSC transformation are verified via existing literature results for different cases. Detailed numerical solutions are performed, and obtained parametric results are presented to show the effects of carbon nanotube (CNT) volume fraction, CNT distribution pattern, geometry of skew and quadrilateral plate, lamination layup, skew and corner angle, thickness-to-length ratio on the vibration, and buckling analyses of FG-CNTR-laminated composite non-rectangular plates with different boundary conditions. Some detailed results related to critical buckling and frequency of FG-CNTR non-rectangular plates have been reported which can serve as benchmark solutions for future investigations.

...read moreread less

Posted Content•

SOLOv2: Dynamic, Faster and Stronger.

[...]

Xinlong Wang, Rufeng Zhang, Tao Kong, Lei Li, Chunhua Shen - Show less +1 more

23 Mar 2020

...read moreread less

Journal Article•DOI•

HyperLi-Net: A hyper-light deep learning network for high-accurate and high-speed ship detection from synthetic aperture radar imagery

[...]

Tianwen Zhang¹, Xiaoling Zhang¹, Jun Shi¹, Shunjun Wei¹•Institutions (1)

University of Electronic Science and Technology of China¹

01 Sep 2020-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: Experimental results on the SAR Ship Detection Dataset (SSDD), Gaofen-SSDD and Sentinel-SS DD show that HyperLi-Net’s accuracy and speed are both superior to the other nine state-of-the-art methods.

...read moreread less

Abstract: Ship detection from Synthetic Aperture Radar (SAR) imagery is attracting increasing attention due to its great value in ocean. However, existing most studies are frequently improving detection accuracy at the expense of detection speed. Thus, to solve this problem, this paper proposes HyperLi-Net for high-accurate and high-speed SAR ship detection. We propose five external modules to achieve high-accuracy, i.e., Multi-Receptive-Field Module (MRF-Module), Dilated Convolution Module (DC-Module), Channel and Spatial Attention Module (CSA-Module), Feature Fusion Module (FF-Module) and Feature Pyramid Module (FP-Module). We also adopt five internal mechanisms to achieve high-speed, i.e., Region-Free Model (RF-Model), Small Kernel (S-Kernel), Narrow Channel (N-Channel), Separable Convolution (Separa-Conv) and Batch Normalization Fusion (BN-Fusion). Experimental results on the SAR Ship Detection Dataset (SSDD), Gaofen-SSDD and Sentinel-SSDD show that HyperLi-Net’s accuracy and speed are both superior to the other nine state-of-the-art methods. Moreover, the satisfactory detection results on two Sentinel-1 SAR images can reveal HyperLi-Net’s good migration capability. HyperLi-Net is build from scratch with fewer parameters, lower computation costs and lighter model that can be efficiently trained on CPUs and is helpful for future hardware transplantation, e.g. FPGAs, DSPs, etc.

...read moreread less

Posted Content•

Unfolding the Alternating Optimization for Blind Super Resolution

[...]

Zhengxiong Luo¹, Yan Huang¹, Shang Li¹, Liang Wang¹, Tieniu Tan² - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, Center for Excellence in Education²

06 Oct 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: An alternating optimization algorithm, which can estimate blur kernel and restore SR image in a single model and is trained with the kernel estimated by \textit{Estimator}, instead of ground-truth kernel, thus the model could be more tolerant to the estimation error of the latter.

...read moreread less

Abstract: Previous methods decompose blind super resolution (SR) problem into two sequential steps: \textit{i}) estimating blur kernel from given low-resolution (LR) image and \textit{ii}) restoring SR image based on estimated kernel. This two-step solution involves two independently trained models, which may not be well compatible with each other. Small estimation error of the first step could cause severe performance drop of the second one. While on the other hand, the first step can only utilize limited information from LR image, which makes it difficult to predict highly accurate blur kernel. Towards these issues, instead of considering these two steps separately, we adopt an alternating optimization algorithm, which can estimate blur kernel and restore SR image in a single model. Specifically, we design two convolutional neural modules, namely \textit{Restorer} and \textit{Estimator}. \textit{Restorer} restores SR image based on predicted kernel, and \textit{Estimator} estimates blur kernel with the help of restored SR image. We alternate these two modules repeatedly and unfold this process to form an end-to-end trainable network. In this way, \textit{Estimator} utilizes information from both LR and SR images, which makes the estimation of blur kernel easier. More importantly, \textit{Restorer} is trained with the kernel estimated by \textit{Estimator}, instead of ground-truth kernel, thus \textit{Restorer} could be more tolerant to the estimation error of \textit{Estimator}. Extensive experiments on synthetic datasets and real-world images show that our model can largely outperform state-of-the-art methods and produce more visually favorable results at much higher speed. The source code is available at this https URL.

...read moreread less

Proceedings Article•

SOLOv2: Dynamic and Fast Instance Segmentation

[...]

Xinlong Wang¹, Rufeng Zhang², Tao Kong³, Lei Li, Chunhua Shen¹ - Show less +1 more•Institutions (3)

University of Adelaide¹, Tongji University², Tsinghua University³

01 Jan 2020

TL;DR: SOLOv2 as discussed by the authors decouples the mask branch into a mask kernel branch and mask feature branch, which are responsible for learning the convolution kernel and the convolved features respectively.

...read moreread less

Proceedings Article•DOI•

Scale-Equalizing Pyramid Convolution for Object Detection

[...]

Xinjiang Wang¹, Shilong Zhang¹, Zhuoran Yu¹, Litong Feng¹, Wayne Zhang¹ - Show less +1 more•Institutions (1)

SenseTime¹

14 Jun 2020

TL;DR: Jeon et al. as mentioned in this paper proposed a scale-equalizing pyramid convolution (SEPC) that aligns the shared pyramid convolutions kernel only at high-level feature maps to extract scale-invariant features.

...read moreread less

Abstract: Feature pyramid has been an efficient method to extract features at different scales. Development over this method mainly focuses on aggregating contextual information at different levels while seldom touching the inter-level correlation in the feature pyramid. Early computer vision methods extracted scale-invariant features by locating the feature extrema in both spatial and scale dimension. Inspired by this, a convolution across the pyramid level is proposed in this study, which is termed pyramid convolution and is a modified 3-D convolution. Stacked pyramid convolutions directly extract 3-D (scale and spatial) features and outperforms other meticulously designed feature fusion modules. Based on the viewpoint of 3-D convolution, an integrated batch normalization that collects statistics from the whole feature pyramid is naturally inserted after the pyramid convolution. Furthermore, we also show that the naive pyramid convolution, together with the design of RetinaNet head, actually best applies for extracting features from a Gaussian pyramid, whose properties can hardly be satisfied by a feature pyramid. In order to alleviate this discrepancy, we build a scale-equalizing pyramid convolution (SEPC) that aligns the shared pyramid convolution kernel only at high-level feature maps. Being computationally efficient and compatible with the head design of most single-stage object detectors, the SEPC module brings significant performance improvement (>4AP increase on MS-COCO2017 dataset) in state-of-the-art one-stage object detectors, and a light version of SEPC also has ~3.5AP gain with only around 7% inference time increase. The pyramid convolution also functions well as a stand-alone module in two-stage object detectors and is able to improve the performance by ~2AP. The source code can be found at https://github.com/jshilong/SEPC.

...read moreread less

Journal Article•DOI•

Low Rank Component Induced Spatial-Spectral Kernel Method for Hyperspectral Image Classification

[...]

Le Sun¹, Chenyang Ma, Yunjie Chen, Yuhui Zheng, Hiuk Jae Shim, Zebin Wu², Byeungwoo Jeon³ - Show less +3 more•Institutions (3)

Nanjing University of Information Science and Technology¹, Nanjing University of Science and Technology², Sungkyunkwan University³

01 Oct 2020-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: Experimental results validate that the proposed LRCISSK method can effectively explore the spatial-spectral information and deliver superior performance with at least 1.30% higher OA and 1.03% higher AA on average when compared to other state-of-the-art classifiers.

...read moreread less

Abstract: Kernel methods, e.g., composite kernels (CKs) and spatial-spectral kernels (SSKs), have been demonstrated to be an effective way to exploit the spatial-spectral information nonlinearly for improving the classification performance of hyperspectral image (HSI). However, these methods are always conducted with square-shaped window or superpixel techniques. Both techniques are likely to misclassify the pixels that lie at the boundaries of class, and thus a small target is always smoothed away. To alleviate these problems, in this paper, we propose a novel patch-based low rank component induced spatial-spectral kernel method, termed LRCISSK, for HSI classification. First, the latent low-rank features of spectra in each cubic patch of HSI are reconstructed by a low rank matrix recovery (LRMR) technique, and then, to further explore more accurate spatial information, they are used to identify a homogeneous neighborhood for the target pixel (i.e., the centroid pixel) adaptively. Finally, the adaptively identified homogenous neighborhood which consists of the latent low-rank spectra is embedded into the spatial-spectral kernel framework. It can easily map the spectra into the nonlinearly complex manifolds and enable a classifier (e.g., support vector machine, SVM) to distinguish them effectively. Experimental results on three real HSI datasets validate that the proposed LRCISSK method can effectively explore the spatial-spectral information and deliver superior performance with at least 1.30% higher OA and 1.03% higher AA on average when compared to other state-of-the-art classifiers.

...read moreread less

Journal Article•DOI•

An Automatic Identification Framework for Complex Power Quality Disturbances Based on Multifusion Convolutional Neural Network

[...]

Wei Qiu¹, Qiu Tang¹, Jie Liu¹, Wenxuan Yao²•Institutions (2)

Hunan University¹, Oak Ridge National Laboratory²

18 Feb 2020-IEEE Transactions on Industrial Informatics

TL;DR: This paper proposes a novel detection framework for complex PQ disturbances based on multifusion convolutional neural network (MFCNN), and focuses on automatic extraction and fusion of features from multiple sources.

...read moreread less

Abstract: Intelligent identification of multiple power quality (PQ) disturbances is very useful for pollution control of power systems. In this paper, we propose a novel detection framework for complex PQ disturbances based on multifusion convolutional neural network (MFCNN). Our contributions focus on automatic extraction and fusion of features from multiple sources. First, an information fusion structure is introduced in which the time domain and frequency domain information of the PQ disturbance signal are used as inputs. Additionally, the one-dimensional composite convolution is proposed to improve the diversity of network features based on the standard convolution and dilated convolution. Then, to speed up the training and prevent overfitting, batch normalization is used to adjust the distribution of features. Second, we use several visualization methods to resolve the internal mode of MFCNN, and demonstrate the working mechanism of the proposed method. Finally, we conduct various experiments to verify the effectiveness of the MFCNN. Compared with the handcrafted feature design methods and the general convolutional neural network models, the simulation under different noises and hardware platform-based experiments verify the effectiveness of noise immunity, higher training speed, and better accuracy of the method.

...read moreread less

Journal Article•DOI•

Fractional discrete Tchebyshev moments and their applications in image encryption and watermarking

[...]

Bin Xiao¹, Jiangxia Luo¹, Xiuli Bi¹, Weisheng Li¹, Beijing Chen² - Show less +1 more•Institutions (2)

Chongqing University of Posts and Telecommunications¹, Nanjing University of Information Science and Technology²

01 Apr 2020-Information Sciences

TL;DR: A novel framework for deriving fractional order DTMs (FrDTMs) by the eigen-decomposition of kernel matrices is proposed in this paper, and some properties of the proposed FrDTMs are analyzed.

...read moreread less

Collapse