Showing papers on "Kernel (image processing) published in 2021"

PDF

Open Access

Proceedings Article•DOI•

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

[...]

Mutian Xu¹, Runyu Ding¹, Hengshuang Zhao², Xiaojuan Qi¹•Institutions (2)

University of Hong Kong¹, University of Oxford²

01 Jun 2021

TL;DR: PAConv as mentioned in this paper constructs the convolution kernel by dynamically assembling basic weight matrices stored in Weight Bank, where the coefficients of these weights are self-adaptively learned from point positions through ScoreNet.

...read moreread less

Abstract: We introduce Position Adaptive Convolution (PAConv), a generic convolution operation for 3D point cloud processing. The key of PAConv is to construct the convolution kernel by dynamically assembling basic weight matrices stored in Weight Bank, where the coefficients of these weight matrices are self-adaptively learned from point positions through ScoreNet. In this way, the kernel is built in a data-driven manner, endowing PAConv with more flexibility than 2D convolutions to better handle the irregular and unordered point cloud data. Besides, the complexity of the learning process is reduced by combining weight matrices instead of brutally predicting kernels from point positions.Furthermore, different from the existing point convolution operators whose network architectures are often heavily engineered, we integrate our PAConv into classical MLP-based point cloud pipelines without changing network configurations. Even built on simple networks, our method still approaches or even surpasses the state-of-the-art models, and significantly improves baseline performance on both classification and segmentation tasks, yet with decent efficiency. Thorough ablation studies and visualizations are provided to understand PAConv. Code is released on https://github.com/CVMI-Lab/PAConv.

...read moreread less

229 citations

Journal Article•DOI•

Attention-Based Adaptive Spectral–Spatial Kernel ResNet for Hyperspectral Image Classification

[...]

Swalpa Kumar Roy¹, Suvojit Manna, Tiecheng Song², Lorenzo Bruzzone³•Institutions (3)

Jalpaiguri Government Engineering College¹, Chongqing University of Posts and Telecommunications², University of Trento³

01 Sep 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: This article presents an attention-based adaptive spectral-spatial kernel improved residual network (A²S²K-ResNet) with spectral attention to capture discriminative spectral- Spatial features for HSI classification in an end-to-end training fashion.

...read moreread less

Abstract: Hyperspectral images (HSIs) provide rich spectral–spatial information with stacked hundreds of contiguous narrowbands. Due to the existence of noise and band correlation, the selection of informative spectral–spatial kernel features poses a challenge. This is often addressed by using convolutional neural networks (CNNs) with receptive field (RF) having fixed sizes. However, these solutions cannot enable neurons to effectively adjust RF sizes and cross-channel dependencies when forward and backward propagations are used to optimize the network. In this article, we present an attention-based adaptive spectral–spatial kernel improved residual network ( A2S2K-ResNet ) with spectral attention to capture discriminative spectral–spatial features for HSI classification in an end-to-end training fashion. In particular, the proposed network learns selective 3-D convolutional kernels to jointly extract spectral–spatial features using improved 3-D ResBlocks and adopts an efficient feature recalibration (EFR) mechanism to boost the classification performance. Extensive experiments are performed on three well-known hyperspectral data sets, i.e., IP, KSC, and UP, and the proposed A2S2K-ResNet can provide better classification results in terms of overall accuracy (OA), average accuracy (AA), and Kappa compared with the existing methods investigated. The source code will be made available at https://github.com/suvojit- $0\times 55$ aa/A2S2K-ResNet.

...read moreread less

185 citations

Journal Article•DOI•

Learning Guided Convolutional Network for Depth Completion

[...]

Jie Tang¹, Fei-Peng Tian¹, Wei Feng², Jian Li³, Ping Tan¹ - Show less +1 more•Institutions (3)

Simon Fraser University¹, Tianjin University², National University of Defense Technology³

01 Jan 2021-IEEE Transactions on Image Processing

TL;DR: Inspired by the guided image filtering, a novel guided network is designed to predict kernel weights from the guidance image and these predicted kernels are then applied to extract the depth image features.

...read moreread less

Abstract: Dense depth perception is critical for autonomous driving and other robotics applications. However, modern LiDAR sensors only provide sparse depth measurement. It is thus necessary to complete the sparse LiDAR data, where a synchronized guidance RGB image is often used to facilitate this completion. Many neural networks have been designed for this task. However, they often naively fuse the LiDAR data and RGB image information by performing feature concatenation or element-wise addition. Inspired by the guided image filtering, we design a novel guided network to predict kernel weights from the guidance image. These predicted kernels are then applied to extract the depth image features. In this way, our network generates content-dependent and spatially-variant kernels for multi-modal feature fusion. Dynamically generated spatially-variant kernels could lead to prohibitive GPU memory consumption and computation overhead. We further design a convolution factorization to reduce computation and memory consumption. The GPU memory reduction makes it possible for feature fusion to work in multi-stage scheme. We conduct comprehensive experiments to verify our method on real-world outdoor, indoor and synthetic datasets. Our method produces strong results. It outperforms state-of-the-art methods on the NYUv2 dataset and ranks 1st on the KITTI depth completion benchmark at the time of submission. It also presents strong generalization capability under different 3D point densities, various lighting and weather conditions as well as cross-dataset evaluations. The code will be released for reproduction.

...read moreread less

165 citations

Book•

Graph Spectral Image Processing

[...]

Gene Cheung¹, Enrico Magli², Yuichi Tanaka³, Michael K. Ng⁴•Institutions (4)

National Institute of Informatics¹, Polytechnic University of Turin², Tokyo University of Agriculture and Technology³, Hong Kong Baptist University⁴

24 Aug 2021

TL;DR: In this article, the authors overview graph spectral techniques in graph signal processing (GSP) specifically for image/video processing, including image compression, image restoration, image filtering, and image segmentation.

...read moreread less

Abstract: Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2-D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this paper, we overview recent graph spectral techniques in GSP specifically for image/video processing. The topics covered include image compression, image restoration, image filtering, and image segmentation.

...read moreread less

126 citations

Journal Article•DOI•

Spherical Kernel for Efficient Graph Convolution on 3D Point Clouds

[...]

Huan Lei¹, Naveed Akhtar¹, Ajmal Mian¹•Institutions (1)

University of Western Australia¹

01 Oct 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Ziyan et al. as discussed by the authors proposed a spherical kernel for efficient graph convolution of 3D point clouds, which is applied to graph neural networks without edge-dependent filter generation, making it computationally attractive for large point clouds.

...read moreread less

Abstract: We propose a spherical kernel for efficient graph convolution of 3D point clouds. Our metric-based kernels systematically quantize the local 3D space to identify distinctive geometric relationships in the data. Similar to the regular grid CNN kernels, the spherical kernel maintains translation-invariance and asymmetry properties, where the former guarantees weight sharing among similar local structures in the data and the latter facilitates fine geometric learning. The proposed kernel is applied to graph neural networks without edge-dependent filter generation, making it computationally attractive for large point clouds. In our graph networks, each vertex is associated with a single point location and edges connect the neighborhood points within a defined range. The graph gets coarsened in the network with farthest point sampling. Analogous to the standard CNNs, we define pooling and unpooling operations for our network. We demonstrate the effectiveness of the proposed spherical kernel with graph neural networks for point cloud classification and semantic segmentation using ModelNet, ShapeNet, RueMonge2014, ScanNet and S3DIS datasets. The source code and the trained models can be downloaded from https://github.com/hlei-ziyan/SPH3D-GCN .

...read moreread less

91 citations

Journal Article•DOI•

Nonlocal Low-Rank Tensor Completion for Visual Data

[...]

Lefei Zhang¹, Liangchen Song¹, Bo Du¹, Yipeng Zhang¹•Institutions (1)

Wuhan University¹

15 Jan 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: The theoretical analysis shows that the error caused by Patch Mismatch can be decomposed into two components, one of which can be bounded by a reasonable assumption named local patch similarity, and the other part is lower than that using matrix completion.

...read moreread less

Abstract: In this paper, we propose a novel nonlocal patch tensor-based visual data completion algorithm and analyze its potential problems. Our algorithm consists of two steps: the first step is initializing the image with triangulation-based linear interpolation and the second step is grouping similar nonlocal patches as a tensor then applying the proposed tensor completion technique. Specifically, with treating a group of patch matrices as a tensor, we impose the low-rank constraint on the tensor through the recently proposed tensor nuclear norm. Moreover, we observe that after the first interpolation step, the image gets blurred and, thus, the similar patches we have found may not exactly match the reference. We name the problem “Patch Mismatch,” and then in order to avoid the error caused by it, we further decompose the patch tensor into a low-rank tensor and a sparse tensor, which means the accepted horizontal strips in mismatched patches. Furthermore, our theoretical analysis shows that the error caused by Patch Mismatch can be decomposed into two components, one of which can be bounded by a reasonable assumption named local patch similarity, and the other part is lower than that using matrix completion. Extensive experimental results on real-world datasets verify our method’s superiority to the state-of-the-art tensor-based image inpainting methods.

...read moreread less

86 citations

Journal Article•DOI•

A new and general fractional Lagrangian approach: A capacitor microphone case study

[...]

Amin Jajarmi¹, Dumitru Baleanu², Dumitru Baleanu³, K. Zarghami Vahid⁴, H. Mohammadi Pirouz¹, Jihad H. Asad - Show less +2 more•Institutions (4)

University of Bojnord¹, Çankaya University², China Medical University (Taiwan)³, Islamic Azad University⁴

01 Dec 2021-Results in physics

TL;DR: In this paper, a new and general fractional formulation is presented to investigate the complex behaviors of a capacitor microphone dynamical system, where the classical Euler-Lagrange equations are constructed by using the classical Lagrangian approach.

...read moreread less

Abstract: In this study, a new and general fractional formulation is presented to investigate the complex behaviors of a capacitor microphone dynamical system. Initially, for both displacement and electrical charge, the classical Euler–Lagrange equations are constructed by using the classical Lagrangian approach. Expanding this classical scheme in a general fractional framework provides the new fractional Euler–Lagrange equations in which non-integer order derivatives involve a general function as their kernel. Applying an appropriate matrix approximation technique changes the latter fractional formulation into a nonlinear algebraic system. Finally, the derived system is solved numerically with a discussion on its dynamical behaviors. According to the obtained results, various features of the capacitor microphone under study are discovered due to the flexibility in choosing the kernel, unlike the previous mathematical formalism.

...read moreread less

86 citations

Journal Article•DOI•

Hyperspectral Image Classification Using Mixed Convolutions and Covariance Pooling

[...]

Jianwei Zheng¹, Feng Yuchao¹, Cong Bai¹, Jinglin Zhang²•Institutions (2)

Zhejiang University of Technology¹, Nanjing University of Information Science and Technology²

01 Jan 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: This work proposes a mixed CNN with covariance pooling for HSI classification that starts with spectral-spatial 3-D convolutions that followed by a spatial 2-D Convolution, which achieves better accuracy than other state-of-the-art methods.

...read moreread less

Abstract: Recently, convolution neural network (CNN)-based hyperspectral image (HSI) classification has enjoyed high popularity due to its appealing performance. However, using 2-D or 3-D convolution in a standalone mode may be suboptimal in real applications. On the one hand, the 2-D convolution overlooks the spectral information in extracting feature maps. On the other hand, the 3-D convolution suffers from heavy computation in practice and seems to perform poorly in scenarios having analogous textures along with consecutive spectral bands. To solve these problems, we propose a mixed CNN with covariance pooling for HSI classification. Specifically, our network architecture starts with spectral–spatial 3-D convolutions that followed by a spatial 2-D convolution. Through this mixture operation, we fuse the feature maps generated by 3-D convolutions along the spectral bands for providing complementary information and reducing the dimension of channels. In addition, the covariance pooling technique is adopted to fully extract the second-order information from spectral–spatial feature maps. Motivated by the channel-wise attention mechanism, we further propose two principal component analysis (PCA)-involved strategies, channel-wise shift and channel-wise weighting, to highlight the importance of different spectral bands and recalibrate channel-wise feature response, which can effectively improve the classification accuracy and stability, especially in the case of limited sample size. To verify the effectiveness of the proposed model, we conduct classification experiments on three well-known HSI data sets, Indian Pines, University of Pavia, and Salinas Scene. The experimental results show that our proposal, although with less parameters, achieves better accuracy than other state-of-the-art methods.

...read moreread less

81 citations

Kernel Operations on GPU or CPU, with Autodiff, without Memory Overflows [R package rkeops version 1.4.2.2]

[...]

Benjamin Charlier, Jean Feydy, Joan Alexis Glaunès, Ghislain Durif

17 Feb 2021

77 citations

Journal Article•DOI•

DRONE: Dual-Domain Residual-based Optimization NEtwork for Sparse-View CT Reconstruction

[...]

Weiwen Wu¹, Dianlin Hu², Chuang Niu³, Hengyong Yu⁴, Varut Vardhanabhuti¹, Ge Wang³ - Show less +2 more•Institutions (4)

University of Hong Kong¹, Southeast University², Rensselaer Polytechnic Institute³, University of Massachusetts Lowell⁴

06 May 2021-IEEE Transactions on Medical Imaging

TL;DR: Wang et al. as mentioned in this paper proposed a dual-domain residual-based optimization (DRONE) network, which consists of three modules respectively for embedding, refinement, and awareness, and the results from the embedding and refinement modules in the data and image domains are regularized for optimized image quality in the awareness module.

...read moreread less

Abstract: Deep learning has attracted rapidly increasing attention in the field of tomographic image reconstruction, especially for CT, MRI, PET/SPECT, ultrasound and optical imaging. Among various topics, sparse-view CT remains a challenge which targets a decent image reconstruction from very few projections. To address this challenge, in this article we propose a Dual-domain Residual-based Optimization NEtwork (DRONE). DRONE consists of three modules respectively for embedding, refinement, and awareness. In the embedding module, a sparse sinogram is first extended. Then, sparse-view artifacts are effectively suppressed in the image domain. After that, the refinement module recovers image details in the residual data and image domains synergistically. Finally, the results from the embedding and refinement modules in the data and image domains are regularized for optimized image quality in the awareness module, which ensures the consistency between measurements and images with the kernel awareness of compressed sensing. The DRONE network is trained, validated, and tested on preclinical and clinical datasets, demonstrating its merits in edge preservation, feature recovery, and reconstruction accuracy.

...read moreread less

77 citations

Journal Article•DOI•

Hyperspectral Image Classification Using a Hybrid 3D-2D Convolutional Neural Networks

[...]

Saeed Ghaderizadeh¹, Dariush Abbasi-Moghadam¹, Alireza Sharifi², Na Zhao³, Aqil Tariq⁴ - Show less +1 more•Institutions (4)

Shahid Bahonar University of Kerman¹, Shahid Rajaee Teacher Training University², Chinese Academy of Sciences³, Wuhan University⁴

26 Jul 2021-IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

TL;DR: In this article, a 3D fast learning block (depthwise separable convolution and a fast convolution block) followed by a 2D convolutional neural network was introduced to extract spectral-spatial features.

...read moreread less

Abstract: Due to the unique feature of the three-dimensional convolution neural network, it is used in image classification. There are some problems such as noise, lack of labeled samples, the tendency to overfitting, a lack of extraction of spectral and spatial features, which has challenged the classification. Among the mentioned problems, the lack of experimental samples is the main problem that has been used to solve the methods in recent years. Among them, convolutional neural network-based algorithms have been proposed as a popular option for hyperspectral image analysis due to their ability to extract useful features and high performance. The traditional convolutional neural network (CNN) based methods mainly use the two-dimensional CNN for feature extraction, which makes the interband correlations of HSIs underutilized. The 3-D-CNN extracts the joint spectral–spatial information representation, but it depends on a more complex model. To address these issues, the report uses a 3-D fast learning block (depthwise separable convolution block and a fast convolution block) followed by a 2-D convolutional neural network was introduced to extract spectral-spatial features. Using a hybrid CNN reduces the complexity of the model compared to using 3-D-CNN alone and can also perform well against noise and a limited number of training samples. In addition, a series of optimization methods including batch normalization, dropout, exponential decay learning rate, and L2 regularization are adopted to alleviate the problem of overfitting and improve the classification results. To test the performance of this hybrid method, it is performed on the Salinas, University Pavia and Indian Pines datasets, and the results are compared with 2-D-CNN and 3-D-CNN deep learning models with the same number of layers.

...read moreread less

Journal Article•DOI•

Deformable Kernel Networks for Joint Image Filtering

[...]

Beom Jun Kim¹, Jean Ponce², Bumsub Ham¹•Institutions (2)

Yonsei University¹, French Institute for Research in Computer Science and Automation²

01 Feb 2021-International Journal of Computer Vision

TL;DR: This paper proposes a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel, and shows that the weighted averaging process with sparsely sampled 3 × 3 kernels outperforms the state of the art by a significant margin in all cases.

...read moreread less

Abstract: Joint image filters are used to transfer structural details from a guidance picture used as a prior to a target image, in tasks such as enhancing spatial resolution and suppressing noise. Previous methods based on convolutional neural networks (CNNs) combine nonlinear activations of spatially-invariant kernels to estimate structural details and regress the filtering result. In this paper, we instead learn explicitly sparse and spatially-variant kernels. We propose a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel. The filtering result is then computed as a weighted average. We also propose a fast version of DKN that runs about seventeen times faster for an image of size $$640 \times 480$$ . We demonstrate the effectiveness and flexibility of our models on the tasks of depth map upsampling, saliency map upsampling, cross-modality image restoration, texture removal, and semantic segmentation. In particular, we show that the weighted averaging process with sparsely sampled $$3 \times 3$$ kernels outperforms the state of the art by a significant margin in all cases.

...read moreread less

Journal Article•DOI•

Multiscale Residual Network With Mixed Depthwise Convolution for Hyperspectral Image Classification

[...]

Hongmin Gao¹, Yang Yao¹, Chenming Li¹, Lianru Gao², Bing Zhang² - Show less +1 more•Institutions (2)

Hohai University¹, Chinese Academy of Sciences²

01 Apr 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A novel multiscale residual network (MSRN) is proposed for HSI classification and experimental results demonstrate the superiority of the proposed MSRN method over several state-of-the-art methods.

...read moreread less

Abstract: Convolutional neural networks (CNNs) are becoming increasingly popular in modern remote sensing image processing tasks and exhibit outstanding capability for hyperspectral image (HSI) classification. However, for the existing CNN-based HSI-classification methods, most of them only consider single-scale feature extraction, which may neglect some important fine information and cannot guarantee to capture optimal spatial features. Moreover, many state-of-the-art methods have a huge number of network parameters needed to be tuned, which will cause high computational cost. To address the aforementioned two issues, a novel multiscale residual network (MSRN) is proposed for HSI classification. Specifically, the proposed MSRN introduces depthwise separable convolution (DSC) and replaces the ordinary depthwise convolution in DSC with mixed depthwise convolution (MDConv), which mixes up multiple kernel sizes in a single depthwise convolution operation. The DSC with mixed depthwise convolution (MDSConv) can not only explore features at different scales from each feature map but also greatly reduce learnable parameters in the network. In addition, a multiscale residual block (MRB) is designed by replacing the convolutional layer in an ordinary residual block with the MDSConv layer. The MRB is used as the major unit of the proposed MSRN. Furthermore, to enhance further the feature representation ability, the proposed network adds a high-level shortcut connection (HSC) on the cascaded two MRBs to aggregate lower level features and higher level features. Experimental results on three benchmark HSIs demonstrate the superiority of the proposed MSRN method over several state-of-the-art methods.

...read moreread less

Journal Article•DOI•

Simultaneous Fidelity and Regularization Learning for Image Restoration

[...]

Dongwei Ren¹, Wangmeng Zuo², David Zhang³, Lei Zhang⁴, Ming-Hsuan Yang⁵ - Show less +1 more•Institutions (5)

Tianjin University¹, Harbin Institute of Technology², The Chinese University of Hong Kong³, Hong Kong Polytechnic University⁴, University of California, Merced⁵

01 Jan 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A principled algorithm within the maximum a posterior framework to tackle image restoration with a partially known or inaccurate degradation model is proposed, which demonstrates the effectiveness of the proposed model for image deconvolution with inaccurate blur kernels, deconVolution with multiple degradations and rain streak removal.

...read moreread less

Abstract: Most existing non-blind restoration methods are based on the assumption that a precise degradation model is known. As the degradation process can only be partially known or inaccurately modeled, images may not be well restored. Rain streak removal and image deconvolution with inaccurate blur kernels are two representative examples of such tasks. For rain streak removal, although an input image can be decomposed into a scene layer and a rain streak layer, there exists no explicit formulation for modeling rain streaks and the composition with scene layer. For blind deconvolution, as estimation error of blur kernel is usually introduced, the subsequent non-blind deconvolution process does not restore the latent image well. In this paper, we propose a principled algorithm within the maximum a posterior framework to tackle image restoration with a partially known or inaccurate degradation model. Specifically, the residual caused by a partially known or inaccurate degradation model is spatially dependent and complexly distributed. With a training set of degraded and ground-truth image pairs, we parameterize and learn the fidelity term for a degradation model in a task-driven manner. Furthermore, the regularization term can also be learned along with the fidelity term, thereby forming a simultaneous fidelity and regularization learning model. Extensive experimental results demonstrate the effectiveness of the proposed model for image deconvolution with inaccurate blur kernels, deconvolution with multiple degradations and rain streak removal.

...read moreread less

Journal Article•DOI•

Convolutional Neural Network Based on Bandwise-Independent Convolution and Hard Thresholding for Hyperspectral Band Selection

[...]

Jie Feng¹, Jiantong Chen¹, Qigong Sun¹, Ronghua Shang¹, Xianghai Cao¹, Xiangrong Zhang¹, Licheng Jiao¹ - Show less +3 more•Institutions (1)

Xidian University¹

01 Sep 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A novel convolutional neural network (CNN) based on bandwise-independent convolution and hard thresholding (BHCNN) is proposed, which combines band selection, feature extraction, and classification into an end-to-end trainable network.

...read moreread less

Abstract: Band selection has been widely utilized in hyperspectral image (HSI) classification to reduce the dimensionality of HSIs Recently, deep-learning-based band selection has become of great interest However, existing deep-learning-based methods usually implement band selection and classification in isolation, or evaluate selected spectral bands by training the deep network repeatedly, which may lead to the loss of discriminative bands and increased computational cost In this article, a novel convolutional neural network (CNN) based on bandwise-independent convolution and hard thresholding (BHCNN) is proposed, which combines band selection, feature extraction, and classification into an end-to-end trainable network In BHCNN, a band selection layer is constructed by designing bandwise $1\times 1$ convolutions, which perform for each spectral band of input HSIs independently Then, hard thresholding is utilized to constrain the weights of convolution kernels with unselected spectral bands to zero In this case, these weights are difficult to update To optimize these weights, the straight-through estimator (STE) is devised by approximating the gradient Furthermore, a novel coarse-to-fine loss calculated by full and selected spectral bands is defined to improve the interpretability of STE In the subsequent layers of BHCNN, multiscale 3-D dilated convolutions are constructed to extract joint spatial–spectral features from HSIs with selected spectral bands The experimental results on several HSI datasets demonstrate that the proposed method uses selected spectral bands to achieve more encouraging classification performance than current state-of-the-art band selection methods

...read moreread less

Journal Article•DOI•

Spectral–Spatial Weighted Kernel Manifold Embedded Distribution Alignment for Remote Sensing Image Classification

[...]

Yanni Dong¹, Tianyang Liang¹, Yuxiang Zhang¹, Bo Du²•Institutions (2)

China University of Geosciences (Wuhan)¹, Wuhan University²

18 May 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: The proposed SSWK-MEDA provides a novel approach for the combination of transfer learning and remote sensing image characteristics and utilizes the geometric structure of features in manifold space to solve the problem of feature distortions of remote sensing data in transfer learning scenarios.

...read moreread less

Abstract: Feature distortions of data are a typical problem in remote sensing image classification, especially in the area of transfer learning. In addition, many transfer learning-based methods only focus on spectral information and fail to utilize spatial information of remote sensing images. To tackle these problems, we propose spectral–spatial weighted kernel manifold embedded distribution alignment (SSWK-MEDA) for remote sensing image classification. The proposed method applies a novel spatial information filter to effectively use similarity between nearby sample pixels and avoid the influence of nonsample pixels. Then, a complex kernel combining spatial kernel and spectral kernel with different weights is constructed to adaptively balance the relative importance of spectral and spatial information of the remote sensing image. Finally, we utilize the geometric structure of features in manifold space to solve the problem of feature distortions of remote sensing data in transfer learning scenarios. SSWK-MEDA provides a novel approach for the combination of transfer learning and remote sensing image characteristics. Extensive experiments have demonstrated that the proposed method is more effective than several state-of-the-art methods.

...read moreread less

Journal Article•DOI•

CKFO: Convolution Kernel First Operated Algorithm With Applications in Memristor-Based Convolutional Neural Network

[...]

Shiping Wen¹, Jiadong Chen², Yingcheng Wu³, Zheng Yan¹, Yuting Cao⁴, Yin Yang⁵, Tingwen Huang⁶ - Show less +3 more•Institutions (6)

Australian Artificial Intelligence Institute¹, University of Electronic Science and Technology of China², University of Southern California³, Hunan University⁴, Khalifa University⁵, Texas A&M University at Qatar⁶

01 Aug 2021-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A simulated memristor implementation of a convolutional neural network (CNN) using the method of ex-situ training to train CNN in Tensorflow and then downloading the trained parameters to the Simulink system by compiling the conductance value of Memristor to test the proposed simulation model.

...read moreread less

Abstract: This article presents a new convolution algorithm: convolution kernel first operated (CKFO), which can solve the problem that the actual calculation is not reduced after pruning the weight of the convolution neural network. According to the convolution algorithm, this article proposes a simulated memristor implementation of a convolutional neural network (CNN). After that, we use the method of ex-situ training to train CNN in Tensorflow and then download the trained parameters to the Simulink system by compiling the conductance value of memristor to test the proposed simulation model. Finally, the effectiveness of the proposed model is verified. In addition, we prune the weights of CNN and retrain it, then adjust the simulation model according to the parameters after being pruned. We are surprised to find that the convolution layer designed according to the new convolution algorithm can apply the results of the pruned weight without any modification to the circuit, which is very cumbersome in other memristor-based CNN because the distribution of the pruned weight is irregular. The parameters are reduced by 75.24% and the number of multiplication operations in the convolution layer was reduced by 30.1%, while the accuracy is just reduced by 0.06%.

...read moreread less

Journal Article•DOI•

Deep Neural Networks for Sensor-Based Human Activity Recognition Using Selective Kernel Convolution

[...]

Wenbin Gao¹, Lei Zhang¹, Wenbo Huang¹, Fuhong Min¹, Jun He², Aiguo Song³ - Show less +2 more•Institutions (3)

Nanjing Normal University¹, Nanjing University of Information Science and Technology², Southeast University³

05 Aug 2021-IEEE Transactions on Instrumentation and Measurement

TL;DR: In this paper, a new multibranch CNN is introduced, which utilizes a selective kernel mechanism for HAR, and it is for the first time to adopt an attention idea to perform kernel selection among multiple branches with different receptive fields in the HAR scenario.

...read moreread less

Abstract: Recently, the state-of-the-art performance in various sensor-based human activity recognition (HAR) tasks has been acquired by deep learning, which can extract automatically features from raw data. In standard convolutional neural networks (CNNs), there is usually the same receptive field (RF) size of artificial neurons within each feature layer. It is well known that the RF size of neurons is able to change adaptively according to the stimulus, which has rarely been exploited in HAR. In this article, a new multibranch CNN is introduced, which utilizes a selective kernel mechanism for HAR. To the best of our knowledge, it is for the first time to adopt an attention idea to perform kernel selection among multiple branches with different RFs in the HAR scenario. We perform extensive experiments on several benchmark HAR datasets, namely, UCI-HAR, UNIMIB SHAR, WISDM, PAMAP2, and OPPORTUNITY, as well as weakly labeled datasets. Ablation experiments show that the selective kernel convolution can adaptively choose an appropriate RF size among multiple branches for classifying numerous human activities. As a result, it can achieve a higher recognition accuracy under a similar computing budget.

...read moreread less

Journal Article•DOI•

HDCB-Net: A Neural Network With the Hybrid Dilated Convolution for Pixel-Level Crack Detection on Concrete Bridges

[...]

Wenbo Jiang¹, Min Liu¹, Yunuo Peng¹, Lehui Wu¹, Yaonan Wang¹ - Show less +1 more•Institutions (1)

Hunan University¹

01 Aug 2021-IEEE Transactions on Industrial Informatics

TL;DR: The experimental results demonstrate that the proposed HDCB-Net is genetic and able to improve the detection accuracy of blurred cracks, and the two-stage strategy is efficient for fast crack detection.

...read moreread less

Abstract: Crack detection on concrete bridges is a critical task to ensure bridge safety. However, many cracks on concrete bridges show low contrast and blurry edges in practice, which brings challenges to image-based crack detection. In this article, to improve the detection accuracy of blurred cracks, we propose the HDCB-Net—a deep learning-based network with the hybrid dilated convolutional block (HDCB) for the pixel-level crack detection. Specifically, HDCB is employed to expand the receptive field of the convolution kernel without increasing the computational complexity and to avoid the gridding effect generated by the dilated convolution. Meanwhile, to achieve a reasonable efficiency/accuracy tradeoff, the HDCB-Net only contains a few downsampling stages, which can avoid the loss of blurred crack pixels due to excessive downsampling. Furthermore, a two-stage strategy is proposed to realize the fast crack detection in a massive number of images (more than 100 000) with the high resolution (5120 × 5120 pixels). At the first stage, YOLOv4 is employed to filter out images without cracks and generate coarse region proposals. At the second stage, to achieve refined damage analysis, the HDCB-Net is used to detect pixel-level cracks from the coarse region proposals. The experimental results demonstrate that the proposed HDCB-Net is genetic and able to improve the detection accuracy of blurred cracks, and our two-stage strategy is efficient for fast crack detection. The whole detection process takes only 0.64 s to handle a single image. Additionally, we have established a public dataset, including 150 632 high-resolution images, dedicated to the research of crack detection, which have been released along with this article.

...read moreread less

Journal Article•DOI•

Sharp U-Net: Depthwise convolutional network for biomedical image segmentation.

[...]

Hasib Zunair¹, A. Ben Hamza¹•Institutions (1)

Concordia University¹

29 Jul 2021-Computers in Biology and Medicine

TL;DR: The Sharp U-Net as discussed by the authors proposes a depthwise convolution of the encoder feature map with a sharpening kernel filter, which produces a sharpened intermediate feature map of the same size as the original encoder map.

...read moreread less

Journal Article•DOI•

Surface-Aware Blind Image Deblurring

[...]

Jun Liu¹, Ming Yan², Tieyong Zeng³•Institutions (3)

Northeast Normal University¹, Michigan State University², The Chinese University of Hong Kong³

01 Mar 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This method is built on the surface-aware strategy arising from the intrinsic geometrical consideration and facilitates the blur kernel estimation due to the preserved sharp edges in the intermediate latent image.

...read moreread less

Abstract: Blind image deblurring is a conundrum because there are infinitely many pairs of latent image and blur kernel. To get a stable and reasonable deblurred image, proper prior knowledge of the latent image and the blur kernel is urgently required. Different from the recent works on the statistical observations of the difference between the blurred image and the clean one, our method is built on the surface-aware strategy arising from the intrinsic geometrical consideration. This approach facilitates the blur kernel estimation due to the preserved sharp edges in the intermediate latent image. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods on deblurring the text and natural images. Moreover, our method can achieve attractive results in some challenging cases, such as low-illumination images with large saturated regions and impulse noise. A direct extension of our method to the non-uniform deblurring problem also validates the effectiveness of the surface-aware prior.

...read moreread less

Journal Article•DOI•

Accurate Retinal Vessel Segmentation in Color Fundus Images via Fully Attention-Based Networks

[...]

Kaiqi Li¹, Xingqun Qi¹, Yiwen Luo¹, Zeyi Yao¹, Xiaoguang Zhou¹, Muyi Sun¹ - Show less +2 more•Institutions (1)

Beijing University of Posts and Telecommunications¹

01 Jun 2021-IEEE Journal of Biomedical and Health Informatics

TL;DR: A novel Fully Attention-based Network (FANet) based on attention mechanisms to adaptively learn rich feature representation and aggregate the multi-scale information and demonstrates that the proposed model can effectively identify irregular, noisy, and multi- scale retinal vessels.

...read moreread less

Abstract: Automatic retinal vessel segmentation is important for the diagnosis and prevention of ophthalmic diseases. The existing deep learning retinal vessel segmentation models always treat each pixel equally. However, the multi-scale vessel structure is a vital factor affecting the segmentation results, especially in thin vessels. To address this crucial gap, we propose a novel Fully Attention-based Network (FANet) based on attention mechanisms to adaptively learn rich feature representation and aggregate the multi-scale information. Specifically, the framework consists of the image pre-processing procedure and the semantic segmentation networks. Green channel extraction (GE) and contrast limited adaptive histogram equalization (CLAHE) are employed as pre-processing to enhance the texture and contrast of retinal blood images. Besides, the network combines two types of attention modules with the U-Net. We propose a lightweight dual-direction attention block to model global dependencies and reduce intra-class inconsistencies, in which the weights of feature maps are updated based on the semantic correlation between pixels. The dual-direction attention block utilizes horizontal and vertical pooling operations to produce the attention map. In this way, the network aggregates global contextual information from semantic-closer regions or a series of pixels belonging to the same object category. Meanwhile, we adopt the selective kernel (SK) unit to replace the standard convolution for obtaining multi-scale features of different receptive field sizes generated by soft attention. Furthermore, we demonstrate that the proposed model can effectively identify irregular, noisy, and multi-scale retinal vessels. The abundant experiments on DRIVE, STARE, and CHASE_DB1 datasets show that our method achieves state-of-the-art performance.

...read moreread less

Journal Article•DOI•

Multiview Subspace Clustering via Co-Training Robust Data Representation.

[...]

Jiyuan Liu¹, Xinwang Liu¹, Yuexiang Yang¹, Xifeng Guo¹, Marius Kloft², Liangzhong He³ - Show less +2 more•Institutions (3)

National University of Defense Technology¹, Kaiserslautern University of Technology², China Mobile³

09 Apr 2021-IEEE Transactions on Neural Networks

TL;DR: Wang et al. as discussed by the authors proposed a multiview subspace clustering (MSC) algorithm that groups samples and removes data redundancy concurrently, which is employed to obtain the robust data representation of low redundancy for later clustering.

...read moreread less

Abstract: Taking the assumption that data samples are able to be reconstructed with the dictionary formed by themselves, recent multiview subspace clustering (MSC) algorithms aim to find a consensus reconstruction matrix via exploring complementary information across multiple views. Most of them directly operate on the original data observations without preprocessing, while others operate on the corresponding kernel matrices. However, they both ignore that the collected features may be designed arbitrarily and hard guaranteed to be independent and nonoverlapping. As a result, original data observations and kernel matrices would contain a large number of redundant details. To address this issue, we propose an MSC algorithm that groups samples and removes data redundancy concurrently. In specific, eigendecomposition is employed to obtain the robust data representation of low redundancy for later clustering. By utilizing the two processes into a unified model, clustering results will guide eigendecomposition to generate more discriminative data representation, which, as feedback, helps obtain better clustering results. In addition, an alternate and convergent algorithm is designed to solve the optimization problem. Extensive experiments are conducted on eight benchmarks, and the proposed algorithm outperforms comparative ones in recent literature by a large margin, verifying its superiority. At the same time, its effectiveness, computational efficiency, and robustness to noise are validated experimentally.

...read moreread less

Journal Article•DOI•

Rain Streaks Removal for Single Image via Kernel-Guided Convolutional Neural Network

[...]

Ye-Tao Wang¹, Xi-Le Zhao¹, Tai-Xiang Jiang², Liang-Jian Deng¹, Yi Chang³, Ting-Zhu Huang¹ - Show less +2 more•Institutions (3)

University of Electronic Science and Technology of China¹, Southwestern University of Finance and Economics², Huazhong University of Science and Technology³

01 Aug 2021-IEEE Transactions on Neural Networks

TL;DR: This article proposes a novel rain streaks removal framework using a kernel-guided convolutional neural network (KGCNN), achieving state-of-the-art performance with a simple network architecture.

...read moreread less

Abstract: Recently emerged deep learning methods have achieved great success in single image rain streaks removal. However, existing methods ignore an essential factor in the rain streaks generation mechanism, i.e., the motion blur leading to the line pattern appearances. Thus, they generally produce overderaining or underderaining results. In this article, inspired by the generation mechanism, we propose a novel rain streaks removal framework using a kernel-guided convolutional neural network (KGCNN), achieving state-of-the-art performance with a simple network architecture. More precisely, our framework consists of three steps. First, we learn the motion blur kernel by a plain neural network, termed parameter network, from the detail layer of a rainy patch. Then, we stretch the learned motion blur kernel into a degradation map with the same spatial size as the rainy patch. Finally, we use the stretched degradation map together with the detail patches to train a deraining network with a typical ResNet architecture, which produces the rain streaks with the guidance of the learned motion blur kernel. Experiments conducted on extensive synthetic and real data demonstrate the effectiveness of the proposed KGCNN, in terms of rain streaks removal and image detail preservation.

...read moreread less

Journal Article•DOI•

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

[...]

Wenhai Wang¹, Enze Xie², Xiang Li³, Xuebo Liu, Ding Liang, Yang Zhibo⁴, Tong Lu¹, Chunhua Shen⁵ - Show less +4 more•Institutions (5)

Nanjing University¹, University of Hong Kong², Nanjing University of Science and Technology³, Tsinghua University⁴, Monash University⁵

04 May 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Zhang et al. as discussed by the authors proposed an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes.

...read moreread less

Abstract: Scene text detection and recognition have been well explored in the past few years. Despite the progress, efficient and accurate end-to-end spotting of arbitrarily-shaped text remains challenging. In this work, we propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes. PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels. By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text. Moreover, as a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications. Taking the advantages of the kernel representation, we design a series of components as follows: 1) a computationally efficient feature enhancement network composed of stacked Feature Pyramid Enhancement Modules (FPEMs); 2) a lightweight detection head cooperating with Pixel Aggregation (PA); and 3) an efficient attention-based recognition head with Masked RoI. Benefiting from the above designs, our method achieves high inference speed while maintaining competitive accuracy. Extensive experiments show the superiority of our method.

...read moreread less

Journal Article•DOI•

ShipDeNet-20: An Only 20 Convolution Layers and <1-MB Lightweight SAR Ship Detector

[...]

Tianwen Zhang¹, Xiaoling Zhang¹•Institutions (1)

University of Electronic Science and Technology of China¹

01 Jul 2021-IEEE Geoscience and Remote Sensing Letters

TL;DR: Experimental results on the open SAR ship detection data set (SSDD) reveal that the accuracy and speed of ShipDeNet-20 are both superior to the other nine state-of-the-art object detectors.

...read moreread less

Abstract: Existing most deep learning-based synthetic aperture radar (SAR) ship detectors have huge network scale and big model size. Thus, to solve these defects, we propose a lightweight SAR ship detector “ShipDeNet-20” with 20 convolution layers and < 1 MB (0.82 MB) model size. We use fewer layers and kernels, and depthwise separable convolution (DS-Conv) to ensure ShipDeNet-20’s lightweight attribute. Moreover, we also propose a feature fusion module (FF-Module), a feature enhance module (FE-Module), and a scale share feature pyramid module (SSFP-Module) to compensate for the raw ShipDeNet-20’s accuracy loss. Experimental results on the open SAR ship detection data set (SSDD) reveal that the accuracy and speed of ShipDeNet-20 are both superior to the other nine state-of-the-art object detectors. Finally, detection results on another two wide-region SAR images show ShipDeNet-20’s strong migration ability. ShipDeNet-20 is a novel SAR ship detector, built from scratch, lighter than others by tens even hundreds of times, helpful for real-time SAR application and future hardware transplantation.

...read moreread less

Journal Article•DOI•

Image Inpainting by End-to-End Cascaded Refinement With Mask Awareness

[...]

Manyu Zhu¹, Dongliang He¹, Li Xin¹, Chao Li¹, Fu Li¹, Xiao Liu¹, Errui Ding¹, Zhaoxiang Zhang² - Show less +4 more•Institutions (2)

Baidu¹, Chinese Academy of Sciences²

04 May 2021-IEEE Transactions on Image Processing

TL;DR: Wang et al. as mentioned in this paper proposed Mask-Aware Dynamic Filtering (MADF) module to effectively learn multi-scale features for missing regions in the encoding phase, where filters for each convolution window are generated from features of the corresponding region of the mask.

...read moreread less

Abstract: Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial. Though U-shaped encoder-decoder frameworks have been witnessed to be successful, most of them share a common drawback of mask unawareness in feature extraction because all convolution windows (or regions), including those with various shapes of missing pixels, are treated equally and filtered with fixed learned kernels. To this end, we propose our novel mask-aware inpainting solution. Firstly, a Mask-Aware Dynamic Filtering (MADF) module is designed to effectively learn multi-scale features for missing regions in the encoding phase. Specifically, filters for each convolution window are generated from features of the corresponding region of the mask. The second fold of mask awareness is achieved by adopting Point-wise Normalization (PN) in our decoding phase, considering that statistical natures of features at masked points differentiate from those of unmasked points. The proposed PN can tackle this issue by dynamically assigning point-wise scaling factor and bias. Lastly, our model is designed to be an end-to-end cascaded refinement one. Supervision information such as reconstruction loss, perceptual loss and total variation loss is incrementally leveraged to boost the inpainting results from coarse to fine. Effectiveness of the proposed framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets including Places2, CelebA and Paris StreetView.

...read moreread less

Journal Article•DOI•

Label constrained convolutional factor analysis for classification with limited training samples

[...]

Jian Chen¹, Lan Du¹, Yuchen Guo¹•Institutions (1)

Xidian University¹

12 Jan 2021-Information Sciences

TL;DR: A label constrained convolutional factor analysis (LCCFA) model, which unifies the factor analysis, convolution operation and supervised learning, is developed, which outperforms other related models in terms of small sample classification.

...read moreread less

Journal Article•DOI•

Intelligent Rolling Bearing Fault Diagnosis via Vision ConvNet

[...]

Yinjun Wang¹, Xiaoxi Ding¹, Qiang Zeng¹, Liming Wang¹, Yimin Shao¹ - Show less +1 more•Institutions (1)

Chongqing University¹

01 Mar 2021-IEEE Sensors Journal

TL;DR: A one-dimensional vision ConvNet (VCN), where the architecture is composed of multilayer small kernel network and single-layer large kernel network side by side, which improves the recognition accuracy with a better stable training process for rolling bearing fault classification.

...read moreread less

Abstract: Feature extraction from a time sequence signal without manual information is an important part for bearing intelligent diagnosis. With the merits of signal information and feature structure information excavation, Deep ConvNet is widely used in bearing fault diagnosis and analysis under complex working conditions. However, due to the complexity of the bearing operating environment in the actual operation process, the sensitive features show different scale distribution characteristics. Meanwhile, it is known that the convolution kernel of ConvNet is usually small, which mainly focuses on the small-scale details of state distribution characteristics while ignores the identification of the overall trend of characteristic distribution. Considering that the size of convolution kernel can sense information hidden in different scales, this paper designed a one-dimensional vision ConvNet (VCN), where the architecture is composed of multilayer small kernel network and single-layer large kernel network side by side. The multi-kernel structure improves the ability of network to detect fault characteristic frequency band. By analyzing the artificially generated data and experimental data, the setting method of large convolution kernel and stride is discussed. Compared with the traditional CNN, wide first-layer kernels (WDCNN) and multiscale kernel-based ResCNN (MK-ResCNN), this network improves the recognition accuracy with a better stable training process for rolling bearing fault classification.

...read moreread less

Proceedings Article•DOI•

To the Point: Efficient 3D Object Detection in the Range Image with Graph Convolution Kernels

[...]

Yuning Chai, Pei Sun, Jiquan Ngiam¹, Weiyue Wang, Benjamin Caine¹, Vijay K. Vasudevan¹, Xiao Zhang, Dragomir Anguelov - Show less +4 more•Institutions (1)

Google¹

20 Jun 2021

TL;DR: In this paper, a 2D convolutional network architecture that carries the 3D spherical coordinates of each pixel throughout the network is proposed to learn a 3D representation directly from this range image view.

...read moreread less

Abstract: 3D object detection is vital for many robotics applications. For tasks where a 2D perspective range image exists, we propose to learn a 3D representation directly from this range image view. To this end, we designed a 2D convolutional network architecture that carries the 3D spherical coordinates of each pixel throughout the network. Its layers can consume any arbitrary convolution kernel in place of the default inner product kernel and exploit the underlying local geometry around each pixel. We outline four such kernels: a dense kernel according to the bag-of-words paradigm, and three graph kernels inspired by recent graph neural network advances: the Transformer, the PointNet, and the Edge Convolution. We also explore cross-modality fusion with the camera image, facilitated by operating in the perspective range image view. Our method performs competitively on the Waymo Open Dataset and improves the state-of-the-art AP for pedestrian detection from 69.7% to 75.5%. It is also efficient in that our smallest model, which still outperforms the popular PointPillars in quality, requires 180 times fewer FLOPS and model parameters.

...read moreread less

Collapse