scispace - formally typeset
Search or ask a question

Showing papers on "Kernel (image processing) published in 2021"


Proceedings ArticleDOI
01 Jun 2021
TL;DR: PAConv as mentioned in this paper constructs the convolution kernel by dynamically assembling basic weight matrices stored in Weight Bank, where the coefficients of these weights are self-adaptively learned from point positions through ScoreNet.
Abstract: We introduce Position Adaptive Convolution (PAConv), a generic convolution operation for 3D point cloud processing. The key of PAConv is to construct the convolution kernel by dynamically assembling basic weight matrices stored in Weight Bank, where the coefficients of these weight matrices are self-adaptively learned from point positions through ScoreNet. In this way, the kernel is built in a data-driven manner, endowing PAConv with more flexibility than 2D convolutions to better handle the irregular and unordered point cloud data. Besides, the complexity of the learning process is reduced by combining weight matrices instead of brutally predicting kernels from point positions.Furthermore, different from the existing point convolution operators whose network architectures are often heavily engineered, we integrate our PAConv into classical MLP-based point cloud pipelines without changing network configurations. Even built on simple networks, our method still approaches or even surpasses the state-of-the-art models, and significantly improves baseline performance on both classification and segmentation tasks, yet with decent efficiency. Thorough ablation studies and visualizations are provided to understand PAConv. Code is released on https://github.com/CVMI-Lab/PAConv.

229 citations


Journal ArticleDOI
TL;DR: This article presents an attention-based adaptive spectral-spatial kernel improved residual network (A²S²K-ResNet) with spectral attention to capture discriminative spectral- Spatial features for HSI classification in an end-to-end training fashion.
Abstract: Hyperspectral images (HSIs) provide rich spectral–spatial information with stacked hundreds of contiguous narrowbands. Due to the existence of noise and band correlation, the selection of informative spectral–spatial kernel features poses a challenge. This is often addressed by using convolutional neural networks (CNNs) with receptive field (RF) having fixed sizes. However, these solutions cannot enable neurons to effectively adjust RF sizes and cross-channel dependencies when forward and backward propagations are used to optimize the network. In this article, we present an attention-based adaptive spectral–spatial kernel improved residual network ( A2S2K-ResNet ) with spectral attention to capture discriminative spectral–spatial features for HSI classification in an end-to-end training fashion. In particular, the proposed network learns selective 3-D convolutional kernels to jointly extract spectral–spatial features using improved 3-D ResBlocks and adopts an efficient feature recalibration (EFR) mechanism to boost the classification performance. Extensive experiments are performed on three well-known hyperspectral data sets, i.e., IP, KSC, and UP, and the proposed A2S2K-ResNet can provide better classification results in terms of overall accuracy (OA), average accuracy (AA), and Kappa compared with the existing methods investigated. The source code will be made available at https://github.com/suvojit- $0\times 55$ aa/A2S2K-ResNet.

185 citations


Journal ArticleDOI
TL;DR: Inspired by the guided image filtering, a novel guided network is designed to predict kernel weights from the guidance image and these predicted kernels are then applied to extract the depth image features.
Abstract: Dense depth perception is critical for autonomous driving and other robotics applications. However, modern LiDAR sensors only provide sparse depth measurement. It is thus necessary to complete the sparse LiDAR data, where a synchronized guidance RGB image is often used to facilitate this completion. Many neural networks have been designed for this task. However, they often naively fuse the LiDAR data and RGB image information by performing feature concatenation or element-wise addition. Inspired by the guided image filtering, we design a novel guided network to predict kernel weights from the guidance image. These predicted kernels are then applied to extract the depth image features. In this way, our network generates content-dependent and spatially-variant kernels for multi-modal feature fusion. Dynamically generated spatially-variant kernels could lead to prohibitive GPU memory consumption and computation overhead. We further design a convolution factorization to reduce computation and memory consumption. The GPU memory reduction makes it possible for feature fusion to work in multi-stage scheme. We conduct comprehensive experiments to verify our method on real-world outdoor, indoor and synthetic datasets. Our method produces strong results. It outperforms state-of-the-art methods on the NYUv2 dataset and ranks 1st on the KITTI depth completion benchmark at the time of submission. It also presents strong generalization capability under different 3D point densities, various lighting and weather conditions as well as cross-dataset evaluations. The code will be released for reproduction.

165 citations


Book
24 Aug 2021
TL;DR: In this article, the authors overview graph spectral techniques in graph signal processing (GSP) specifically for image/video processing, including image compression, image restoration, image filtering, and image segmentation.
Abstract: Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2-D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this paper, we overview recent graph spectral techniques in GSP specifically for image/video processing. The topics covered include image compression, image restoration, image filtering, and image segmentation.

126 citations


Journal ArticleDOI
TL;DR: Ziyan et al. as discussed by the authors proposed a spherical kernel for efficient graph convolution of 3D point clouds, which is applied to graph neural networks without edge-dependent filter generation, making it computationally attractive for large point clouds.
Abstract: We propose a spherical kernel for efficient graph convolution of 3D point clouds. Our metric-based kernels systematically quantize the local 3D space to identify distinctive geometric relationships in the data. Similar to the regular grid CNN kernels, the spherical kernel maintains translation-invariance and asymmetry properties, where the former guarantees weight sharing among similar local structures in the data and the latter facilitates fine geometric learning. The proposed kernel is applied to graph neural networks without edge-dependent filter generation, making it computationally attractive for large point clouds. In our graph networks, each vertex is associated with a single point location and edges connect the neighborhood points within a defined range. The graph gets coarsened in the network with farthest point sampling. Analogous to the standard CNNs, we define pooling and unpooling operations for our network. We demonstrate the effectiveness of the proposed spherical kernel with graph neural networks for point cloud classification and semantic segmentation using ModelNet, ShapeNet, RueMonge2014, ScanNet and S3DIS datasets. The source code and the trained models can be downloaded from https://github.com/hlei-ziyan/SPH3D-GCN .

91 citations


Journal ArticleDOI
TL;DR: The theoretical analysis shows that the error caused by Patch Mismatch can be decomposed into two components, one of which can be bounded by a reasonable assumption named local patch similarity, and the other part is lower than that using matrix completion.
Abstract: In this paper, we propose a novel nonlocal patch tensor-based visual data completion algorithm and analyze its potential problems. Our algorithm consists of two steps: the first step is initializing the image with triangulation-based linear interpolation and the second step is grouping similar nonlocal patches as a tensor then applying the proposed tensor completion technique. Specifically, with treating a group of patch matrices as a tensor, we impose the low-rank constraint on the tensor through the recently proposed tensor nuclear norm. Moreover, we observe that after the first interpolation step, the image gets blurred and, thus, the similar patches we have found may not exactly match the reference. We name the problem “Patch Mismatch,” and then in order to avoid the error caused by it, we further decompose the patch tensor into a low-rank tensor and a sparse tensor, which means the accepted horizontal strips in mismatched patches. Furthermore, our theoretical analysis shows that the error caused by Patch Mismatch can be decomposed into two components, one of which can be bounded by a reasonable assumption named local patch similarity, and the other part is lower than that using matrix completion. Extensive experimental results on real-world datasets verify our method’s superiority to the state-of-the-art tensor-based image inpainting methods.

86 citations


Journal ArticleDOI
TL;DR: In this paper, a new and general fractional formulation is presented to investigate the complex behaviors of a capacitor microphone dynamical system, where the classical Euler-Lagrange equations are constructed by using the classical Lagrangian approach.
Abstract: In this study, a new and general fractional formulation is presented to investigate the complex behaviors of a capacitor microphone dynamical system. Initially, for both displacement and electrical charge, the classical Euler–Lagrange equations are constructed by using the classical Lagrangian approach. Expanding this classical scheme in a general fractional framework provides the new fractional Euler–Lagrange equations in which non-integer order derivatives involve a general function as their kernel. Applying an appropriate matrix approximation technique changes the latter fractional formulation into a nonlinear algebraic system. Finally, the derived system is solved numerically with a discussion on its dynamical behaviors. According to the obtained results, various features of the capacitor microphone under study are discovered due to the flexibility in choosing the kernel, unlike the previous mathematical formalism.

86 citations


Journal ArticleDOI
TL;DR: This work proposes a mixed CNN with covariance pooling for HSI classification that starts with spectral-spatial 3-D convolutions that followed by a spatial 2-D Convolution, which achieves better accuracy than other state-of-the-art methods.
Abstract: Recently, convolution neural network (CNN)-based hyperspectral image (HSI) classification has enjoyed high popularity due to its appealing performance. However, using 2-D or 3-D convolution in a standalone mode may be suboptimal in real applications. On the one hand, the 2-D convolution overlooks the spectral information in extracting feature maps. On the other hand, the 3-D convolution suffers from heavy computation in practice and seems to perform poorly in scenarios having analogous textures along with consecutive spectral bands. To solve these problems, we propose a mixed CNN with covariance pooling for HSI classification. Specifically, our network architecture starts with spectral–spatial 3-D convolutions that followed by a spatial 2-D convolution. Through this mixture operation, we fuse the feature maps generated by 3-D convolutions along the spectral bands for providing complementary information and reducing the dimension of channels. In addition, the covariance pooling technique is adopted to fully extract the second-order information from spectral–spatial feature maps. Motivated by the channel-wise attention mechanism, we further propose two principal component analysis (PCA)-involved strategies, channel-wise shift and channel-wise weighting, to highlight the importance of different spectral bands and recalibrate channel-wise feature response, which can effectively improve the classification accuracy and stability, especially in the case of limited sample size. To verify the effectiveness of the proposed model, we conduct classification experiments on three well-known HSI data sets, Indian Pines, University of Pavia, and Salinas Scene. The experimental results show that our proposal, although with less parameters, achieves better accuracy than other state-of-the-art methods.

81 citations



Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a dual-domain residual-based optimization (DRONE) network, which consists of three modules respectively for embedding, refinement, and awareness, and the results from the embedding and refinement modules in the data and image domains are regularized for optimized image quality in the awareness module.
Abstract: Deep learning has attracted rapidly increasing attention in the field of tomographic image reconstruction, especially for CT, MRI, PET/SPECT, ultrasound and optical imaging. Among various topics, sparse-view CT remains a challenge which targets a decent image reconstruction from very few projections. To address this challenge, in this article we propose a Dual-domain Residual-based Optimization NEtwork (DRONE). DRONE consists of three modules respectively for embedding, refinement, and awareness. In the embedding module, a sparse sinogram is first extended. Then, sparse-view artifacts are effectively suppressed in the image domain. After that, the refinement module recovers image details in the residual data and image domains synergistically. Finally, the results from the embedding and refinement modules in the data and image domains are regularized for optimized image quality in the awareness module, which ensures the consistency between measurements and images with the kernel awareness of compressed sensing. The DRONE network is trained, validated, and tested on preclinical and clinical datasets, demonstrating its merits in edge preservation, feature recovery, and reconstruction accuracy.

77 citations


Journal ArticleDOI
TL;DR: In this article, a 3D fast learning block (depthwise separable convolution and a fast convolution block) followed by a 2D convolutional neural network was introduced to extract spectral-spatial features.
Abstract: Due to the unique feature of the three-dimensional convolution neural network, it is used in image classification. There are some problems such as noise, lack of labeled samples, the tendency to overfitting, a lack of extraction of spectral and spatial features, which has challenged the classification. Among the mentioned problems, the lack of experimental samples is the main problem that has been used to solve the methods in recent years. Among them, convolutional neural network-based algorithms have been proposed as a popular option for hyperspectral image analysis due to their ability to extract useful features and high performance. The traditional convolutional neural network (CNN) based methods mainly use the two-dimensional CNN for feature extraction, which makes the interband correlations of HSIs underutilized. The 3-D-CNN extracts the joint spectral–spatial information representation, but it depends on a more complex model. To address these issues, the report uses a 3-D fast learning block (depthwise separable convolution block and a fast convolution block) followed by a 2-D convolutional neural network was introduced to extract spectral-spatial features. Using a hybrid CNN reduces the complexity of the model compared to using 3-D-CNN alone and can also perform well against noise and a limited number of training samples. In addition, a series of optimization methods including batch normalization, dropout, exponential decay learning rate, and L2 regularization are adopted to alleviate the problem of overfitting and improve the classification results. To test the performance of this hybrid method, it is performed on the Salinas, University Pavia and Indian Pines datasets, and the results are compared with 2-D-CNN and 3-D-CNN deep learning models with the same number of layers.

Journal ArticleDOI
TL;DR: This paper proposes a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel, and shows that the weighted averaging process with sparsely sampled 3 × 3 kernels outperforms the state of the art by a significant margin in all cases.
Abstract: Joint image filters are used to transfer structural details from a guidance picture used as a prior to a target image, in tasks such as enhancing spatial resolution and suppressing noise. Previous methods based on convolutional neural networks (CNNs) combine nonlinear activations of spatially-invariant kernels to estimate structural details and regress the filtering result. In this paper, we instead learn explicitly sparse and spatially-variant kernels. We propose a CNN architecture and its efficient implementation, called the deformable kernel network (DKN), that outputs sets of neighbors and the corresponding weights adaptively for each pixel. The filtering result is then computed as a weighted average. We also propose a fast version of DKN that runs about seventeen times faster for an image of size $$640 \times 480$$ . We demonstrate the effectiveness and flexibility of our models on the tasks of depth map upsampling, saliency map upsampling, cross-modality image restoration, texture removal, and semantic segmentation. In particular, we show that the weighted averaging process with sparsely sampled $$3 \times 3$$ kernels outperforms the state of the art by a significant margin in all cases.

Journal ArticleDOI
TL;DR: A novel multiscale residual network (MSRN) is proposed for HSI classification and experimental results demonstrate the superiority of the proposed MSRN method over several state-of-the-art methods.
Abstract: Convolutional neural networks (CNNs) are becoming increasingly popular in modern remote sensing image processing tasks and exhibit outstanding capability for hyperspectral image (HSI) classification. However, for the existing CNN-based HSI-classification methods, most of them only consider single-scale feature extraction, which may neglect some important fine information and cannot guarantee to capture optimal spatial features. Moreover, many state-of-the-art methods have a huge number of network parameters needed to be tuned, which will cause high computational cost. To address the aforementioned two issues, a novel multiscale residual network (MSRN) is proposed for HSI classification. Specifically, the proposed MSRN introduces depthwise separable convolution (DSC) and replaces the ordinary depthwise convolution in DSC with mixed depthwise convolution (MDConv), which mixes up multiple kernel sizes in a single depthwise convolution operation. The DSC with mixed depthwise convolution (MDSConv) can not only explore features at different scales from each feature map but also greatly reduce learnable parameters in the network. In addition, a multiscale residual block (MRB) is designed by replacing the convolutional layer in an ordinary residual block with the MDSConv layer. The MRB is used as the major unit of the proposed MSRN. Furthermore, to enhance further the feature representation ability, the proposed network adds a high-level shortcut connection (HSC) on the cascaded two MRBs to aggregate lower level features and higher level features. Experimental results on three benchmark HSIs demonstrate the superiority of the proposed MSRN method over several state-of-the-art methods.

Journal ArticleDOI
TL;DR: A principled algorithm within the maximum a posterior framework to tackle image restoration with a partially known or inaccurate degradation model is proposed, which demonstrates the effectiveness of the proposed model for image deconvolution with inaccurate blur kernels, deconVolution with multiple degradations and rain streak removal.
Abstract: Most existing non-blind restoration methods are based on the assumption that a precise degradation model is known. As the degradation process can only be partially known or inaccurately modeled, images may not be well restored. Rain streak removal and image deconvolution with inaccurate blur kernels are two representative examples of such tasks. For rain streak removal, although an input image can be decomposed into a scene layer and a rain streak layer, there exists no explicit formulation for modeling rain streaks and the composition with scene layer. For blind deconvolution, as estimation error of blur kernel is usually introduced, the subsequent non-blind deconvolution process does not restore the latent image well. In this paper, we propose a principled algorithm within the maximum a posterior framework to tackle image restoration with a partially known or inaccurate degradation model. Specifically, the residual caused by a partially known or inaccurate degradation model is spatially dependent and complexly distributed. With a training set of degraded and ground-truth image pairs, we parameterize and learn the fidelity term for a degradation model in a task-driven manner. Furthermore, the regularization term can also be learned along with the fidelity term, thereby forming a simultaneous fidelity and regularization learning model. Extensive experimental results demonstrate the effectiveness of the proposed model for image deconvolution with inaccurate blur kernels, deconvolution with multiple degradations and rain streak removal.

Journal ArticleDOI
TL;DR: A novel convolutional neural network (CNN) based on bandwise-independent convolution and hard thresholding (BHCNN) is proposed, which combines band selection, feature extraction, and classification into an end-to-end trainable network.
Abstract: Band selection has been widely utilized in hyperspectral image (HSI) classification to reduce the dimensionality of HSIs Recently, deep-learning-based band selection has become of great interest However, existing deep-learning-based methods usually implement band selection and classification in isolation, or evaluate selected spectral bands by training the deep network repeatedly, which may lead to the loss of discriminative bands and increased computational cost In this article, a novel convolutional neural network (CNN) based on bandwise-independent convolution and hard thresholding (BHCNN) is proposed, which combines band selection, feature extraction, and classification into an end-to-end trainable network In BHCNN, a band selection layer is constructed by designing bandwise $1\times 1$ convolutions, which perform for each spectral band of input HSIs independently Then, hard thresholding is utilized to constrain the weights of convolution kernels with unselected spectral bands to zero In this case, these weights are difficult to update To optimize these weights, the straight-through estimator (STE) is devised by approximating the gradient Furthermore, a novel coarse-to-fine loss calculated by full and selected spectral bands is defined to improve the interpretability of STE In the subsequent layers of BHCNN, multiscale 3-D dilated convolutions are constructed to extract joint spatial–spectral features from HSIs with selected spectral bands The experimental results on several HSI datasets demonstrate that the proposed method uses selected spectral bands to achieve more encouraging classification performance than current state-of-the-art band selection methods

Journal ArticleDOI
TL;DR: The proposed SSWK-MEDA provides a novel approach for the combination of transfer learning and remote sensing image characteristics and utilizes the geometric structure of features in manifold space to solve the problem of feature distortions of remote sensing data in transfer learning scenarios.
Abstract: Feature distortions of data are a typical problem in remote sensing image classification, especially in the area of transfer learning. In addition, many transfer learning-based methods only focus on spectral information and fail to utilize spatial information of remote sensing images. To tackle these problems, we propose spectral–spatial weighted kernel manifold embedded distribution alignment (SSWK-MEDA) for remote sensing image classification. The proposed method applies a novel spatial information filter to effectively use similarity between nearby sample pixels and avoid the influence of nonsample pixels. Then, a complex kernel combining spatial kernel and spectral kernel with different weights is constructed to adaptively balance the relative importance of spectral and spatial information of the remote sensing image. Finally, we utilize the geometric structure of features in manifold space to solve the problem of feature distortions of remote sensing data in transfer learning scenarios. SSWK-MEDA provides a novel approach for the combination of transfer learning and remote sensing image characteristics. Extensive experiments have demonstrated that the proposed method is more effective than several state-of-the-art methods.

Journal ArticleDOI
TL;DR: A simulated memristor implementation of a convolutional neural network (CNN) using the method of ex-situ training to train CNN in Tensorflow and then downloading the trained parameters to the Simulink system by compiling the conductance value of Memristor to test the proposed simulation model.
Abstract: This article presents a new convolution algorithm: convolution kernel first operated (CKFO), which can solve the problem that the actual calculation is not reduced after pruning the weight of the convolution neural network. According to the convolution algorithm, this article proposes a simulated memristor implementation of a convolutional neural network (CNN). After that, we use the method of ex-situ training to train CNN in Tensorflow and then download the trained parameters to the Simulink system by compiling the conductance value of memristor to test the proposed simulation model. Finally, the effectiveness of the proposed model is verified. In addition, we prune the weights of CNN and retrain it, then adjust the simulation model according to the parameters after being pruned. We are surprised to find that the convolution layer designed according to the new convolution algorithm can apply the results of the pruned weight without any modification to the circuit, which is very cumbersome in other memristor-based CNN because the distribution of the pruned weight is irregular. The parameters are reduced by 75.24% and the number of multiplication operations in the convolution layer was reduced by 30.1%, while the accuracy is just reduced by 0.06%.

Journal ArticleDOI
TL;DR: In this paper, a new multibranch CNN is introduced, which utilizes a selective kernel mechanism for HAR, and it is for the first time to adopt an attention idea to perform kernel selection among multiple branches with different receptive fields in the HAR scenario.
Abstract: Recently, the state-of-the-art performance in various sensor-based human activity recognition (HAR) tasks has been acquired by deep learning, which can extract automatically features from raw data. In standard convolutional neural networks (CNNs), there is usually the same receptive field (RF) size of artificial neurons within each feature layer. It is well known that the RF size of neurons is able to change adaptively according to the stimulus, which has rarely been exploited in HAR. In this article, a new multibranch CNN is introduced, which utilizes a selective kernel mechanism for HAR. To the best of our knowledge, it is for the first time to adopt an attention idea to perform kernel selection among multiple branches with different RFs in the HAR scenario. We perform extensive experiments on several benchmark HAR datasets, namely, UCI-HAR, UNIMIB SHAR, WISDM, PAMAP2, and OPPORTUNITY, as well as weakly labeled datasets. Ablation experiments show that the selective kernel convolution can adaptively choose an appropriate RF size among multiple branches for classifying numerous human activities. As a result, it can achieve a higher recognition accuracy under a similar computing budget.

Journal ArticleDOI
Wenbo Jiang1, Min Liu1, Yunuo Peng1, Lehui Wu1, Yaonan Wang1 
TL;DR: The experimental results demonstrate that the proposed HDCB-Net is genetic and able to improve the detection accuracy of blurred cracks, and the two-stage strategy is efficient for fast crack detection.
Abstract: Crack detection on concrete bridges is a critical task to ensure bridge safety. However, many cracks on concrete bridges show low contrast and blurry edges in practice, which brings challenges to image-based crack detection. In this article, to improve the detection accuracy of blurred cracks, we propose the HDCB-Net—a deep learning-based network with the hybrid dilated convolutional block (HDCB) for the pixel-level crack detection. Specifically, HDCB is employed to expand the receptive field of the convolution kernel without increasing the computational complexity and to avoid the gridding effect generated by the dilated convolution. Meanwhile, to achieve a reasonable efficiency/accuracy tradeoff, the HDCB-Net only contains a few downsampling stages, which can avoid the loss of blurred crack pixels due to excessive downsampling. Furthermore, a two-stage strategy is proposed to realize the fast crack detection in a massive number of images (more than 100 000) with the high resolution (5120 × 5120 pixels). At the first stage, YOLOv4 is employed to filter out images without cracks and generate coarse region proposals. At the second stage, to achieve refined damage analysis, the HDCB-Net is used to detect pixel-level cracks from the coarse region proposals. The experimental results demonstrate that the proposed HDCB-Net is genetic and able to improve the detection accuracy of blurred cracks, and our two-stage strategy is efficient for fast crack detection. The whole detection process takes only 0.64 s to handle a single image. Additionally, we have established a public dataset, including 150 632 high-resolution images, dedicated to the research of crack detection, which have been released along with this article.

Journal ArticleDOI
TL;DR: The Sharp U-Net as discussed by the authors proposes a depthwise convolution of the encoder feature map with a sharpening kernel filter, which produces a sharpened intermediate feature map of the same size as the original encoder map.

Journal ArticleDOI
TL;DR: This method is built on the surface-aware strategy arising from the intrinsic geometrical consideration and facilitates the blur kernel estimation due to the preserved sharp edges in the intermediate latent image.
Abstract: Blind image deblurring is a conundrum because there are infinitely many pairs of latent image and blur kernel. To get a stable and reasonable deblurred image, proper prior knowledge of the latent image and the blur kernel is urgently required. Different from the recent works on the statistical observations of the difference between the blurred image and the clean one, our method is built on the surface-aware strategy arising from the intrinsic geometrical consideration. This approach facilitates the blur kernel estimation due to the preserved sharp edges in the intermediate latent image. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods on deblurring the text and natural images. Moreover, our method can achieve attractive results in some challenging cases, such as low-illumination images with large saturated regions and impulse noise. A direct extension of our method to the non-uniform deblurring problem also validates the effectiveness of the surface-aware prior.

Journal ArticleDOI
TL;DR: A novel Fully Attention-based Network (FANet) based on attention mechanisms to adaptively learn rich feature representation and aggregate the multi-scale information and demonstrates that the proposed model can effectively identify irregular, noisy, and multi- scale retinal vessels.
Abstract: Automatic retinal vessel segmentation is important for the diagnosis and prevention of ophthalmic diseases. The existing deep learning retinal vessel segmentation models always treat each pixel equally. However, the multi-scale vessel structure is a vital factor affecting the segmentation results, especially in thin vessels. To address this crucial gap, we propose a novel Fully Attention-based Network (FANet) based on attention mechanisms to adaptively learn rich feature representation and aggregate the multi-scale information. Specifically, the framework consists of the image pre-processing procedure and the semantic segmentation networks. Green channel extraction (GE) and contrast limited adaptive histogram equalization (CLAHE) are employed as pre-processing to enhance the texture and contrast of retinal blood images. Besides, the network combines two types of attention modules with the U-Net. We propose a lightweight dual-direction attention block to model global dependencies and reduce intra-class inconsistencies, in which the weights of feature maps are updated based on the semantic correlation between pixels. The dual-direction attention block utilizes horizontal and vertical pooling operations to produce the attention map. In this way, the network aggregates global contextual information from semantic-closer regions or a series of pixels belonging to the same object category. Meanwhile, we adopt the selective kernel (SK) unit to replace the standard convolution for obtaining multi-scale features of different receptive field sizes generated by soft attention. Furthermore, we demonstrate that the proposed model can effectively identify irregular, noisy, and multi-scale retinal vessels. The abundant experiments on DRIVE, STARE, and CHASE_DB1 datasets show that our method achieves state-of-the-art performance.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a multiview subspace clustering (MSC) algorithm that groups samples and removes data redundancy concurrently, which is employed to obtain the robust data representation of low redundancy for later clustering.
Abstract: Taking the assumption that data samples are able to be reconstructed with the dictionary formed by themselves, recent multiview subspace clustering (MSC) algorithms aim to find a consensus reconstruction matrix via exploring complementary information across multiple views. Most of them directly operate on the original data observations without preprocessing, while others operate on the corresponding kernel matrices. However, they both ignore that the collected features may be designed arbitrarily and hard guaranteed to be independent and nonoverlapping. As a result, original data observations and kernel matrices would contain a large number of redundant details. To address this issue, we propose an MSC algorithm that groups samples and removes data redundancy concurrently. In specific, eigendecomposition is employed to obtain the robust data representation of low redundancy for later clustering. By utilizing the two processes into a unified model, clustering results will guide eigendecomposition to generate more discriminative data representation, which, as feedback, helps obtain better clustering results. In addition, an alternate and convergent algorithm is designed to solve the optimization problem. Extensive experiments are conducted on eight benchmarks, and the proposed algorithm outperforms comparative ones in recent literature by a large margin, verifying its superiority. At the same time, its effectiveness, computational efficiency, and robustness to noise are validated experimentally.

Journal ArticleDOI
TL;DR: This article proposes a novel rain streaks removal framework using a kernel-guided convolutional neural network (KGCNN), achieving state-of-the-art performance with a simple network architecture.
Abstract: Recently emerged deep learning methods have achieved great success in single image rain streaks removal. However, existing methods ignore an essential factor in the rain streaks generation mechanism, i.e., the motion blur leading to the line pattern appearances. Thus, they generally produce overderaining or underderaining results. In this article, inspired by the generation mechanism, we propose a novel rain streaks removal framework using a kernel-guided convolutional neural network (KGCNN), achieving state-of-the-art performance with a simple network architecture. More precisely, our framework consists of three steps. First, we learn the motion blur kernel by a plain neural network, termed parameter network, from the detail layer of a rainy patch. Then, we stretch the learned motion blur kernel into a degradation map with the same spatial size as the rainy patch. Finally, we use the stretched degradation map together with the detail patches to train a deraining network with a typical ResNet architecture, which produces the rain streaks with the guidance of the learned motion blur kernel. Experiments conducted on extensive synthetic and real data demonstrate the effectiveness of the proposed KGCNN, in terms of rain streaks removal and image detail preservation.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes.
Abstract: Scene text detection and recognition have been well explored in the past few years. Despite the progress, efficient and accurate end-to-end spotting of arbitrarily-shaped text remains challenging. In this work, we propose an end-to-end text spotting framework, termed PAN++, which can efficiently detect and recognize text of arbitrary shapes in natural scenes. PAN++ is based on the kernel representation that reformulates a text line as a text kernel (central region) surrounded by peripheral pixels. By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text. Moreover, as a pixel-based representation, the kernel representation can be predicted by a single fully convolutional network, which is very friendly to real-time applications. Taking the advantages of the kernel representation, we design a series of components as follows: 1) a computationally efficient feature enhancement network composed of stacked Feature Pyramid Enhancement Modules (FPEMs); 2) a lightweight detection head cooperating with Pixel Aggregation (PA); and 3) an efficient attention-based recognition head with Masked RoI. Benefiting from the above designs, our method achieves high inference speed while maintaining competitive accuracy. Extensive experiments show the superiority of our method.

Journal ArticleDOI
TL;DR: Experimental results on the open SAR ship detection data set (SSDD) reveal that the accuracy and speed of ShipDeNet-20 are both superior to the other nine state-of-the-art object detectors.
Abstract: Existing most deep learning-based synthetic aperture radar (SAR) ship detectors have huge network scale and big model size. Thus, to solve these defects, we propose a lightweight SAR ship detector “ShipDeNet-20” with 20 convolution layers and < 1 MB (0.82 MB) model size. We use fewer layers and kernels, and depthwise separable convolution (DS-Conv) to ensure ShipDeNet-20’s lightweight attribute. Moreover, we also propose a feature fusion module (FF-Module), a feature enhance module (FE-Module), and a scale share feature pyramid module (SSFP-Module) to compensate for the raw ShipDeNet-20’s accuracy loss. Experimental results on the open SAR ship detection data set (SSDD) reveal that the accuracy and speed of ShipDeNet-20 are both superior to the other nine state-of-the-art object detectors. Finally, detection results on another two wide-region SAR images show ShipDeNet-20’s strong migration ability. ShipDeNet-20 is a novel SAR ship detector, built from scratch, lighter than others by tens even hundreds of times, helpful for real-time SAR application and future hardware transplantation.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed Mask-Aware Dynamic Filtering (MADF) module to effectively learn multi-scale features for missing regions in the encoding phase, where filters for each convolution window are generated from features of the corresponding region of the mask.
Abstract: Inpainting arbitrary missing regions is challenging because learning valid features for various masked regions is nontrivial. Though U-shaped encoder-decoder frameworks have been witnessed to be successful, most of them share a common drawback of mask unawareness in feature extraction because all convolution windows (or regions), including those with various shapes of missing pixels, are treated equally and filtered with fixed learned kernels. To this end, we propose our novel mask-aware inpainting solution. Firstly, a Mask-Aware Dynamic Filtering (MADF) module is designed to effectively learn multi-scale features for missing regions in the encoding phase. Specifically, filters for each convolution window are generated from features of the corresponding region of the mask. The second fold of mask awareness is achieved by adopting Point-wise Normalization (PN) in our decoding phase, considering that statistical natures of features at masked points differentiate from those of unmasked points. The proposed PN can tackle this issue by dynamically assigning point-wise scaling factor and bias. Lastly, our model is designed to be an end-to-end cascaded refinement one. Supervision information such as reconstruction loss, perceptual loss and total variation loss is incrementally leveraged to boost the inpainting results from coarse to fine. Effectiveness of the proposed framework is validated both quantitatively and qualitatively via extensive experiments on three public datasets including Places2, CelebA and Paris StreetView.

Journal ArticleDOI
TL;DR: A label constrained convolutional factor analysis (LCCFA) model, which unifies the factor analysis, convolution operation and supervised learning, is developed, which outperforms other related models in terms of small sample classification.

Journal ArticleDOI
Yinjun Wang1, Xiaoxi Ding1, Qiang Zeng1, Liming Wang1, Yimin Shao1 
TL;DR: A one-dimensional vision ConvNet (VCN), where the architecture is composed of multilayer small kernel network and single-layer large kernel network side by side, which improves the recognition accuracy with a better stable training process for rolling bearing fault classification.
Abstract: Feature extraction from a time sequence signal without manual information is an important part for bearing intelligent diagnosis. With the merits of signal information and feature structure information excavation, Deep ConvNet is widely used in bearing fault diagnosis and analysis under complex working conditions. However, due to the complexity of the bearing operating environment in the actual operation process, the sensitive features show different scale distribution characteristics. Meanwhile, it is known that the convolution kernel of ConvNet is usually small, which mainly focuses on the small-scale details of state distribution characteristics while ignores the identification of the overall trend of characteristic distribution. Considering that the size of convolution kernel can sense information hidden in different scales, this paper designed a one-dimensional vision ConvNet (VCN), where the architecture is composed of multilayer small kernel network and single-layer large kernel network side by side. The multi-kernel structure improves the ability of network to detect fault characteristic frequency band. By analyzing the artificially generated data and experimental data, the setting method of large convolution kernel and stride is discussed. Compared with the traditional CNN, wide first-layer kernels (WDCNN) and multiscale kernel-based ResCNN (MK-ResCNN), this network improves the recognition accuracy with a better stable training process for rolling bearing fault classification.

Proceedings ArticleDOI
20 Jun 2021
TL;DR: In this paper, a 2D convolutional network architecture that carries the 3D spherical coordinates of each pixel throughout the network is proposed to learn a 3D representation directly from this range image view.
Abstract: 3D object detection is vital for many robotics applications. For tasks where a 2D perspective range image exists, we propose to learn a 3D representation directly from this range image view. To this end, we designed a 2D convolutional network architecture that carries the 3D spherical coordinates of each pixel throughout the network. Its layers can consume any arbitrary convolution kernel in place of the default inner product kernel and exploit the underlying local geometry around each pixel. We outline four such kernels: a dense kernel according to the bag-of-words paradigm, and three graph kernels inspired by recent graph neural network advances: the Transformer, the PointNet, and the Edge Convolution. We also explore cross-modality fusion with the camera image, facilitated by operating in the perspective range image view. Our method performs competitively on the Waymo Open Dataset and improves the state-of-the-art AP for pedestrian detection from 69.7% to 75.5%. It is also efficient in that our smallest model, which still outperforms the popular PointPillars in quality, requires 180 times fewer FLOPS and model parameters.