scispace - formally typeset
Search or ask a question

Showing papers on "Kernel (image processing) published in 2020"


Proceedings ArticleDOI
14 Jun 2020
TL;DR: The Efficient Channel Attention (ECA) module as discussed by the authors proposes a local cross-channel interaction strategy without dimensionality reduction, which can be efficiently implemented via 1D convolution, which only involves a handful of parameters while bringing clear performance gain.
Abstract: Recently, channel attention mechanism has demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most existing methods dedicate to developing more sophisticated attention modules for achieving better performance, which inevitably increase model complexity. To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. By dissecting the channel attention module in SENet, we empirically show avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity. Therefore, we propose a local cross-channel interaction strategy without dimensionality reduction, which can be efficiently implemented via 1D convolution. Furthermore, we develop a method to adaptively select kernel size of 1D convolution, determining coverage of local cross-channel interaction. The proposed ECA module is both efficient and effective, e.g., the parameters and computations of our modules against backbone of ResNet50 are 80 vs. 24.37M and 4.7e-4 GFlops vs. 3.86 GFlops, respectively, and the performance boost is more than 2% in terms of Top-1 accuracy. We extensively evaluate our ECA module on image classification, object detection and instance segmentation with backbones of ResNets and MobileNetV2. The experimental results show our module is more efficient while performing favorably against its counterparts.

1,378 citations


Posted Content
TL;DR: An approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities is suggested.
Abstract: We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron (MLP) to learn high-frequency functions in low-dimensional problem domains These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP fails to learn high frequencies both in theory and in practice To overcome this spectral bias, we use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities

787 citations


Posted Content
TL;DR: This work forms a new neural operator by parameterizing the integral kernel directly in Fourier space, allowing for an expressive and efficient architecture and shows state-of-the-art performance compared to existing neural network methodologies.
Abstract: The classical development of neural networks has primarily focused on learning mappings between finite-dimensional Euclidean spaces. Recently, this has been generalized to neural operators that learn mappings between function spaces. For partial differential equations (PDEs), neural operators directly learn the mapping from any functional parametric dependence to the solution. Thus, they learn an entire family of PDEs, in contrast to classical methods which solve one instance of the equation. In this work, we formulate a new neural operator by parameterizing the integral kernel directly in Fourier space, allowing for an expressive and efficient architecture. We perform experiments on Burgers' equation, Darcy flow, and Navier-Stokes equation. The Fourier neural operator is the first ML-based method to successfully model turbulent flows with zero-shot super-resolution. It is up to three orders of magnitude faster compared to traditional PDE solvers. Additionally, it achieves superior accuracy compared to previous learning-based solvers under fixed resolution.

762 citations


Journal ArticleDOI
TL;DR: Meltdown as mentioned in this paper exploits side effects of out-of-order execution on modern processors to read arbitrary kernel-memory locations including personal data and passwords, and it does not rely on any software vulnerabilities.
Abstract: The security of computer systems fundamentally relies on memory isolation, e.g., kernel address ranges are marked as non-accessible and are protected from user access. In this paper, we present Meltdown. Meltdown exploits side effects of out-of-order execution on modern processors to read arbitrary kernel-memory locations including personal data and passwords. Out-of-order execution is an indispensable performance feature and present in a wide range of modern processors. The attack is independent of the operating system, and it does not rely on any software vulnerabilities. Meltdown breaks all security guarantees provided by address space isolation as well as paravirtualized environments and, thus, every security mechanism building upon this foundation. On affected systems, Meltdown enables an adversary to read memory of other processes or virtual machines in the cloud without any permissions or privileges, affecting millions of customers and virtually every user of a personal computer. We show that the KAISER defense mechanism for KASLR has the important (but inadvertent) side effect of impeding Meltdown. We stress that KAISER must be deployed immediately to prevent large-scale exploitation of this severe information leakage.

497 citations


Proceedings ArticleDOI
Yinpeng Chen1, Xiyang Dai1, Mengchen Liu1, Dongdong Chen1, Lu Yuan1, Zicheng Liu1 
14 Jun 2020
TL;DR: Dynamic convolution as mentioned in this paper aggregates multiple parallel convolution kernels dynamically based on their attentions, which are input dependent, to increase model complexity without increasing the network depth or width.
Abstract: Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of channels) of CNNs, resulting in limited representation capability. To address this issue, we present Dynamic Convolution, a new design that increases model complexity without increasing the network depth or width. Instead of using a single convolution kernel per layer, dynamic convolution aggregates multiple parallel convolution kernels dynamically based upon their attentions, which are input dependent. Assembling multiple kernels is not only computationally efficient due to the small kernel size, but also has more representation power since these kernels are aggregated in a non-linear way via attention. By simply using dynamic convolution for the state-of-the-art architecture MobileNetV3-Small, the top-1 accuracy of ImageNet classification is boosted by 2.9% with only 4% additional FLOPs and 2.9 AP gain is achieved on COCO keypoint detection.

291 citations


Journal ArticleDOI
TL;DR: The proposed multiscale dynamic GCN (MDGCN) enables the graph to be dynamically updated along with the graph convolution process so that these two steps can be benefited from each other to gradually produce the discriminative embedded features as well as a refined graph.
Abstract: Convolutional neural network (CNN) has demonstrated impressive ability to represent hyperspectral images and to achieve promising results in hyperspectral image classification. However, traditional CNN models can only operate convolution on regular square image regions with fixed size and weights, and thus, they cannot universally adapt to the distinct local regions with various object distributions and geometric appearances. Therefore, their classification performances are still to be improved, especially in class boundaries. To alleviate this shortcoming, we consider employing the recently proposed graph convolutional network (GCN) for hyperspectral image classification, as it can conduct the convolution on arbitrarily structured non-Euclidean data and is applicable to the irregular image regions represented by graph topological information. Different from the commonly used GCN models that work on a fixed graph, we enable the graph to be dynamically updated along with the graph convolution process so that these two steps can be benefited from each other to gradually produce the discriminative embedded features as well as a refined graph. Moreover, to comprehensively deploy the multiscale information inherited by hyperspectral images, we establish multiple input graphs with different neighborhood scales to extensively exploit the diversified spectral–spatial correlations at multiple scales. Therefore, our method is termed multiscale dynamic GCN (MDGCN). The experimental results on three typical benchmark data sets firmly demonstrate the superiority of the proposed MDGCN to other state-of-the-art methods in both qualitative and quantitative aspects.

270 citations


Posted Content
TL;DR: State-of-the-art results in object detection (from the authors' mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation.
Abstract: In this work, we aim at building a simple, direct, and fast instance segmentation framework with strong performance. We follow the principle of the SOLO method of Wang et al. "SOLO: segmenting objects by locations". Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location. Specifically, the mask branch is decoupled into a mask kernel branch and mask feature branch, which are responsible for learning the convolution kernel and the convolved features respectively. Moreover, we propose Matrix NMS (non maximum suppression) to significantly reduce the inference time overhead due to NMS of masks. Our Matrix NMS performs NMS with parallel matrix operations in one shot, and yields better results. We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art methods in both speed and accuracy. A light-weight version of SOLOv2 executes at 31.3 FPS and yields 37.1% AP. Moreover, our state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation. Code is available at: this https URL

261 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: A novel self-calibrated convolution that explicitly expand fields-of-view of each convolutional layers through internal communications and hence enrich the output features to help CNNs generate more discriminative representations by explicitly incorporating richer information.
Abstract: Recent advances on CNNs are mostly devoted to designing more complex architectures to enhance their representation learning capacity. In this paper, we consider how to improve the basic convolutional feature transformation process of CNNs without tuning the model architectures. To this end, we present a novel self-calibrated convolutions that explicitly expand fields-of-view of each convolutional layers through internal communications and hence enrich the output features. In particular, unlike the standard convolutions that fuse spatial and channel-wise information using small kernels (e.g., 3x3), self-calibrated convolutions adaptively build long-range spatial and inter-channel dependencies around each spatial location through a novel self-calibration operation. Thus, it can help CNNs generate more discriminative representations by explicitly incorporating richer information. Our self-calibrated convolution design is simple and generic, and can be easily applied to augment standard convolutional layers without introducing extra parameters and complexity. Extensive experiments demonstrate that when applying self-calibrated convolutions into different backbones, our networks can significantly improve the baseline models in a variety of vision tasks, including image recognition, object detection, instance segmentation, and keypoint detection, with no need to change the network architectures. We hope this work could provide a promising way for future research in designing novel convolutional feature transformations for improving convolutional networks. Code is available on the project page.

239 citations


Proceedings ArticleDOI
Sida Peng1, Wen Jiang1, Huaijin Pi1, Xiuli Li, Hujun Bao1, Xiaowei Zhou1 
14 Jun 2020
TL;DR: This paper develops a two-stage pipeline for instance segmentation: initial contour proposal and contour deformation, which can handle errors in object localization and proposes to use circular convolution in deep snake for structured feature learning on the contour.
Abstract: This paper introduces a novel contour-based approach named deep snake for real-time instance segmentation. Unlike some recent methods that directly regress the coordinates of the object boundary points from an image, deep snake uses a neural network to iteratively deform an initial contour to match the object boundary, which implements the classic idea of snake algorithms with a learning-based approach. For structured feature learning on the contour, we propose to use circular convolution in deep snake, which better exploits the cycle-graph structure of a contour compared against generic graph convolution. Based on deep snake, we develop a two-stage pipeline for instance segmentation: initial contour proposal and contour deformation, which can handle errors in object localization. Experiments show that the proposed approach achieves competitive performances on the Cityscapes, KINS, SBD and COCO datasets while being efficient for real-time applications with a speed of 32.3 fps for 512 x 512 images on a 1080Ti GPU. The code is available at https://github.com/zju3dv/snake/.

225 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: Meta-Transfer Learning for Zero-Shot Super-Resolution (MZSR) is presented, which leverages ZSSR and can exploit both external and internal information, where one single gradient update can yield quite considerable results.
Abstract: Convolutional neural networks (CNNs) have shown dramatic improvements in single image super-resolution (SISR) by using large-scale external samples. Despite their remarkable performance based on the external dataset, they cannot exploit internal information within a specific image. Another problem is that they are applicable only to the specific condition of data that they are supervised. For instance, the low-resolution (LR) image should be a "bicubic" downsampled noise-free image from a high-resolution (HR) one. To address both issues, zero-shot super-resolution (ZSSR) has been proposed for flexible internal learning. However, they require thousands of gradient updates, i.e., long inference time. In this paper, we present Meta-Transfer Learning for Zero-Shot Super-Resolution (MZSR), which leverages ZSSR. Precisely, it is based on finding a generic initial parameter that is suitable for internal learning. Thus, we can exploit both external and internal information, where one single gradient update can yield quite considerable results. With our method, the network can quickly adapt to a given image condition. In this respect, our method can be applied to a large spectrum of image conditions within a fast adaptation process.

223 citations


Proceedings ArticleDOI
14 Jun 2020
TL;DR: Experimental results show that the proposed SelfDeblur can achieve notable quantitative gains as well as more visually plausible deblurring results in comparison to state-of-the-art blind deconvolution methods on benchmark datasets and real-world blurry images.
Abstract: Blind deconvolution is a classical yet challenging low-level vision problem with many real-world applications. Traditional maximum a posterior (MAP) based methods rely heavily on fixed and handcrafted priors that certainly are insufficient in characterizing clean images and blur kernels, and usually adopt specially designed alternating minimization to avoid trivial solution. In contrast, existing deep motion deblurring networks learn from massive training images the mapping to clean image or blur kernel, but are limited in handling various complex and large size blur kernels. To connect MAP and deep models, we in this paper present two generative networks for respectively modeling the deep priors of clean image and blur kernel, and propose an unconstrained neural optimization solution to blind deconvolution. In particular, we adopt an asymmetric Autoencoder with skip connections for generating latent clean image, and a fully-connected network (FCN) for generating blur kernel. Moreover, the SoftMax nonlinearity is applied to the output layer of FCN to meet the non-negative and equality constraints. The process of neural optimization can be explained as a kind of ''zero-shot" self-supervised learning of the generative networks, and thus our proposed method is dubbed SelfDeblur. Experimental results show that our SelfDeblur can achieve notable quantitative gains as well as more visually plausible deblurring results in comparison to state-of-the-art blind deconvolution methods on benchmark datasets and real-world blurry images. The source code is publicly available at https://github.com/csdwren/SelfDeblur

Proceedings ArticleDOI
01 Mar 2020
TL;DR: In this paper, a deep learning based edge detector is proposed, which is inspired on both HED (Holistically-Nested Edge Detection) and Xception networks; the proposed approach generates thin edge-maps that are plausible for human eyes; it can be used in any edge detection task without previous training or fine tuning process.
Abstract: This paper proposes a Deep Learning based edge detector, which is inspired on both HED (Holistically-Nested Edge Detection) and Xception networks. The proposed approach generates thin edge-maps that are plausible for human eyes; it can be used in any edge detection task without previous training or fine tuning process. As a second contribution, a large dataset with carefully annotated edges, has been generated. This dataset has been used for training the proposed approach as well the state-of-the-art algorithms for comparisons. Quantitative and qualitative evaluations have been performed on different benchmarks showing improvements with the proposed method when F-measure of ODS and OIS are considered.

Journal ArticleDOI
TL;DR: This work proposes two novel CNN architectures which achieve a human-like accuracy of 65% and can serve as a basis for standardization of the base model for the much inquired FER-2013 dataset.
Abstract: Facial expression recognition is a challenging problem in image classification. Recently, the use of deep learning is gaining importance in image classification. This has led to increased efforts in solving the problem of facial expression recognition using convolutional neural networks (CNNs). A significant challenge in deep learning is to design a network architecture that is simple and effective. A simple architecture is fast to train and easy to implement. An effective architecture achieves good accuracy on the test data. CNN architectures are black boxes to us. VGGNet, AlexNet and Inception are well-known CNN architectures. These architectures have strongly influenced CNN model designs for new datasets. Almost all CNN models known to achieve high accuracy on facial expression recognition problem are influenced by these architectures. This work tries to overcome this limitation by using FER-2013 dataset as starting point to design new CNN models. In this work, the effect of CNN parameters namely kernel size and number of filters on the classification accuracy is investigated using FER-2013 dataset. Our major contribution is a thorough evaluation of different kernel sizes and number of filters to propose two novel CNN architectures which achieve a human-like accuracy of 65% (Goodfellow et al. in: Neural information processing, Springer, Berlin, pp 117–124, 2013) on FER-2013 dataset. These architectures can serve as a basis for standardization of the base model for the much inquired FER-2013 dataset.

Journal ArticleDOI
TL;DR: In this paper, the authors consider an advection-dispersion model, where the velocity is considered to be 1 and the kernels are power law, exponential decay law and the generalized Mittag-Leffler kernel.
Abstract: Nonlocal differential and integral operators with fractional order and fractal dimension have been recently introduced and appear to be powerful mathematical tools to model complex real world problems that could not be modeled with classical and nonlocal differential and integral operators with single order. To stress further possible application of such operators, we consider in this work an advection-dispersion model, where the velocity is considered to be 1. We consider three cases of the models, when the kernels are power law, exponential decay law and the generalized Mittag-Leffler kernel. For each case, we present a detailed analysis including, numerical solution, stability analysis and error analysis. We present some numerical simulation.

Journal ArticleDOI
TL;DR: A new method for the further research of crop diseases diagnosis by using deep learning and SVM, which has higher accuracy than the traditional back propagation neural networks models.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: The proposed orthogonal convolution requires no additional parameters and little computational overhead and consistently outperforms the kernel orthogonality alternative on a wide range of tasks such as image classification and inpainting under supervised, semi-supervised and unsupervised settings.
Abstract: Deep convolutional neural networks are hindered by training instability and feature redundancy towards further performance improvement. A promising solution is to impose orthogonality on convolutional filters. We develop an efficient approach to impose filter orthogonality on a convolutional layer based on the doubly block-Toeplitz matrix representation of the convolutional kernel, instead of the common kernel orthogonality approach, which we show is only necessary but not sufficient for ensuring orthogonal convolutions. Our proposed orthogonal convolution requires no additional parameters and little computational overhead. It consistently outperforms the kernel orthogonality alternative on a wide range of tasks such as image classification and inpainting under supervised, semi-supervised and unsupervised settings. It learns more diverse and expressive features with better training stability, robustness, and generalization. Our code is publicly available.

Journal ArticleDOI
TL;DR: The experimental results show that the proposed CNN structure is significantly better than other five methods when it is used to detect three spatial algorithms such as WOW, S-UNIWARD and HILL with a wide variety of datasets and payloads.
Abstract: For steganalysis, many studies showed that convolutional neural network (CNN) has better performances than the two-part structure of traditional machine learning methods. Existing CNN architectures use various tricks to improve the performance of steganalysis, such as fixed convolutional kernels, the absolute value layer, data augmentation and the domain knowledge. However, some designing of the network structure were not extensively studied so far, such as different convolutions (inception, xception, etc.) and variety ways of pooling(spatial pyramid pooling, etc.). In this paper, we focus on designing a new CNN network structure to improve detection accuracy of spatial-domain steganography. First, we use $3\times 3$ kernels instead of the traditional $5\times 5$ kernels and optimize convolution kernels in the preprocessing layer. The smaller convolution kernels are used to reduce the number of parameters and model the features in a small local region. Next, we use separable convolutions to utilize channel correlation of the residuals, compress the image content and increase the signal-to-noise ratio (between the stego signal and the image signal). Then, we use spatial pyramid pooling (SPP) to aggregate the local features and enhance the representation ability of features by multi-level pooling. Finally, data augmentation is adopted to further improve network performance. The experimental results show that the proposed CNN structure is significantly better than other five methods such as SRM, Ye-Net, Xu-Net, Yedroudj-Net and SRNet, when it is used to detect three spatial algorithms such as WOW, S-UNIWARD and HILL with a wide variety of datasets and payloads.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: 3D Graph Convolution Networks (3D-GCN), which is designed to extract local 3D features from point clouds across scales, while shift and scale-invariance properties are introduced.
Abstract: Point clouds are among the popular geometry representations for 3D vision applications. However, without regular structures like 2D images, processing and summarizing information over these unordered data points are very challenging. Although a number of previous works attempt to analyze point clouds and achieve promising performances, their performances would degrade significantly when data variations like shift and scale changes are presented. In this paper, we propose 3D Graph Convolution Networks (3D-GCN), which is designed to extract local 3D features from point clouds across scales, while shift and scale-invariance properties are introduced. The novelty of our 3D-GCN lies in the definition of learnable kernels with a graph max-pooling mechanism. We show that 3D-GCN can be applied to 3D classification and segmentation tasks, with ablation studies and visualizations verifying the design of 3D-GCN.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: A novel squeeze-and-attention network (SANet) architecture is proposed that leverages an effective squeeze- and-att attention (SA) module to account for two distinctive characteristics of segmentation: i) pixel-group attention, and ii) pixels-wise prediction.
Abstract: The recent integration of attention mechanisms into segmentation networks improves their representational capabilities through a great emphasis on more informative features. However, these attention mechanisms ignore an implicit sub-task of semantic segmentation and are constrained by the grid structure of convolution kernels. In this paper, we propose a novel squeeze-and-attention network (SANet) architecture that leverages an effective squeeze-and-attention (SA) module to account for two distinctive characteristics of segmentation: i) pixel-group attention, and ii) pixel-wise prediction. Specifically, the proposed SA modules impose pixel-group attention on conventional convolution by introducing an 'attention' convolutional channel, thus taking into account spatial-channel inter-dependencies in an efficient manner. The final segmentation results are produced by merging outputs from four hierarchical stages of a SANet to integrate multi-scale contexts for obtaining an enhanced pixel-wise prediction. Empirical experiments on two challenging public datasets validate the effectiveness of the proposed SANets, which achieves 83.2 % mIoU (without COCO pre-training) on PASCAL VOC and a state-of-the-art mIoU of 54.4 % on PASCAL Context.

Proceedings ArticleDOI
14 Jun 2020
TL;DR: D4LCN overcomes the limitation of conventional 2D convolutions and narrows the gap between image representation and 3D point cloud representation, where the filters and their receptive fields can be automatically learned from image-based depth maps.
Abstract: 3D object detection from a single image without LiDAR is a challenging task due to the lack of accurate depth information. Conventional 2D convolutions are unsuitable for this task because they fail to capture local object and its scale information, which are vital for 3D object detection. To better represent 3D structure, prior arts typically transform depth maps estimated from 2D images into a pseudo-LiDAR representation, and then apply existing 3D point-cloud based object detectors. However, their results depend heavily on the accuracy of the estimated depth maps, resulting in suboptimal performance. In this work, instead of using pseudo-LiDAR representation, we improve the fundamental 2D fully convolutions by proposing a new local convolutional network (LCN), termed Depth-guided Dynamic-Depthwise-Dilated LCN (D4LCN), where the filters and their receptive fields can be automatically learned from image-based depth maps, making different pixels of different images have different filters. D4LCN overcomes the limitation of conventional 2D convolutions and narrows the gap between image representation and 3D point cloud representation. Extensive experiments show that D$^4$LCN outperforms existing works by large margins. For example, the relative improvement of D4LCN against the state-of-the-art on KITTI is 9.1\% in the moderate setting. D4LCN ranks 1st on KITTI monocular 3D object detection benchmark at the time of submission (car, December 2019). The code is available at https://github.com/dingmyu/D4LCN

Journal ArticleDOI
TL;DR: The results and analysis of extensive real HSIC experiments demonstrate that the proposed light-weighted 2D-3D CNN network can effectively extract refined features and improve the classification accuracy.
Abstract: Convolutional neural networks (CNN) have led to a successful breakthrough for hyperspectral image classification (HSIC). Due to the intrinsic spatial-spectral specificities of a hyperspectral cube, feature extraction with 3-D convolution operation is a straightforward way for HSIC. However, the overwhelming features obtained from the original 3-D CNN network suffers from the overfitting and more training cost problem. To address this issue, in this article, a novel HSIC framework based on a simplified 2D-3D CNN is implemented by the cooperation between a 2-D CNN and a 3-D convolution layer. First, the 2-D convolution block aims to extract the spatial features abundantly involved spectral information as a training channel. Then, the 3-D CNN approach primarily concentrates on exploiting band co-relation data by using a reduced kernel. The proposed architecture achieves the spatial and spectral features simultaneously based on a joint 2D-3D pattern to achieve superior fused feature for the subsequent classification. Furthermore, a deconvolution layer intends to enhance the robustness of the deep features is utilized in the proposed CNN network. The results and analysis of extensive real HSIC experiments demonstrate that the proposed light-weighted 2D-3D CNN network can effectively extract refined features and improve the classification accuracy.

Journal ArticleDOI
TL;DR: In this paper, the free vibration and buckling analyses of functionally graded carbon nanotube-reinforced (FG-CNTR) laminated non-rectangular plates, i.e., quadrilateral and skew plates, using a four-nodded straight-sided transformation method.
Abstract: This paper presents the free vibration and buckling analyses of functionally graded carbon nanotube-reinforced (FG-CNTR) laminated non-rectangular plates, i.e., quadrilateral and skew plates, using a four-nodded straight-sided transformation method. At first, the related equations of motion and buckling of quadrilateral plate have been given, and then, these equations are transformed from the irregular physical domain into a square computational domain using the geometric transformation formulation via discrete singular convolution (DSC). The discretization of these equations is obtained via two-different regularized kernel, i.e., regularized Shannon’s delta (RSD) and Lagrange-delta sequence (LDS) kernels in conjunctions with the discrete singular convolution numerical integration. Convergence and accuracy of the present DSC transformation are verified via existing literature results for different cases. Detailed numerical solutions are performed, and obtained parametric results are presented to show the effects of carbon nanotube (CNT) volume fraction, CNT distribution pattern, geometry of skew and quadrilateral plate, lamination layup, skew and corner angle, thickness-to-length ratio on the vibration, and buckling analyses of FG-CNTR-laminated composite non-rectangular plates with different boundary conditions. Some detailed results related to critical buckling and frequency of FG-CNTR non-rectangular plates have been reported which can serve as benchmark solutions for future investigations.

Posted Content
23 Mar 2020
TL;DR: State-of-the-art results in object detection (from the authors' mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation.
Abstract: In this work, we aim at building a simple, direct, and fast instance segmentation framework with strong performance. We follow the principle of the SOLO method of Wang et al. "SOLO: segmenting objects by locations". Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location. Specifically, the mask branch is decoupled into a mask kernel branch and mask feature branch, which are responsible for learning the convolution kernel and the convolved features respectively. Moreover, we propose Matrix NMS (non maximum suppression) to significantly reduce the inference time overhead due to NMS of masks. Our Matrix NMS performs NMS with parallel matrix operations in one shot, and yields better results. We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art methods in both speed and accuracy. A light-weight version of SOLOv2 executes at 31.3 FPS and yields 37.1% AP. Moreover, our state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation. Code is available at: this https URL

Journal ArticleDOI
TL;DR: Experimental results on the SAR Ship Detection Dataset (SSDD), Gaofen-SSDD and Sentinel-SS DD show that HyperLi-Net’s accuracy and speed are both superior to the other nine state-of-the-art methods.
Abstract: Ship detection from Synthetic Aperture Radar (SAR) imagery is attracting increasing attention due to its great value in ocean. However, existing most studies are frequently improving detection accuracy at the expense of detection speed. Thus, to solve this problem, this paper proposes HyperLi-Net for high-accurate and high-speed SAR ship detection. We propose five external modules to achieve high-accuracy, i.e., Multi-Receptive-Field Module (MRF-Module), Dilated Convolution Module (DC-Module), Channel and Spatial Attention Module (CSA-Module), Feature Fusion Module (FF-Module) and Feature Pyramid Module (FP-Module). We also adopt five internal mechanisms to achieve high-speed, i.e., Region-Free Model (RF-Model), Small Kernel (S-Kernel), Narrow Channel (N-Channel), Separable Convolution (Separa-Conv) and Batch Normalization Fusion (BN-Fusion). Experimental results on the SAR Ship Detection Dataset (SSDD), Gaofen-SSDD and Sentinel-SSDD show that HyperLi-Net’s accuracy and speed are both superior to the other nine state-of-the-art methods. Moreover, the satisfactory detection results on two Sentinel-1 SAR images can reveal HyperLi-Net’s good migration capability. HyperLi-Net is build from scratch with fewer parameters, lower computation costs and lighter model that can be efficiently trained on CPUs and is helpful for future hardware transplantation, e.g. FPGAs, DSPs, etc.

Posted Content
TL;DR: An alternating optimization algorithm, which can estimate blur kernel and restore SR image in a single model and is trained with the kernel estimated by \textit{Estimator}, instead of ground-truth kernel, thus the model could be more tolerant to the estimation error of the latter.
Abstract: Previous methods decompose blind super resolution (SR) problem into two sequential steps: \textit{i}) estimating blur kernel from given low-resolution (LR) image and \textit{ii}) restoring SR image based on estimated kernel. This two-step solution involves two independently trained models, which may not be well compatible with each other. Small estimation error of the first step could cause severe performance drop of the second one. While on the other hand, the first step can only utilize limited information from LR image, which makes it difficult to predict highly accurate blur kernel. Towards these issues, instead of considering these two steps separately, we adopt an alternating optimization algorithm, which can estimate blur kernel and restore SR image in a single model. Specifically, we design two convolutional neural modules, namely \textit{Restorer} and \textit{Estimator}. \textit{Restorer} restores SR image based on predicted kernel, and \textit{Estimator} estimates blur kernel with the help of restored SR image. We alternate these two modules repeatedly and unfold this process to form an end-to-end trainable network. In this way, \textit{Estimator} utilizes information from both LR and SR images, which makes the estimation of blur kernel easier. More importantly, \textit{Restorer} is trained with the kernel estimated by \textit{Estimator}, instead of ground-truth kernel, thus \textit{Restorer} could be more tolerant to the estimation error of \textit{Estimator}. Extensive experiments on synthetic datasets and real-world images show that our model can largely outperform state-of-the-art methods and produce more visually favorable results at much higher speed. The source code is available at this https URL.

Proceedings Article
01 Jan 2020
TL;DR: SOLOv2 as discussed by the authors decouples the mask branch into a mask kernel branch and mask feature branch, which are responsible for learning the convolution kernel and the convolved features respectively.
Abstract: In this work, we aim at building a simple, direct, and fast instance segmentation framework with strong performance. We follow the principle of the SOLO method of Wang et al. "SOLO: segmenting objects by locations". Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location. Specifically, the mask branch is decoupled into a mask kernel branch and mask feature branch, which are responsible for learning the convolution kernel and the convolved features respectively. Moreover, we propose Matrix NMS (non maximum suppression) to significantly reduce the inference time overhead due to NMS of masks. Our Matrix NMS performs NMS with parallel matrix operations in one shot, and yields better results. We demonstrate a simple direct instance segmentation system, outperforming a few state-of-the-art methods in both speed and accuracy. A light-weight version of SOLOv2 executes at 31.3 FPS and yields 37.1% AP. Moreover, our state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation show the potential to serve as a new strong baseline for many instance-level recognition tasks besides instance segmentation. Code is available at: this https URL

Proceedings ArticleDOI
Xinjiang Wang1, Shilong Zhang1, Zhuoran Yu1, Litong Feng1, Wayne Zhang1 
14 Jun 2020
TL;DR: Jeon et al. as mentioned in this paper proposed a scale-equalizing pyramid convolution (SEPC) that aligns the shared pyramid convolutions kernel only at high-level feature maps to extract scale-invariant features.
Abstract: Feature pyramid has been an efficient method to extract features at different scales. Development over this method mainly focuses on aggregating contextual information at different levels while seldom touching the inter-level correlation in the feature pyramid. Early computer vision methods extracted scale-invariant features by locating the feature extrema in both spatial and scale dimension. Inspired by this, a convolution across the pyramid level is proposed in this study, which is termed pyramid convolution and is a modified 3-D convolution. Stacked pyramid convolutions directly extract 3-D (scale and spatial) features and outperforms other meticulously designed feature fusion modules. Based on the viewpoint of 3-D convolution, an integrated batch normalization that collects statistics from the whole feature pyramid is naturally inserted after the pyramid convolution. Furthermore, we also show that the naive pyramid convolution, together with the design of RetinaNet head, actually best applies for extracting features from a Gaussian pyramid, whose properties can hardly be satisfied by a feature pyramid. In order to alleviate this discrepancy, we build a scale-equalizing pyramid convolution (SEPC) that aligns the shared pyramid convolution kernel only at high-level feature maps. Being computationally efficient and compatible with the head design of most single-stage object detectors, the SEPC module brings significant performance improvement (>4AP increase on MS-COCO2017 dataset) in state-of-the-art one-stage object detectors, and a light version of SEPC also has ~3.5AP gain with only around 7% inference time increase. The pyramid convolution also functions well as a stand-alone module in two-stage object detectors and is able to improve the performance by ~2AP. The source code can be found at https://github.com/jshilong/SEPC.

Journal ArticleDOI
TL;DR: Experimental results validate that the proposed LRCISSK method can effectively explore the spatial-spectral information and deliver superior performance with at least 1.30% higher OA and 1.03% higher AA on average when compared to other state-of-the-art classifiers.
Abstract: Kernel methods, e.g., composite kernels (CKs) and spatial-spectral kernels (SSKs), have been demonstrated to be an effective way to exploit the spatial-spectral information nonlinearly for improving the classification performance of hyperspectral image (HSI). However, these methods are always conducted with square-shaped window or superpixel techniques. Both techniques are likely to misclassify the pixels that lie at the boundaries of class, and thus a small target is always smoothed away. To alleviate these problems, in this paper, we propose a novel patch-based low rank component induced spatial-spectral kernel method, termed LRCISSK, for HSI classification. First, the latent low-rank features of spectra in each cubic patch of HSI are reconstructed by a low rank matrix recovery (LRMR) technique, and then, to further explore more accurate spatial information, they are used to identify a homogeneous neighborhood for the target pixel (i.e., the centroid pixel) adaptively. Finally, the adaptively identified homogenous neighborhood which consists of the latent low-rank spectra is embedded into the spatial-spectral kernel framework. It can easily map the spectra into the nonlinearly complex manifolds and enable a classifier (e.g., support vector machine, SVM) to distinguish them effectively. Experimental results on three real HSI datasets validate that the proposed LRCISSK method can effectively explore the spatial-spectral information and deliver superior performance with at least 1.30% higher OA and 1.03% higher AA on average when compared to other state-of-the-art classifiers.

Journal ArticleDOI
TL;DR: This paper proposes a novel detection framework for complex PQ disturbances based on multifusion convolutional neural network (MFCNN), and focuses on automatic extraction and fusion of features from multiple sources.
Abstract: Intelligent identification of multiple power quality (PQ) disturbances is very useful for pollution control of power systems. In this paper, we propose a novel detection framework for complex PQ disturbances based on multifusion convolutional neural network (MFCNN). Our contributions focus on automatic extraction and fusion of features from multiple sources. First, an information fusion structure is introduced in which the time domain and frequency domain information of the PQ disturbance signal are used as inputs. Additionally, the one-dimensional composite convolution is proposed to improve the diversity of network features based on the standard convolution and dilated convolution. Then, to speed up the training and prevent overfitting, batch normalization is used to adjust the distribution of features. Second, we use several visualization methods to resolve the internal mode of MFCNN, and demonstrate the working mechanism of the proposed method. Finally, we conduct various experiments to verify the effectiveness of the MFCNN. Compared with the handcrafted feature design methods and the general convolutional neural network models, the simulation under different noises and hardware platform-based experiments verify the effectiveness of noise immunity, higher training speed, and better accuracy of the method.

Journal ArticleDOI
TL;DR: A novel framework for deriving fractional order DTMs (FrDTMs) by the eigen-decomposition of kernel matrices is proposed in this paper, and some properties of the proposed FrDTMs are analyzed.