scispace - formally typeset
Search or ask a question

Showing papers on "Convolutional neural network published in 2021"


Journal ArticleDOI
TL;DR: Res2Net as mentioned in this paper constructs hierarchical residual-like connections within one single residual block to represent multi-scale features at a granular level and increases the range of receptive fields for each network layer.
Abstract: Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layer-wise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods. The source code and trained models are available on https://mmcheng.net/res2net/ .

1,553 citations


Proceedings ArticleDOI
23 Aug 2021
TL;DR: Wang et al. as discussed by the authors proposed a strong baseline model SwinIR for image restoration based on the Swin Transformer, which consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction.
Abstract: Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by $\textbf{up to 0.14$\sim$0.45dB}$, while the total number of parameters can be reduced by $\textbf{up to 67%}$.

1,064 citations


Journal ArticleDOI
TL;DR: An extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos as a subset of unsupervised learning methods to learn general image and video features from large-scale unlabeled data without using any human-annotated labels is provided.
Abstract: Large-scale labeled data are generally required to train deep neural networks in order to obtain better performance in visual feature learning from images or videos for computer vision applications. To avoid extensive cost of collecting and annotating large-scale datasets, as a subset of unsupervised learning methods, self-supervised learning methods are proposed to learn general image and video features from large-scale unlabeled data without using any human-annotated labels. This paper provides an extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos. First, the motivation, general pipeline, and terminologies of this field are described. Then the common deep neural network architectures that used for self-supervised learning are summarized. Next, the schema and evaluation metrics of self-supervised learning methods are reviewed followed by the commonly used datasets for images, videos, audios, and 3D data, as well as the existing self-supervised visual feature learning methods. Finally, quantitative performance comparisons of the reviewed methods on benchmark datasets are summarized and discussed for both image and video feature learning. At last, this paper is concluded and lists a set of promising future directions for self-supervised visual feature learning.

876 citations


Journal ArticleDOI
TL;DR: A comprehensive review of deep learning-based image segmentation can be found in this article, where the authors investigate the relationships, strengths, and challenges of these DL-based models, examine the widely used datasets, compare performances, and discuss promising research directions.
Abstract: Image segmentation is a key task in computer vision and image processing with important applications such as scene understanding, medical image analysis, robotic perception, video surveillance, augmented reality, and image compression, among others, and numerous segmentation algorithms are found in the literature. Against this backdrop, the broad success of Deep Learning (DL) has prompted the development of new image segmentation approaches leveraging DL models. We provide a comprehensive review of this recent literature, covering the spectrum of pioneering efforts in semantic and instance segmentation, including convolutional pixel-labeling networks, encoder-decoder architectures, multiscale and pyramid-based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the relationships, strengths, and challenges of these DL-based segmentation models, examine the widely used datasets, compare performances, and discuss promising research directions.

827 citations


Journal ArticleDOI
TL;DR: In this paper, five pre-trained convolutional neural network-based models were proposed for the detection of coronavirus pneumonia-infected patient using chest X-ray radiographs.
Abstract: The 2019 novel coronavirus disease (COVID-19), with a starting point in China, has spread rapidly among people living in other countries and is approaching approximately 101,917,147 cases worldwide according to the statistics of World Health Organization. There are a limited number of COVID-19 test kits available in hospitals due to the increasing cases daily. Therefore, it is necessary to implement an automatic detection system as a quick alternative diagnosis option to prevent COVID-19 spreading among people. In this study, five pre-trained convolutional neural network-based models (ResNet50, ResNet101, ResNet152, InceptionV3 and Inception-ResNetV2) have been proposed for the detection of coronavirus pneumonia-infected patient using chest X-ray radiographs. We have implemented three different binary classifications with four classes (COVID-19, normal (healthy), viral pneumonia and bacterial pneumonia) by using five-fold cross-validation. Considering the performance results obtained, it has been seen that the pre-trained ResNet50 model provides the highest classification performance (96.1% accuracy for Dataset-1, 99.5% accuracy for Dataset-2 and 99.7% accuracy for Dataset-3) among other four used models.

769 citations


Journal ArticleDOI
TL;DR: This paper presents a comprehensive review of the general architecture and principals of 1D CNNs along with their major engineering applications, especially focused on the recent progress in this field.

659 citations


Journal ArticleDOI
TL;DR: In this article, a deep CNN, called Decompose, Transfer, and Compose (DeTraC), was used for the classification of COVID-19 chest X-ray images.
Abstract: Chest X-ray is the first imaging technique that plays an important role in the diagnosis of COVID-19 disease. Due to the high availability of large-scale annotated image datasets, great success has been achieved using convolutional neural networks (CNN s) for image recognition and classification. However, due to the limited availability of annotated medical images, the classification of medical images remains the biggest challenge in medical diagnosis. Thanks to transfer learning, an effective mechanism that can provide a promising solution by transferring knowledge from generic object recognition tasks to domain-specific tasks. In this paper, we validate and a deep CNN, called Decompose, Transfer, and Compose (DeTraC), for the classification of COVID-19 chest X-ray images. DeTraC can deal with any irregularities in the image dataset by investigating its class boundaries using a class decomposition mechanism. The experimental results showed the capability of DeTraC in the detection of COVID-19 cases from a comprehensive image dataset collected from several hospitals around the world. High accuracy of 93.1% (with a sensitivity of 100%) was achieved by DeTraC in the detection of COVID-19 X-ray images from normal, and severe acute respiratory syndrome cases.

644 citations


Journal ArticleDOI
TL;DR: A new minibatch GCN is developed that is capable of inferring out-of-sample data without retraining networks and improving classification performance, and three fusion strategies are explored: additive fusion, elementwise multiplicative fusion, and concatenation fusion to measure the obtained performance gain.
Abstract: Convolutional neural networks (CNNs) have been attracting increasing attention in hyperspectral (HS) image classification due to their ability to capture spatial–spectral feature representations. Nevertheless, their ability in modeling relations between the samples remains limited. Beyond the limitations of grid sampling, graph convolutional networks (GCNs) have been recently proposed and successfully applied in irregular (or nongrid) data representation and analysis. In this article, we thoroughly investigate CNNs and GCNs (qualitatively and quantitatively) in terms of HS image classification. Due to the construction of the adjacency matrix on all the data, traditional GCNs usually suffer from a huge computational cost, particularly in large-scale remote sensing (RS) problems. To this end, we develop a new minibatch GCN (called miniGCN hereinafter), which allows to train large-scale GCNs in a minibatch fashion. More significantly, our miniGCN is capable of inferring out-of-sample data without retraining networks and improving classification performance. Furthermore, as CNNs and GCNs can extract different types of HS features, an intuitive solution to break the performance bottleneck of a single model is to fuse them. Since miniGCNs can perform batchwise network training (enabling the combination of CNNs and GCNs), we explore three fusion strategies: additive fusion, elementwise multiplicative fusion, and concatenation fusion to measure the obtained performance gain. Extensive experiments, conducted on three HS data sets, demonstrate the advantages of miniGCNs over GCNs and the superiority of the tested fusion strategies with regard to the single CNN or GCN models. The codes of this work will be available at https://github.com/danfenghong/IEEE_TGRS_GCN for the sake of reproducibility.

560 citations


Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a residual dense block (RDB) to extract abundant local features via densely connected convolutional layers, which further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory mechanism.
Abstract: Recently, deep convolutional neural network (CNN) has achieved great success for image restoration (IR) and provided hierarchical features at the same time. However, most deep CNN based IR models do not make full use of the hierarchical features from the original low-quality images; thereby, resulting in relatively-low performance. In this work, we propose a novel and efficient residual dense network (RDN) to address this problem in IR, by making a better tradeoff between efficiency and effectiveness in exploiting the hierarchical features from all the convolutional layers. Specifically, we propose residual dense block (RDB) to extract abundant local features via densely connected convolutional layers. RDB further allows direct connections from the state of preceding RDB to all the layers of current RDB, leading to a contiguous memory mechanism. To adaptively learn more effective features from preceding and current local features and stabilize the training of wider network, we proposed local feature fusion in RDB. After fully obtaining dense local features, we use global feature fusion to jointly and adaptively learn global hierarchical features in a holistic way. We demonstrate the effectiveness of RDN with several representative IR applications, single image super-resolution, Gaussian image denoising, image compression artifact reduction, and image deblurring. Experiments on benchmark and real-world datasets show that our RDN achieves favorable performance against state-of-the-art methods for each IR task quantitatively and visually.

519 citations


Journal ArticleDOI
TL;DR: This paper extends the preliminary work PointRCNN to a novel and strong point-cloud-based 3D object detection framework, the part-aware and aggregation neural network, which outperforms all existing 3D detection methods and achieves new state-of-the-art on KITTI 3D objects detection dataset by utilizing only the LiDAR point cloud data.
Abstract: 3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications. In this paper, we extend our preliminary work PointRCNN to a novel and strong point-cloud-based 3D object detection framework, the part-aware and aggregation neural network (Part- $A^2$ A 2 net). The whole framework consists of the part-aware stage and the part-aggregation stage. First, the part-aware stage for the first time fully utilizes free-of-charge part supervisions derived from 3D ground-truth boxes to simultaneously predict high quality 3D proposals and accurate intra-object part locations. The predicted intra-object part locations within the same proposal are grouped by our new-designed RoI-aware point cloud pooling module, which results in an effective representation to encode the geometry-specific features of each 3D proposal. Then the part-aggregation stage learns to re-score the box and refine the box location by exploring the spatial relationship of the pooled intra-object part locations. Extensive experiments are conducted to demonstrate the performance improvements from each component of our proposed framework. Our Part- $A^2$ A 2 net outperforms all existing 3D detection methods and achieves new state-of-the-art on KITTI 3D object detection dataset by utilizing only the LiDAR point cloud data.

500 citations


Book ChapterDOI
27 Sep 2021
TL;DR: Jeon et al. as discussed by the authors proposed a gated axial-attention model which extends the existing transformer-based architectures by introducing an additional control mechanism in the selfattention module.
Abstract: Over the past decade, deep convolutional neural networks have been widely adopted for medical image segmentation and shown to achieve adequate performance. However, due to inherent inductive biases present in convolutional architectures, they lack understanding of long-range dependencies in the image. Recently proposed transformer-based architectures that leverage self-attention mechanism encode long-range dependencies and learn representations that are highly expressive. This motivates us to explore transformer-based solutions and study the feasibility of using transformer-based network architectures for medical image segmentation tasks. Majority of existing transformer-based network architectures proposed for vision applications require large-scale datasets to train properly. However, compared to the datasets for vision applications, in medical imaging the number of data samples is relatively low, making it difficult to efficiently train transformers for medical imaging applications. To this end, we propose a gated axial-attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module. Furthermore, to train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance. Specifically, we operate on the whole image and patches to learn global and local features, respectively. The proposed Medical Transformer (MedT) is evaluated on three different medical image segmentation datasets and it is shown that it achieves better performance than the convolutional and other related transformer-based architectures. Code: https://github.com/jeya-maria-jose/Medical-Transformer

Posted Content
TL;DR: A simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3 × 3 convolution and ReLU, while the training-time model has a multi-branch topology.
Abstract: We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG. On ImageNet, RepVGG reaches over 80% top-1 accuracy, which is the first time for a plain model, to the best of our knowledge. On NVIDIA 1080Ti GPU, RepVGG models run 83% faster than ResNet-50 or 101% faster than ResNet-101 with higher accuracy and show favorable accuracy-speed trade-off compared to the state-of-the-art models like EfficientNet and RegNet. The code and trained models are available at this https URL.

Journal ArticleDOI
TL;DR: The structural principle, the characteristics, and some kinds of classic models of deep learning, such as stacked auto encoder, deep belief network, deep Boltzmann machine, and convolutional neural network are described.

Journal ArticleDOI
TL;DR: An Attention-based Bidirectional CNN-RNN Deep Model (ABCDM) is proposed that achieves state-of-the-art results on both long review and short tweet polarity classification and is evaluated on sentiment polarity detection.

Journal ArticleDOI
06 Jan 2021-Nature
TL;DR: In this paper, a universal optical vector convolutional accelerator operating at more than ten TOPS (trillions (1012) of operations per second, or tera-ops per second), generating convolutions of images with 250,000 pixels was used for facial image recognition.
Abstract: Convolutional neural networks, inspired by biological visual cortex systems, are a powerful category of artificial neural networks that can extract the hierarchical features of raw data to provide greatly reduced parametric complexity and to enhance the accuracy of prediction. They are of great interest for machine learning tasks such as computer vision, speech recognition, playing board games and medical diagnosis1–7. Optical neural networks offer the promise of dramatically accelerating computing speed using the broad optical bandwidths available. Here we demonstrate a universal optical vector convolutional accelerator operating at more than ten TOPS (trillions (1012) of operations per second, or tera-ops per second), generating convolutions of images with 250,000 pixels—sufficiently large for facial image recognition. We use the same hardware to sequentially form an optical convolutional neural network with ten output neurons, achieving successful recognition of handwritten digit images at 88 per cent accuracy. Our results are based on simultaneously interleaving temporal, wavelength and spatial dimensions enabled by an integrated microcomb source. This approach is scalable and trainable to much more complex networks for demanding applications such as autonomous vehicles and real-time video recognition. An optical vector convolutional accelerator operating at more than ten trillion operations per second is used to create an optical convolutional neural network that can successfully recognize handwritten digit images with 88 per cent accuracy.

Journal ArticleDOI
Zewen Li1, Fan Liu1, Wenjie Yang1, Shouheng Peng1, Jun Zhou2 
TL;DR: In this article, the authors provide an overview of various convolutional neural network (CNN) models and provide several rules of thumb for functions and hyperparameter selection, as well as open issues and promising directions for future work.
Abstract: A convolutional neural network (CNN) is one of the most significant networks in the deep learning field. Since CNN made impressive achievements in many areas, including but not limited to computer vision and natural language processing, it attracted much attention from both industry and academia in the past few years. The existing reviews mainly focus on CNN's applications in different scenarios without considering CNN from a general perspective, and some novel ideas proposed recently are not covered. In this review, we aim to provide some novel ideas and prospects in this fast-growing field. Besides, not only 2-D convolution but also 1-D and multidimensional ones are involved. First, this review introduces the history of CNN. Second, we provide an overview of various convolutions. Third, some classic and advanced CNN models are introduced; especially those key points making them reach state-of-the-art results. Fourth, through experimental analysis, we draw some conclusions and provide several rules of thumb for functions and hyperparameter selection. Fifth, the applications of 1-D, 2-D, and multidimensional convolution are covered. Finally, some open issues and promising directions for CNN are discussed as guidelines for future work.

Proceedings Article
18 Jul 2021
TL;DR: GPSA is introduced, a form of positional self-attention which can be equipped with a "soft" convolutional inductive bias and outperforms the DeiT on ImageNet, while offering a much improved sample efficiency.
Abstract: Convolutional architectures have proven extremely successful for vision tasks. Their hard inductive biases enable sample-efficient learning, but come at the cost of a potentially lower performance ceiling. Vision Transformers (ViTs) rely on more flexible self-attention layers, and have recently outperformed CNNs for image classification. However, they require costly pre-training on large external datasets or distillation from pre-trained convolutional networks. In this paper, we ask the following question: is it possible to combine the strengths of these two architectures while avoiding their respective limitations? To this end, we introduce gated positional self-attention (GPSA), a form of positional self-attention which can be equipped with a ``soft" convolutional inductive bias. We initialise the GPSA layers to mimic the locality of convolutional layers, then give each attention head the freedom to escape locality by adjusting a gating parameter regulating the attention paid to position versus content information. The resulting convolutional-like ViT architecture, ConViT, outperforms the DeiT on ImageNet, while offering a much improved sample efficiency. We further investigate the role of locality in learning by first quantifying how it is encouraged in vanilla self-attention layers, then analysing how it is escaped in GPSA layers. We conclude by presenting various ablations to better understand the success of the ConViT. Our code and models are released publicly at this https URL.

Proceedings ArticleDOI
11 Jan 2021
TL;DR: RepVGG as mentioned in this paper decouples the training-time and inference-time architecture by a structural re-parameterization technique and achieves state-of-the-art accuracy on ImageNet.
Abstract: We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3 × 3 convolution and ReLU, while the training-time model has a multi-branch topology. Such decoupling of the training-time and inference-time architecture is realized by a structural re-parameterization technique so that the model is named RepVGG. On ImageNet, RepVGG reaches over 80% top-1 accuracy, which is the first time for a plain model, to the best of our knowledge. On NVIDIA 1080Ti GPU, RepVGG models run 83% faster than ResNet-50 or 101% faster than ResNet-101 with higher accuracy and show favorable accuracy-speed trade-off compared to the state-of-the-art models like EfficientNet and RegNet. The code and trained models are available at https://github.com/megvii-model/RepVGG.

Posted Content
Chun-Fu Chen1, Quanfu Fan1, Rameswar Panda1
TL;DR: Zhang et al. as mentioned in this paper proposed a dual-branch transformer to combine image patches (i.e., tokens in a transformer) of different sizes to produce stronger image features, which achieved promising results on image classification compared to convolutional neural networks.
Abstract: The recently developed vision transformer (ViT) has achieved promising results on image classification compared to convolutional neural networks. Inspired by this, in this paper, we study how to learn multi-scale feature representations in transformer models for image classification. To this end, we propose a dual-branch transformer to combine image patches (i.e., tokens in a transformer) of different sizes to produce stronger image features. Our approach processes small-patch and large-patch tokens with two separate branches of different computational complexity and these tokens are then fused purely by attention multiple times to complement each other. Furthermore, to reduce computation, we develop a simple yet effective token fusion module based on cross attention, which uses a single token for each branch as a query to exchange information with other branches. Our proposed cross-attention only requires linear time for both computational and memory complexity instead of quadratic time otherwise. Extensive experiments demonstrate that our approach performs better than or on par with several concurrent works on vision transformer, in addition to efficient CNN models. For example, on the ImageNet1K dataset, with some architectural changes, our approach outperforms the recent DeiT by a large margin of 2\% with a small to moderate increase in FLOPs and model parameters. Our source codes and models are available at \url{this https URL}.

Journal ArticleDOI
TL;DR: Compared to other state-of-the-art segmentation networks, this model yields better segmentation performance, increasing the accuracy of the predictions while reducing the standard deviation, which demonstrates the efficiency of the approach to generate precise and reliable automatic segmentations of medical images.
Abstract: Even though convolutional neural networks (CNNs) are driving progress in medical image segmentation, standard models still have some drawbacks. First, the use of multi-scale approaches, i.e., encoder-decoder architectures, leads to a redundant use of information, where similar low-level features are extracted multiple times at multiple scales. Second, long-range feature dependencies are not efficiently modeled, resulting in non-optimal discriminative feature representations associated with each semantic class. In this paper we attempt to overcome these limitations with the proposed architecture, by capturing richer contextual dependencies based on the use of guided self-attention mechanisms. This approach is able to integrate local features with their corresponding global dependencies, as well as highlight interdependent channel maps in an adaptive manner. Further, the additional loss between different modules guides the attention mechanisms to neglect irrelevant information and focus on more discriminant regions of the image by emphasizing relevant feature associations. We evaluate the proposed model in the context of semantic segmentation on three different datasets: abdominal organs, cardiovascular structures and brain tumors. A series of ablation experiments support the importance of these attention modules in the proposed architecture. In addition, compared to other state-of-the-art segmentation networks our model yields better segmentation performance, increasing the accuracy of the predictions while reducing the standard deviation. This demonstrates the efficiency of our approach to generate precise and reliable automatic segmentations of medical images. Our code is made publicly available at: https://github.com/sinAshish/Multi-Scale-Attention .

Journal ArticleDOI
TL;DR: A single-shot alignment network (S2A-Net) consisting of two modules: a feature alignment module (FAM) and an oriented detection module (ODM) that can achieve the state-of-the-art performance on two commonly used aerial objects’ data sets while keeping high efficiency.
Abstract: The past decade has witnessed significant progress on detecting objects in aerial images that are often distributed with large-scale variations and arbitrary orientations. However, most of existing methods rely on heuristically defined anchors with different scales, angles, and aspect ratios, and usually suffer from severe misalignment between anchor boxes (ABs) and axis-aligned convolutional features, which lead to the common inconsistency between the classification score and localization accuracy. To address this issue, we propose a single-shot alignment network (S²A-Net) consisting of two modules: a feature alignment module (FAM) and an oriented detection module (ODM). The FAM can generate high-quality anchors with an anchor refinement network and adaptively align the convolutional features according to the ABs with a novel alignment convolution. The ODM first adopts active rotating filters to encode the orientation information and then produces orientation-sensitive and orientation-invariant features to alleviate the inconsistency between classification score and localization accuracy. Besides, we further explore the approach to detect objects in large-size images, which leads to a better trade-off between speed and accuracy. Extensive experiments demonstrate that our method can achieve the state-of-the-art performance on two commonly used aerial objects' data sets (i.e., DOTA and HRSC2016) while keeping high efficiency.

Journal ArticleDOI
TL;DR: A comprehensive survey of the recent developments in 3D reconstruction using convolutional neural networks, focusing on the works which use deep learning techniques to estimate the 3D shape of generic objects either from a single or multiple RGB images.
Abstract: 3D reconstruction is a longstanding ill-posed problem, which has been explored for decades by the computer vision, computer graphics, and machine learning communities. Since 2015, image-based 3D reconstruction using convolutional neural networks (CNN) has attracted increasing interest and demonstrated an impressive performance. Given this new era of rapid evolution, this article provides a comprehensive survey of the recent developments in this field. We focus on the works which use deep learning techniques to estimate the 3D shape of generic objects either from a single or multiple RGB images. We organize the literature based on the shape representations, the network architectures, and the training mechanisms they use. While this survey is intended for methods which reconstruct generic objects, we also review some of the recent works which focus on specific object classes such as human body shapes and faces. We provide an analysis and comparison of the performance of some key papers, summarize some of the open problems in this field, and discuss promising directions for future research.

Journal ArticleDOI
TL;DR: This paper proposes a DL assisted automated method using X-ray images for early diagnosis of COVID-19 infection and evaluates the effectiveness of eight pre-trained Convolutional Neural Network models such as AlexNet, VGG-16, GoogleNet, MobileNet-V2, SqueezeNet, ResNet-34, Res net-50 and Inception-V3 for classification of CO VID-19 from normal cases.

Journal ArticleDOI
TL;DR: A novel CNN model called CoroDet for automatic detection of COVID-19 by using raw chest X-ray and CT scan images have been proposed and the experimental results indicate the superiority of Corodet over the existing state-of-the-art-methods.
Abstract: Background and Objective The Coronavirus 2019, or shortly COVID-19, is a viral disease that causes serious pneumonia and impacts our different body parts from mild to severe depending on patient’s immune system. This infection was first reported in Wuhan city of China in December 2019, and afterward, it became a global pandemic spreading rapidly around the world. As the virus spreads through human to human contact, it has affected our lives in a devastating way, including the vigorous pressure on the public health system, the world economy, education sector, workplaces, and shopping malls. Preventing viral spreading requires early detection of positive cases and to treat infected patients as quickly as possible. The need for COVID-19 testing kits has increased, and many of the developing countries in the world are facing a shortage of testing kits as new cases are increasing day by day. In this situation, the recent research using radiology imaging (such as X-ray and CT scan) techniques can be proven helpful to detect COVID-19 as X-ray and CT scan images provide important information about the disease caused by COVID-19 virus. The latest data mining and machine learning techniques such as Convolutional Neural Network (CNN) can be applied along with X-ray and CT scan images of the lungs for the accurate and rapid detection of the disease, assisting in mitigating the problem of scarcity of testing kits. Methods Hence a novel CNN model called CoroDet for automatic detection of COVID-19 by using raw chest X-ray and CT scan images have been proposed in this study. CoroDet is developed to serve as an accurate diagnostics for 2 class classification (COVID and Normal), 3 class classification (COVID, Normal, and non-COVID pneumonia), and 4 class classification (COVID, Normal, non-COVID viral pneumonia, and non-COVID bacterial pneumonia). Results The performance of our proposed model was compared with ten existing techniques for COVID detection in terms of accuracy. A classification accuracy of 99.1% for 2 class classification, 94.2% for 3 class classification, and 91.2% for 4 class classification was produced by our proposed model, which is obviously better than the state-of-the-art-methods used for COVID-19 detection to the best of our knowledge. Moreover, the dataset with x-ray images that we prepared for the evaluation of our method is the largest datasets for COVID detection as far as our knowledge goes. Conclusion The experimental results of our proposed method CoroDet indicate the superiority of CoroDet over the existing state-of-the-art-methods. CoroDet may assist clinicians in making appropriate decisions for COVID-19 detection and may also mitigate the problem of scarcity of testing kits.

Journal ArticleDOI
18 Apr 2021-Sensors
TL;DR: In this article, the authors proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM), which proved to be efficient with better accuracy that can work on lightweight computational devices.
Abstract: Deep learning models are efficient in learning the features that assist in understanding complex patterns precisely. This study proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM). The MobileNet V2 model proved to be efficient with a better accuracy that can work on lightweight computational devices. The proposed model is efficient in maintaining stateful information for precise predictions. A grey-level co-occurrence matrix is used for assessing the progress of diseased growth. The performance has been compared against other state-of-the-art models such as Fine-Tuned Neural Networks (FTNN), Convolutional Neural Network (CNN), Very Deep Convolutional Networks for Large-Scale Image Recognition developed by Visual Geometry Group (VGG), and convolutional neural network architecture that expanded with few changes. The HAM10000 dataset is used and the proposed method has outperformed other methods with more than 85% accuracy. Its robustness in recognizing the affected region much faster with almost 2× lesser computations than the conventional MobileNet model results in minimal computational efforts. Furthermore, a mobile application is designed for instant and proper action. It helps the patient and dermatologists identify the type of disease from the affected region's image at the initial stage of the skin disease. These findings suggest that the proposed system can help general practitioners efficiently and effectively diagnose skin conditions, thereby reducing further complications and morbidity.

Journal ArticleDOI
TL;DR: The proposed FGCNet model can assist radiologists to rapidly detect COVID-19 from chest CT images and gives better performance than all 15 state-of-the-art methods.

Journal ArticleDOI
Minghao Zhu1, Licheng Jiao1, Fang Liu1, Shuyuan Yang1, Jianing Wang1 
TL;DR: Zhang et al. as discussed by the authors proposed an end-to-end residual spectral-spatial attention network (RSSAN) for hyperspectral image classification, which takes raw 3D cubes as input data without additional feature engineering.
Abstract: In the last five years, deep learning has been introduced to tackle the hyperspectral image (HSI) classification and demonstrated good performance. In particular, the convolutional neural network (CNN)-based methods for HSI classification have made great progress. However, due to the high dimensionality of HSI and equal treatment of all bands, the performance of these methods is hampered by learning features from useless bands for classification. Moreover, for patchwise-based CNN models, equal treatment of spatial information from the pixel-centered neighborhood also hinders the performance of these methods. In this article, we propose an end-to-end residual spectral–spatial attention network (RSSAN) for HSI classification. The RSSAN takes raw 3-D cubes as input data without additional feature engineering. First, a spectral attention module is designed for spectral band selection from raw input data by emphasizing useful bands for classification and suppressing useless bands. Then, a spatial attention module is designed for the adaptive selection of spatial information by emphasizing pixels from the same class as the center pixel or those are useful for classification in the pixel-centered neighborhood and suppressing those from a different class or useless. Second, two attention modules are also used in the following CNN for adaptive feature refinement in spectral–spatial feature learning. Third, a sequential spectral–spatial attention module is embedded into a residual block to avoid overfitting and accelerate the training of the proposed model. Experimental studies demonstrate that the RSSAN achieved superior classification accuracy compared with the state of the art on three HSI data sets: Indian Pines (IN), University of Pavia (UP), and Kennedy Space Center (KSC).

Journal ArticleDOI
TL;DR: This review highlights what, in the context of CNNs, it means to be a good model in computational neuroscience and the various ways models can provide insight.
Abstract: Convolutional neural networks (CNNs) were inspired by early findings in the study of biological vision. They have since become successful tools in computer vision and state-of-the-art models of both neural activity and behavior on visual tasks. This review highlights what, in the context of CNNs, it means to be a good model in computational neuroscience and the various ways models can provide insight. Specifically, it covers the origins of CNNs and the methods by which we validate them as models of biological vision. It then goes on to elaborate on what we can learn about biological vision by understanding and experimenting on CNNs and discusses emerging opportunities for the use of CNNs in vision research beyond basic object recognition.

Proceedings ArticleDOI
06 Jun 2021
TL;DR: Zhang et al. as discussed by the authors proposed an efficient shuffle attention (SA) module, which adopts Shuffle Units to combine two types of attention mechanisms effectively, i.e., spatial attention and channel attention.
Abstract: Attention mechanisms, which enable a neural network to accurately focus on all the relevant elements of the input, have become an essential component to improve the performance of deep neural networks. There are mainly two attention mechanisms widely used in computer vision studies, spatial attention and channel attention, which aim to capture the pixel-level pairwise relationship and channel dependency, respectively. Although fusing them together may achieve better performance than their individual implementations, it will inevitably increase the computational overhead. In this paper, we propose an efficient Shuffle Attention (SA) module to address this issue, which adopts Shuffle Units to combine two types of attention mechanisms effectively. Specifically, SA first groups channel dimensions into multiple sub-features before processing them in parallel. Then, for each sub-feature, SA utilizes a Shuffle Unit to depict feature dependencies in both spatial and channel dimensions. After that, all sub-features are aggregated and a "channel shuffle" operator is adopted to enable information communication between different sub-features. The proposed SA module is efficient yet effective, e.g., the parameters and computations of SA against the backbone ResNet50 are 300 vs. 25.56M and 2.76e-3 GFLOPs vs. 4.12 GFLOPs, respectively, and the performance boost is more than 1.34% in terms of Top-1 accuracy. Extensive experimental results on common-used benchmarks, including ImageNet-1k for classification, MS COCO for object detection, and instance segmentation, demonstrate that the proposed SA outperforms the current SOTA methods significantly by achieving higher accuracy while having lower model complexity.

Journal ArticleDOI
TL;DR: The experimental results in this paper show that traditional machine learning has a better solution effect on small sample data sets, and deep learning framework has higher recognition accuracy on large sample data set.