scispace - formally typeset
Search or ask a question

Showing papers on "Convolutional neural network published in 2022"


Journal ArticleDOI
01 Dec 2022
TL;DR: In this article , the authors provide an overview of various convolutional neural network (CNN) models and provide several rules of thumb for functions and hyperparameter selection, as well as open issues and promising directions for future work.
Abstract: A convolutional neural network (CNN) is one of the most significant networks in the deep learning field. Since CNN made impressive achievements in many areas, including but not limited to computer vision and natural language processing, it attracted much attention from both industry and academia in the past few years. The existing reviews mainly focus on CNN's applications in different scenarios without considering CNN from a general perspective, and some novel ideas proposed recently are not covered. In this review, we aim to provide some novel ideas and prospects in this fast-growing field. Besides, not only 2-D convolution but also 1-D and multidimensional ones are involved. First, this review introduces the history of CNN. Second, we provide an overview of various convolutions. Third, some classic and advanced CNN models are introduced; especially those key points making them reach state-of-the-art results. Fourth, through experimental analysis, we draw some conclusions and provide several rules of thumb for functions and hyperparameter selection. Fifth, the applications of 1-D, 2-D, and multidimensional convolution are covered. Finally, some open issues and promising directions for CNN are discussed as guidelines for future work.

308 citations


Proceedings ArticleDOI
01 Jan 2022
TL;DR: UNETR as discussed by the authors utilizes a transformer encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information, while also following the successful U-shaped network design for the encoder and decoder.
Abstract: Fully Convolutional Neural Networks (FCNNs) with contracting and expanding paths have shown prominence for the majority of medical image segmentation applications since the past decade. In FCNNs, the encoder plays an integral role by learning both global and local features and contextual representations which can be utilized for semantic output prediction by the decoder. Despite their success, the locality of convolutional layers in FCNNs, limits the capability of learning long-range spatial dependencies. Inspired by the recent success of transformers for Natural Language Processing (NLP) in long-range sequence learning, we reformulate the task of volumetric (3D) medical image segmentation as a sequence-to-sequence prediction problem. We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a transformer as the encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information, while also following the successful "U-shaped" network design for the encoder and decoder. The transformer encoder is directly connected to a decoder via skip connections at different resolutions to compute the final semantic segmentation output. We have validated the performance of our method on the Multi Atlas Labeling Beyond The Cranial Vault (BTCV) dataset for multi-organ segmentation and the Medical Segmentation Decathlon (MSD) dataset for brain tumor and spleen segmentation tasks. Our benchmarks demonstrate new state-of-the-art performance on the BTCV leaderboard.

219 citations


Journal ArticleDOI
TL;DR: Spectralformer as discussed by the authors proposes a backbone network for hyperspectral image classification with transformers, which is capable of learning spectrally local sequence information from neighboring bands of HS images, yielding groupwise spectral embeddings.
Abstract: Hyperspectral (HS) images are characterized by approximately contiguous spectral information, enabling the fine identification of materials by capturing subtle spectral discrepancies. Due to their excellent locally contextual modeling ability, convolutional neural networks (CNNs) have been proven to be a powerful feature extractor in HS image classification. However, CNNs fail to mine and represent the sequence attributes of spectral signatures well due to the limitations of their inherent network backbone. To solve this issue, we rethink HS image classification from a sequential perspective with transformers and propose a novel backbone network called SpectralFormer . Beyond bandwise representations in classic transformers, SpectralFormer is capable of learning spectrally local sequence information from neighboring bands of HS images, yielding groupwise spectral embeddings. More significantly, to reduce the possibility of losing valuable information in the layerwise propagation process, we devise a cross-layer skip connection to convey memory-like components from shallow to deep layers by adaptively learning to fuse “soft” residuals across layers. It is worth noting that the proposed SpectralFormer is a highly flexible backbone network, which can be applicable to both pixelwise and patchwise inputs. We evaluate the classification performance of the proposed SpectralFormer on three HS datasets by conducting extensive experiments, showing the superiority over classic transformers and achieving a significant improvement in comparison with state-of-the-art backbone networks. The codes of this work will be available at https://github.com/danfenghong/IEEE_TGRS_SpectralFormer for the sake of reproducibility.

209 citations


Journal ArticleDOI
TL;DR: In this article , a brief overview of the You Only Look Once (YOLO) algorithm and its subsequent advanced versions is given, and the results show the differences and similarities among the YOLO versions and between CNNs.

166 citations


Journal ArticleDOI
TL;DR: In this article , a multi-focus image fusion (MFIF) method is employed to generate the fused image by integrating the fuzzy sets (FS) and convolutional neural network (CNN) to detect focused and unfocused parts in both source images.

131 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a graph convolutional network based on SenticNet to leverage the affective dependencies of the sentence according to the specific aspect, called Sentic GCN.
Abstract: Aspect-based sentiment analysis is a fine-grained sentiment analysis task, which needs to detection the sentiment polarity towards a given aspect. Recently, graph neural models over the dependency tree are widely applied for aspect-based sentiment analysis. Most existing works, however, they generally focus on learning the dependency information from contextual words to aspect words based on the dependency tree of the sentence, which lacks the exploitation of contextual affective knowledge with regard to the specific aspect. In this paper, we propose a graph convolutional network based on SenticNet to leverage the affective dependencies of the sentence according to the specific aspect, called Sentic GCN . To be specific, we explore a novel solution to construct the graph neural networks via integrating the affective knowledge from SenticNet to enhance the dependency graphs of sentences. Based on it, both the dependencies of contextual words and aspect words and the affective information between opinion words and the aspect are considered by the novel affective enhanced graph model. Experimental results on multiple public benchmark datasets illustrate that our proposed model can beat state-of-the-art methods.

127 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a graph convolutional network based on SenticNet to leverage the affective dependencies of the sentence according to the specific aspect, called Sentic GCN.
Abstract: Aspect-based sentiment analysis is a fine-grained sentiment analysis task, which needs to detection the sentiment polarity towards a given aspect. Recently, graph neural models over the dependency tree are widely applied for aspect-based sentiment analysis. Most existing works, however, they generally focus on learning the dependency information from contextual words to aspect words based on the dependency tree of the sentence, which lacks the exploitation of contextual affective knowledge with regard to the specific aspect. In this paper, we propose a graph convolutional network based on SenticNet to leverage the affective dependencies of the sentence according to the specific aspect, called Sentic GCN. To be specific, we explore a novel solution to construct the graph neural networks via integrating the affective knowledge from SenticNet to enhance the dependency graphs of sentences. Based on it, both the dependencies of contextual words and aspect words and the affective information between opinion words and the aspect are considered by the novel affective enhanced graph model. Experimental results on multiple public benchmark datasets illustrate that our proposed model can beat state-of-the-art methods.

126 citations


Proceedings ArticleDOI
01 Jun 2022
TL;DR: In this paper , a split-attention module is proposed to provide a simple and modular computation block that can serve as a drop-in replacement for the popular residual block, while producing more diverse representations via cross-feature interactions.
Abstract: The ability to learn richer network representations generally boosts the performance of deep learning models. To improve representation-learning in convolutional neural networks, we present a multi-branch architecture, which applies channel-wise attention across different network branches to leverage the complementary strengths of both feature-map attention and multi-path representation. Our proposed Split-Attention module provides a simple and modular computation block that can serve as a drop-in replacement for the popular residual block, while producing more diverse representations via cross-feature interactions. Adding a Split-Attention module into the architecture design space of RegNet-Y and FBNetV2 directly improves the performance of the resulting network. Replacing residual blocks with our Split-Attention module, we further design a new variant of the ResNet model, named ResNeSt, which outperforms EfficientNet in terms of the accuracy/latency trade-off.

117 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a unified dynamic deep spatio-temporal neural network model based on convolutional neural networks and long short-term memory, termed as (DHSTNet) to simultaneously predict crowd flows in every region of a city.

95 citations


Journal ArticleDOI
TL;DR: In this article , an unsupervised domain-share convolutional neural network (CNN) is proposed for efficient fault transfer diagnosis of machines from steady speeds to time-varying speeds.

94 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a unified dynamic deep spatio-temporal neural network model based on convolutional neural networks and long short-term memory, termed as (DHSTNet) to simultaneously predict crowd flows in every region of a city.

Journal ArticleDOI
TL;DR: In this paper, an unsupervised domain-share convolutional neural network (CNN) is proposed for efficient fault transfer diagnosis of machines from steady speeds to time-varying speeds.

Journal ArticleDOI
TL;DR: In this article , the authors provide guidance for selecting a model and transfer learning approaches for the medical image classification task and provide a review of transfer learning with convolutional neural networks.
Abstract: Transfer learning (TL) with convolutional neural networks aims to improve performances on a new task by leveraging the knowledge of similar tasks learned in advance. It has made a major contribution to medical image analysis as it overcomes the data scarcity problem as well as it saves time and hardware resources. However, transfer learning has been arbitrarily configured in the majority of studies. This review paper attempts to provide guidance for selecting a model and TL approaches for the medical image classification task.425 peer-reviewed articles were retrieved from two databases, PubMed and Web of Science, published in English, up until December 31, 2020. Articles were assessed by two independent reviewers, with the aid of a third reviewer in the case of discrepancies. We followed the PRISMA guidelines for the paper selection and 121 studies were regarded as eligible for the scope of this review. We investigated articles focused on selecting backbone models and TL approaches including feature extractor, feature extractor hybrid, fine-tuning and fine-tuning from scratch.The majority of studies (n = 57) empirically evaluated multiple models followed by deep models (n = 33) and shallow (n = 24) models. Inception, one of the deep models, was the most employed in literature (n = 26). With respect to the TL, the majority of studies (n = 46) empirically benchmarked multiple approaches to identify the optimal configuration. The rest of the studies applied only a single approach for which feature extractor (n = 38) and fine-tuning from scratch (n = 27) were the two most favored approaches. Only a few studies applied feature extractor hybrid (n = 7) and fine-tuning (n = 3) with pretrained models.The investigated studies demonstrated the efficacy of transfer learning despite the data scarcity. We encourage data scientists and practitioners to use deep models (e.g. ResNet or Inception) as feature extractors, which can save computational costs and time without degrading the predictive power.

Journal ArticleDOI
21 Jan 2022-Sensors
TL;DR: A new framework for breast cancer classification from ultrasound images that employs deep learning and the fusion of the best selected features is proposed, which outperforms recent techniques.
Abstract: After lung cancer, breast cancer is the second leading cause of death in women. If breast cancer is detected early, mortality rates in women can be reduced. Because manual breast cancer diagnosis takes a long time, an automated system is required for early cancer detection. This paper proposes a new framework for breast cancer classification from ultrasound images that employs deep learning and the fusion of the best selected features. The proposed framework is divided into five major steps: (i) data augmentation is performed to increase the size of the original dataset for better learning of Convolutional Neural Network (CNN) models; (ii) a pre-trained DarkNet-53 model is considered and the output layer is modified based on the augmented dataset classes; (iii) the modified model is trained using transfer learning and features are extracted from the global average pooling layer; (iv) the best features are selected using two improved optimization algorithms known as reformed differential evaluation (RDE) and reformed gray wolf (RGW); and (v) the best selected features are fused using a new probability-based serial approach and classified using machine learning algorithms. The experiment was conducted on an augmented Breast Ultrasound Images (BUSI) dataset, and the best accuracy was 99.1%. When compared with recent techniques, the proposed framework outperforms them.

Journal ArticleDOI
TL;DR: A comprehensive review of recent progress in applying deep learning techniques for spatio-temporal data mining can be found in this paper , where the authors categorize the spatiotemporal data into five different types, and then briefly introduce the deep learning models that are widely used in STDM.
Abstract: With the fast development of various positioning techniques such as Global Position System (GPS), mobile devices and remote sensing, spatio-temporal data has become increasingly available nowadays. Mining valuable knowledge from spatio-temporal data is critically important to many real-world applications including human mobility understanding, smart transportation, urban planning, public safety, health care and environmental management. As the number, volume and resolution of spatio-temporal data increase rapidly, traditional data mining methods, especially statistics-based methods for dealing with such data are becoming overwhelmed. Recently deep learning models such as recurrent neural network (RNN) and convolutional neural network (CNN) have achieved remarkable success in many domains due to the powerful ability in automatic feature representation learning, and are also widely applied in various spatio-temporal data mining (STDM) tasks such as predictive learning, anomaly detection and classification. In this paper, we provide a comprehensive review of recent progress in applying deep learning techniques for STDM. We first categorize the spatio-temporal data into five different types, and then briefly introduce the deep learning models that are widely used in STDM. Next, we classify existing literature based on the types of spatio-temporal data, the data mining tasks, and the deep learning models, followed by the applications of deep learning for STDM in different domains including transportation, on-demand service, climate & weather analysis, human mobility, location-based social network, crime analysis, and neuroscience. Finally, we conclude the limitations of current research and point out future research directions.

Journal ArticleDOI
TL;DR: A brief overview of some of the most significant deep learning schemes used in computer vision problems, that is, Convolutional Neural Networks, Deep Boltzmann Machines and Deep Belief Networks, and Stacked Denoising Autoencoders are provided.
Abstract: Over the last years deep learning methods have been shown to outperform previous state-of-the-art machine learning techniques in several fields, with computer vision being one of the most prominent cases. This review paper provides a brief overview of some of the most significant deep learning schemes used in computer vision problems, that is, Convolutional Neural Networks, Deep Boltzmann Machines and Deep Belief Networks, and Stacked Denoising Autoencoders. A brief account of their history, structure, advantages, and limitations is given, followed by a description of their applications in various computer vision tasks, such as object detection, face recognition, action and activity recognition, and human pose estimation. Finally, a brief overview is given of future directions in designing deep learning schemes for computer vision problems and the challenges involved therein.

Journal ArticleDOI
Xiang Li1
TL;DR: In this paper , the authors proposed a distributed parallelism strategy of convolutional neural network (CNN) for big data analysis (BDA) on the massive data generated in the smart city Internet of things (IoT).

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a distributed parallelism strategy of convolutional neural network (CNN) for big data analysis (BDA) on the massive data generated in the smart city Internet of things (IoT).

Journal ArticleDOI
TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion by using the feature maps of all preceding layers as inputs, and its own feature-maps are used as inputs into all subsequent layers.
Abstract: Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with $L$ layers have $L$ connections—one between each layer and its subsequent layer—our network has $\frac{L(L+1)}{2}$ direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, encourage feature reuse and substantially improve parameter efficiency. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less parameters and computation to achieve high performance.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed an improved SE-YOLOv5 network model for the recognition of tomato virus diseases, which used a squeeze-and-excitation module to realize the extraction of key features, using a human visual attention mechanism for reference.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a bidirectional convolutional recurrent neural network architecture to derive both past and future contexts by connecting two hidden layers of opposite directions to the same context.

Journal ArticleDOI
TL;DR: This paper focuses on an intensive review on deep-learning-based object recognition for both surface and underwater targets and concludes on state-of-the-art marine object recognition using deep learning techniques are drawn.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a feature alignment module (FAM) and an oriented detection module (ODM), which can generate high-quality anchors with an anchor refinement network and adaptively align the CNN features according to the anchor boxes with a novel alignment convolution.
Abstract: The past decade has witnessed significant progress on detecting objects in aerial images that are often distributed with large-scale variations and arbitrary orientations. However, most of existing methods rely on heuristically defined anchors with different scales, angles, and aspect ratios, and usually suffer from severe misalignment between anchor boxes (ABs) and axis-aligned convolutional features, which lead to the common inconsistency between the classification score and localization accuracy. To address this issue, we propose a single-shot alignment network (S 2 A-Net) consisting of two modules: a feature alignment module (FAM) and an oriented detection module (ODM). The FAM can generate high-quality anchors with an anchor refinement network and adaptively align the convolutional features according to the ABs with a novel alignment convolution. The ODM first adopts active rotating filters to encode the orientation information and then produces orientation-sensitive and orientation-invariant features to alleviate the inconsistency between classification score and localization accuracy. Besides, we further explore the approach to detect objects in large-size images, which leads to a better trade-off between speed and accuracy. Extensive experiments demonstrate that our method can achieve the state-of-the-art performance on two commonly used aerial objects’ data sets (i.e., DOTA and HRSC2016) while keeping high efficiency.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed a spatial and spectral HSI classification algorithm, Local Similarity Projection Gabor Filtering (LSPGF), which uses LSP-based reduced dimensional CNN with a 2-D Gabor filtering algorithm.
Abstract: Currently, the different deep neural network (DNN) learning approaches have done much for the classification of hyperspectral images (HSIs), especially most of them use the convolutional neural network (CNN). HSI data have the characteristics of multidimensionality, correlation, nonlinearity, and a large amount of data. Therefore, it is particularly important to extract deeper features in HSIs by reducing dimensionalities which help improve the classification in both spectral and spatial domains. In this article, we present a spatial–spectral HSI classification algorithm, local similarity projection Gabor filtering (LSPGF), which uses local similarity projection (LSP)-based reduced dimensional CNN with a 2-D Gabor filtering algorithm. First, use the local similarity analysis to reduce the dimensionality of the hyperspectral data, and then we use the 2-D Gabor filter to filter the reduced hyperspectral data to generate spatial tunnel information. Second, use the CNN to extract features from the original hyperspectral data to generate spectral tunnel information. Third, the spatial tunnel information and the spectral tunnel information are fused to form the spatial–spectral feature information, which is input into the deep CNN to extract more effective features; and finally, a dual optimization classifier is used to classify the final extracted features. This article compares the performance of the proposed method with other algorithms in three public HSI databases and shows that the overall accuracy of the classification of LSPGF outperforms all datasets.

Journal ArticleDOI
TL;DR: In this article , a CNN denoising block equipped with an element-wise subtraction structure is designed to exploit both the spatial features of the noisy channel matrices and the additive nature of the noise simultaneously.
Abstract: Channel estimation is one of the main tasks in realizing practical intelligent reflecting surface-assisted multi-user communication (IRS-MUC) systems. However, different from traditional communication systems, an IRS-MUC system generally involves a cascaded channel with a sophisticated statistical distribution. In this case, the optimal minimum mean square error (MMSE) estimator requires the calculation of a multidimensional integration which is intractable to be implemented in practice. To further improve the channel estimation performance, in this paper, we model the channel estimation as a denoising problem and adopt a deep residual learning (DReL) approach to implicitly learn the residual noise for recovering the channel coefficients from the noisy pilot-based observations. To this end, we first develop a versatile DReL-based channel estimation framework where a deep residual network (DRN)-based MMSE estimator is derived in terms of Bayesian philosophy. As a realization of the developed DReL framework, a convolutional neural network (CNN)-based DRN (CDRN) is then proposed for channel estimation in IRS-MUC systems, in which a CNN denoising block equipped with an element-wise subtraction structure is specifically designed to exploit both the spatial features of the noisy channel matrices and the additive nature of the noise simultaneously. In particular, an explicit expression of the proposed CDRN is derived and analyzed in terms of Bayesian estimation to characterize its properties theoretically. Finally, simulation results demonstrate that the performance of the proposed method approaches that of the optimal MMSE estimator requiring the availability of the prior probability density function of channel.

Journal ArticleDOI
TL;DR: In this article , a new method for detecting COVID-19 and pneumonia using chest X-ray images was proposed, which can be described as a three-step process and achieved the highest testing classification accuracy of 96.6% using the VGG-19 model associated with the binary robust invariant scalable key-points (BRISK) algorithm.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a multi-scale superpixel fusion network (MFGCN), where two different convolutional networks are utilized in two branches, separately, for hyperspectral image (HSI) classification.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a novel face mask detection framework FMD-Yolo to monitor whether people wear masks in a right way in public, which is an effective way to block the virus transmission.

Journal ArticleDOI
TL;DR: A review of CNN implementation on civil structure crack detection in the perspective of image pre-processing techniques, processing hardware, software tools, datasets, network architectures, learning procedures, loss functions, and network performance.

Journal ArticleDOI
Zhou Deng1
TL;DR: Zhang et al. as discussed by the authors proposed a feature adaptive transformer network (FAT-Net) which integrates an extra transformer branch to capture long-range dependencies and global context information, and employed a memory-efficient decoder and a feature adaptation module to enhance the feature fusion between the adjacent-level features by activating the effective channels and restraining the irrelevant background noise.