Showing papers on "Convolutional neural network published in 2019"

PDF

Open Access

Posted Content•

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

[...]

Mingxing Tan¹, Quoc V. Le¹•Institutions (1)

28 May 2019-arXiv: Learning

TL;DR: A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.

...read moreread less

Abstract: Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. Based on this observation, we propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. We demonstrate the effectiveness of this method on scaling up MobileNets and ResNet. To go even further, we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.3% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Source code is at this https URL.

...read moreread less

6,222 citations

Journal Article•DOI•

A survey on Image Data Augmentation for Deep Learning

[...]

Connor Shorten¹, Taghi M. Khoshgoftaar¹•Institutions (1)

Florida Atlantic University¹

06 Jul 2019-Journal of Big Data

TL;DR: This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing DataAugmentation, a data-space solution to the problem of limited data.

...read moreread less

Abstract: Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data. Unfortunately, many application domains do not have access to big data, such as medical image analysis. This survey focuses on Data Augmentation, a data-space solution to the problem of limited data. Data Augmentation encompasses a suite of techniques that enhance the size and quality of training datasets such that better Deep Learning models can be built using them. The image augmentation algorithms discussed in this survey include geometric transformations, color space augmentations, kernel filters, mixing images, random erasing, feature space augmentation, adversarial training, generative adversarial networks, neural style transfer, and meta-learning. The application of augmentation methods based on GANs are heavily covered in this survey. In addition to augmentation techniques, this paper will briefly discuss other characteristics of Data Augmentation such as test-time augmentation, resolution impact, final dataset size, and curriculum learning. This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing Data Augmentation. Readers will understand how Data Augmentation can improve the performance of their models and expand limited datasets to take advantage of the capabilities of big data.

...read moreread less

5,782 citations

Proceedings Article•DOI•

ArcFace: Additive Angular Margin Loss for Deep Face Recognition

[...]

Jiankang Deng¹, Jia Guo, Niannan Xue¹, Stefanos Zafeiriou¹•Institutions (1)

Imperial College London¹

15 Jun 2019

TL;DR: This paper presents arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks, and shows that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead.

...read moreread less

Abstract: One of the main challenges in feature learning using Deep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss functions that can enhance the discriminative power. Centre loss penalises the distance between deep features and their corresponding class centres in the Euclidean space to achieve intra-class compactness. SphereFace assumes that the linear transformation matrix in the last fully connected layer can be used as a representation of the class centres in the angular space and therefore penalises the angles between deep features and their corresponding weights in a multiplicative way. Recently, a popular line of research is to incorporate margins in well-established loss functions in order to maximise face class separability. In this paper, we propose an Additive Angular Margin Loss (ArcFace) to obtain highly discriminative features for face recognition. The proposed ArcFace has a clear geometric interpretation due to its exact correspondence to geodesic distance on a hypersphere. We present arguably the most extensive experimental evaluation against all recent state-of-the-art face recognition methods on ten face recognition benchmarks which includes a new large-scale image database with trillions of pairs and a large-scale video dataset. We show that ArcFace consistently outperforms the state of the art and can be easily implemented with negligible computational overhead. To facilitate future research, the code has been made available.

...read moreread less

4,312 citations

Journal Article•DOI•

Dynamic Graph CNN for Learning on Point Clouds

[...]

Yue Wang¹, Yongbin Sun¹, Ziwei Liu², Sanjay E. Sarma¹, Michael M. Bronstein³, Justin Solomon¹ - Show less +2 more•Institutions (3)

Massachusetts Institute of Technology¹, University of California, Berkeley², Imperial College London³

10 Oct 2019-ACM Transactions on Graphics

TL;DR: This work proposes a new neural network module suitable for CNN-based high-level tasks on point clouds, including classification and segmentation called EdgeConv, which acts on graphs dynamically computed in each layer of the network.

...read moreread less

Abstract: Point clouds provide a flexible geometric representation suitable for countless applications in computer graphics; they also comprise the raw output of most 3D data acquisition devices. While hand-designed features on point clouds have long been proposed in graphics and vision, however, the recent overwhelming success of convolutional neural networks (CNNs) for image analysis suggests the value of adapting insight from CNN to the point cloud world. Point clouds inherently lack topological information, so designing a model to recover topology can enrich the representation power of point clouds. To this end, we propose a new neural network module dubbed EdgeConv suitable for CNN-based high-level tasks on point clouds, including classification and segmentation. EdgeConv acts on graphs dynamically computed in each layer of the network. It is differentiable and can be plugged into existing architectures. Compared to existing modules operating in extrinsic space or treating each point independently, EdgeConv has several appealing properties: It incorporates local neighborhood information; it can be stacked applied to learn global shape properties; and in multi-layer systems affinity in feature space captures semantic characteristics over potentially long distances in the original embedding. We show the performance of our model on standard benchmarks, including ModelNet40, ShapeNetPart, and S3DIS.

...read moreread less

3,727 citations

Proceedings Article•

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

[...]

Mingxing Tan¹, Quoc V. Le¹•Institutions (1)

Google¹

24 May 2019

TL;DR: EfficientNet-B7 as discussed by the authors proposes a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient, which achieves state-of-the-art accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference.

...read moreread less

3,445 citations

Journal Article•DOI•

Object Detection With Deep Learning: A Review

[...]

Zhong-Qiu Zhao¹, Peng Zheng¹, Shou-Tao Xu¹, Xindong Wu²•Institutions (2)

Hefei University of Technology¹, University of Louisiana at Lafayette²

28 Jan 2019-IEEE Transactions on Neural Networks

TL;DR: In this article, a review of deep learning-based object detection frameworks is provided, focusing on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further.

...read moreread less

Abstract: Due to object detection’s close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles that combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy, and optimization function. In this paper, we provide a review of deep learning-based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely, the convolutional neural network. Then, we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection, and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network-based learning systems.

...read moreread less

3,097 citations

Proceedings Article•DOI•

CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features

[...]

Sangdoo Yun¹, Dongyoon Han¹, Sanghyuk Chun¹, Seong Joon Oh, Youngjoon Yoo¹, Junsuk Choe² - Show less +2 more•Institutions (2)

Naver Corporation¹, Yonsei University²

07 Aug 2019

TL;DR: CutMix as discussed by the authors augments the training data by cutting and pasting patches among training images, where the ground truth labels are also mixed proportionally to the area of the patches.

...read moreread less

Abstract: Regional dropout strategies have been proposed to enhance performance of convolutional neural network classifiers. They have proved to be effective for guiding the model to attend on less discriminative parts of objects (e.g. leg as opposed to head of a person), thereby letting the network generalize better and have better object localization capabilities. On the other hand, current methods for regional dropout removes informative pixels on training images by overlaying a patch of either black pixels or random noise. Such removal is not desirable because it suffers from information loss causing inefficiency in training. We therefore propose the CutMix augmentation strategy: patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches. By making efficient use of training pixels and retaining the regularization effect of regional dropout, CutMix consistently outperforms state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on ImageNet weakly-supervised localization task. Moreover, unlike previous augmentation methods, our CutMix-trained ImageNet classifier, when used as a pretrained model, results in consistent performance gain in Pascal detection and MS-COCO image captioning benchmarks. We also show that CutMix can improve the model robustness against input corruptions and its out-of distribution detection performance.

...read moreread less

3,013 citations

Proceedings Article•DOI•

MnasNet: Platform-Aware Neural Architecture Search for Mobile

[...]

Mingxing Tan¹, Bo Chen¹, Ruoming Pang¹, Vijay K. Vasudevan¹, Mark Sandler¹, Andrew Howard¹, Quoc V. Le¹ - Show less +3 more•Institutions (1)

Google¹

01 Jun 2019

TL;DR: In this article, the authors propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporates model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency.

...read moreread less

Abstract: Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, we propose a novel factorized hierarchical search space that encourages layer diversity throughout the network. Experimental results show that our approach consistently outperforms state-of-the-art mobile CNN models across multiple vision tasks. On the ImageNet classification task, our MnasNet achieves 75.2% top-1 accuracy with 78ms latency on a Pixel phone, which is 1.8× faster than MobileNetV2 with 0.5% higher accuracy and 2.3× faster than NASNet with 1.2% higher accuracy. Our MnasNet also achieves better mAP quality than MobileNets for COCO object detection. Code is at https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet.

...read moreread less

1,841 citations

Journal Article•DOI•

Deep learning for time series classification: a review

[...]

Hassan Ismail Fawaz¹, Germain Forestier², Jonathan Weber¹, Lhassane Idoumghar¹, Pierre-Alain Muller¹ - Show less +1 more•Institutions (2)

University of Upper Alsace¹, Monash University²

01 Jul 2019-Data Mining and Knowledge Discovery

TL;DR: This article proposes the most exhaustive study of DNNs for TSC by training 8730 deep learning models on 97 time series datasets and provides an open source deep learning framework to the TSC community.

...read moreread less

Abstract: Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-of-the-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.

...read moreread less

1,833 citations

Journal Article•DOI•

Deep learning and its applications to machine health monitoring

[...]

Rui Zhao¹, Ruqiang Yan¹, Zhenghua Chen², Kezhi Mao², Peng Wang³, Robert X. Gao³ - Show less +2 more•Institutions (3)

Xi'an Jiaotong University¹, Nanyang Technological University², Case Western Reserve University³

15 Jan 2019-Mechanical Systems and Signal Processing

TL;DR: The applications of deep learning in machine health monitoring systems are reviewed mainly from the following aspects: Auto-encoder and its variants, Restricted Boltzmann Machines, Convolutional Neural Networks, and Recurrent Neural Networks.

...read moreread less

1,569 citations

Proceedings Article•DOI•

Selective Kernel Networks

[...]

Xiang Li¹, Wenhai Wang², Xiaolin Hu³, Jian Yang¹•Institutions (3)

Nanjing University of Science and Technology¹, Tsinghua University², Nanjing University³

01 Jun 2019

TL;DR: SKNet as discussed by the authors proposes a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information, which can capture target objects with different scales.

...read moreread less

Abstract: In standard Convolutional Neural Networks (CNNs), the receptive fields of artificial neurons in each layer are designed to share the same size. It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches. Different attentions on these branches yield different sizes of the effective receptive fields of neurons in the fusion layer. Multiple SK units are stacked to a deep network termed Selective Kernel Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show that SKNet outperforms the existing state-of-the-art architectures with lower model complexity. Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input. The code and models are available at https://github.com/implus/SKNet.

...read moreread less

Proceedings Article•DOI•

PointConv: Deep Convolutional Networks on 3D Point Clouds

[...]

Wenxuan Wu¹, Zhongang Qi¹, Li Fuxin¹•Institutions (1)

Oregon State University¹

15 Jun 2019

TL;DR: The dynamic filter is extended to a new convolution operation, named PointConv, which can be applied on point clouds to build deep convolutional networks and is able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds.

...read moreread less

Abstract: Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and the density functions through kernel density estimation. A novel reformulation is proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure.

...read moreread less

Proceedings Article•DOI•

Second-Order Attention Network for Single Image Super-Resolution

[...]

Tao Dai¹, Jianrui Cai², Yongbing Zhang¹, Shu-Tao Xia¹, Lei Zhang² - Show less +1 more•Institutions (2)

Tsinghua University¹, Hong Kong Polytechnic University²

15 Jun 2019

TL;DR: Experimental results demonstrate the superiority of the SAN network over state-of-the-art SISR methods in terms of both quantitative metrics and visual quality.

...read moreread less

Abstract: Recently, deep convolutional neural networks (CNNs) have been widely explored in single image super-resolution (SISR) and obtained remarkable performance. However, most of the existing CNN-based SISR methods mainly focus on wider or deeper architecture design, neglecting to explore the feature correlations of intermediate layers, hence hindering the representational power of CNNs. To address this issue, in this paper, we propose a second-order attention network (SAN) for more powerful feature expression and feature correlation learning. Specifically, a novel train- able second-order channel attention (SOCA) module is developed to adaptively rescale the channel-wise features by using second-order feature statistics for more discriminative representations. Furthermore, we present a non-locally enhanced residual group (NLRG) structure, which not only incorporates non-local operations to capture long-distance spatial contextual information, but also contains repeated local-source residual attention groups (LSRAG) to learn increasingly abstract feature representations. Experimental results demonstrate the superiority of our SAN network over state-of-the-art SISR methods in terms of both quantitative metrics and visual quality.

...read moreread less

Journal Article•DOI•

HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition

[...]

Rajeev Ranjan¹, Vishal M. Patel², Rama Chellappa¹•Institutions (2)

University of Maryland, College Park¹, Rutgers University²

01 Jan 2019-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: HyperFace as discussed by the authors combines face detection, landmarks localization, pose estimation and gender recognition using deep convolutional neural networks (CNNs) and achieves significant improvement in performance by fusing intermediate layers of a deep CNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features.

...read moreread less

Abstract: We present an algorithm for simultaneous face detection, landmarks localization, pose estimation and gender recognition using deep convolutional neural networks (CNN). The proposed method called, HyperFace, fuses the intermediate layers of a deep CNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features. It exploits the synergy among the tasks which boosts up their individual performances. Additionally, we propose two variants of HyperFace: (1) HyperFace-ResNet that builds on the ResNet-101 model and achieves significant improvement in performance, and (2) Fast-HyperFace that uses a high recall fast face detector for generating region proposals to improve the speed of the algorithm. Extensive experiments show that the proposed models are able to capture both global and local information in faces and performs significantly better than many competitive algorithms for each of these four tasks.

...read moreread less

Journal Article•DOI•

Graph Convolutional Networks for Text Classification

[...]

Liang Yao¹, Chengsheng Mao¹, Yuan Luo¹•Institutions (1)

Northwestern University¹

17 Jul 2019

TL;DR: Zhang et al. as discussed by the authors proposed a Text Graph Convolutional Network (Text GCN) for text classification, which jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents.

...read moreread less

Abstract: Text classification is an important and classical problem in natural language processing. There have been a number of studies that applied convolutional neural networks (convolution on regular grid, e.g., sequence) to classification. However, only a limited number of studies have explored the more flexible graph convolutional neural networks (convolution on non-grid, e.g., arbitrary graph) for the task. In this work, we propose to use graph convolutional networks for text classification. We build a single text graph for a corpus based on word co-occurrence and document word relations, then learn a Text Graph Convolutional Network (Text GCN) for the corpus. Our Text GCN is initialized with one-hot representation for word and document, it then jointly learns the embeddings for both words and documents, as supervised by the known class labels for documents. Our experimental results on multiple benchmark datasets demonstrate that a vanilla Text GCN without any external word embeddings or knowledge outperforms state-of-the-art methods for text classification. On the other hand, Text GCN also learns predictive word and document embeddings. In addition, experimental results show that the improvement of Text GCN over state-of-the-art comparison methods become more prominent as we lower the percentage of training data, suggesting the robustness of Text GCN to less training data in text classification.

...read moreread less

Posted Content•

ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks

[...]

Qilong Wang¹, Banggu Wu¹, Pengfei Zhu¹, Peihua Li², Wangmeng Zuo³, Qinghua Hu¹ - Show less +2 more•Institutions (3)

Tianjin University¹, Dalian University of Technology², Harbin Institute of Technology³

08 Oct 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain, and develops a method to adaptively select kernel size of 1D convolution, determining coverage of local cross-channel interaction.

...read moreread less

Abstract: Recently, channel attention mechanism has demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most existing methods dedicate to developing more sophisticated attention modules for achieving better performance, which inevitably increase model complexity. To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. By dissecting the channel attention module in SENet, we empirically show avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity. Therefore, we propose a local cross-channel interaction strategy without dimensionality reduction, which can be efficiently implemented via $1D$ convolution. Furthermore, we develop a method to adaptively select kernel size of $1D$ convolution, determining coverage of local cross-channel interaction. The proposed ECA module is efficient yet effective, e.g., the parameters and computations of our modules against backbone of ResNet50 are 80 vs. 24.37M and 4.7e-4 GFLOPs vs. 3.86 GFLOPs, respectively, and the performance boost is more than 2% in terms of Top-1 accuracy. We extensively evaluate our ECA module on image classification, object detection and instance segmentation with backbones of ResNets and MobileNetV2. The experimental results show our module is more efficient while performing favorably against its counterparts.

...read moreread less

Journal Article•DOI•

Deeply Supervised Salient Object Detection with Short Connections

[...]

Qibin Hou¹, Ming-Ming Cheng¹, Xiaowei Hu¹, Ali Borji², Zhuowen Tu³, Philip H. S. Torr⁴ - Show less +2 more•Institutions (4)

Nankai University¹, University of Central Florida², University of California, San Diego³, University of Oxford⁴

01 Apr 2019-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A new saliency method is proposed by introducing short connections to the skip-layer structures within the HED architecture, which produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency, effectiveness, and simplicity over the existing algorithms.

...read moreread less

Abstract: Recent progress on salient object detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and salient object detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. The Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture. Our framework takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms. Beyond that, we conduct an exhaustive analysis of the role of training data on performance. We provide a training set for future research and fair comparisons.

...read moreread less

Proceedings Article•DOI•

ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation

[...]

Tuan-Hung Vu¹, Himalaya Jain¹, Maxime Bucher¹, Matthieu Cord², Patrick Pérez¹ - Show less +1 more•Institutions (2)

Valeo¹, University of Paris²

15 Jun 2019

TL;DR: This work proposes two novel, complementary methods using (i) entropy loss and (ii) adversarial loss respectively for unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions.

...read moreread less

Abstract: Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real-world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) entropy loss and (ii) adversarial loss respectively. We demonstrate state-of-the-art performance in semantic segmentation on two challenging “synthetic-2-real” set-ups and show that the approach can also be used for detection.

...read moreread less

Proceedings Article•DOI•

Bag of Tricks for Image Classification with Convolutional Neural Networks

[...]

Tong He¹, Zhi Zhang¹, Hang Zhang¹, Zhongyue Zhang¹, Junyuan Xie¹, Mu Li¹ - Show less +2 more•Institutions (1)

Amazon.com¹

01 Jun 2019

TL;DR: This article examined a collection of such refinements and empirically evaluated their impact on the final model accuracy through ablation study, and showed that by combining these refinements together, they are able to improve various CNN models significantly.

...read moreread less

Abstract: Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes in data augmentations and optimization methods. In the literature, however, most refinements are either briefly mentioned as implementation details or only visible in source code. In this paper, we will examine a collection of such refinements and empirically evaluate their impact on the final model accuracy through ablation study. We will show that, by combining these refinements together, we are able to improve various CNN models significantly. For example, we raise ResNet-50's top-1 validation accuracy from 75.3% to 79.29% on ImageNet. We will also demonstrate that improvement on image classification accuracy leads to better transfer learning performance in other application domains such as object detection and semantic segmentation.

...read moreread less

Journal Article•DOI•

Attention gated networks: Learning to leverage salient regions in medical images.

[...]

Jo Schlemper¹, Ozan Oktay¹, Michiel Schaap, Mattias P. Heinrich, Bernhard Kainz¹, Ben Glocker¹, Daniel Rueckert¹ - Show less +3 more•Institutions (1)

Imperial College London¹

05 Feb 2019-Medical Image Analysis

TL;DR: Experimental results show that AG models consistently improve the prediction performance of the base architectures across different datasets and training sizes while preserving computational efficiency.

...read moreread less

Proceedings Article•DOI•

BASNet: Boundary-Aware Salient Object Detection

[...]

Xuebin Qin¹, Zichen Zhang¹, Chenyang Huang¹, Chao Gao¹, Masood Dehghan¹, Martin Jagersand¹ - Show less +2 more•Institutions (1)

University of Alberta¹

01 Jun 2019

TL;DR: Experimental results on six public datasets show that the proposed predict-refine architecture, BASNet, outperforms the state-of-the-art methods both in terms of regional and boundary evaluation measures.

...read moreread less

Abstract: Deep Convolutional Neural Networks have been adopted for salient object detection and achieved the state-of-the-art performance. Most of the previous works however focus on region accuracy but not on the boundary quality. In this paper, we propose a predict-refine architecture, BASNet, and a new hybrid loss for Boundary-Aware Salient object detection. Specifically, the architecture is composed of a densely supervised Encoder-Decoder network and a residual refinement module, which are respectively in charge of saliency prediction and saliency map refinement. The hybrid loss guides the network to learn the transformation between the input image and the ground truth in a three-level hierarchy -- pixel-, patch- and map- level -- by fusing Binary Cross Entropy (BCE), Structural SIMilarity (SSIM) and Intersection-over-Union (IoU) losses. Equipped with the hybrid loss, the proposed predict-refine architecture is able to effectively segment the salient object regions and accurately predict the fine structures with clear boundaries. Experimental results on six public datasets show that our method outperforms the state-of-the-art methods both in terms of regional and boundary evaluation measures. Our method runs at over 25 fps on a single GPU. The code is available at: https://github.com/NathanUA/BASNet.

...read moreread less

Journal Article•DOI•

A State-of-the-Art Survey on Deep Learning Theory and Architectures

[...]

Zahangir Alom, Tarek M. Taha, Chris Yakopcic, Stefan Westberg, Paheding Sidike, Mst Shamima Nasrin, Mahmudul Hasan, Brian Van Essen, Abdul A. S. Awwal, Vijayan K. Asari - Show less +6 more

05 Mar 2019-Electronics

TL;DR: This survey presents a brief survey on the advances that have occurred in the area of Deep Learning (DL), starting with the Deep Neural Network and goes on to cover Convolutional Neural Network, Recurrent Neural Network (RNN), and Deep Reinforcement Learning (DRL).

...read moreread less

Abstract: In recent years, deep learning has garnered tremendous success in a variety of application domains. This new field of machine learning has been growing rapidly and has been applied to most traditional application domains, as well as some new areas that present more opportunities. Different methods have been proposed based on different categories of learning, including supervised, semi-supervised, and un-supervised learning. Experimental results show state-of-the-art performance using deep learning when compared to traditional machine learning approaches in the fields of image processing, computer vision, speech recognition, machine translation, art, medical imaging, medical information processing, robotics and control, bioinformatics, natural language processing, cybersecurity, and many others. This survey presents a brief survey on the advances that have occurred in the area of Deep Learning (DL), starting with the Deep Neural Network (DNN). The survey goes on to cover Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), Auto-Encoder (AE), Deep Belief Network (DBN), Generative Adversarial Network (GAN), and Deep Reinforcement Learning (DRL). Additionally, we have discussed recent developments, such as advanced variant DL techniques based on these DL approaches. This work considers most of the papers published after 2012 from when the history of deep learning began. Furthermore, DL approaches that have been explored and evaluated in different application domains are also included in this survey. We also included recently developed frameworks, SDKs, and benchmark datasets that are used for implementing and evaluating deep learning approaches. There are some surveys that have been published on DL using neural networks and a survey on Reinforcement Learning (RL). However, those papers have not discussed individual advanced techniques for training large-scale deep learning models and the recently developed method of generative models.

...read moreread less

Journal Article•DOI•

Review of Deep Learning Algorithms and Architectures

[...]

Ajay Shrestha¹, Ausif Mahmood¹•Institutions (1)

University of Bridgeport¹

22 Apr 2019-IEEE Access

TL;DR: This paper reviews several optimization methods to improve the accuracy of the training and to reduce training time, and delve into the math behind training algorithms used in recent deep networks.

...read moreread less

Abstract: Deep learning (DL) is playing an increasingly important role in our lives. It has already made a huge impact in areas, such as cancer diagnosis, precision medicine, self-driving cars, predictive forecasting, and speech recognition. The painstakingly handcrafted feature extractors used in traditional learning, classification, and pattern recognition systems are not scalable for large-sized data sets. In many cases, depending on the problem complexity, DL can also overcome the limitations of earlier shallow networks that prevented efficient training and abstractions of hierarchical representations of multi-dimensional training data. Deep neural network (DNN) uses multiple (deep) layers of units with highly optimized algorithms and architectures. This paper reviews several optimization methods to improve the accuracy of the training and to reduce training time. We delve into the math behind training algorithms used in recent deep networks. We describe current shortcomings, enhancements, and implementations. The review also covers different types of deep architectures, such as deep convolution networks, deep residual networks, recurrent neural networks, reinforcement learning, variational autoencoders, and others.

...read moreread less

Journal Article•DOI•

CE-Net: Context Encoder Network for 2D Medical Image Segmentation

[...]

Zaiwang Gu¹, Jun Cheng², Huazhu Fu, Kang Zhou³, Huaying Hao², Yitian Zhao², Tianyang Zhang², Shenghua Gao³, Jiang Liu² - Show less +5 more•Institutions (3)

Shanghai University¹, Chinese Academy of Sciences², ShanghaiTech University³

07 Mar 2019-IEEE Transactions on Medical Imaging

TL;DR: Comprehensive results show that the proposed CE-Net method outperforms the original U- net method and other state-of-the-art methods for optic disc segmentation, vessel detection, lung segmentation , cell contour segmentation and retinal optical coherence tomography layer segmentation.

...read moreread less

Abstract: Medical image segmentation is an important step in medical image analysis. With the rapid development of a convolutional neural network in image processing, deep learning has been used for medical image segmentation, such as optic disc segmentation, blood vessel detection, lung segmentation, cell segmentation, and so on. Previously, U-net based approaches have been proposed. However, the consecutive pooling and strided convolutional operations led to the loss of some spatial information. In this paper, we propose a context encoder network (CE-Net) to capture more high-level information and preserve spatial information for 2D medical image segmentation. CE-Net mainly contains three major components: a feature encoder module, a context extractor, and a feature decoder module. We use the pretrained ResNet block as the fixed feature extractor. The context extractor module is formed by a newly proposed dense atrous convolution block and a residual multi-kernel pooling block. We applied the proposed CE-Net to different 2D medical image segmentation tasks. Comprehensive results show that the proposed method outperforms the original U-Net method and other state-of-the-art methods for optic disc segmentation, vessel detection, lung segmentation, cell contour segmentation, and retinal optical coherence tomography layer segmentation.

...read moreread less

Journal Article•DOI•

MoDL: Model-Based Deep Learning Architecture for Inverse Problems

[...]

Hemant Kumar Aggarwal¹, Merry Mani¹, Mathews Jacob¹•Institutions (1)

University of Iowa¹

01 Feb 2019-IEEE Transactions on Medical Imaging

TL;DR: In this article, a convolution neural network (CNN)-based regularization prior is proposed for inverse problems with the arbitrary structure, where the forward model is explicitly accounted for and a smaller network with fewer parameters is sufficient to capture the image information compared to direct inversion.

...read moreread less

Abstract: We introduce a model-based image reconstruction framework with a convolution neural network (CNN)-based regularization prior. The proposed formulation provides a systematic approach for deriving deep architectures for inverse problems with the arbitrary structure. Since the forward model is explicitly accounted for, a smaller network with fewer parameters is sufficient to capture the image information compared to direct inversion approaches. Thus, reducing the demand for training data and training time. Since we rely on end-to-end training with weight sharing across iterations, the CNN weights are customized to the forward model, thus offering improved performance over approaches that rely on pre-trained denoisers. Our experiments show that the decoupling of the number of iterations from the network complexity offered by this approach provides benefits, including lower demand for training data, reduced risk of overfitting, and implementations with significantly reduced memory footprint. We propose to enforce data-consistency by using numerical optimization blocks, such as conjugate gradients algorithm within the network. This approach offers faster convergence per iteration, compared to methods that rely on proximal gradients steps to enforce data consistency. Our experiments show that the faster convergence translates to improved performance, primarily when the available GPU memory restricts the number of iterations.

...read moreread less

Journal Article•DOI•

CE-Net: Context Encoder Network for 2D Medical Image Segmentation

[...]

Zaiwang Gu¹, Jun Cheng², Huazhu Fu, Kang Zhou³, Huaying Hao², Yitian Zhao², Tianyang Zhang², Shenghua Gao³, Jiang Liu² - Show less +5 more•Institutions (3)

Shanghai University¹, Chinese Academy of Sciences², ShanghaiTech University³

07 Mar 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: Li et al. as mentioned in this paper proposed a context encoder network (referred to as CE-Net) to capture more high-level information and preserve spatial information for 2D medical image segmentation, which mainly contains three major components: a feature encoder module, a context extractor and a feature decoder module.

...read moreread less

Abstract: Medical image segmentation is an important step in medical image analysis. With the rapid development of convolutional neural network in image processing, deep learning has been used for medical image segmentation, such as optic disc segmentation, blood vessel detection, lung segmentation, cell segmentation, etc. Previously, U-net based approaches have been proposed. However, the consecutive pooling and strided convolutional operations lead to the loss of some spatial information. In this paper, we propose a context encoder network (referred to as CE-Net) to capture more high-level information and preserve spatial information for 2D medical image segmentation. CE-Net mainly contains three major components: a feature encoder module, a context extractor and a feature decoder module. We use pretrained ResNet block as the fixed feature extractor. The context extractor module is formed by a newly proposed dense atrous convolution (DAC) block and residual multi-kernel pooling (RMP) block. We applied the proposed CE-Net to different 2D medical image segmentation tasks. Comprehensive results show that the proposed method outperforms the original U-Net method and other state-of-the-art methods for optic disc segmentation, vessel detection, lung segmentation, cell contour segmentation and retinal optical coherence tomography layer segmentation.

...read moreread less

Journal Article•DOI•

Deep learning for electroencephalogram (EEG) classification tasks: a review.

[...]

Alexander Craik¹, Yongtian He¹, Jose L. Contreras-Vidal¹•Institutions (1)

University of Houston¹

26 Feb 2019-Journal of Neural Engineering

TL;DR: Practical suggestions on the selection of many hyperparameters are provided in the hope that they will promote or guide the deployment of deep learning to EEG datasets in future research.

...read moreread less

Abstract: Objective Electroencephalography (EEG) analysis has been an important tool in neuroscience with applications in neuroscience, neural engineering (e.g. Brain-computer interfaces, BCI's), and even commercial applications. Many of the analytical tools used in EEG studies have used machine learning to uncover relevant information for neural classification and neuroimaging. Recently, the availability of large EEG data sets and advances in machine learning have both led to the deployment of deep learning architectures, especially in the analysis of EEG signals and in understanding the information it may contain for brain functionality. The robust automatic classification of these signals is an important step towards making the use of EEG more practical in many applications and less reliant on trained professionals. Towards this goal, a systematic review of the literature on deep learning applications to EEG classification was performed to address the following critical questions: (1) Which EEG classification tasks have been explored with deep learning? (2) What input formulations have been used for training the deep networks? (3) Are there specific deep learning network structures suitable for specific types of tasks? Approach A systematic literature review of EEG classification using deep learning was performed on Web of Science and PubMed databases, resulting in 90 identified studies. Those studies were analyzed based on type of task, EEG preprocessing methods, input type, and deep learning architecture. Main results For EEG classification tasks, convolutional neural networks, recurrent neural networks, deep belief networks outperform stacked auto-encoders and multi-layer perceptron neural networks in classification accuracy. The tasks that used deep learning fell into five general groups: emotion recognition, motor imagery, mental workload, seizure detection, event related potential detection, and sleep scoring. For each type of task, we describe the specific input formulation, major characteristics, and end classifier recommendations found through this review. Significance This review summarizes the current practices and performance outcomes in the use of deep learning for EEG classification. Practical suggestions on the selection of many hyperparameters are provided in the hope that they will promote or guide the deployment of deep learning to EEG datasets in future research.

...read moreread less

Journal Article•DOI•

Deep Convolutional Transfer Learning Network: A New Method for Intelligent Fault Diagnosis of Machines With Unlabeled Data

[...]

Liang Guo¹, Yaguo Lei¹, Saibo Xing¹, Tao Yan¹, Naipeng Li¹ - Show less +1 more•Institutions (1)

Xi'an Jiaotong University¹

01 Sep 2019-IEEE Transactions on Industrial Electronics

TL;DR: A new intelligent method named deep convolutional transfer learning network (DCTLN) is proposed, which facilitates the 1-D CNN to learn domain-invariant features by maximizing domain recognition errors and minimizing the probability distribution distance.

...read moreread less

Abstract: The success of intelligent fault diagnosis of machines relies on the following two conditions: 1) labeled data with fault information are available; and 2) the training and testing data are drawn from the same probability distribution. However, for some machines, it is difficult to obtain massive labeled data. Moreover, even though labeled data can be obtained from some machines, the intelligent fault diagnosis method trained with such labeled data possibly fails in classifying unlabeled data acquired from the other machines due to data distribution discrepancy. These problems limit the successful applications of intelligent fault diagnosis of machines with unlabeled data. As a potential tool, transfer learning adapts a model trained in a source domain to its application in a target domain. Based on the transfer learning, we propose a new intelligent method named deep convolutional transfer learning network (DCTLN). A DCTLN consists of two modules: condition recognition and domain adaptation. The condition recognition module is constructed by a one-dimensional (1-D) convolutional neural network (CNN) to automatically learn features and recognize health conditions of machines. The domain adaptation module facilitates the 1-D CNN to learn domain-invariant features by maximizing domain recognition errors and minimizing the probability distribution distance. The effectiveness of the proposed method is verified using six transfer fault diagnosis experiments.

...read moreread less

Proceedings Article•DOI•

A Simple Pooling-Based Design for Real-Time Salient Object Detection

[...]

Jiang-Jiang Liu¹, Qibin Hou¹, Ming-Ming Cheng¹, Jiashi Feng², Jianmin Jiang³ - Show less +1 more•Institutions (3)

Nankai University¹, National University of Singapore², Shenzhen University³

15 Jun 2019

TL;DR: This work solves the problem of salient object detection by investigating how to expand the role of pooling in convolutional neural networks by building a global guidance module (GGM) and designing a feature aggregation module (FAM) to make the coarse-level semantic information well fused with the fine-level features from the top-down path- way.

...read moreread less

Abstract: We solve the problem of salient object detection by investigating how to expand the role of pooling in convolutional neural networks. Based on the U-shape architecture, we first build a global guidance module (GGM) upon the bottom-up pathway, aiming at providing layers at different feature levels the location information of potential salient objects. We further design a feature aggregation module (FAM) to make the coarse-level semantic information well fused with the fine-level features from the top-down path- way. By adding FAMs after the fusion operations in the top-down pathway, coarse-level features from the GGM can be seamlessly merged with features at various scales. These two pooling-based modules allow the high-level semantic features to be progressively refined, yielding detail enriched saliency maps. Experiment results show that our proposed approach can more accurately locate the salient objects with sharpened details and hence substantially improve the performance compared to the previous state-of-the-arts. Our approach is fast as well and can run at a speed of more than 30 FPS when processing a 300×400 image. Code can be found at http://mmcheng.net/poolnet/.

...read moreread less

Proceedings Article•DOI•

Cascaded Partial Decoder for Fast and Accurate Salient Object Detection

[...]

Zhe Wu¹, Li Su¹, Qingming Huang¹•Institutions (1)

Chinese Academy of Sciences¹

15 Jun 2019

TL;DR: A novel Cascaded Partial Decoder (CPD) framework for fast and accurate salient object detection and applies the proposed framework to optimize existing multi-level feature aggregation models and significantly improve their efficiency and accuracy.

...read moreread less

Abstract: Existing state-of-the-art salient object detection networks rely on aggregating multi-level features of pre-trained convolutional neural networks (CNNs). However, compared to high-level features, low-level features contribute less to performance. Meanwhile, they raise more computational cost because of their larger spatial resolutions. In this paper, we propose a novel Cascaded Partial Decoder (CPD) framework for fast and accurate salient object detection. On the one hand, the framework constructs partial decoder which discards larger resolution features of shallow layers for acceleration. On the other hand, we observe that integrating features of deep layers will obtain relatively precise saliency map. Therefore we directly utilize generated saliency map to recurrently optimize features of deep layers. This strategy efficiently suppresses distractors in the features and significantly improves their representation ability. Experiments conducted on five benchmark datasets exhibit that the proposed model not only achieves state-of-the-art but also runs much faster than existing models. Besides, we apply the proposed framework to optimize existing multi-level feature aggregation models and significantly improve their efficiency and accuracy.

...read moreread less

Collapse