Showing papers on "Deep learning published in 2019"

PDF

Open Access

Proceedings Article•DOI•

A Style-Based Generator Architecture for Generative Adversarial Networks

[...]

Tero Karras¹, Samuli Laine¹, Timo Aila¹•Institutions (1)

15 Jun 2019

TL;DR: This paper proposed an alternative generator architecture for GANs, borrowing from style transfer literature, which leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images.

...read moreread less

Abstract: We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.

...read moreread less

6,564 citations

Journal Article•DOI•

A survey on Image Data Augmentation for Deep Learning

[...]

Connor Shorten¹, Taghi M. Khoshgoftaar¹•Institutions (1)

Florida Atlantic University¹

06 Jul 2019-Journal of Big Data

TL;DR: This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing DataAugmentation, a data-space solution to the problem of limited data.

...read moreread less

Abstract: Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant on big data to avoid overfitting. Overfitting refers to the phenomenon when a network learns a function with very high variance such as to perfectly model the training data. Unfortunately, many application domains do not have access to big data, such as medical image analysis. This survey focuses on Data Augmentation, a data-space solution to the problem of limited data. Data Augmentation encompasses a suite of techniques that enhance the size and quality of training datasets such that better Deep Learning models can be built using them. The image augmentation algorithms discussed in this survey include geometric transformations, color space augmentations, kernel filters, mixing images, random erasing, feature space augmentation, adversarial training, generative adversarial networks, neural style transfer, and meta-learning. The application of augmentation methods based on GANs are heavily covered in this survey. In addition to augmentation techniques, this paper will briefly discuss other characteristics of Data Augmentation such as test-time augmentation, resolution impact, final dataset size, and curriculum learning. This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing Data Augmentation. Readers will understand how Data Augmentation can improve the performance of their models and expand limited datasets to take advantage of the capabilities of big data.

...read moreread less

5,782 citations

Journal Article•DOI•

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations

[...]

Maziar Raissi¹, Paris Perdikaris², George Em Karniadakis¹•Institutions (2)

Brown University¹, University of Pennsylvania²

01 Feb 2019-Journal of Computational Physics

TL;DR: In this article, the authors introduce physics-informed neural networks, which are trained to solve supervised learning tasks while respecting any given laws of physics described by general nonlinear partial differential equations.

...read moreread less

5,448 citations

Journal Article•DOI•

Object Detection With Deep Learning: A Review

[...]

Zhong-Qiu Zhao¹, Peng Zheng¹, Shou-Tao Xu¹, Xindong Wu²•Institutions (2)

Hefei University of Technology¹, University of Louisiana at Lafayette²

28 Jan 2019-IEEE Transactions on Neural Networks

TL;DR: In this article, a review of deep learning-based object detection frameworks is provided, focusing on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further.

...read moreread less

Abstract: Due to object detection’s close relationship with video analysis and image understanding, it has attracted much research attention in recent years. Traditional object detection methods are built on handcrafted features and shallow trainable architectures. Their performance easily stagnates by constructing complex ensembles that combine multiple low-level image features with high-level context from object detectors and scene classifiers. With the rapid development in deep learning, more powerful tools, which are able to learn semantic, high-level, deeper features, are introduced to address the problems existing in traditional architectures. These models behave differently in network architecture, training strategy, and optimization function. In this paper, we provide a review of deep learning-based object detection frameworks. Our review begins with a brief introduction on the history of deep learning and its representative tool, namely, the convolutional neural network. Then, we focus on typical generic object detection architectures along with some modifications and useful tricks to improve detection performance further. As distinct specific detection tasks exhibit different characteristics, we also briefly survey several specific tasks, including salient object detection, face detection, and pedestrian detection. Experimental analyses are also provided to compare various methods and draw some meaningful conclusions. Finally, several promising directions and tasks are provided to serve as guidelines for future work in both object detection and relevant neural network-based learning systems.

...read moreread less

3,097 citations

Posted Content•

Fast Graph Representation Learning with PyTorch Geometric

[...]

Matthias Fey, Jan Eric Lenssen

06 Mar 2019-arXiv: Learning

TL;DR: PyTorch Geometric is introduced, a library for deep learning on irregularly structured input data such as graphs, point clouds and manifolds, built upon PyTorch, and a comprehensive comparative study of the implemented methods in homogeneous evaluation scenarios is performed.

...read moreread less

Abstract: We introduce PyTorch Geometric, a library for deep learning on irregularly structured input data such as graphs, point clouds and manifolds, built upon PyTorch. In addition to general graph data structures and processing methods, it contains a variety of recently published methods from the domains of relational learning and 3D data processing. PyTorch Geometric achieves high data throughput by leveraging sparse GPU acceleration, by providing dedicated CUDA kernels and by introducing efficient mini-batch handling for input examples of different size. In this work, we present the library in detail and perform a comprehensive comparative study of the implemented methods in homogeneous evaluation scenarios.

...read moreread less

2,308 citations

Proceedings Article•DOI•

Semantic Image Synthesis With Spatially-Adaptive Normalization

[...]

Taesung Park¹, Ming-Yu Liu², Ting-Chun Wang², Jun-Yan Zhu³•Institutions (3)

University of California, Berkeley¹, Nvidia², Massachusetts Institute of Technology³

18 Mar 2019

TL;DR: S spatially-adaptive normalization is proposed, a simple but effective layer for synthesizing photorealistic images given an input semantic layout that allows users to easily control the style and content of image synthesis results as well as create multi-modal results.

...read moreread less

Abstract: We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. Previous methods directly feed the semantic layout as input to the network, forcing the network to memorize the information throughout all the layers. Instead, we propose using the input layout for modulating the activations in normalization layers through a spatially-adaptive, learned affine transformation. Experiments on several challenging datasets demonstrate the superiority of our method compared to existing approaches, regarding both visual fidelity and alignment with input layouts. Finally, our model allows users to easily control the style and content of image synthesis results as well as create multi-modal results. Code is available upon publication.

...read moreread less

2,159 citations

Journal Article•DOI•

Deep learning and process understanding for data-driven Earth system science

[...]

Markus Reichstein¹, Gustau Camps-Valls², Bjorn Stevens¹, Martin Jung¹, Joachim Denzler³, Nuno Carvalhais¹, Nuno Carvalhais⁴, Prabhat⁵ - Show less +4 more•Institutions (5)

Max Planck Society¹, University of Valencia², University of Jena³, Universidade Nova de Lisboa⁴, Lawrence Berkeley National Laboratory⁵

13 Feb 2019-Nature

TL;DR: It is argued that contextual cues should be used as part of deep learning to gain further process understanding of Earth system science problems, improving the predictive ability of seasonal forecasting and modelling of long-range spatial connections across multiple timescales.

...read moreread less

Abstract: Machine learning approaches are increasingly used to extract patterns and insights from the ever-increasing stream of geospatial data, but current approaches may not be optimal when system behaviour is dominated by spatial or temporal context. Here, rather than amending classical machine learning, we argue that these contextual cues should be used as part of deep learning (an approach that is able to extract spatio-temporal features automatically) to gain further process understanding of Earth system science problems, improving the predictive ability of seasonal forecasting and modelling of long-range spatial connections across multiple timescales, for example. The next step will be a hybrid modelling approach, coupling physical process models with the versatility of data-driven machine learning.

...read moreread less

2,014 citations

Journal Article•DOI•

Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning

[...]

Takeru Miyato, Shin-ichi Maeda, Masanori Koyama¹, Shin Ishii²•Institutions (2)

Ritsumeikan University¹, Kyoto University²

01 Aug 2019-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Virtual adversarial training (VAT) as discussed by the authors is a regularization method based on virtual adversarial loss, which is a measure of local smoothness of the conditional label distribution given input.

...read moreread less

Abstract: We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input. Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation. Unlike adversarial training, our method defines the adversarial direction without label information and is hence applicable to semi-supervised learning. Because the directions in which we smooth the model are only “virtually” adversarial, we call our method virtual adversarial training (VAT). The computational cost of VAT is relatively low. For neural networks, the approximated gradient of virtual adversarial loss can be computed with no more than two pairs of forward- and back-propagations. In our experiments, we applied VAT to supervised and semi-supervised learning tasks on multiple benchmark datasets. With a simple enhancement of the algorithm based on the entropy minimization principle, our VAT achieves state-of-the-art performance for semi-supervised learning tasks on SVHN and CIFAR-10.

...read moreread less

1,991 citations

Journal Article•DOI•

A guide to deep learning in healthcare.

[...]

Andre Esteva¹, Alexandre Robicquet¹, Bharath Ramsundar¹, Volodymyr Kuleshov¹, Mark A. DePristo², Katherine Chou², Claire Cui², Greg S. Corrado², Sebastian Thrun¹, Jeffrey Dean² - Show less +6 more•Institutions (2)

Stanford University¹, Google²

01 Jan 2019-Nature Medicine

TL;DR: How these computational techniques can impact a few key areas of medicine and explore how to build end-to-end systems are described.

...read moreread less

Abstract: Here we present deep-learning techniques for healthcare, centering our discussion on deep learning in computer vision, natural language processing, reinforcement learning, and generalized methods. We describe how these computational techniques can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging, and we describe the application of natural language processing to domains such as electronic health record data. Similarly, reinforcement learning is discussed in the context of robotic-assisted surgery, and generalized deep-learning methods for genomics are reviewed.

...read moreread less

1,843 citations

Proceedings Article•DOI•

MnasNet: Platform-Aware Neural Architecture Search for Mobile

[...]

Mingxing Tan¹, Bo Chen¹, Ruoming Pang¹, Vijay K. Vasudevan¹, Mark Sandler¹, Andrew Howard¹, Quoc V. Le¹ - Show less +3 more•Institutions (1)

Google¹

01 Jun 2019

TL;DR: In this article, the authors propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporates model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency.

...read moreread less

Abstract: Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, it is very difficult to manually balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose an automated mobile neural architecture search (MNAS) approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. Unlike previous work, where latency is considered via another, often inaccurate proxy (e.g., FLOPS), our approach directly measures real-world inference latency by executing the model on mobile phones. To further strike the right balance between flexibility and search space size, we propose a novel factorized hierarchical search space that encourages layer diversity throughout the network. Experimental results show that our approach consistently outperforms state-of-the-art mobile CNN models across multiple vision tasks. On the ImageNet classification task, our MnasNet achieves 75.2% top-1 accuracy with 78ms latency on a Pixel phone, which is 1.8× faster than MobileNetV2 with 0.5% higher accuracy and 2.3× faster than NASNet with 1.2% higher accuracy. Our MnasNet also achieves better mAP quality than MobileNets for COCO object detection. Code is at https://github.com/tensorflow/tpu/tree/master/models/official/mnasnet.

...read moreread less

1,841 citations

Journal Article•DOI•

Deep learning for time series classification: a review

[...]

Hassan Ismail Fawaz¹, Germain Forestier², Jonathan Weber¹, Lhassane Idoumghar¹, Pierre-Alain Muller¹ - Show less +1 more•Institutions (2)

University of Upper Alsace¹, Monash University²

01 Jul 2019-Data Mining and Knowledge Discovery

TL;DR: This article proposes the most exhaustive study of DNNs for TSC by training 8730 deep learning models on 97 time series datasets and provides an open source deep learning framework to the TSC community.

...read moreread less

Abstract: Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state-of-the-art performance for document classification and speech recognition. In this article, we study the current state-of-the-art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By training 8730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.

...read moreread less

Journal Article•DOI•

Cardiologist-Level Arrhythmia Detection and Classification in Ambulatory Electrocardiograms Using a Deep Neural Network

[...]

Awni Hannun¹, Pranav Rajpurkar¹, Masoumeh Haghpanahi, Geoffrey H. Tison², Codie Bourn, Mintu P. Turakhia¹, Mintu P. Turakhia³, Andrew Y. Ng¹ - Show less +4 more•Institutions (3)

Stanford University¹, University of California, San Francisco², Veterans Health Administration³

07 Jan 2019-Nature Medicine

TL;DR: It is demonstrated that an end-to-end deep learning approach can classify a broad range of distinct arrhythmias from single-lead ECGs with high diagnostic performance similar to that of cardiologists.

...read moreread less

Abstract: Computerized electrocardiogram (ECG) interpretation plays a critical role in the clinical ECG workflow1. Widely available digital ECG data and the algorithmic paradigm of deep learning2 present an opportunity to substantially improve the accuracy and scalability of automated ECG analysis. However, a comprehensive evaluation of an end-to-end deep learning approach for ECG analysis across a wide variety of diagnostic classes has not been previously reported. Here, we develop a deep neural network (DNN) to classify 12 rhythm classes using 91,232 single-lead ECGs from 53,549 patients who used a single-lead ambulatory ECG monitoring device. When validated against an independent test dataset annotated by a consensus committee of board-certified practicing cardiologists, the DNN achieved an average area under the receiver operating characteristic curve (ROC) of 0.97. The average F1 score, which is the harmonic mean of the positive predictive value and sensitivity, for the DNN (0.837) exceeded that of average cardiologists (0.780). With specificity fixed at the average specificity achieved by cardiologists, the sensitivity of the DNN exceeded the average cardiologist sensitivity for all rhythm classes. These findings demonstrate that an end-to-end deep learning approach can classify a broad range of distinct arrhythmias from single-lead ECGs with high diagnostic performance similar to that of cardiologists. If confirmed in clinical settings, this approach could reduce the rate of misdiagnosed computerized ECG interpretations and improve the efficiency of expert human ECG interpretation by accurately triaging or prioritizing the most urgent conditions. Analysis of electrocardiograms using an end-to-end deep learning approach can detect and classify cardiac arrhythmia with high accuracy, similar to that of cardiologists.

...read moreread less

Posted Content•

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI.

[...]

Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez¹, Javier Del Ser², Javier Del Ser³, Adrien Bennetot⁴, Adrien Bennetot¹, Siham Tabik⁵, Alberto Barbado⁶, Salvador García⁵, Sergio Gil-Lopez, Daniel Molina⁵, Richard Benjamins⁶, Raja Chatila⁴, Francisco Herrera⁵ - Show less +10 more•Institutions (6)

French Institute for Research in Computer Science and Automation¹, University of the Basque Country², Basque Center for Applied Mathematics³, University of Paris⁴, University of Granada⁵, Telefónica⁶

22 Oct 2019-arXiv: Artificial Intelligence

TL;DR: Previous efforts to define explainability in Machine Learning are summarized, establishing a novel definition that covers prior conceptual propositions with a major focus on the audience for which explainability is sought, and a taxonomy of recent contributions related to the explainability of different Machine Learning models are proposed.

...read moreread less

Abstract: In the last years, Artificial Intelligence (AI) has achieved a notable momentum that may deliver the best of expectations over many application sectors across the field. For this to occur, the entire community stands in front of the barrier of explainability, an inherent problem of AI techniques brought by sub-symbolism (e.g. ensembles or Deep Neural Networks) that were not present in the last hype of AI. Paradigms underlying this problem fall within the so-called eXplainable AI (XAI) field, which is acknowledged as a crucial feature for the practical deployment of AI models. This overview examines the existing literature in the field of XAI, including a prospect toward what is yet to be reached. We summarize previous efforts to define explainability in Machine Learning, establishing a novel definition that covers prior conceptual propositions with a major focus on the audience for which explainability is sought. We then propose and discuss about a taxonomy of recent contributions related to the explainability of different Machine Learning models, including those aimed at Deep Learning methods for which a second taxonomy is built. This literature analysis serves as the background for a series of challenges faced by XAI, such as the crossroads between data fusion and explainability. Our prospects lead toward the concept of Responsible Artificial Intelligence, namely, a methodology for the large-scale implementation of AI methods in real organizations with fairness, model explainability and accountability at its core. Our ultimate goal is to provide newcomers to XAI with a reference material in order to stimulate future research advances, but also to encourage experts and professionals from other disciplines to embrace the benefits of AI in their activity sectors, without any prior bias for its lack of interpretability.

...read moreread less

Posted Content•

A Survey on Bias and Fairness in Machine Learning

[...]

Ninareh Mehrabi¹, Fred Morstatter¹, Nripsuta Saxena¹, Kristina Lerman¹, Aram Galstyan¹ - Show less +1 more•Institutions (1)

Information Sciences Institute¹

23 Aug 2019-arXiv: Learning

TL;DR: This survey investigated different real-world applications that have shown biases in various ways, and created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems.

...read moreread less

Abstract: With the widespread use of AI systems and applications in our everyday lives, it is important to take fairness issues into consideration while designing and engineering these types of systems. Such systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that the decisions do not reflect discriminatory behavior toward certain groups or populations. We have recently seen work in machine learning, natural language processing, and deep learning that addresses such challenges in different subdomains. With the commercialization of these systems, researchers are becoming aware of the biases that these applications can contain and have attempted to address them. In this survey we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined in order to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and how they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.

...read moreread less

Journal Article•DOI•

Deep learning and its applications to machine health monitoring

[...]

Rui Zhao¹, Ruqiang Yan¹, Zhenghua Chen², Kezhi Mao², Peng Wang³, Robert X. Gao³ - Show less +2 more•Institutions (3)

Xi'an Jiaotong University¹, Nanyang Technological University², Case Western Reserve University³

15 Jan 2019-Mechanical Systems and Signal Processing

TL;DR: The applications of deep learning in machine health monitoring systems are reviewed mainly from the following aspects: Auto-encoder and its variants, Restricted Boltzmann Machines, Convolutional Neural Networks, and Recurrent Neural Networks.

...read moreread less

Proceedings Article•DOI•

SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks

[...]

Bo Li¹, Wei Wu¹, Qiang Wang¹, Fangyi Zhang², Junliang Xing, Junjie Yan² - Show less +2 more•Institutions (2)

SenseTime¹, Chinese Academy of Sciences²

01 Jun 2019

TL;DR: This work proves the core reason Siamese trackers still have accuracy gap comes from the lack of strict translation invariance, and proposes a new model architecture to perform depth-wise and layer-wise aggregations, which not only improves the accuracy but also reduces the model size.

...read moreread less

Abstract: Siamese network based trackers formulate tracking as convolutional feature cross-correlation between target template and searching region. However, Siamese trackers still have accuracy gap compared with state-of-the-art algorithms and they cannot take advantage of feature from deep networks, such as ResNet-50 or deeper. In this work we prove the core reason comes from the lack of strict translation invariance. By comprehensive theoretical analysis and experimental validations, we break this restriction through a simple yet effective spatial aware sampling strategy and successfully train a ResNet-driven Siamese tracker with significant performance gain. Moreover, we propose a new model architecture to perform depth-wise and layer-wise aggregations, which not only further improves the accuracy but also reduces the model size. We conduct extensive ablation studies to demonstrate the effectiveness of the proposed tracker, which obtains currently the best results on four large tracking benchmarks, including OTB2015, VOT2018, UAV123, and LaSOT. Our model will be released to facilitate further studies based on this problem.

...read moreread less

Proceedings Article•DOI•

Heterogeneous Graph Attention Network

[...]

Xiao Wang¹, Houye Ji¹, Chuan Shi¹, Bai Wang¹, Yanfang Ye², Peng Cui³, Philip S. Yu⁴ - Show less +3 more•Institutions (4)

Beijing University of Posts and Telecommunications¹, West Virginia University², Tsinghua University³, University of Illinois at Chicago⁴

13 May 2019

TL;DR: Wang et al. as discussed by the authors proposed a heterogeneous graph neural network based on the hierarchical attention, including node-level and semantic-level attentions, which can generate node embedding by aggregating features from meta-path based neighbors in a hierarchical manner.

...read moreread less

Abstract: Graph neural network, as a powerful graph representation technique based on deep learning, has shown superior performance and attracted considerable research interest. However, it has not been fully considered in graph neural network for heterogeneous graph which contains different types of nodes and links. The heterogeneity and rich semantic information bring great challenges for designing a graph neural network for heterogeneous graph. Recently, one of the most exciting advancements in deep learning is the attention mechanism, whose great potential has been well demonstrated in various areas. In this paper, we first propose a novel heterogeneous graph neural network based on the hierarchical attention, including node-level and semantic-level attentions. Specifically, the node-level attention aims to learn the importance between a node and its meta-path based neighbors, while the semantic-level attention is able to learn the importance of different meta-paths. With the learned importance from both node-level and semantic-level attention, the importance of node and meta-path can be fully considered. Then the proposed model can generate node embedding by aggregating features from meta-path based neighbors in a hierarchical manner. Extensive experimental results on three real-world heterogeneous graphs not only show the superior performance of our proposed model over the state-of-the-arts, but also demonstrate its potentially good interpretability for graph analysis.

...read moreread less

Journal Article•DOI•

Short-Term Residential Load Forecasting Based on LSTM Recurrent Neural Network

[...]

Weicong Kong¹, Zhao Yang Dong¹, Youwei Jia², David J. Hill³, Yan Xu⁴, Yuan Zhang¹ - Show less +2 more•Institutions (4)

University of New South Wales¹, Hong Kong Polytechnic University², University of Sydney³, Nanyang Technological University⁴

01 Jan 2019-IEEE Transactions on Smart Grid

TL;DR: The proposed LSTM approach outperforms the other listed rival algorithms in the task of short-term load forecasting for individual residential households and is comprehensively compared to various benchmarks including the state-of-the-arts in the field of load forecasting.

...read moreread less

Abstract: As the power system is facing a transition toward a more intelligent, flexible, and interactive system with higher penetration of renewable energy generation, load forecasting, especially short-term load forecasting for individual electric customers plays an increasingly essential role in the future grid planning and operation. Other than aggregated residential load in a large scale, forecasting an electric load of a single energy user is fairly challenging due to the high volatility and uncertainty involved. In this paper, we propose a long short-term memory (LSTM) recurrent neural network-based framework, which is the latest and one of the most popular techniques of deep learning, to tackle this tricky issue. The proposed framework is tested on a publicly available set of real residential smart meter data, of which the performance is comprehensively compared to various benchmarks including the state-of-the-arts in the field of load forecasting. As a result, the proposed LSTM approach outperforms the other listed rival algorithms in the task of short-term load forecasting for individual residential households.

...read moreread less

Proceedings Article•DOI•

Selective Kernel Networks

[...]

Xiang Li¹, Wenhai Wang², Xiaolin Hu³, Jian Yang¹•Institutions (3)

Nanjing University of Science and Technology¹, Tsinghua University², Nanjing University³

01 Jun 2019

TL;DR: SKNet as discussed by the authors proposes a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information, which can capture target objects with different scales.

...read moreread less

Abstract: In standard Convolutional Neural Networks (CNNs), the receptive fields of artificial neurons in each layer are designed to share the same size. It is well-known in the neuroscience community that the receptive field size of visual cortical neurons are modulated by the stimulus, which has been rarely considered in constructing CNNs. We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information. A building block called Selective Kernel (SK) unit is designed, in which multiple branches with different kernel sizes are fused using softmax attention that is guided by the information in these branches. Different attentions on these branches yield different sizes of the effective receptive fields of neurons in the fusion layer. Multiple SK units are stacked to a deep network termed Selective Kernel Networks (SKNets). On the ImageNet and CIFAR benchmarks, we empirically show that SKNet outperforms the existing state-of-the-art architectures with lower model complexity. Detailed analyses show that the neurons in SKNet can capture target objects with different scales, which verifies the capability of neurons for adaptively adjusting their receptive field sizes according to the input. The code and models are available at https://github.com/implus/SKNet.

...read moreread less

Journal Article•DOI•

Survey on deep learning with class imbalance

[...]

Justin M. Johnson¹, Taghi M. Khoshgoftaar¹•Institutions (1)

Florida Atlantic University¹

01 Mar 2019-Journal of Big Data

TL;DR: Examination of existing deep learning techniques for addressing class imbalanced data finds that research in this area is very limited, that most existing work focuses on computer vision tasks with convolutional neural networks, and that the effects of big data are rarely considered.

...read moreread less

Abstract: The purpose of this study is to examine existing deep learning techniques for addressing class imbalanced data. Effective classification with imbalanced data is an important area of research, as high class imbalance is naturally inherent in many real-world applications, e.g., fraud detection and cancer detection. Moreover, highly imbalanced data poses added difficulty, as most learners will exhibit bias towards the majority class, and in extreme cases, may ignore the minority class altogether. Class imbalance has been studied thoroughly over the last two decades using traditional machine learning models, i.e. non-deep learning. Despite recent advances in deep learning, along with its increasing popularity, very little empirical work in the area of deep learning with class imbalance exists. Having achieved record-breaking performance results in several complex domains, investigating the use of deep neural networks for problems containing high levels of class imbalance is of great interest. Available studies regarding class imbalance and deep learning are surveyed in order to better understand the efficacy of deep learning when applied to class imbalanced data. This survey discusses the implementation details and experimental results for each study, and offers additional insight into their strengths and weaknesses. Several areas of focus include: data complexity, architectures tested, performance interpretation, ease of use, big data application, and generalization to other domains. We have found that research in this area is very limited, that most existing work focuses on computer vision tasks with convolutional neural networks, and that the effects of big data are rarely considered. Several traditional methods for class imbalance, e.g. data sampling and cost-sensitive learning, prove to be applicable in deep learning, while more advanced methods that exploit neural network feature learning abilities show promising results. The survey concludes with a discussion that highlights various gaps in deep learning from class imbalanced data for the purpose of guiding future research.

...read moreread less

Posted Content•

Unsupervised Data Augmentation for Consistency Training

[...]

Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le - Show less +1 more

29 Apr 2019-arXiv: Learning

TL;DR: A new perspective on how to effectively noise unlabeled examples is presented and it is argued that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.

...read moreread less

Abstract: Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce. Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise. In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods such as RandAugment and back-translation, our method brings substantial improvements across six language and three vision tasks under the same consistency training framework. On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 5.43 with only 250 examples. Our method also combines well with transfer learning, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as ImageNet, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used. Code is available at this https URL.

...read moreread less

Proceedings Article•DOI•

PointConv: Deep Convolutional Networks on 3D Point Clouds

[...]

Wenxuan Wu¹, Zhongang Qi¹, Li Fuxin¹•Institutions (1)

Oregon State University¹

15 Jun 2019

TL;DR: The dynamic filter is extended to a new convolution operation, named PointConv, which can be applied on point clouds to build deep convolutional networks and is able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds.

...read moreread less

Abstract: Unlike images which are represented in regular dense grids, 3D point clouds are irregular and unordered, hence applying convolution on them can be difficult. In this paper, we extend the dynamic filter to a new convolution operation, named PointConv. PointConv can be applied on point clouds to build deep convolutional networks. We treat convolution kernels as nonlinear functions of the local coordinates of 3D points comprised of weight and density functions. With respect to a given point, the weight functions are learned with multi-layer perceptron networks and the density functions through kernel density estimation. A novel reformulation is proposed for efficiently computing the weight functions, which allowed us to dramatically scale up the network and significantly improve its performance. The learned convolution kernel can be used to compute translation-invariant and permutation-invariant convolution on any point set in the 3D space. Besides, PointConv can also be used as deconvolution operators to propagate features from a subsampled point cloud back to its original resolution. Experiments on ModelNet40, ShapeNet, and ScanNet show that deep convolutional neural networks built on PointConv are able to achieve state-of-the-art on challenging semantic segmentation benchmarks on 3D point clouds. Besides, our experiments converting CIFAR-10 into a point cloud showed that networks built on PointConv can match the performance of convolutional networks in 2D images of a similar structure.

...read moreread less

Proceedings Article•DOI•

Learning Implicit Fields for Generative Shape Modeling

[...]

Zhiqin Chen¹, Hao Zhang¹•Institutions (1)

Simon Fraser University¹

15 Jun 2019

TL;DR: In this paper, an implicit field is used to assign a value to each point in 3D space, so that a shape can be extracted as an iso-surface, and a binary classifier is trained to perform this assignment.

...read moreread less

Abstract: We advocate the use of implicit fields for learning generative models of shapes and introduce an implicit field decoder, called IM-NET, for shape generation, aimed at improving the visual quality of the generated shapes. An implicit field assigns a value to each point in 3D space, so that a shape can be extracted as an iso-surface. IM-NET is trained to perform this assignment by means of a binary classifier. Specifically, it takes a point coordinate, along with a feature vector encoding a shape, and outputs a value which indicates whether the point is outside the shape or not. By replacing conventional decoders by our implicit decoder for representation learning (via IM-AE) and shape generation (via IM-GAN), we demonstrate superior results for tasks such as generative shape modeling, interpolation, and single-view 3D reconstruction, particularly in terms of visual quality. Code and supplementary material are available at https://github.com/czq142857/implicit-decoder.

...read moreread less

Proceedings Article•DOI•

Neural Graph Collaborative Filtering

[...]

Xiang Wang¹, Xiangnan He², Meng Wang³, Fuli Feng¹, Tat-Seng Chua¹ - Show less +1 more•Institutions (3)

National University of Singapore¹, University of Science and Technology of China², Hefei University of Technology³

18 Jul 2019

TL;DR: Wang et al. as discussed by the authors proposed Neural Graph Collaborative Filtering (NGCF), which exploits the user-item graph structure by propagating embeddings on it, effectively injecting the collaborative signal into the embedding process in an explicit manner.

...read moreread less

Abstract: Learning vector representations (aka. embeddings) of users and items lies at the core of modern recommender systems. Ranging from early matrix factorization to recently emerged deep learning based methods, existing efforts typically obtain a user's (or an item's) embedding by mapping from pre-existing features that describe the user (or the item), such as ID and attributes. We argue that an inherent drawback of such methods is that, the collaborative signal, which is latent in user-item interactions, is not encoded in the embedding process. As such, the resultant embeddings may not be sufficient to capture the collaborative filtering effect. In this work, we propose to integrate the user-item interactions - more specifically the bipartite graph structure - into the embedding process. We develop a new recommendation framework Neural Graph Collaborative Filtering (NGCF), which exploits the user-item graph structure by propagating embeddings on it. This leads to the expressive modeling of high-order connectivity in user-item graph, effectively injecting the collaborative signal into the embedding process in an explicit manner. We conduct extensive experiments on three public benchmarks, demonstrating significant improvements over several state-of-the-art models like HOP-Rec [39] and Collaborative Memory Network [5]. Further analysis verifies the importance of embedding propagation for learning better user and item representations, justifying the rationality and effectiveness of NGCF. Codes are available at https://github.com/xiangwang1223/neural_graph_collaborative_filtering.

...read moreread less

Proceedings Article•DOI•

Second-Order Attention Network for Single Image Super-Resolution

[...]

Tao Dai¹, Jianrui Cai², Yongbing Zhang¹, Shu-Tao Xia¹, Lei Zhang² - Show less +1 more•Institutions (2)

Tsinghua University¹, Hong Kong Polytechnic University²

15 Jun 2019

TL;DR: Experimental results demonstrate the superiority of the SAN network over state-of-the-art SISR methods in terms of both quantitative metrics and visual quality.

...read moreread less

Abstract: Recently, deep convolutional neural networks (CNNs) have been widely explored in single image super-resolution (SISR) and obtained remarkable performance. However, most of the existing CNN-based SISR methods mainly focus on wider or deeper architecture design, neglecting to explore the feature correlations of intermediate layers, hence hindering the representational power of CNNs. To address this issue, in this paper, we propose a second-order attention network (SAN) for more powerful feature expression and feature correlation learning. Specifically, a novel train- able second-order channel attention (SOCA) module is developed to adaptively rescale the channel-wise features by using second-order feature statistics for more discriminative representations. Furthermore, we present a non-locally enhanced residual group (NLRG) structure, which not only incorporates non-local operations to capture long-distance spatial contextual information, but also contains repeated local-source residual attention groups (LSRAG) to learn increasingly abstract feature representations. Experimental results demonstrate the superiority of our SAN network over state-of-the-art SISR methods in terms of both quantitative metrics and visual quality.

...read moreread less

Journal Article•DOI•

HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition

[...]

Rajeev Ranjan¹, Vishal M. Patel², Rama Chellappa¹•Institutions (2)

University of Maryland, College Park¹, Rutgers University²

01 Jan 2019-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: HyperFace as discussed by the authors combines face detection, landmarks localization, pose estimation and gender recognition using deep convolutional neural networks (CNNs) and achieves significant improvement in performance by fusing intermediate layers of a deep CNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features.

...read moreread less

Abstract: We present an algorithm for simultaneous face detection, landmarks localization, pose estimation and gender recognition using deep convolutional neural networks (CNN). The proposed method called, HyperFace, fuses the intermediate layers of a deep CNN using a separate CNN followed by a multi-task learning algorithm that operates on the fused features. It exploits the synergy among the tasks which boosts up their individual performances. Additionally, we propose two variants of HyperFace: (1) HyperFace-ResNet that builds on the ResNet-101 model and achieves significant improvement in performance, and (2) Fast-HyperFace that uses a high recall fast face detector for generating region proposals to improve the speed of the algorithm. Extensive experiments show that the proposed models are able to capture both global and local information in faces and performs significantly better than many competitive algorithms for each of these four tasks.

...read moreread less

Proceedings Article•DOI•

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search

[...]

Bichen Wu¹, Kurt Keutzer¹, Xiaoliang Dai², Peizhao Zhang³, Yanghan Wang³, Fei Sun³, Yiming Wu³, Yuandong Tian³, Peter Vajda³, Yangqing Jia³ - Show less +6 more•Institutions (3)

University of California, Berkeley¹, Princeton University², Facebook³

15 Jun 2019

TL;DR: This work proposes a differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures separately as in previous methods.

...read moreread less

Abstract: Designing accurate and efficient ConvNets for mobile devices is challenging because the design space is combinatorially large. Due to this, previous neural architecture search (NAS) methods are computationally expensive. ConvNet architecture optimality depends on factors such as input resolution and target devices. However, existing approaches are too resource demanding for case-by-case redesigns. Also, previous work focuses primarily on reducing FLOPs, but FLOP count does not always reflect actual latency. To address these, we propose a differentiable neural architecture search (DNAS) framework that uses gradient-based methods to optimize ConvNet architectures, avoiding enumerating and training individual architectures separately as in previous methods. FBNets (Facebook-Berkeley-Nets), a family of models discovered by DNAS surpass state-of-the-art models both designed manually and generated automatically. FBNet-B achieves 74.1% top-1 accuracy on ImageNet with 295M FLOPs and 23.1 ms latency on a Samsung S8 phone, 2.4x smaller and 1.5x faster than MobileNetV2-1.3 with similar accuracy. Despite higher accuracy and lower latency than MnasNet, we estimate FBNet-B's search cost is 420x smaller than MnasNet's, at only 216 GPU-hours. Searched for different resolutions and channel sizes, FBNets achieve 1.5% to 6.4% higher accuracy than MobileNetV2. The smallest FBNet achieves 50.2% accuracy and 2.9 ms latency (345 frames per second) on a Samsung S8. Over a Samsung-optimized FBNet, the iPhone-X-optimized model achieves a 1.4x speedup on an iPhone X. FBNet models are open-sourced at https://github. com/facebookresearch/mobile-vision.

...read moreread less

Proceedings Article•DOI•

Occupancy Networks: Learning 3D Reconstruction in Function Space

[...]

Lars Mescheder¹, Michael Oechsle¹, Michael Niemeyer¹, Sebastian Nowozin², Andreas Geiger¹ - Show less +1 more•Institutions (2)

University of Tübingen¹, Google²

15 Jun 2019

TL;DR: In this paper, the authors propose Occupancy Networks, which implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier, which can be used for learning-based 3D reconstruction methods.

...read moreread less

Abstract: With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose Occupancy Networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

...read moreread less

Journal Article•DOI•

Deep learning in remote sensing applications: A meta-analysis and review

[...]

Lei Ma, Yu Liu¹, Xueliang Zhang², Yuanxin Ye³, Gaofei Yin³, Brian Alan Johnson - Show less +2 more•Institutions (3)

Hefei University of Technology¹, Nanjing University², Southwest Jiaotong University³

01 Jun 2019-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: This review covers nearly every application and technology in the field of remote sensing, ranging from preprocessing to mapping, and a conclusion regarding the current state-of-the art methods, a critical conclusion on open challenges, and directions for future research are presented.

...read moreread less

Abstract: Deep learning (DL) algorithms have seen a massive rise in popularity for remote-sensing image analysis over the past few years. In this study, the major DL concepts pertinent to remote-sensing are introduced, and more than 200 publications in this field, most of which were published during the last two years, are reviewed and analyzed. Initially, a meta-analysis was conducted to analyze the status of remote sensing DL studies in terms of the study targets, DL model(s) used, image spatial resolution(s), type of study area, and level of classification accuracy achieved. Subsequently, a detailed review is conducted to describe/discuss how DL has been applied for remote sensing image analysis tasks including image fusion, image registration, scene classification, object detection, land use and land cover (LULC) classification, segmentation, and object-based image analysis (OBIA). This review covers nearly every application and technology in the field of remote sensing, ranging from preprocessing to mapping. Finally, a conclusion regarding the current state-of-the art methods, a critical conclusion on open challenges, and directions for future research are presented.

...read moreread less

Proceedings Article•DOI•

Fast Online Object Tracking and Segmentation: A Unifying Approach

[...]

Qiang Wang, Li Zhang¹, Luca Bertinetto¹, Weiming Hu², Philip H. S. Torr¹ - Show less +1 more•Institutions (2)

University of Oxford¹, Chinese Academy of Sciences²

01 Jun 2019

TL;DR: This method improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task, and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 55 frames per second.

...read moreread less

Abstract: In this paper we illustrate how to perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach. Our method, dubbed SiamMask, improves the offline training procedure of popular fully-convolutional Siamese approaches for object tracking by augmenting their loss with a binary segmentation task. Once trained, SiamMask solely relies on a single bounding box initialisation and operates online, producing class-agnostic object segmentation masks and rotated bounding boxes at 55 frames per second. Despite its simplicity, versatility and fast speed, our strategy allows us to establish a new state-of-the-art among real-time trackers on VOT-2018, while at the same time demonstrating competitive performance and the best speed for the semi-supervised video object segmentation task on DAVIS-2016 and DAVIS-2017.

...read moreread less

Collapse