Showing papers on "Recurrent neural network published in 2020"

PDF

Open Access

Posted Content•

Conformer: Convolution-augmented Transformer for Speech Recognition

[...]

Anmol Gulati¹, James Qin¹, Chung-Cheng Chiu¹, Niki Parmar¹, Yu Zhang¹, Jiahui Yu², Wei Han¹, Shibo Wang, Zhengdong Zhang¹, Yonghui Wu¹, Ruoming Pang¹ - Show less +7 more•Institutions (2)

Google¹, Adobe Systems²

16 May 2020-arXiv: Audio and Speech Processing

TL;DR: This work proposes the convolution-augmented transformer for speech recognition, named Conformer, which significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies.

...read moreread less

Abstract: Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. In this work, we achieve the best of both worlds by studying how to combine convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way. To this regard, we propose the convolution-augmented transformer for speech recognition, named Conformer. Conformer significantly outperforms the previous Transformer and CNN based models achieving state-of-the-art accuracies. On the widely used LibriSpeech benchmark, our model achieves WER of 2.1%/4.3% without using a language model and 1.9%/3.9% with an external language model on test/testother. We also observe competitive performance of 2.7%/6.3% with a small model of only 10M parameters.

...read moreread less

1,270 citations

Journal Article•DOI•

Deep Learning on Graphs: A Survey

[...]

Ziwei Zhang¹, Peng Cui¹, Wenwu Zhu¹•Institutions (1)

Tsinghua University¹

17 Mar 2020-IEEE Transactions on Knowledge and Data Engineering

TL;DR: Deep learning has been shown to be successful in a number of domains, ranging from acoustics, images, to natural language processing as discussed by the authors. However, applying deep learning to the ubiquitous graph data is non-trivial because of the unique characteristics of graphs.

...read moreread less

Abstract: Deep learning has been shown to be successful in a number of domains, ranging from acoustics, images, to natural language processing. However, applying deep learning to the ubiquitous graph data is non-trivial because of the unique characteristics of graphs. Recently, substantial research efforts have been devoted to applying deep learning methods to graphs, resulting in beneficial advances in graph analysis techniques. In this survey, we comprehensively review the different types of deep learning methods on graphs. We divide the existing methods into five categories based on their model architectures and training strategies: graph recurrent neural networks, graph convolutional networks, graph autoencoders, graph reinforcement learning, and graph adversarial methods. We then provide a comprehensive overview of these methods in a systematic manner mainly by following their development history. We also analyze the differences and compositions of different methods. Finally, we briefly outline the applications in which they have been used and discuss potential future research directions.

...read moreread less

686 citations

Journal Article•DOI•

Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting

[...]

Zhiyong Cui¹, Kristian Henrickson, Ruimin Ke¹, Yinhai Wang¹•Institutions (1)

University of Washington¹

01 Nov 2020-IEEE Transactions on Intelligent Transportation Systems

TL;DR: A novel deep learning framework, Traffic Graph Convolutional Long Short-Term Memory Neural Network (TGC-LSTM), to learn the interactions between roadways in the traffic network and forecast the network-wide traffic state and shows that the proposed model outperforms baseline methods on two real-world traffic state datasets.

...read moreread less

Abstract: Traffic forecasting is a particularly challenging application of spatiotemporal forecasting, due to the time-varying traffic patterns and the complicated spatial dependencies on road networks. To address this challenge, we learn the traffic network as a graph and propose a novel deep learning framework, Traffic Graph Convolutional Long Short-Term Memory Neural Network (TGC-LSTM), to learn the interactions between roadways in the traffic network and forecast the network-wide traffic state. We define the traffic graph convolution based on the physical network topology. The relationship between the proposed traffic graph convolution and the spectral graph convolution is also discussed. An L1-norm on graph convolution weights and an L2-norm on graph convolution features are added to the model’s loss function to enhance the interpretability of the proposed model. Experimental results show that the proposed model outperforms baseline methods on two real-world traffic state datasets. The visualization of the graph convolution weights indicates that the proposed framework can recognize the most influential road segments in real-world traffic networks.

...read moreread less

611 citations

Proceedings Article•DOI•

Conformer: Convolution-augmented Transformer for Speech Recognition

[...]

Anmol Gulati¹, James Qin¹, Chung-Cheng Chiu¹, Niki Parmar¹, Yu Zhang¹, Jiahui Yu², Wei Han¹, Shibo Wang, Zhengdong Zhang¹, Yonghui Wu¹, Ruoming Pang¹ - Show less +7 more•Institutions (2)

Google¹, Adobe Systems²

16 May 2020

TL;DR: Conformer as mentioned in this paper combines convolution neural networks and transformers to model both local and global dependencies of an audio sequence in a parameter-efficient way, achieving state-of-the-art accuracies.

...read moreread less

607 citations

Proceedings Article•DOI•

Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation

[...]

Yi Luo¹, Zhuo Chen², Takuya Yoshioka²•Institutions (2)

Columbia University¹, Microsoft²

04 May 2020

TL;DR: In this paper, a dual-path recurrent neural network (DPRNN) is proposed for modeling extremely long sequences. But the model is not effective for modeling such long sequences due to optimization difficulties, while one-dimensional CNNs cannot perform utterance-level sequence modeling when its receptive field is smaller than the sequence length.

...read moreread less

Abstract: Recent studies in deep learning-based speech separation have proven the superiority of time-domain approaches to conventional time-frequency-based methods. Unlike the time-frequency domain approaches, the time-domain separation systems often receive input sequences consisting of a huge number of time steps, which introduces challenges for modeling extremely long sequences. Conventional recurrent neural networks (RNNs) are not effective for modeling such long sequences due to optimization difficulties, while one-dimensional convolutional neural networks (1-D CNNs) cannot perform utterance-level sequence modeling when its receptive field is smaller than the sequence length. In this paper, we propose dual-path recurrent neural network (DPRNN), a simple yet effective method for organizing RNN layers in a deep structure to model extremely long sequences. DPRNN splits the long sequential input into smaller chunks and applies intra- and inter-chunk operations iteratively, where the input length can be made proportional to the square root of the original sequence length in each operation. Experiments show that by replacing 1-D CNN with DPRNN and apply sample-level modeling in the time-domain audio separation network (TasNet), a new state-of-the-art performance on WSJ0-2mix is achieved with a 20 times smaller model than the previous best system.

...read moreread less

476 citations

Journal Article•DOI•

Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study

[...]

Mohamed Amine Ferrag, Leandros A. Maglaras¹, Sotiris Moschoyiannis², Helge Janicke¹•Institutions (2)

De Montfort University¹, University of Surrey²

01 Feb 2020

TL;DR: A survey of deep learning approaches for cyber security intrusion detection, the datasets used, and a comparative study to evaluate the efficiency of several methods are presented.

...read moreread less

Abstract: In this paper, we present a survey of deep learning approaches for cybersecurity intrusion detection, the datasets used, and a comparative study. Specifically, we provide a review of intrusion detection systems based on deep learning approaches. The dataset plays an important role in intrusion detection, therefore we describe 35 well-known cyber datasets and provide a classification of these datasets into seven categories; namely, network traffic-based dataset, electrical network-based dataset, internet traffic-based dataset, virtual private network-based dataset, android apps-based dataset, IoT traffic-based dataset, and internet-connected devices-based dataset. We analyze seven deep learning models including recurrent neural networks, deep neural networks, restricted Boltzmann machines, deep belief networks, convolutional neural networks, deep Boltzmann machines, and deep autoencoders. For each model, we study the performance in two categories of classification (binary and multiclass) under two new real traffic datasets, namely, the CSE-CIC-IDS2018 dataset and the Bot-IoT dataset. In addition, we use the most important performance indicators, namely, accuracy, false alarm rate, and detection rate for evaluating the efficiency of several methods.

...read moreread less

464 citations

Journal Article•DOI•

A hybrid method of exponential smoothing and recurrent neural networks for time series forecasting

[...]

Slawek Smyl¹•Institutions (1)

Uber ¹

01 Jan 2020-International Journal of Forecasting

TL;DR: A dynamic computational graph neural network system that enables a standard exponential smoothing model to be mixed with advanced long short term memory networks into a common framework is used and the result is a hybrid and hierarchical forecasting method.

...read moreread less

423 citations

Journal Article•DOI•

A review on the long short-term memory model

[...]

Greg Van Houdt¹, Carlos Mosquera², Gonzalo Nápoles³, Gonzalo Nápoles¹•Institutions (3)

University of Hasselt¹, Vrije Universiteit Brussel², Tilburg University³

01 Dec 2020-Artificial Intelligence Review

TL;DR: A comprehensive review of LSTM’s formulation and training, relevant applications reported in the literature and code resources implementing this model for a toy example are presented.

...read moreread less

Abstract: Long short-term memory (LSTM) has transformed both machine learning and neurocomputing fields. According to several online sources, this model has improved Google’s speech recognition, greatly improved machine translations on Google Translate, and the answers of Amazon’s Alexa. This neural system is also employed by Facebook, reaching over 4 billion LSTM-based translations per day as of 2017. Interestingly, recurrent neural networks had shown a rather discrete performance until LSTM showed up. One reason for the success of this recurrent network lies in its ability to handle the exploding/vanishing gradient problem, which stands as a difficult issue to be circumvented when training recurrent or very deep neural networks. In this paper, we present a comprehensive review that covers LSTM’s formulation and training, relevant applications reported in the literature and code resources implementing this model for a toy example.

...read moreread less

412 citations

Proceedings Article•DOI•

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

[...]

Qian Zhang¹, Han Lu¹, Hasim Sak¹, Anshuman Tripathi¹, Erik McDermott¹, Stephen Koo¹, Shankar Kumar¹ - Show less +3 more•Institutions (1)

Google¹

07 Feb 2020

TL;DR: An end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system and shows that the full attention version of the model beats the-state-of-the art accuracy on the LibriSpeech benchmarks.

...read moreread less

Abstract: In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system. Transformer computation blocks based on self-attention are used to encode both audio and label sequences independently. The activations from both audio and label encoders are combined with a feed-forward layer to compute a probability distribution over the label space for every combination of acoustic frame position and label history. This is similar to the Recurrent Neural Network Transducer (RNN-T) model, which uses RNNs for information encoding instead of Transformer encoders. The model is trained with the RNN-T loss well-suited to streaming decoding. We present results on the LibriSpeech dataset showing that limiting the left context for self-attention in the Transformer layers makes decoding computationally tractable for streaming, with only a slight degradation in accuracy. We also show that the full attention version of our model beats the-state-of-the art accuracy on the LibriSpeech benchmarks. Our results also show that we can bridge the gap between full attention and limited attention versions of our model by attending to a limited number of future frames.

...read moreread less

382 citations

Journal Article•DOI•

EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs

[...]

Aldo Pareja¹, Giacomo Domeniconi¹, Jie Chen¹, Tengfei Ma¹, Toyotaro Suzumura¹, Hiroki Kanezashi¹, Tim Kaler², Tao B. Schardl², Charles E. Leiserson² - Show less +5 more•Institutions (2)

IBM¹, Massachusetts Institute of Technology²

03 Apr 2020

TL;DR: In this article, the authors proposed EvolveGCN, which adapts the graph convolutional network (GCN) model along the temporal dimension without resorting to node embeddings.

...read moreread less

Abstract: Graph representation learning resurges as a trending research subject owing to the widespread use of deep learning for Euclidean data, which inspire various creative designs of neural networks in the non-Euclidean domain, particularly graphs. With the success of these graph neural networks (GNN) in the static setting, we approach further practical scenarios where the graph dynamically evolves. Existing approaches typically resort to node embeddings and use a recurrent neural network (RNN, broadly speaking) to regulate the embeddings and learn the temporal dynamics. These methods require the knowledge of a node in the full time span (including both training and testing) and are less applicable to the frequent change of the node set. In some extreme scenarios, the node sets at different time steps may completely differ. To resolve this challenge, we propose EvolveGCN, which adapts the graph convolutional network (GCN) model along the temporal dimension without resorting to node embeddings. The proposed approach captures the dynamism of the graph sequence through using an RNN to evolve the GCN parameters. Two architectures are considered for the parameter evolution. We evaluate the proposed approach on tasks including link prediction, edge classification, and node classification. The experimental results indicate a generally higher performance of EvolveGCN compared with related approaches. The code is available at https://github.com/IBM/EvolveGCN.

...read moreread less

327 citations

Journal Article•DOI•

LSTM-CNN Architecture for Human Activity Recognition

[...]

Kun Xia¹, Huang Jianguang¹, Hanyu Wang¹•Institutions (1)

University of Shanghai for Science and Technology¹

20 Mar 2020-IEEE Access

TL;DR: The results show that the proposed model has higher robustness and better activity detection capability than some of the reported results, and can not only adaptively extract activity features, but also has fewer parameters and higher accuracy.

...read moreread less

Abstract: In the past years, traditional pattern recognition methods have made great progress. However, these methods rely heavily on manual feature extraction, which may hinder the generalization model performance. With the increasing popularity and success of deep learning methods, using these techniques to recognize human actions in mobile and wearable computing scenarios has attracted widespread attention. In this paper, a deep neural network that combines convolutional layers with long short-term memory (LSTM) was proposed. This model could extract activity features automatically and classify them with a few model parameters. LSTM is a variant of the recurrent neural network (RNN), which is more suitable for processing temporal sequences. In the proposed architecture, the raw data collected by mobile sensors was fed into a two-layer LSTM followed by convolutional layers. In addition, a global average pooling layer (GAP) was applied to replace the fully connected layer after convolution for reducing model parameters. Moreover, a batch normalization layer (BN) was added after the GAP layer to speed up the convergence, and obvious results were achieved. The model performance was evaluated on three public datasets (UCI, WISDM, and OPPORTUNITY). Finally, the overall accuracy of the model in the UCI-HAR dataset is 95.78%, in the WISDM dataset is 95.85%, and in the OPPORTUNITY dataset is 92.63%. The results show that the proposed model has higher robustness and better activity detection capability than some of the reported results. It can not only adaptively extract activity features, but also has fewer parameters and higher accuracy.

...read moreread less

Journal Article•DOI•

Nonlinear Dynamic Soft Sensor Modeling With Supervised Long Short-Term Memory Network

[...]

Xiaofeng Yuan¹, Lin Li¹, Yalin Wang¹•Institutions (1)

Central South University¹

01 May 2020-IEEE Transactions on Industrial Informatics

TL;DR: A supervised LSTM (SLSTM) network is proposed to learn quality-relevant hidden dynamics for soft sensor application, which is composed of basic SLSTM unit at each sampling instant.

...read moreread less

Abstract: Soft sensor has been extensively utilized in industrial processes for prediction of key quality variables. To build an accurate virtual sensor model, it is very significant to model the dynamic and nonlinear behaviors of process sequential data properly. Recently, a long short-term memory (LSTM) network has shown great modeling ability on various time series, in which basic LSTM units can handle data nonlinearities and dynamics with a dynamic latent variable structure. However, the hidden variables in the basic LSTM unit mainly focus on describing the dynamics of input variables, which lack representation for the quality data. In this paper, a supervised LSTM (SLSTM) network is proposed to learn quality-relevant hidden dynamics for soft sensor application, which is composed of basic SLSTM unit at each sampling instant. In the basic SLSTM unit, the quality and input variables are simultaneously utilized to learn the dynamic hidden states, which are more relevant and useful for quality prediction. The effectiveness of the proposed SLSTM network is demonstrated on a penicillin fermentation process and an industrial debutanizer column.

...read moreread less

Journal Article•DOI•

A solution to the learning dilemma for recurrent networks of spiking neurons

[...]

Guillaume Bellec¹, Franz Scherr¹, Anand Subramoney¹, Elias Hajek¹, Darjan Salaj¹, Robert Legenstein¹, Wolfgang Maass¹ - Show less +3 more•Institutions (1)

Graz University of Technology¹

17 Jul 2020-Nature Communications

TL;DR: This learning method–called e-prop–approaches the performance of backpropagation through time (BPTT), the best-known method for training recurrent neural networks in machine learning and suggests a method for powerful on-chip learning in energy-efficient spike-based hardware for artificial intelligence.

...read moreread less

Abstract: Recurrently connected networks of spiking neurons underlie the astounding information processing capabilities of the brain. Yet in spite of extensive research, how they can learn through synaptic plasticity to carry out complex network computations remains unclear. We argue that two pieces of this puzzle were provided by experimental data from neuroscience. A mathematical result tells us how these pieces need to be combined to enable biologically plausible online network learning through gradient descent, in particular deep reinforcement learning. This learning method–called e-prop–approaches the performance of backpropagation through time (BPTT), the best-known method for training recurrent neural networks in machine learning. In addition, it suggests a method for powerful on-chip learning in energy-efficient spike-based hardware for artificial intelligence. Bellec et al. present a mathematically founded approximation for gradient descent training of recurrent neural networks without backwards propagation in time. This enables biologically plausible training of spike-based neural network models with working memory and supports on-chip training of neuromorphic hardware.

...read moreread less

Proceedings Article•DOI•

Traffic Flow Prediction via Spatial Temporal Graph Neural Network

[...]

Xiaoyang Wang¹, Yao Ma², Yiqi Wang², Wei Jin², Xin Wang, Jiliang Tang², Caiyan Jia¹, Jian Yu¹ - Show less +4 more•Institutions (2)

Beijing Jiaotong University¹, Michigan State University²

20 Apr 2020

TL;DR: A novel spatial temporal graph neural network for traffic flow prediction, which can comprehensively capture spatial and temporal patterns and provides a sequential component to model the traffic flow dynamics which can exploit both local and global temporal dependencies.

...read moreread less

Abstract: Traffic flow analysis, prediction and management are keystones for building smart cities in the new era. With the help of deep neural networks and big traffic data, we can better understand the latent patterns hidden in the complex transportation networks. The dynamic of the traffic flow on one road not only depends on the sequential patterns in the temporal dimension but also relies on other roads in the spatial dimension. Although there are existing works on predicting the future traffic flow, the majority of them have certain limitations on modeling spatial and temporal dependencies. In this paper, we propose a novel spatial temporal graph neural network for traffic flow prediction, which can comprehensively capture spatial and temporal patterns. In particular, the framework offers a learnable positional attention mechanism to effectively aggregate information from adjacent roads. Meanwhile, it provides a sequential component to model the traffic flow dynamics which can exploit both local and global temporal dependencies. Experimental results on various real traffic datasets demonstrate the effectiveness of the proposed framework.

...read moreread less

Journal Article•DOI•

Backpropagation Algorithms and Reservoir Computing in Recurrent Neural Networks for the Forecasting of Complex Spatiotemporal Dynamics

[...]

Pantelis R. Vlachas¹, Jaideep Pathak², Brian R. Hunt², Themistoklis P. Sapsis³, Michelle Girvan², Edward Ott², Petros Koumoutsakos¹ - Show less +3 more•Institutions (3)

ETH Zurich¹, University of Maryland, College Park², Massachusetts Institute of Technology³

01 Jun 2020-Neural Networks

TL;DR: This study establishes that RNNs are a potent computational framework for the learning and forecasting of complex spatiotemporal systems.

...read moreread less

Journal Article•DOI•

Deep Learning for Spatio-Temporal Data Mining: A Survey

[...]

Senzhang Wang¹, Jiannong Cao², Philip S. Yu•Institutions (2)

Nanjing University of Aeronautics and Astronautics¹, Hong Kong Polytechnic University²

22 Sep 2020-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A comprehensive survey on recent progress in applying deep learning techniques for STDM is provided and existing literatures are classified based on the types of spatio-temporal data, the data mining tasks, and the deep learning models.

...read moreread less

Abstract: With the fast development of various positioning techniques such as Global Position System (GPS), mobile devices and remote sensing, spatio-temporal data has become increasingly available nowadays. Mining valuable knowledge from spatio-temporal data is critically important to many real-world applications including human mobility understanding, smart transportation, urban planning, public safety, health care and environmental management. As the number, volume and resolution of spatio-temporal data increase rapidly, traditional data mining methods, especially statistics based methods for dealing with such data are becoming overwhelmed. Recently deep learning models such as recurrent neural network (RNN) and convolutional neural network (CNN) have achieved remarkable success in many domains, and are also widely applied in various spatio-temporal data mining (STDM) tasks such as predictive learning, anomaly detection and classification. In this paper, we provide a comprehensive review of recent progress in applying deep learning techniques for STDM. We first categorize the spatio-temporal data into five different types, and then briefly introduce the deep learning models that are widely used in STDM. Next, we classify existing literature based on the types of spatio-temporal data, the data mining tasks, and the deep learning models, followed by the applications of deep learning for STDM in different domains.

...read moreread less

Journal Article•DOI•

A CNN-RNN Framework for Crop Yield Prediction.

[...]

Saeed Khaki¹, Lizhi Wang¹, Sotirios V. Archontoulis¹•Institutions (1)

Iowa State University¹

24 Jan 2020-Frontiers in Plant Science

TL;DR: The proposed CNN-RNN model, along with other popular methods such as random forest, deep fully connected neural networks (DFNN), and LASSO, was used to forecast corn and soybean yield across the entire Corn Belt for years 2016, 2017, and 2018 using historical data.

...read moreread less

Abstract: Crop yield prediction is extremely challenging due to its dependence on multiple factors such as crop genotype, environmental factors, management practices, and their interactions. This paper presents a deep learning framework using convolutional neural networks (CNNs) and recurrent neural networks (RNNs) for crop yield prediction based on environmental data and management practices. The proposed CNN-RNN model, along with other popular methods such as random forest (RF), deep fully connected neural networks (DFNN), and LASSO, was used to forecast corn and soybean yield across the entire Corn Belt (including 13 states) in the United States for years 2016, 2017, and 2018 using historical data. The new model achieved a root-mean-square-error (RMSE) 9% and 8% of their respective average yields, substantially outperforming all other methods that were tested. The CNN-RNN has three salient features that make it a potentially useful method for other crop yield prediction studies. (1) The CNN-RNN model was designed to capture the time dependencies of environmental factors and the genetic improvement of seeds over time without having their genotype information. (2) The model demonstrated the capability to generalize the yield prediction to untested environments without significant drop in the prediction accuracy. (3) Coupled with the backpropagation method, the model could reveal the extent to which weather conditions, accuracy of weather predictions, soil conditions, and management practices were able to explain the variation in the crop yields.

...read moreread less

Posted Content•

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement

[...]

Yanxin Hu¹, Yun Liu, Shubo Lv, Mengtao Xing¹, Shimin Zhang¹, Yihui Fu¹, Jian Wu², Bihong Zhang, Lei Xie¹ - Show less +5 more•Institutions (2)

Northwestern Polytechnical University¹, Microsoft²

01 Aug 2020-arXiv: Audio and Speech Processing

TL;DR: A new network structure simulating the complex-valued operation, called Deep Complex Convolution Recurrent Network (DCCRN), where both CNN and RNN structures can handle complex- valued operation.

...read moreread less

Abstract: Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods focus on predicting TF-masks or speech spectrum, via a naive convolution neural network (CNN) or recurrent neural network (RNN). Some recent studies use complex-valued spectrogram as a training target but train in a real-valued network, predicting the magnitude and phase component or real and imaginary part, respectively. Particularly, convolution recurrent network (CRN) integrates a convolutional encoder-decoder (CED) structure and long short-term memory (LSTM), which has been proven to be helpful for complex targets. In order to train the complex target more effectively, in this paper, we design a new network structure simulating the complex-valued operation, called Deep Complex Convolution Recurrent Network (DCCRN), where both CNN and RNN structures can handle complex-valued operation. The proposed DCCRN models are very competitive over other previous networks, either on objective or subjective metric. With only 3.7M parameters, our DCCRN models submitted to the Interspeech 2020 Deep Noise Suppression (DNS) challenge ranked first for the real-time-track and second for the non-real-time track in terms of Mean Opinion Score (MOS).

...read moreread less

Journal Article•DOI•

Direction-Aware Spatial Context Features for Shadow Detection and Removal

[...]

Xiaowei Hu¹, Chi-Wing Fu¹, Lei Zhu¹, Jing Qin², Pheng-Ann Heng¹ - Show less +1 more•Institutions (2)

The Chinese University of Hong Kong¹, Hong Kong Polytechnic University²

01 Nov 2020-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Zhang et al. as discussed by the authors proposed a direction-aware attention mechanism in a spatial recurrent neural network (RNN) by introducing attention weights when aggregating spatial context features in the RNN.

...read moreread less

Abstract: Shadow detection and shadow removal are fundamental and challenging tasks, requiring an understanding of the global image semantics. This paper presents a novel deep neural network design for shadow detection and removal by analyzing the spatial image context in a direction-aware manner. To achieve this, we first formulate the direction-aware attention mechanism in a spatial recurrent neural network (RNN) by introducing attention weights when aggregating spatial context features in the RNN. By learning these weights through training, we can recover direction-aware spatial context (DSC) for detecting and removing shadows. This design is developed into the DSC module and embedded in a convolutional neural network (CNN) to learn the DSC features at different levels. Moreover, we design a weighted cross entropy loss to make effective the training for shadow detection and further adopt the network for shadow removal by using a euclidean loss function and formulating a color transfer function to address the color and luminosity inconsistencies in the training pairs. We employed two shadow detection benchmark datasets and two shadow removal benchmark datasets, and performed various experiments to evaluate our method. Experimental results show that our method performs favorably against the state-of-the-art methods for both shadow detection and shadow removal.

...read moreread less

Proceedings Article•DOI•

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement

[...]

Yanxin Hu¹, Yun Liu, Shubo Lv, Mengtao Xing¹, Shimin Zhang¹, Yihui Fu¹, Jian Wu², Bihong Zhang, Lei Xie¹ - Show less +5 more•Institutions (2)

Northwestern Polytechnical University¹, Microsoft²

01 Aug 2020

TL;DR: Deep Complex Convolution Recurrent Network (DCCRN) as mentioned in this paper is a new network structure simulating the complex-valued operation, where both convolutional encoder-decoder (CED) and long short-term memory (LSTM) structures can handle complexvalued operation.

...read moreread less

Proceedings Article•DOI•

Transformer-Based Acoustic Modeling for Hybrid Speech Recognition

[...]

Yongqiang Wang¹, Abdelrahman Mohamed¹, Due Le¹, Chunxi Liu¹, Alex Xiao¹, Jay Mahadeokar¹, Hongzhao Huang¹, Andros Tjandra², Xiaohui Zhang¹, Frank Zhang¹, Christian Fuegen¹, Geoffrey Zweig¹, Michael L. Seltzer¹ - Show less +9 more•Institutions (2)

Facebook¹, Nara Institute of Science and Technology²

04 May 2020

TL;DR: This article proposed and evaluated transformer-based acoustic models (AMs) for hybrid speech recognition, including various positional embedding methods and an iterated loss to enable training deep transformers.

...read moreread less

Abstract: We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition. Several modeling choices are discussed in this work, including various positional embedding methods and an iterated loss to enable training deep transformers. We also present a preliminary study of using limited right context in transformer models, which makes it possible for streaming applications. We demonstrate that on the widely used Librispeech benchmark, our transformer-based AM outperforms the best published hybrid result by 19% to 26% relative when the standard n-gram language model (LM) is used. Combined with neural network LM for rescoring, our proposed approach achieves state-of-the-art results on Librispeech. Our findings are also confirmed on a much larger internal dataset.

...read moreread less

Journal Article•DOI•

Fractional Neuro-Sequential ARFIMA-LSTM for Financial Market Forecasting

[...]

Ayaz Hussain Bukhari¹, Muhammad Asif Zahoor Raja², Muhammad Sulaiman¹, Saeed Islam¹, Muhammad Shoaib³, Poom Kumam⁴ - Show less +2 more•Institutions (4)

Abdul Wali Khan University Mardan¹, National Yunlin University of Science and Technology², COMSATS Institute of Information Technology³, King Mongkut's University of Technology Thonburi⁴

06 Apr 2020-IEEE Access

TL;DR: A novel hybrid model with the strength of fractional order derivative is presented with their dynamical features of deep learning, long-short term memory (LSTM) networks, to predict the abrupt stochastic variation of the financial market.

...read moreread less

Abstract: Forecasting of fast fluctuated and high-frequency financial data is always a challenging problem in the field of economics and modelling. In this study, a novel hybrid model with the strength of fractional order derivative is presented with their dynamical features of deep learning, long-short term memory (LSTM) networks, to predict the abrupt stochastic variation of the financial market. Stock market prices are dynamic, highly sensitive, nonlinear and chaotic. There are different techniques for forecast prices in the time-variant domain and due to variability and uncertain behavior in stock prices, traditional methods, such as data mining, statistical approaches, and non-deep neural networks models are not suited for prediction and generalized forecasting stock prices. While autoregressive fractional integrated moving average (ARFIMA) model provides a flexible tool for classes of long-memory models. The advancement of machine learning-based deep non-linear modelling confirms that the hybrid model efficiently extracts profound features and model non-linear functions. LSTM networks are a special kind of recurrent neural network (RNN) that map sequences of input observations to output observations with capabilities of long-term dependencies. A novel ARFIMA-LSTM hybrid recurrent network is presented in which ARFIMA model-based filters having the linear tendencies better than ARIMA model in the data and passes the residual to the LSTM model that captures nonlinearity in the residual values with the help of exogenous dependent variables. The model not only minimizes the volatility problem but also overcome the over fitting problem of neural networks. The model is evaluated using PSX company data of the stock market based on RMSE, MSE and MAPE along with a comparison of ARIMA, LSTM model and generalized regression radial basis neural network (GRNN) ensemble method independently. The forecasting performance indicates the effectiveness of the proposed AFRIMA-LSTM hybrid model to improve around 80% accuracy on RMSE as compared to traditional forecasting counterparts.

...read moreread less

Journal Article•DOI•

A Review on Deep Learning Methods for ECG Arrhythmia Classification

[...]

Zahra Ebrahimi¹, Mohammad Loni², Masoud Daneshtalab², Arash Gharehbaghi²•Institutions (2)

University of Shahrood¹, Mälardalen University College²

01 Sep 2020-Expert Systems With Applications

TL;DR: A comprehensive review study on the recent DL methods applied to the ECG signal for the classification purposes, which showed high accuracy in correct classification of Atrial Fibrillation, Supraventricular ECTopic Beats, and Ventricular Ectopic Beats using the GRU, CNN, and LSTM, respectively.

...read moreread less

Abstract: Deep Learning (DL) has recently become a topic of study in different applications including healthcare, in which timely detection of anomalies on Electrocardiogram (ECG) can play a vital role in patient monitoring. This paper presents a comprehensive review study on the recent DL methods applied to the ECG signal for the classification purposes. This study considers various types of the DL methods such as Convolutional Neural Network (CNN), Deep Belief Network (DBN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). From the 75 studies reported within 2017 and 2018, CNN is dominantly observed as the suitable technique for feature extraction, seen in 52% of the studies. DL methods showed high accuracy in correct classification of Atrial Fibrillation (AF) (100%), Supraventricular Ectopic Beats (SVEB) (99.8%), and Ventricular Ectopic Beats (VEB) (99.7%) using the GRU/LSTM, CNN, and LSTM, respectively.

...read moreread less

Journal Article•DOI•

Machine learning based approaches for detecting COVID-19 using clinical text data.

[...]

Akib Mohi Ud Din Khanday¹, Syed Tanzeel Rabani¹, Qamar Rayees Khan¹, Nusrat Rouf¹, Masarat Mohi Ud Din² - Show less +1 more•Institutions (2)

Baba Ghulam Shah Badshah University¹, Government Medical College, Srinagar²

30 Jun 2020-International Journal of Information Technology

TL;DR: This paper classified textual clinical reports into four classes by using classical and ensemble machine learning algorithms, and Logistic regression and Multinomial Naïve Bayes showed better results than other ML algorithms by having 96.2% testing accuracy.

...read moreread less

Abstract: Technology advancements have a rapid effect on every field of life, be it medical field or any other field. Artificial intelligence has shown the promising results in health care through its decision making by analysing the data. COVID-19 has affected more than 100 countries in a matter of no time. People all over the world are vulnerable to its consequences in future. It is imperative to develop a control system that will detect the coronavirus. One of the solution to control the current havoc can be the diagnosis of disease with the help of various AI tools. In this paper, we classified textual clinical reports into four classes by using classical and ensemble machine learning algorithms. Feature engineering was performed using techniques like Term frequency/inverse document frequency (TF/IDF), Bag of words (BOW) and report length. These features were supplied to traditional and ensemble machine learning classifiers. Logistic regression and Multinomial Naive Bayes showed better results than other ML algorithms by having 96.2% testing accuracy. In future recurrent neural network can be used for better accuracy.

...read moreread less

Journal Article•DOI•

Stacked bidirectional and unidirectional LSTM recurrent neural network for forecasting network-wide traffic state with missing values

[...]

Zhiyong Cui¹, Ruimin Ke¹, Ziyuan Pu¹, Yinhai Wang¹•Institutions (1)

University of Washington¹

01 Sep 2020-Transportation Research Part C-emerging Technologies

TL;DR: Experimental results indicate that the proposed SBU-LSTM architecture, especially the two-layer BDLSTM network, can achieve superior performance for the network-wide traffic prediction in both accuracy and robustness and comprehensive comparison results show that the suggested data imputation mechanism in the RNN-based models can achieve outstanding prediction performance.

...read moreread less

Abstract: Short-term traffic forecasting based on deep learning methods, especially recurrent neural networks (RNN), has received much attention in recent years. However, the potential of RNN-based models in traffic forecasting has not yet been fully exploited in terms of the predictive power of spatial–temporal data and the capability of handling missing data. In this paper, we focus on RNN-based models and attempt to reformulate the way to incorporate RNN and its variants into traffic prediction models. A stacked bidirectional and unidirectional LSTM network architecture (SBU-LSTM) is proposed to assist the design of neural network structures for traffic state forecasting. As a key component of the architecture, the bidirectional LSTM (BDLSM) is exploited to capture the forward and backward temporal dependencies in spatiotemporal data. To deal with missing values in spatial–temporal data, we also propose a data imputation mechanism in the LSTM structure (LSTM-I) by designing an imputation unit to infer missing values and assist traffic prediction. The bidirectional version of LSTM-I is incorporated in the SBU-LSTM architecture. Two real-world network-wide traffic state datasets are used to conduct experiments and published to facilitate further traffic prediction research. The prediction performance of multiple types of multi-layer LSTM or BDLSTM models is evaluated. Experimental results indicate that the proposed SBU-LSTM architecture, especially the two-layer BDLSTM network, can achieve superior performance for the network-wide traffic prediction in both accuracy and robustness. Further, comprehensive comparison results show that the proposed data imputation mechanism in the RNN-based models can achieve outstanding prediction performance when the model’s input data contains different patterns of missing values.

...read moreread less

Journal Article•DOI•

A hybrid deep learning model for short-term PV power forecasting

[...]

Li Pengtao¹, Li Pengtao², Kaile Zhou¹, Kaile Zhou², Xinhui Lu¹, Xinhui Lu², Shanlin Yang¹, Shanlin Yang² - Show less +4 more•Institutions (2)

Hefei University of Technology¹, Chinese Ministry of Education²

01 Feb 2020-Applied Energy

TL;DR: The values of three performance evaluation indicators, MBE, MAPE, and RMSE, show that the proposed hybrid deep learning model exhibits superior performance in both forecasting accuracy and stability.

...read moreread less

Journal Article•DOI•

An optimized model using LSTM network for demand forecasting

[...]

Hossein Abbasimehr¹, Mostafa Shabani², Mohsen Yousefi•Institutions (2)

Azarbaijan Shahid Madani University¹, K.N.Toosi University of Technology²

01 May 2020-Computers & Industrial Engineering

TL;DR: The proposed method automatically selects the best forecasting model by considering different combinations of LSTM hyperparameters for a given time series using the grid search method, which has the ability to capture nonlinear patterns in time seriesData, while considering the inherent characteristics of non-stationary time series data.

...read moreread less

Journal Article•DOI•

Stock Market Prediction Using LSTM Recurrent Neural Network

[...]

Adil Moghar, Mhamed Hamiche

01 Jan 2020-Procedia Computer Science

TL;DR: This article aims to build a model using Recurrent Neural Networks (RNN) and especially Long-Short Term Memory model (LSTM) to predict future stock market values.

...read moreread less

Journal Article•DOI•

Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach

[...]

Kasun Bandara¹, Christoph Bergmeir¹, Slawek Smyl²•Institutions (2)

Monash University¹, Uber ²

01 Feb 2020-Expert Systems With Applications

TL;DR: In this article, a prediction model that can be used with different types of RNN models on subgroups of similar time series, which are identified by time series clustering techniques is presented.

...read moreread less

Abstract: With the advent of Big Data, nowadays in many applications databases containing large quantities of similar time series are available. Forecasting time series in these domains with traditional univariate forecasting procedures leaves great potentials for producing accurate forecasts untapped. Recurrent neural networks (RNNs), and in particular Long Short Term Memory (LSTM) networks, have proven recently that they are able to outperform state-of-the-art univariate time series forecasting methods in this context, when trained across all available time series. However, if the time series database is heterogeneous, accuracy may degenerate, so that on the way towards fully automatic forecasting methods in this space, a notion of similarity between the time series needs to be built into the methods. To this end, we present a prediction model that can be used with different types of RNN models on subgroups of similar time series, which are identified by time series clustering techniques. We assess our proposed methodology using LSTM networks, a widely popular RNN variant, together with various clustering algorithms, such as kMeans, DBScan, Partition Around Medoids (PAM), and Snob. Our method achieves competitive results on benchmarking datasets under competition evaluation procedures. In particular, in terms of mean sMAPE accuracy it consistently outperforms the baseline LSTM model, and outperforms all other methods on the CIF2016 forecasting competition dataset.

...read moreread less

Journal Article•DOI•

Machine Learning Applied to Electrified Vehicle Battery State of Charge and State of Health Estimation: State-of-the-Art

[...]

Carlos Vidal¹, Pawel Malysz¹, Phillip J. Kollmeyer¹, Ali Emadi¹•Institutions (1)

McMaster University¹

16 Mar 2020-IEEE Access

TL;DR: A survey of battery state estimation methods based on ML approaches such as feedforward neural networks, recurrent neural networks (RNNs), support vector machines (SVM), radial basis functions (RBF), and Hamming networks is provided.

...read moreread less

Abstract: The growing interest and recent breakthroughs in artificial intelligence and machine learning (ML) have actively contributed to an increase in research and development of new methods to estimate the states of electrified vehicle batteries. Data-driven approaches, such as ML, are becoming more popular for estimating the state of charge (SOC) and state of health (SOH) due to greater availability of battery data and improved computing power capabilities. This paper provides a survey of battery state estimation methods based on ML approaches such as feedforward neural networks (FNNs), recurrent neural networks (RNNs), support vector machines (SVM), radial basis functions (RBF), and Hamming networks. Comparisons between methods are shown in terms of data quality, inputs and outputs, test conditions, battery types, and stated accuracy to give readers a bigger picture view of the ML landscape for SOC and SOH estimation. Additionally, to provide insight into how to best approach with the comparison of different neural network structures, an FNN and long short-term memory (LSTM) RNN are trained fifty times each for 3000 epochs. The error is somewhat different for each training repetition due to the random initial values of the trainable parameters, demonstrating that it is important to train networks multiple times to achieve the best result. Furthermore, it is recommended that when performing a comparison among estimation techniques such as those presented in this review paper, the compared networks should have a similar number of learnable parameters and be trained and tested with identical data. Otherwise, it is difficult to make a general conclusion regarding the quality of a given estimation technique.

...read moreread less

Collapse