Open AccessPosted Content
Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification.
Kyuyeon Hwang,Wonyong Sung +1 more
TLDR
An expectation-maximization (EM) based online CTC algorithm is introduced that enables unidirectional RNNs to learn sequences that are longer than the amount of unrolling and can also be trained to process an infinitely long input sequence without pre-segmentation or external reset.Abstract:
Connectionist temporal classification (CTC) based supervised sequence training of recurrent neural networks (RNNs) has shown great success in many machine learning areas including end-to-end speech and handwritten character recognition. For the CTC training, however, it is required to unroll (or unfold) the RNN by the length of an input sequence. This unrolling requires a lot of memory and hinders a small footprint implementation of online learning or adaptation. Furthermore, the length of training sequences is usually not uniform, which makes parallel training with multiple sequences inefficient on shared memory models such as graphics processing units (GPUs). In this work, we introduce an expectation-maximization (EM) based online CTC algorithm that enables unidirectional RNNs to learn sequences that are longer than the amount of unrolling. The RNNs can also be trained to process an infinitely long input sequence without pre-segmentation or external reset. Moreover, the proposed approach allows efficient parallel training on GPUs. For evaluation, phoneme recognition and end-to-end speech recognition examples are presented on the TIMIT and Wall Street Journal (WSJ) corpora, respectively. Our online model achieves 20.7% phoneme error rate (PER) on the very long input sequence that is generated by concatenating all 192 utterances in the TIMIT core test set. On WSJ, a network can be trained with only 64 times of unrolling while sacrificing 4.5% relative word error rate (WER).read more
Citations
More filters
Posted Content
Deep Lip Reading: a comparison of models and an online application
TL;DR: The best performing model improves the state-of-the-art word error rate on the challenging BBC-Oxford Lip Reading Sentences 2 (LRS2) benchmark dataset by over 20 percent.
Proceedings ArticleDOI
Character-level incremental speech recognition with recurrent neural networks
Kyuyeon Hwang,Wonyong Sung +1 more
TL;DR: This work proposes tree-based online beam search with additional depth-pruning, which enables the system to process infinitely long input speech with low latency and not only responds quickly on speech but also can dictate out-of-vocabulary (OOV) words according to pronunciation.
Posted Content
Online Keyword Spotting with a Character-Level Recurrent Neural Network.
TL;DR: Experimental results show that the proposed keyword spotter significantly outperforms the deep neural network (DNN) and hidden Markov model (HMM) based keyword-filler model even with less computations.
Journal ArticleDOI
Influenza-like illness prediction using a long short-term memory deep learning model with multiple open data sources
TL;DR: This study collects information on the influenza-like illness emergency department visits to the Taiwan Centers for Disease Control, and the PM 2.5 open-source data from the Taiwan Environmental Protection Administration's air quality monitoring network to predict whether there is an outbreak of influenza in the region.
Proceedings ArticleDOI
An End-to-end Framework for Audio-to-Score Music Transcription on Monophonic Excerpts
TL;DR: This is the first automatic music transcription approach which obtains directly a symbolic score from audio, instead of performing separate stages for piano-roll estimation, pitch detection and note tracking, meter detection or key estimation.
References
More filters
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Proceedings ArticleDOI
Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation
Kyunghyun Cho,Bart van Merriënboer,Caglar Gulcehre,Dzmitry Bahdanau,Fethi Bougares,Holger Schwenk,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio +8 more
TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.
Posted Content
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Proceedings Article
Sequence to Sequence Learning with Neural Networks
TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Related Papers (5)
Sequence training of multiple deep neural networks for better performance and faster training speed
Pan Zhou,Li-Rong Dai,Hui Jiang +2 more