scispace - formally typeset
Open AccessPosted Content

Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification.

Kyuyeon Hwang, +1 more
- 21 Nov 2015 - 
TLDR
An expectation-maximization (EM) based online CTC algorithm is introduced that enables unidirectional RNNs to learn sequences that are longer than the amount of unrolling and can also be trained to process an infinitely long input sequence without pre-segmentation or external reset.
Abstract
Connectionist temporal classification (CTC) based supervised sequence training of recurrent neural networks (RNNs) has shown great success in many machine learning areas including end-to-end speech and handwritten character recognition. For the CTC training, however, it is required to unroll (or unfold) the RNN by the length of an input sequence. This unrolling requires a lot of memory and hinders a small footprint implementation of online learning or adaptation. Furthermore, the length of training sequences is usually not uniform, which makes parallel training with multiple sequences inefficient on shared memory models such as graphics processing units (GPUs). In this work, we introduce an expectation-maximization (EM) based online CTC algorithm that enables unidirectional RNNs to learn sequences that are longer than the amount of unrolling. The RNNs can also be trained to process an infinitely long input sequence without pre-segmentation or external reset. Moreover, the proposed approach allows efficient parallel training on GPUs. For evaluation, phoneme recognition and end-to-end speech recognition examples are presented on the TIMIT and Wall Street Journal (WSJ) corpora, respectively. Our online model achieves 20.7% phoneme error rate (PER) on the very long input sequence that is generated by concatenating all 192 utterances in the TIMIT core test set. On WSJ, a network can be trained with only 64 times of unrolling while sacrificing 4.5% relative word error rate (WER).

read more

Citations
More filters
Posted Content

Deep Lip Reading: a comparison of models and an online application

TL;DR: The best performing model improves the state-of-the-art word error rate on the challenging BBC-Oxford Lip Reading Sentences 2 (LRS2) benchmark dataset by over 20 percent.
Proceedings ArticleDOI

Character-level incremental speech recognition with recurrent neural networks

TL;DR: This work proposes tree-based online beam search with additional depth-pruning, which enables the system to process infinitely long input speech with low latency and not only responds quickly on speech but also can dictate out-of-vocabulary (OOV) words according to pronunciation.
Posted Content

Online Keyword Spotting with a Character-Level Recurrent Neural Network.

TL;DR: Experimental results show that the proposed keyword spotter significantly outperforms the deep neural network (DNN) and hidden Markov model (HMM) based keyword-filler model even with less computations.
Journal ArticleDOI

Influenza-like illness prediction using a long short-term memory deep learning model with multiple open data sources

TL;DR: This study collects information on the influenza-like illness emergency department visits to the Taiwan Centers for Disease Control, and the PM 2.5 open-source data from the Taiwan Environmental Protection Administration's air quality monitoring network to predict whether there is an outbreak of influenza in the region.
Proceedings ArticleDOI

An End-to-end Framework for Audio-to-Score Music Transcription on Monophonic Excerpts

TL;DR: This is the first automatic music transcription approach which obtains directly a symbolic score from audio, instead of performing separate stages for piano-roll estimation, pitch detection and note tracking, meter detection or key estimation.
References
More filters
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Proceedings ArticleDOI

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.
Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Proceedings Article

Sequence to Sequence Learning with Neural Networks

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.