Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

doi:10.1145/1143844.1143891

Proceedings ArticleDOI

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

- pp 369-376

TLDR

This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.

Abstract:

Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.

Citations

PDF

Open Access

More filters

Book

Deep Learning

Ian Goodfellow, +2 more

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Journal ArticleDOI

Deep learning in neural networks

Jürgen Schmidhuber

- 01 Jan 2015 -

Neural Networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

...read moreread less

Proceedings Article

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

...read moreread less

Posted Content

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

- 10 Sep 2014 -

arXiv: Computation and Language

TL;DR: This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

...read moreread less

Proceedings ArticleDOI

Speech recognition with deep recurrent neural networks

Alex Graves, +2 more

TL;DR: This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Connectionist Speech Recognition: A Hybrid Approach

Hervé Bourlard, +1 more

TL;DR: Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous speech recognition systems based on Hidden Markov Models (HMMs) to improve their performance.

...read moreread less

Book ChapterDOI

Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition

John S. Bridle

TL;DR: In this article, the outputs of the network are treated as probabilities of alternatives (e.g. pattern classes), conditioned on the inputs, and two modifications are proposed: probability scoring, which is an alternative to squared error minimisation, and a normalised exponential (softmax) multi-input generalisation of the logistic nonlinearity.

...read moreread less

Book ChapterDOI

Bidirectional LSTM networks for improved phoneme classification and recognition

Alex Graves, +2 more

TL;DR: In this paper, two experiments on the TIMIT speech corpus with bidirectional and unidirectional Long Short Term Memory networks are carried out and it is found that a hybrid BLSTM-HMM system improves on an equivalent traditional HMM system.

...read moreread less

Journal ArticleDOI

An application of recurrent nets to phone probability estimation

A.J. Robinson

- 01 Mar 1994 -

IEEE Transactions on Neural Networks

TL;DR: Recognition results are presented for the DARPA TIMIT and Resource Management tasks, and it is concluded that recurrent nets are competitive with traditional means for performing phone probability estimation.

...read moreread less

Journal ArticleDOI

Fast curvature matrix-vector products for second-order gradient descent

Nicol N. Schraudolph

- 01 Jul 2002 -

Neural Computation

TL;DR: A generic method for iteratively approximating various second-order gradient steps-Newton, Gauss- newton, Levenberg-Marquardt, and natural gradient-in linear time per iteration, using special curvature matrix-vector products that can be computed in O(n).

...read moreread less

Related Papers (5)

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

Citations

Deep Learning

Deep learning in neural networks

Sequence to Sequence Learning with Neural Networks

Sequence to Sequence Learning with Neural Networks

Speech recognition with deep recurrent neural networks

References

Connectionist Speech Recognition: A Hybrid Approach

Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition

Bidirectional LSTM networks for improved phoneme classification and recognition

An application of recurrent nets to phone probability estimation

Fast curvature matrix-vector products for second-order gradient descent

Related Papers (5)

Long short-term memory

Attention is All you Need

Neural Machine Translation by Jointly Learning to Align and Translate

Deep Residual Learning for Image Recognition

Librispeech: An ASR corpus based on public domain audio books