Phoneme recognition using time-delay neural networks

doi:10.1109/29.21701

Journal ArticleDOI

Phoneme recognition using time-delay neural networks

Alex Waibel, +4 more

- 01 Mar 1989 -

IEEE Transactions on Acoustics, Speech, ...

- Vol. 37, Iss: 3, pp 393-404

TLDR

In this article, the authors presented a time-delay neural network (TDNN) approach to phoneme recognition, which is characterized by two important properties: (1) using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation; and (2) the time delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independently of position in time and therefore not blurred by temporal shifts in the input

Abstract:

The authors present a time-delay neural network (TDNN) approach to phoneme recognition which is characterized by two important properties: (1) using a three-layer arrangement of simple computing units, a hierarchy can be constructed that allows for the formation of arbitrary nonlinear decision surfaces, which the TDNN learns automatically using error backpropagation; and (2) the time-delay arrangement enables the network to discover acoustic-phonetic features and the temporal relationships between them independently of position in time and therefore not blurred by temporal shifts in the input. As a recognition task, the speaker-dependent recognition of the phonemes B, D, and G in varying phonetic contexts was chosen. For comparison, several discrete hidden Markov models (HMM) were trained to perform the same task. Performance evaluation over 1946 testing tokens from three speakers showed that the TDNN achieves a recognition rate of 98.5% correct while the rate obtained by the best of the HMMs was only 93.7%. >

Phoneme recognition using time-delay neural networks

Citations

Deep learning

Gradient-based learning applied to document recognition

Neural Networks for Pattern Recognition

Finding Structure in Time

The self-organizing map

References

Learning representations by back-propagating errors

An introduction to computing with neural nets

Continuous speech recognition by statistical methods

Neocognitron: A neural network model for a mechanism of visual pattern recognition

Connectionist learning procedures

Related Papers (5)

Finding Structure in Time

Learning internal representations by error propagation

Long short-term memory

Learning representations by back-propagating errors

A tutorial on hidden Markov models and selected applications in speech recognition