Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Open AccessProceedings Article

Framewise phoneme classification with bidirectional LSTM and other neural network architectures

Alex Graves, +1 more

- Vol. 18, pp 602-610

Chats0

TLDR

In this article, a modified, full gradient version of the LSTM learning algorithm was used for framewise phoneme classification, using the TIMIT database, and the results support the view that contextual information is crucial to speech processing, and suggest that bidirectional networks outperform unidirectional ones.

Abstract:

In this paper, we present bidirectional Long Short Term Memory (LSTM) networks, and a modified, full gradient version of the LSTM learning algorithm. We evaluate Bidirectional LSTM (BLSTM) and several other network architectures on the benchmark task of framewise phoneme classification, using the TIMIT database. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short Term Memory (LSTM) is much faster and also more accurate than both standard Recurrent Neural Nets (RNNs) and time-windowed Multilayer Perceptrons (MLPs). Our results support the view that contextual information is crucial to speech processing, and suggest that BLSTM is an effective architecture with which to exploit it'.

Citations

PDF

Open Access

More filters

Posted Content

"Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection

William Yang Wang

- 01 May 2017 -

arXiv: Computation and Language

TL;DR: Liar as mentioned in this paper is a large dataset of 12.8k manually labeled short statements in various contexts from this http URL, which provides detailed analysis report and links to source documents for each case.

...read moreread less

Journal ArticleDOI

Speech synthesis from neural decoding of spoken sentences

Gopala Krishna Anumanchipalli, +2 more

- 01 Apr 2019 -

Nature

TL;DR: A neural decoder that explicitly leverages kinematic and sound representations encoded in human cortical activity to synthesize audible speech, which is readily identified and transcribed by listeners and could synthesize speech when a participant silently mimed sentences.

...read moreread less

Journal ArticleDOI

Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection

Emre Cakir, +4 more

- 01 Jun 2017 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: In this paper, a convolutional recurrent neural network (CRNN) was proposed for polyphonic sound event detection task and compared with CNN, RNN and other established methods, and observed a considerable improvement for four different datasets consisting of everyday sound events.

...read moreread less

Proceedings Article

Learning The Difference That Makes A Difference With Counterfactually-Augmented Data

Divyansh Kaushik, +2 more

TL;DR: This paper focuses on natural language processing, introducing methods and resources for training models less sensitive to spurious patterns, and task humans with revising each document so that it accords with a counterfactual target label and retains internal coherence.

...read moreread less

Posted Content

Jointly Modeling Embedding and Translation to Bridge Video and Language

Yingwei Pan, +4 more

- 07 May 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A novel unified framework, named Long Short-Term Memory with visual-semantic Embedding (LSTM-E), which can simultaneously explore the learning of LSTM and visual- semantic embedding and outperforms several state-of-the-art techniques in predicting Subject-Verb-Object (SVO) triplets.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Book

Neural networks for pattern recognition

Christopher M. Bishop

TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.

...read moreread less

Journal ArticleDOI

Bidirectional recurrent neural networks

Mike Schuster, +1 more

- 01 Nov 1997 -

IEEE Transactions on Signal Processing

TL;DR: It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.

...read moreread less