Recent advances in conversational speech recognition using convolutional and recurrent neural networks

doi:10.1147/JRD.2017.2701178

Journal ArticleDOI

Recent advances in conversational speech recognition using convolutional and recurrent neural networks

George Saon, +1 more

- 01 Jul 2017 -

Ibm Journal of Research and Development

- Vol. 61, Iss: 4

Chats0

TLDR

A set of deep learning techniques that proved to be particularly successful in achieving performance gains in word error rate on a popular large vocabulary conversational speech recognition benchmark task (“Switchboard”) are described.

Abstract:

Deep learning methodologies have had a major impact on performance across a wide variety of machine learning tasks, and speech recognition is no exception. We describe a set of deep learning techniques that proved to be particularly successful in achieving performance gains in word error rate on a popular large vocabulary conversational speech recognition benchmark task (“Switchboard”). We found that the best performance is achieved by combining features from both recurrent and convolutional neural networks. We compare two recurrent architectures: partially unfolded nets with max-out activations and bidirectional long short-term memory nets. In addition, inspired by the success of convolutional networks for image classification, we designed a convolutional net with many convolutional layers and small kernels that create a receptive field with more nonlinearity and fewer parameters than standard configurations. When combined, these neural networks achieve a word error rate of 6.2% on this difficult task; this was the best reported rate at the time of this writing and is even more remarkable given that human performance itself is estimated to be 4% on this data.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A combined method for state-of-charge estimation for lithium-ion batteries using a long short-term memory network and an adaptive cubature Kalman filter

Yong Tian, +4 more

- 01 May 2020 -

Applied Energy

TL;DR: Experimental results reveal that the proposed method can dramatically improve estimation accuracy compared with the solo LSTM method and the combined L STM-CKF method, and it exhibits excellent generalization ability for different datasets and convergence ability to address initial errors.

...read moreread less

Journal ArticleDOI

Towards Robust Pattern Recognition: A Review

Xu-Yao Zhang, +2 more

TL;DR: A comprehensive review of research toward robust pattern recognition from the perspective of breaking three basic and implicit assumptions: closed-world assumption, independent and identically distributed assumption, and clean and big data assumption, which form the foundation of most pattern recognition models.

...read moreread less

Journal ArticleDOI

Using long short-term memory networks for river flow prediction

Wei Xu, +7 more

- 01 Dec 2020 -

Hydrology Research

TL;DR: Wang et al. as mentioned in this paper used LSTM networks to predict the 10-day average flow and daily flow in the Upper Yangtze and Hun river basins with different characteristics.

...read moreread less

Proceedings ArticleDOI

An Investigation of Mixup Training Strategies for Acoustic Models in ASR

Ivan Medennikov, +6 more

TL;DR: This paper focuses on applying mixup to automatic speech recognition (ASR), and considers it as a method of data augmentation as well as regularization, and compares it with widely used speed perturbation and dropout techniques.

...read moreread less

Journal ArticleDOI

Investigation on Works and Military Applications of Artificial Intelligence

Wei Wang, +4 more

- 17 Jul 2020 -

IEEE Access

TL;DR: This paper lists the different intelligence levels by introducing their corresponding applications and reviews the technical classification based on the related concepts, and discusses technical and practical difficulties.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Posted Content

Empirical evaluation of gated recurrent neural networks on sequence modeling

Junyoung Chung, +5 more

- 11 Dec 2014 -

arXiv: Neural and Evolutionary Computing

TL;DR: These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM.

...read moreread less

Proceedings ArticleDOI

Speech recognition with deep recurrent neural networks

Alex Graves, +2 more

TL;DR: This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.

...read moreread less

Posted Content

Speech Recognition with Deep Recurrent Neural Networks

Alex Graves, +2 more

- 22 Mar 2013 -

arXiv: Neural and Evolutionary Computing

TL;DR: In this paper, deep recurrent neural networks (RNNs) are used to combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.

...read moreread less

Collapse

Computer Speech & Language

An analysis of convolutional neural networks for speech recognition

Jui-Ting Huang, +2 more

Recent advances in conversational speech recognition using convolutional and recurrent neural networks

Citations

A combined method for state-of-charge estimation for lithium-ion batteries using a long short-term memory network and an adaptive cubature Kalman filter

Towards Robust Pattern Recognition: A Review

Using long short-term memory networks for river flow prediction

An Investigation of Mixup Training Strategies for Acoustic Models in ASR

Investigation on Works and Military Applications of Artificial Intelligence

References

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Empirical evaluation of gated recurrent neural networks on sequence modeling

Speech recognition with deep recurrent neural networks

Speech Recognition with Deep Recurrent Neural Networks

Related Papers (5)

Very deep multilingual convolutional neural networks for LVCSR

Deep convolutional neural networks for LVCSR

An empirical study of learning rates in deep neural networks for speech recognition

Building DNN acoustic models for large vocabulary speech recognition

An analysis of convolutional neural networks for speech recognition

Trending Questions (1)