scispace - formally typeset
Journal ArticleDOI

Recent advances in conversational speech recognition using convolutional and recurrent neural networks

George Saon, +1 more
- 01 Jul 2017 - 
- Vol. 61, Iss: 4
Reads0
Chats0
TLDR
A set of deep learning techniques that proved to be particularly successful in achieving performance gains in word error rate on a popular large vocabulary conversational speech recognition benchmark task (“Switchboard”) are described.
Abstract
Deep learning methodologies have had a major impact on performance across a wide variety of machine learning tasks, and speech recognition is no exception. We describe a set of deep learning techniques that proved to be particularly successful in achieving performance gains in word error rate on a popular large vocabulary conversational speech recognition benchmark task (“Switchboard”). We found that the best performance is achieved by combining features from both recurrent and convolutional neural networks. We compare two recurrent architectures: partially unfolded nets with max-out activations and bidirectional long short-term memory nets. In addition, inspired by the success of convolutional networks for image classification, we designed a convolutional net with many convolutional layers and small kernels that create a receptive field with more nonlinearity and fewer parameters than standard configurations. When combined, these neural networks achieve a word error rate of 6.2% on this difficult task; this was the best reported rate at the time of this writing and is even more remarkable given that human performance itself is estimated to be 4% on this data.

read more

Citations
More filters
Journal ArticleDOI

A combined method for state-of-charge estimation for lithium-ion batteries using a long short-term memory network and an adaptive cubature Kalman filter

TL;DR: Experimental results reveal that the proposed method can dramatically improve estimation accuracy compared with the solo LSTM method and the combined L STM-CKF method, and it exhibits excellent generalization ability for different datasets and convergence ability to address initial errors.
Journal ArticleDOI

Towards Robust Pattern Recognition: A Review

TL;DR: A comprehensive review of research toward robust pattern recognition from the perspective of breaking three basic and implicit assumptions: closed-world assumption, independent and identically distributed assumption, and clean and big data assumption, which form the foundation of most pattern recognition models.
Journal ArticleDOI

Using long short-term memory networks for river flow prediction

TL;DR: Wang et al. as mentioned in this paper used LSTM networks to predict the 10-day average flow and daily flow in the Upper Yangtze and Hun river basins with different characteristics.
Proceedings ArticleDOI

An Investigation of Mixup Training Strategies for Acoustic Models in ASR

TL;DR: This paper focuses on applying mixup to automatic speech recognition (ASR), and considers it as a method of data augmentation as well as regularization, and compares it with widely used speed perturbation and dropout techniques.
Journal ArticleDOI

Investigation on Works and Military Applications of Artificial Intelligence

TL;DR: This paper lists the different intelligence levels by introducing their corresponding applications and reviews the technical classification based on the related concepts, and discusses technical and practical difficulties.
References
More filters
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Posted Content

Empirical evaluation of gated recurrent neural networks on sequence modeling

TL;DR: These advanced recurrent units that implement a gating mechanism, such as a long short-term memory (LSTM) unit and a recently proposed gated recurrent unit (GRU), are found to be comparable to LSTM.
Proceedings ArticleDOI

Speech recognition with deep recurrent neural networks

TL;DR: This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Posted Content

Speech Recognition with Deep Recurrent Neural Networks

TL;DR: In this paper, deep recurrent neural networks (RNNs) are used to combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Related Papers (5)
Trending Questions (1)
What makes speech recognition more difficult than handwriting recognition when using deep learning methods?

Speech recognition is more difficult than handwriting recognition because it requires combining features from both recurrent and convolutional neural networks to achieve the best performance.