Recurrent convolutional neural network for speech processing

doi:10.1109/ICASSP.2017.7953168

Proceedings ArticleDOI

Recurrent convolutional neural network for speech processing

Yue Zhao, +2 more

- pp 5300-5304

Chats0

TLDR

A recently developed deep learning model, recurrent convolutional neural network (RCNN), is proposed to use for speech processing, which inherits some merits of recurrent neural networks (RNN) and convolutionals (CNN) and is competitive with previous methods in terms of accuracy and efficiency.

Abstract:

Different neural networks have exhibited excellent performance on various speech processing tasks, and they usually have specific advantages and disadvantages. We propose to use a recently developed deep learning model, recurrent convolutional neural network (RCNN), for speech processing, which inherits some merits of recurrent neural network (RNN) and convolutional neural network (CNN). The core module can be viewed as a convolutional layer embedded with an RNN, which enables the model to capture both temporal and frequency dependance in the spectrogram of the speech in an efficient way. The model is tested on speech corpus TIMIT for phoneme recognition and IEMOCAP for emotion recognition. Experimental results show that the model is competitive with previous methods in terms of accuracy and efficiency.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A Survey on Deep Learning: Algorithms, Techniques, and Applications

Samira Pouyanfar, +8 more

- 18 Sep 2018 -

ACM Computing Surveys

TL;DR: A comprehensive review of historical and recent state-of-the-art approaches in visual, audio, and text processing; social network analysis; and natural language processing is presented, followed by the in-depth analysis on pivoting and groundbreaking advances in deep learning applications.

...read moreread less

Journal ArticleDOI

Speech Emotion Recognition Using Deep Learning Techniques: A Review

Ruhul Amin Khalil, +5 more

- 19 Aug 2019 -

IEEE Access

TL;DR: An overview of Deep Learning techniques is presented and some recent literature where these methods are utilized for speech-based emotion recognition is discussed, including databases used, emotions extracted, contributions made toward speech emotion recognition and limitations related to it.

...read moreread less

Proceedings Article

Gated recurrent convolution neural network for OCR

Jianfeng Wang, +1 more

TL;DR: A new architecture named Gated RCNN (GRCNN) is proposed, inspired by a recently proposed model for general image classification, Recurrent Convolution Neural Network, which is combined with BLSTM to recognize text in natural images.

...read moreread less

Journal ArticleDOI

Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model Tuning

Jyotibdha Acharya, +1 more

- 18 Mar 2020 -

IEEE Transactions on Biomedical Circuits...

TL;DR: A deep CNN-RNN model that classifies respiratory sounds based on Mel-spectrograms is proposed that is able to achieve state of the art score on the ICBHI’17 dataset and deep learning models are shown to successfully learn domain specific knowledge when pre-trained with breathing data and produce significantly superior performance compared to generalized models.

...read moreread less

Journal ArticleDOI

Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model Tuning

Jyotibdha Acharya, +1 more

- 16 Apr 2020 -

arXiv: Audio and Speech Processing

TL;DR: In this article, a deep CNN-RNN model was proposed to classify respiratory sounds based on Mel-spectrograms and a local log quantization strategy for model weights to reduce the memory footprint for deployment in memory constrained systems such as wearable devices.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Posted Content

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

- 11 Feb 2015 -

arXiv: Learning

TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.

...read moreread less

Collapse

Journal of The Audio Engineering Society

Recurrent convolutional neural network for speech processing

Citations

A Survey on Deep Learning: Algorithms, Techniques, and Applications

Speech Emotion Recognition Using Deep Learning Techniques: A Review

Gated recurrent convolution neural network for OCR

Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model Tuning

Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model Tuning

References

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Related Papers (5)

Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition

Speech Emotion Recognition using Convolutional Neural Networks

Speech Recognition of Command Words Based on Convolutional Neural Network

Recognizing Arabic letter utterance using convolutional neural network

Continuous Speech Emotion Recognition with Convolutional Neural Networks