Proceedings ArticleDOI
Performance Comparison of Phoneme Modeling using PNCC and RASTA-PLP Features with various Optimization Algorithms for Neural Network Architectures
Reads0
Chats0
TLDR
This work focuses on phoneme modeling in English language using Artificial Neural Networks (ANN) using RASTA-PLP features and the results for different input sizes, optimizing training functions, number ofhidden layers and number of hidden nodes for the ANN are compared.Abstract:
Human speech signal is rich with information such as the identity of the speaker, the spoken message, emotional and physical state of the speaker, the spoken language, gender, age and other information. Automatic Speech Recognition (ASR) involves complex tasks aimed at recognition and translation of human speech to text by computers. Phoneme Recognition means recognizing the phonemes associated with speech utterance and is a part of ASR. Developing a phonetic engine and enhancing its performance can lead to significant improvement in ASR. In this paper we propose Artifical Neural Network (ANN) based Phoneme Modeling. We compare performance of speech features such as Inner Hair Cell Coefficients (IHCC) and Mel-Frequency Cepstral Coefficient (MFCC) using various neural network architectures, different optimization algorithms, change in input data vector dimensionality (corresponding to different contextual information) and also increasing the number of epochs. Experiments were carried out on TIMIT database. Our experimental results indicates MFCC performs much better than IHCC feature and the best optimization algorithm is found to be SGD for IHCC and Adagrad for MFCC features by trying out with different sizes of input vector, changing the number of training iterations, training algorithms for various neural network architectures.read more
Citations
More filters
References
More filters
Book
Neural Networks: A Comprehensive Foundation
TL;DR: Thorough, well-organized, and completely up to date, this book examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks.
Journal Article
Deep Neural Networks for Acoustic Modeling in Speech Recognition
Geoffrey E. Hinton,Li Deng,Dong Yu,George E. Dahl,Abdelrahman Mohamed,Navdeep Jaitly,Andrew W. Senior,Vincent Vanhoucke,Patrick Nguyen,Tara N. Sainath,Brian Kingsbury +10 more
TL;DR: This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Journal ArticleDOI
RASTA processing of speech
Hynek Hermansky,Nelson Morgan +1 more
TL;DR: The theoretical and experimental foundations of the RASTA method are reviewed, the relationship with human auditory perception is discussed, the original method is extended to combinations of additive noise and convolutional noise, and an application is shown to speech enhancement.
Book
Automatic Speech Recognition: A Deep Learning Approach
TL;DR: This book summarizes the recent advancement in the field of automatic speech recognition with a focus on discriminative and hierarchical models and presents insights and theoretical foundation of a series of recent models such as conditional random field, semi-Markov and hidden conditionalrandom field, deep neural network, deep belief network, and deep stacking models for sequential learning.
An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank
TL;DR: Previous work is extended by deriving an even more efficient implementation of the Gammatone filter bank, and by showing the MATLAB™ code to design and implement an ERB filter bank based on Gamm atone filters.