scispace - formally typeset
Proceedings ArticleDOI

Performance Comparison of Phoneme Modeling using PNCC and RASTA-PLP Features with various Optimization Algorithms for Neural Network Architectures

Reads0
Chats0
TLDR
This work focuses on phoneme modeling in English language using Artificial Neural Networks (ANN) using RASTA-PLP features and the results for different input sizes, optimizing training functions, number ofhidden layers and number of hidden nodes for the ANN are compared.
Abstract
Human speech signal is rich with information such as the identity of the speaker, the spoken message, emotional and physical state of the speaker, the spoken language, gender, age and other information. Automatic Speech Recognition (ASR) involves complex tasks aimed at recognition and translation of human speech to text by computers. Phoneme Recognition means recognizing the phonemes associated with speech utterance and is a part of ASR. Developing a phonetic engine and enhancing its performance can lead to significant improvement in ASR. In this paper we propose Artifical Neural Network (ANN) based Phoneme Modeling. We compare performance of speech features such as Inner Hair Cell Coefficients (IHCC) and Mel-Frequency Cepstral Coefficient (MFCC) using various neural network architectures, different optimization algorithms, change in input data vector dimensionality (corresponding to different contextual information) and also increasing the number of epochs. Experiments were carried out on TIMIT database. Our experimental results indicates MFCC performs much better than IHCC feature and the best optimization algorithm is found to be SGD for IHCC and Adagrad for MFCC features by trying out with different sizes of input vector, changing the number of training iterations, training algorithms for various neural network architectures.

read more

Citations
More filters
References
More filters
Book

Neural Networks: A Comprehensive Foundation

Simon Haykin
TL;DR: Thorough, well-organized, and completely up to date, this book examines all the important aspects of this emerging technology, including the learning process, back-propagation learning, radial-basis function networks, self-organizing systems, modular networks, temporal processing and neurodynamics, and VLSI implementation of neural networks.
Journal Article

Deep Neural Networks for Acoustic Modeling in Speech Recognition

TL;DR: This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Journal ArticleDOI

RASTA processing of speech

TL;DR: The theoretical and experimental foundations of the RASTA method are reviewed, the relationship with human auditory perception is discussed, the original method is extended to combinations of additive noise and convolutional noise, and an application is shown to speech enhancement.
Book

Automatic Speech Recognition: A Deep Learning Approach

Dong Yu, +1 more
TL;DR: This book summarizes the recent advancement in the field of automatic speech recognition with a focus on discriminative and hierarchical models and presents insights and theoretical foundation of a series of recent models such as conditional random field, semi-Markov and hidden conditionalrandom field, deep neural network, deep belief network, and deep stacking models for sequential learning.

An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank

TL;DR: Previous work is extended by deriving an even more efficient implementation of the Gammatone filter bank, and by showing the MATLAB™ code to design and implement an ERB filter bank based on Gamm atone filters.
Related Papers (5)