Performance Comparison of Phoneme Modeling using PNCC and RASTA-PLP Features with various Optimization Algorithms for Neural Network Architectures

doi:10.1109/ICACC48162.2019.8986167

Proceedings ArticleDOI

Performance Comparison of Phoneme Modeling using PNCC and RASTA-PLP Features with various Optimization Algorithms for Neural Network Architectures

Beneeta Rozario, +2 more

Chats0

TLDR

This work focuses on phoneme modeling in English language using Artificial Neural Networks (ANN) using RASTA-PLP features and the results for different input sizes, optimizing training functions, number ofhidden layers and number of hidden nodes for the ANN are compared.

Abstract:

Human speech signal is rich with information such as the identity of the speaker, the spoken message, emotional and physical state of the speaker, the spoken language, gender, age and other information. Automatic Speech Recognition (ASR) involves complex tasks aimed at recognition and translation of human speech to text by computers. Phoneme Recognition means recognizing the phonemes associated with speech utterance and is a part of ASR. Developing a phonetic engine and enhancing its performance can lead to significant improvement in ASR. In this paper we propose Artifical Neural Network (ANN) based Phoneme Modeling. We compare performance of speech features such as Inner Hair Cell Coefficients (IHCC) and Mel-Frequency Cepstral Coefficient (MFCC) using various neural network architectures, different optimization algorithms, change in input data vector dimensionality (corresponding to different contextual information) and also increasing the number of epochs. Experiments were carried out on TIMIT database. Our experimental results indicates MFCC performs much better than IHCC feature and the best optimization algorithm is found to be SGD for IHCC and Adagrad for MFCC features by trying out with different sizes of input vector, changing the number of training iterations, training algorithms for various neural network architectures.

Performance Comparison of Phoneme Modeling using PNCC and RASTA-PLP Features with various Optimization Algorithms for Neural Network Architectures

Citations

Speech and audio signal processing: processing and perception of speech and music [Book Review]

References

Neural Networks: A Comprehensive Foundation

Deep Neural Networks for Acoustic Modeling in Speech Recognition

RASTA processing of speech

Automatic Speech Recognition: A Deep Learning Approach

An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank

Related Papers (5)

Estimating Phoneme Class Conditional Probabilities from Raw Speech Signal using Convolutional Neural Networks

Modular neural predictive coding for discriminative feature extraction

Phoneme aware speech recognition through evolutionary optimisation

Stacked Autoencoder Networks Based Speaker Recognition

Phoneme-Unit-Specific Time-Delay Neural Network for Speaker Verification