scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

2-D psychoacoustic modeling for automatic speech recognition in noisy environment

TL;DR: In this paper, auditory properties of human system are studied and modeled with the help of psychoacoustic filter, which is labeled as 2D P-filter as its parameter has values zero or positive.
Abstract: Powerful automatic speech recognition system (ASR)is matter of commercial importance as many leading companies are sprinting at industry and consumer level production. One of the major reasons for speech quality to hamper is environmental noise. Speech gets obscured by the loud background sound. This adversely affects the performance of automatic speech recognition system. We also know that human auditory system is comparatively more capable of managing noise than the machine. So as to improve the performance of ASR, auditory properties of human system is studied and modeled with the help of psychoacoustic filter. The filter is labeled as 2D P-filter as its parameter has values zero or positive. Also to remove noise, masking effect is implemented where the sounds falling under predetermined masking threshold are modified. Therefore the enhanced set of features are extracted by applying this filter to the Mel filter bank. The novelty of the paper is use of different distance metrics for classification and testing the performance of Automatic speech recognition system. Experiments are carried out on database of recording of rhyming words by articulatory disabled children in a studio. Expected results obtained after testing phase for noisy speech signals would be considerably improved.
References
More filters
Journal ArticleDOI
TL;DR: It is shown that DNNs can be used to boost the classification accuracy of basic speech units, such as phonetic attributes (phonological features) and phonemes, and results in improved word recognition accuracy, which is better than previously reported word lattice rescoring results.

114 citations

Journal ArticleDOI
TL;DR: The proposed hardware-software coprocessing speech recognizer is suitable for integration in various types of voice (speech)-controlled applications and reduces the average real-time factor to 0.54 with the word accuracy rate of 93.16%.
Abstract: We present a hardware-software coprocessing speech recognizer for real-time embedded applications. The system consists of a standard microprocessor and a hardware accelerator for Gaussian mixture model (GMM) emission probability calculation implemented on a field-programmable gate array. The GMM accelerator is optimized for timing performance by exploiting data parallelism. In order to avoid large memory requirement, the accelerator adopts a double buffering scheme for accessing the acoustic parameters with no assumption made on the access pattern of these parameters. Experiments on widely used benchmark data show that the real-time factor of the proposed system is 0.62, which is about three times faster than the pure software-based baseline system, while the word accuracy rate is preserved at 93.33%. As a part of the recognizer, a new adaptive beam-pruning algorithm is also proposed and implemented, which further reduces the average real-time factor to 0.54 with the word accuracy rate of 93.16%. The proposed speech recognizer is suitable for integration in various types of voice (speech)-controlled applications.

58 citations


"2-D psychoacoustic modeling for aut..." refers methods in this paper

  • ...Application of this P-filter to the Mel filter bank directly can give us enhanced MFCC features which are more reliable than the MFCC features alone....

    [...]

01 Jan 2013
TL;DR: The ability of HPS (Harmonic Product Spectrum) algorithm and MFCC for gender and speaker recognition is explored and the quality and testing of speaker recognition and gender recognition system is completed and analysed.
Abstract: Speaker Recognition software using MFCC (Mel Frequency Cepstral Co-efficient) and vector quantization has been designed, developed and tested satisfactorily for male and female voice. In this paper the ability of HPS (Harmonic Product Spectrum) algorithm and MFCC for gender and speaker recognition is explored. HPS algorithm can be used to find the pitch of the speaker which can be used to determine gender of the speaker. In this algorithm the speech signals for male and female ware recorded in .wav(dot wav) file at 8 KHz sampling rate and then modified. This modified wav file for speech signal was processed using MATLAB software for computing and plotting the autocorrelation of speech signal. The software reliably computes the pitch of male and female voice. The MFCC algorithm and vector quantization algorithm is used for speech recognition process. By using Autocorrelation technique and FFT pitch of the signal is calculated which is used to identify the true gender. In this paper the quality and testing of speaker recognition and gender recognition system is completed and analysed.

39 citations


Additional excerpts

  • ...Prasad [5] explored the ability of HPS (Harmonic Product Spectrum) algorithm and MFCC for gender and speaker recognition....

    [...]

Journal ArticleDOI
TL;DR: A database and testing procedures were developed to evaluate two facets of recognizer performance integral to speech training: utterance identification and speech quality assessment, and the recognizer, based on hidden Markov models (HMM's), provided better identification scores for normal and disordered speech than the two template-based recognizers.
Abstract: The use of speech recognition technology for speech training represents an important and potentially very large application of speech technology. However, speech training places unique demands on recognizer performance that have not been well-characterized. In this research, a database and testing procedures were developed to evaluate two facets of recognizer performance integral to speech training: utterance identification and speech quality assessment. Using these materials, three commercial speech recognizers that employ different types of recognition algorithms were evaluated. In general, the recognizer, based on hidden Markov models (HMM's), provided better identification scores for normal and disordered speech than the two template-based recognizers. A recognizer's identification performance on normal speech often predicted its identification performance on disordered speech. For each recognizer, analysis using phonological features revealed classes of speech sounds that are poorly discriminated. Procedures were developed to provide human ratings of the quality of disordered speech for comparison to recognizer performance. Recognizers were compared to speech-language pathologists with respect to the ability to judge speech quality. In contrast, with identification performance, the two speech recognizers based on template comparisons provided better measures of speech quality than the HMM-based recognizer. >

19 citations


"2-D psychoacoustic modeling for aut..." refers methods in this paper

  • ...Sven Anderson and Diane Kewley-Port [3] studied and implemented HMM methodology....

    [...]

Proceedings ArticleDOI
08 Dec 2009
TL;DR: A 2D psychoacoustic modeling algorithm which is integrated with a feature extraction front-end for hidden Markov model (HMM) and incorporates the properties of human auditory system and applies it to the speech recognition system to enhance its robustness.
Abstract: One of the weaknesses of speech recognition system is its lack of robustness to background noise as compared to human listeners under similarly conditions. This paper proposes a 2D psychoacoustic modeling algorithm which is integrated with a feature extraction front-end for hidden Markov model (HMM). The proposed algorithm incorporates the properties of human auditory system and applies it to the speech recognition system to enhance its robustness. It integrates forward masking, lateral inhibition and Cepstral Mean Normalization into ordinary melfrequency cepstral coefficients (MFCC) feature extraction algorithm. Experiments carried out on AURORA2 database show that the word recognition rate can be improved significantly at low computational cost.

11 citations