2-D psychoacoustic modeling for automatic speech recognition in noisy environment

doi:10.1109/CASP.2016.7746151

Home
/
Papers
/
2-D psychoacoustic modeling for automatic speech recognition in noisy environment

Proceedings Article•DOI•

2-D psychoacoustic modeling for automatic speech recognition in noisy environment

Sampreeta Desai¹, Prasad D. Khandekar¹, Ketan J. Raut¹•Institutions (1)

Vishwakarma Institute of Information Technology¹

09 Jun 2016-pp 129-132

TL;DR: In this paper, auditory properties of human system are studied and modeled with the help of psychoacoustic filter, which is labeled as 2D P-filter as its parameter has values zero or positive.

read less

Abstract: Powerful automatic speech recognition system (ASR)is matter of commercial importance as many leading companies are sprinting at industry and consumer level production. One of the major reasons for speech quality to hamper is environmental noise. Speech gets obscured by the loud background sound. This adversely affects the performance of automatic speech recognition system. We also know that human auditory system is comparatively more capable of managing noise than the machine. So as to improve the performance of ASR, auditory properties of human system is studied and modeled with the help of psychoacoustic filter. The filter is labeled as 2D P-filter as its parameter has values zero or positive. Also to remove noise, masking effect is implemented where the sounds falling under predetermined masking threshold are modified. Therefore the enhanced set of features are extracted by applying this filter to the Mel filter bank. The novelty of the paper is use of different distance metrics for classification and testing the performance of Automatic speech recognition system. Experiments are carried out on database of recording of rhyming words by articulatory disabled children in a studio. Expected results obtained after testing phase for noisy speech signals would be considerably improved.

...read moreread less

References

PDF

Open Access

More filters

Journal Article•DOI•

Exploiting deep neural networks for detection-based speech recognition

[...]

Sabato Marco Siniscalchi¹, Dong Yu², Li Deng², Chin-Hui Lee¹•Institutions (2)

Georgia Institute of Technology¹, Microsoft²

01 Apr 2013-Neurocomputing

TL;DR: It is shown that DNNs can be used to boost the classification accuracy of basic speech units, such as phonetic attributes (phonological features) and phonemes, and results in improved word recognition accuracy, which is better than previously reported word lattice rescoring results.

...read moreread less

114 citations

Journal Article•DOI•

Hardware–Software Codesign of Automatic Speech Recognition System for Embedded Real-Time Applications

[...]

Octavian Cheng¹, Waleed H. Abdulla¹, Zoran Salcic¹•Institutions (1)

University of Auckland¹

01 Mar 2011-IEEE Transactions on Industrial Electronics

TL;DR: The proposed hardware-software coprocessing speech recognizer is suitable for integration in various types of voice (speech)-controlled applications and reduces the average real-time factor to 0.54 with the word accuracy rate of 93.16%.

...read moreread less

Abstract: We present a hardware-software coprocessing speech recognizer for real-time embedded applications. The system consists of a standard microprocessor and a hardware accelerator for Gaussian mixture model (GMM) emission probability calculation implemented on a field-programmable gate array. The GMM accelerator is optimized for timing performance by exploiting data parallelism. In order to avoid large memory requirement, the accelerator adopts a double buffering scheme for accessing the acoustic parameters with no assumption made on the access pattern of these parameters. Experiments on widely used benchmark data show that the real-time factor of the proposed system is 0.62, which is about three times faster than the pure software-based baseline system, while the word accuracy rate is preserved at 93.33%. As a part of the recognizer, a new adaptive beam-pruning algorithm is also proposed and implemented, which further reduces the average real-time factor to 0.54 with the word accuracy rate of 93.16%. The proposed speech recognizer is suitable for integration in various types of voice (speech)-controlled applications.

...read moreread less

58 citations

"2-D psychoacoustic modeling for aut..." refers methods in this paper

...Application of this P-filter to the Mel filter bank directly can give us enhanced MFCC features which are more reliable than the MFCC features alone....
[...]

Speech Recognition and Verification Using MFCC & VQ

[...]

Kashyap Patel

01 Jan 2013

TL;DR: The ability of HPS (Harmonic Product Spectrum) algorithm and MFCC for gender and speaker recognition is explored and the quality and testing of speaker recognition and gender recognition system is completed and analysed.

...read moreread less

Abstract: Speaker Recognition software using MFCC (Mel Frequency Cepstral Co-efficient) and vector quantization has been designed, developed and tested satisfactorily for male and female voice. In this paper the ability of HPS (Harmonic Product Spectrum) algorithm and MFCC for gender and speaker recognition is explored. HPS algorithm can be used to find the pitch of the speaker which can be used to determine gender of the speaker. In this algorithm the speech signals for male and female ware recorded in .wav(dot wav) file at 8 KHz sampling rate and then modified. This modified wav file for speech signal was processed using MATLAB software for computing and plotting the autocorrelation of speech signal. The software reliably computes the pitch of male and female voice. The MFCC algorithm and vector quantization algorithm is used for speech recognition process. By using Autocorrelation technique and FFT pitch of the signal is calculated which is used to identify the true gender. In this paper the quality and testing of speaker recognition and gender recognition system is completed and analysed.

...read moreread less

39 citations

Additional excerpts

...Prasad [5] explored the ability of HPS (Harmonic Product Spectrum) algorithm and MFCC for gender and speaker recognition....
[...]

Journal Article•DOI•

Evaluation of speech recognizers for speech training applications

[...]

Sven Anderson¹, Diane Kewley-Port²•Institutions (2)

University of Chicago¹, Indiana University²

01 Jul 1995-IEEE Transactions on Speech and Audio Processing

TL;DR: A database and testing procedures were developed to evaluate two facets of recognizer performance integral to speech training: utterance identification and speech quality assessment, and the recognizer, based on hidden Markov models (HMM's), provided better identification scores for normal and disordered speech than the two template-based recognizers.

...read moreread less

Abstract: The use of speech recognition technology for speech training represents an important and potentially very large application of speech technology. However, speech training places unique demands on recognizer performance that have not been well-characterized. In this research, a database and testing procedures were developed to evaluate two facets of recognizer performance integral to speech training: utterance identification and speech quality assessment. Using these materials, three commercial speech recognizers that employ different types of recognition algorithms were evaluated. In general, the recognizer, based on hidden Markov models (HMM's), provided better identification scores for normal and disordered speech than the two template-based recognizers. A recognizer's identification performance on normal speech often predicted its identification performance on disordered speech. For each recognizer, analysis using phonological features revealed classes of speech sounds that are poorly discriminated. Procedures were developed to provide human ratings of the quality of disordered speech for comparison to recognizer performance. Recognizers were compared to speech-language pathologists with respect to the ability to judge speech quality. In contrast, with identification performance, the two speech recognizers based on template comparisons provided better measures of speech quality than the HMM-based recognizer. >

...read moreread less

19 citations

"2-D psychoacoustic modeling for aut..." refers methods in this paper

...Sven Anderson and Diane Kewley-Port [3] studied and implemented HMM methodology....
[...]

Proceedings Article•DOI•

2D psychoacoustic filtering for robust speech recognition

[...]

Peng Dai¹, Ing Yann Soon¹, Chai Kiat Yeo¹•Institutions (1)

Nanyang Technological University¹

08 Dec 2009

TL;DR: A 2D psychoacoustic modeling algorithm which is integrated with a feature extraction front-end for hidden Markov model (HMM) and incorporates the properties of human auditory system and applies it to the speech recognition system to enhance its robustness.

...read moreread less

Abstract: One of the weaknesses of speech recognition system is its lack of robustness to background noise as compared to human listeners under similarly conditions. This paper proposes a 2D psychoacoustic modeling algorithm which is integrated with a feature extraction front-end for hidden Markov model (HMM). The proposed algorithm incorporates the properties of human auditory system and applies it to the speech recognition system to enhance its robustness. It integrates forward masking, lateral inhibition and Cepstral Mean Normalization into ordinary melfrequency cepstral coefficients (MFCC) feature extraction algorithm. Experiments carried out on AURORA2 database show that the word recognition rate can be improved significantly at low computational cost.

...read moreread less

11 citations