scispace - formally typeset
Search or ask a question
Author

P. Mahalakshmi

Bio: P. Mahalakshmi is an academic researcher from VIT University. The author has contributed to research in topics: PSQM & Intelligibility (communication). The author has an hindex of 1, co-authored 2 publications receiving 11 citations.

Papers
More filters
Proceedings ArticleDOI
06 Apr 2016
TL;DR: In this article, a speech recognition system is proposed to distinguish between a normal and a person having speech disability using Mel Frequency Cepstral Coefficient (MFCC) to measure the degree of speech disability.
Abstract: Speech disability concerns communication issues encompassing hearing, speech, language and fluency. A speech recognition system is proposed to distinguish between a normal and a person having speech disability using Mel Frequency Cepstral Coefficient. The speech disability focused in this work is stuttering which is carried out on normal female speakers and female stutter speakers. This paper presents the capability of MFCC to extract features which could be used to measure the degree of speech disability. This method is a prerequisite for designing and producing standard telecommunications equipment and services that can be used to alleviate the negative consequences of a disability.

15 citations

Proceedings ArticleDOI
23 Mar 2016
TL;DR: This paper uses Cepstrum analysis to distinguish between different vowels of American Phonemes using the database of all vowels spoken by a subject and uses spectrogram to show the clear difference among vowels and how to estimate them for speech analysis and applications.
Abstract: Phoneme is a single unit of sound essential for any language to function. American official language English was originated from British colonization, but its pronunciation is different. This paper focuses on MFCC (Mel Frequency Cepstral Coefficients) and different methods such as spectrogram analysis and speech waveform for phoneme analysis, especially vowels because they form the basis of every language. We use Cepstrum analysis to distinguish between different vowels of American Phonemes using the database of all vowels spoken by a subject. Database of vowels is recorded without noise to improve clarity and accuracy in determining Mel Frequency Cepstral Coefficients. We also use spectrogram to show the clear difference among vowels and how to estimate them for speech analysis and applications. These features can be used for enhancing speech recognition techniques such as security systems, call detection and automated identification.

Cited by
More filters
Proceedings ArticleDOI
11 Jul 2018
TL;DR: This paper presents the feature data reduction of MFCC using Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) to improve the accuracy and increase the computational speed of the classification process by decreasing the dimensions of feature data.
Abstract: The development of the pattern recognition system has increased rapidly in this century. Many developments of methods have been done. Mel Frequency Cepstral Coefficients (MFCC) is a popular feature extraction method but still has many disadvantages, especially regarding the level of accuracy and the high dimensional feature of the extraction method. This paper presents the feature data reduction of MFCC using Principal Component Analysis (PCA) and Singular Value Decomposition (SVD). Combining MFCC and data reduction methods, it is expected to improve the accuracy and increase the computational speed of the classification process by decreasing the dimensions of feature data. The result of extraction MFCC feature data plus the delta coefficient forms the matrix data which will be combined with the data reduction method. The data reduction process is designed into two versions. Then the results of data reduction are done classification process with Support Vector Machine (SVM) method. The dataset is composed of 140 recorded speech data from 28 speakers. The results showed that MFCC + PCA version 2 and MFCC + SVD version 1 were able to provide the maximum accuracy improvement with an increase of accuracy from conventional MFCC method from 83.57% to 90.71%. In addition, MFCC + PCA version 2 and MFCC + SVD version 1 method can accelerate the process of classification in speech recognition system from 7.819 seconds into for about 7.6 seconds by decreasing dimension of feature data from 26 into 10 for MFCC + PCA version 2 and decreasing dimension of feature data from 26 into 14 for MFCC + SVD version 1.

19 citations

Proceedings ArticleDOI
03 Apr 2018
TL;DR: Automatic Speech and Emotion Recognition is a widely researched topic that is a subset of Human Computer Interface (HCI) and has a range of applications.
Abstract: With the advent of digitization of every possible avenue, Automatic Speech and Emotion Recognition is a widely researched topic that is a subset of Human Computer Interface (HCI) and has a range of applications. With machines taking over many menial jobs it has become important for the computer to understand us as we understand it. Features such as MFCC, pitch and amplitude are extracted from a given sample and run across the existing and growing database of training samples. MFCC is being used to detected speaker and utterance, while SVM is used to distinguish the emotion of the sample given. An SVM classifier differentiates between anger, happiness, fear, sadness and updates the database as it goes.

18 citations

Journal ArticleDOI
01 Jan 2022-Sensors
TL;DR: The presented paper introduces principal component analysis application for dimensionality reduction of variables describing speech signal and applicability of obtained results for the disturbed and fluent speech recognition process.
Abstract: The presented paper introduces principal component analysis application for dimensionality reduction of variables describing speech signal and applicability of obtained results for the disturbed and fluent speech recognition process. A set of fluent speech signals and three speech disturbances—blocks before words starting with plosives, syllable repetitions, and sound-initial prolongations—was transformed using principal component analysis. The result was a model containing four principal components describing analysed utterances. Distances between standardised original variables and elements of the observation matrix in a new system of coordinates were calculated and then applied in the recognition process. As a classifying algorithm, the multilayer perceptron network was used. Achieved results were compared with outcomes from previous experiments where speech samples were parameterised with the Kohonen network application. The classifying network achieved overall accuracy at 76% (from 50% to 91%, depending on the dysfluency type).

13 citations

Proceedings ArticleDOI
01 Sep 2018
TL;DR: To remove prolongation(s) from the sample, amplitude thresholding through neural networks is developed and the output signal, void of all stutters, produces better speech recognition.
Abstract: The aim of this paper is to develop an algorithm to enhance speech recognition of a stuttered speech. Stuttering is a disorder that affects the fluency of speech by involuntary repetition, prolongation of words/syllables, or involuntary silent intervals. Current speech recognition systems fail to recognize stuttered speech. Methods to detect stutter have been reported in literature but efficient techniques for stutter correction have not been reported. This paper addresses this issue and proposes methods to detect and correct stutter within acceptable time limits. To remove prolongation(s) from the sample, amplitude thresholding through neural networks is developed. Repetitions are removed through string repetition removal algorithm using an existing Text-to-Speech (TTS) system. Thus, the output signal, void of all stutters, produces better speech recognition.

12 citations

Proceedings ArticleDOI
20 Apr 2018
TL;DR: This paper presented a survey on techniques of feature extraction and classification which are applied for automatic speech recognition, and also presented a comparative analysis of stuttering techniques on the basis of accuracy, sensitivity, specificity, and dataset size.
Abstract: Disability in speech concerns many other communication problems such as hearing, and fluency. Stuttering is a neurodevelopmental disorder identified by the existence of dysfluencies during speech production. The disruptions of speech flow in stuttering have led a large body of researchers to examine the potential processes underlying speech-language production in people who stutter. This paper presents a survey on techniques of feature extraction and classification which are applied for automatic speech recognition, and also presents a comparative analysis of stuttering techniques on the basis of accuracy, sensitivity, specificity, and dataset size.

11 citations