scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Speech disorder recognition using MFCC

06 Apr 2016-Vol. 2016, pp 0246-0250
TL;DR: In this article, a speech recognition system is proposed to distinguish between a normal and a person having speech disability using Mel Frequency Cepstral Coefficient (MFCC) to measure the degree of speech disability.
Abstract: Speech disability concerns communication issues encompassing hearing, speech, language and fluency. A speech recognition system is proposed to distinguish between a normal and a person having speech disability using Mel Frequency Cepstral Coefficient. The speech disability focused in this work is stuttering which is carried out on normal female speakers and female stutter speakers. This paper presents the capability of MFCC to extract features which could be used to measure the degree of speech disability. This method is a prerequisite for designing and producing standard telecommunications equipment and services that can be used to alleviate the negative consequences of a disability.
Citations
More filters
Proceedings ArticleDOI
11 Jul 2018
TL;DR: This paper presents the feature data reduction of MFCC using Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) to improve the accuracy and increase the computational speed of the classification process by decreasing the dimensions of feature data.
Abstract: The development of the pattern recognition system has increased rapidly in this century. Many developments of methods have been done. Mel Frequency Cepstral Coefficients (MFCC) is a popular feature extraction method but still has many disadvantages, especially regarding the level of accuracy and the high dimensional feature of the extraction method. This paper presents the feature data reduction of MFCC using Principal Component Analysis (PCA) and Singular Value Decomposition (SVD). Combining MFCC and data reduction methods, it is expected to improve the accuracy and increase the computational speed of the classification process by decreasing the dimensions of feature data. The result of extraction MFCC feature data plus the delta coefficient forms the matrix data which will be combined with the data reduction method. The data reduction process is designed into two versions. Then the results of data reduction are done classification process with Support Vector Machine (SVM) method. The dataset is composed of 140 recorded speech data from 28 speakers. The results showed that MFCC + PCA version 2 and MFCC + SVD version 1 were able to provide the maximum accuracy improvement with an increase of accuracy from conventional MFCC method from 83.57% to 90.71%. In addition, MFCC + PCA version 2 and MFCC + SVD version 1 method can accelerate the process of classification in speech recognition system from 7.819 seconds into for about 7.6 seconds by decreasing dimension of feature data from 26 into 10 for MFCC + PCA version 2 and decreasing dimension of feature data from 26 into 14 for MFCC + SVD version 1.

19 citations

Proceedings ArticleDOI
03 Apr 2018
TL;DR: Automatic Speech and Emotion Recognition is a widely researched topic that is a subset of Human Computer Interface (HCI) and has a range of applications.
Abstract: With the advent of digitization of every possible avenue, Automatic Speech and Emotion Recognition is a widely researched topic that is a subset of Human Computer Interface (HCI) and has a range of applications. With machines taking over many menial jobs it has become important for the computer to understand us as we understand it. Features such as MFCC, pitch and amplitude are extracted from a given sample and run across the existing and growing database of training samples. MFCC is being used to detected speaker and utterance, while SVM is used to distinguish the emotion of the sample given. An SVM classifier differentiates between anger, happiness, fear, sadness and updates the database as it goes.

18 citations

Journal ArticleDOI
01 Jan 2022-Sensors
TL;DR: The presented paper introduces principal component analysis application for dimensionality reduction of variables describing speech signal and applicability of obtained results for the disturbed and fluent speech recognition process.
Abstract: The presented paper introduces principal component analysis application for dimensionality reduction of variables describing speech signal and applicability of obtained results for the disturbed and fluent speech recognition process. A set of fluent speech signals and three speech disturbances—blocks before words starting with plosives, syllable repetitions, and sound-initial prolongations—was transformed using principal component analysis. The result was a model containing four principal components describing analysed utterances. Distances between standardised original variables and elements of the observation matrix in a new system of coordinates were calculated and then applied in the recognition process. As a classifying algorithm, the multilayer perceptron network was used. Achieved results were compared with outcomes from previous experiments where speech samples were parameterised with the Kohonen network application. The classifying network achieved overall accuracy at 76% (from 50% to 91%, depending on the dysfluency type).

13 citations

Proceedings ArticleDOI
01 Sep 2018
TL;DR: To remove prolongation(s) from the sample, amplitude thresholding through neural networks is developed and the output signal, void of all stutters, produces better speech recognition.
Abstract: The aim of this paper is to develop an algorithm to enhance speech recognition of a stuttered speech. Stuttering is a disorder that affects the fluency of speech by involuntary repetition, prolongation of words/syllables, or involuntary silent intervals. Current speech recognition systems fail to recognize stuttered speech. Methods to detect stutter have been reported in literature but efficient techniques for stutter correction have not been reported. This paper addresses this issue and proposes methods to detect and correct stutter within acceptable time limits. To remove prolongation(s) from the sample, amplitude thresholding through neural networks is developed. Repetitions are removed through string repetition removal algorithm using an existing Text-to-Speech (TTS) system. Thus, the output signal, void of all stutters, produces better speech recognition.

12 citations

Proceedings ArticleDOI
20 Apr 2018
TL;DR: This paper presented a survey on techniques of feature extraction and classification which are applied for automatic speech recognition, and also presented a comparative analysis of stuttering techniques on the basis of accuracy, sensitivity, specificity, and dataset size.
Abstract: Disability in speech concerns many other communication problems such as hearing, and fluency. Stuttering is a neurodevelopmental disorder identified by the existence of dysfluencies during speech production. The disruptions of speech flow in stuttering have led a large body of researchers to examine the potential processes underlying speech-language production in people who stutter. This paper presents a survey on techniques of feature extraction and classification which are applied for automatic speech recognition, and also presents a comparative analysis of stuttering techniques on the basis of accuracy, sensitivity, specificity, and dataset size.

11 citations

References
More filters
Book
01 Jan 2001
TL;DR: Spoken Language Processing draws on the latest advances and techniques from multiple fields: computer science, electrical engineering, acoustics, linguistics, mathematics, psychology, and beyond to create the state of the art in spoken language technology.
Abstract: From the Publisher: New advances in spoken language processing: theory and practice In-depth coverage of speech processing, speech recognition, speech synthesis, spoken language understanding, and speech interface design Many case studies from state-of-the-art systems, including examples from Microsoft's advanced research labs Spoken Language Processing draws on the latest advances and techniques from multiple fields: computer science, electrical engineering, acoustics, linguistics, mathematics, psychology, and beyond. Starting with the fundamentals, it presents all this and more: Essential background on speech production and perception, probability and information theory, and pattern recognition Extracting information from the speech signal: useful representations and practical compression solutions Modern speech recognition techniques: hidden Markov models, acoustic and language modeling, improving resistance to environmental noises, search algorithms, and large vocabulary speech recognition Text-to-speech: analyzing documents, pitch and duration controls; trainable synthesis, and more Spoken language understanding: dialog management, spoken language applications, and multimodal interfaces To illustrate the book's methods, the authors present detailed case studies based on state-of-the-art systems, including Microsoft's Whisper speech recognizer, Whistler text-to-speech system, Dr. Who dialog system, and the MiPad handheld device. Whether you're planning, designing, building, or purchasing spoken language technology, this is the state of the art—fromalgorithms through business productivity.

1,795 citations

01 Jan 1968
TL;DR: Abstract : Contents: Preprocessing of data; Digital filtering; Fourier series and Fourier transform computations; Correlation function computations'; Spectral density function computation; Frequency response function and coherence function Computations; Probability density function computation; Nonstationary processes; and Test case and examples.
Abstract: : Contents: Preprocessing of data; Digital filtering; Fourier series and Fourier transform computations; Correlation function computations; Spectral density function computations; Frequency response function and coherence function computations; Probability density function computations; Nonstationary processes; and Test case and examples.

65 citations

DOI
01 Mar 2008
TL;DR: This paper explores the viability of Mel-Frequency Cepstral Coefficient (MFCC) technique to extract features from Quranic verse recitation, one of the most popular feature extraction techniques used in speech recognition.
Abstract: Each person’s voice is different. Thus, the Quran sound, which had been recited by most of recitors will probably tend to differ a lot from one person to another. Although those Quranic sentence were particularly taken from the same verse, but the way of the sentence in Al-Quran been recited or delivered may be different. It may produce the difference sounds for the different recitors. Those same combinations of letters may be pronounced differently due to the use of harakates. This paper explores the viability of Mel-Frequency Cepstral Coefficient (MFCC) technique to extract features from Quranic verse recitation. Features extraction is crucial to prepare data for classification process. MFCC is one of the most popular feature extraction techniques used in speech recognition, whereby it is based on the frequency domain of Mel scale for human ear scale. MFCCs consist of preprocessing, framing, windowing, DFT, Mel Filterbank, Logarithm and Inverse DFT.

40 citations