Speech disorder recognition using MFCC

doi:10.1109/ICCSP.2016.7754132

Home
/
Papers
/
Speech disorder recognition using MFCC

Proceedings Article•DOI•

Speech disorder recognition using MFCC

Gunjan Jhawar¹, Prajacta Nagraj¹, P. Mahalakshmi¹•Institutions (1)

VIT University¹

06 Apr 2016-Vol. 2016, pp 0246-0250

TL;DR: In this article, a speech recognition system is proposed to distinguish between a normal and a person having speech disability using Mel Frequency Cepstral Coefficient (MFCC) to measure the degree of speech disability.

read less

Abstract: Speech disability concerns communication issues encompassing hearing, speech, language and fluency. A speech recognition system is proposed to distinguish between a normal and a person having speech disability using Mel Frequency Cepstral Coefficient. The speech disability focused in this work is stuttering which is carried out on normal female speakers and female stutter speakers. This paper presents the capability of MFCC to extract features which could be used to measure the degree of speech disability. This method is a prerequisite for designing and producing standard telecommunications equipment and services that can be used to alleviate the negative consequences of a disability.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Feature Data Reduction of MFCC Using PCA and SVD in Speech Recognition System

[...]

Anggun Winursito¹, Risanuri Hidayat¹, Agus Bejo¹, Muhammad Nur Yasir Utomo¹•Institutions (1)

Gadjah Mada University¹

11 Jul 2018

TL;DR: This paper presents the feature data reduction of MFCC using Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) to improve the accuracy and increase the computational speed of the classification process by decreasing the dimensions of feature data.

...read moreread less

Abstract: The development of the pattern recognition system has increased rapidly in this century. Many developments of methods have been done. Mel Frequency Cepstral Coefficients (MFCC) is a popular feature extraction method but still has many disadvantages, especially regarding the level of accuracy and the high dimensional feature of the extraction method. This paper presents the feature data reduction of MFCC using Principal Component Analysis (PCA) and Singular Value Decomposition (SVD). Combining MFCC and data reduction methods, it is expected to improve the accuracy and increase the computational speed of the classification process by decreasing the dimensions of feature data. The result of extraction MFCC feature data plus the delta coefficient forms the matrix data which will be combined with the data reduction method. The data reduction process is designed into two versions. Then the results of data reduction are done classification process with Support Vector Machine (SVM) method. The dataset is composed of 140 recorded speech data from 28 speakers. The results showed that MFCC + PCA version 2 and MFCC + SVD version 1 were able to provide the maximum accuracy improvement with an increase of accuracy from conventional MFCC method from 83.57% to 90.71%. In addition, MFCC + PCA version 2 and MFCC + SVD version 1 method can accelerate the process of classification in speech recognition system from 7.819 seconds into for about 7.6 seconds by decreasing dimension of feature data from 26 into 10 for MFCC + PCA version 2 and decreasing dimension of feature data from 26 into 14 for MFCC + SVD version 1.

...read moreread less

19 citations

Proceedings Article•DOI•

A Study of Speech, Speaker and Emotion Recognition Using Mel Frequency Cepstrum Coefficients and Support Vector Machines

[...]

Ashwini Rajasekhar¹, Malaya Kumar Hota¹•Institutions (1)

VIT University¹

03 Apr 2018

TL;DR: Automatic Speech and Emotion Recognition is a widely researched topic that is a subset of Human Computer Interface (HCI) and has a range of applications.

...read moreread less

Abstract: With the advent of digitization of every possible avenue, Automatic Speech and Emotion Recognition is a widely researched topic that is a subset of Human Computer Interface (HCI) and has a range of applications. With machines taking over many menial jobs it has become important for the computer to understand us as we understand it. Features such as MFCC, pitch and amplitude are extracted from a given sample and run across the existing and growing database of training samples. MFCC is being used to detected speaker and utterance, while SVM is used to distinguish the emotion of the sample given. An SVM classifier differentiates between anger, happiness, fear, sadness and updates the database as it goes.

...read moreread less

18 citations

Journal Article•DOI•

Artificial Neural Networks Combined with the Principal Component Analysis for Non-Fluent Speech Recognition

[...]

Izabela Świetlicka, Wiesława Kuniszyk-Jóźkowiak, Michał Świetlicki

01 Jan 2022-Sensors

TL;DR: The presented paper introduces principal component analysis application for dimensionality reduction of variables describing speech signal and applicability of obtained results for the disturbed and fluent speech recognition process.

...read moreread less

Abstract: The presented paper introduces principal component analysis application for dimensionality reduction of variables describing speech signal and applicability of obtained results for the disturbed and fluent speech recognition process. A set of fluent speech signals and three speech disturbances—blocks before words starting with plosives, syllable repetitions, and sound-initial prolongations—was transformed using principal component analysis. The result was a model containing four principal components describing analysed utterances. Distances between standardised original variables and elements of the observation matrix in a new system of coordinates were calculated and then applied in the recognition process. As a classifying algorithm, the multilayer perceptron network was used. Achieved results were compared with outcomes from previous experiments where speech samples were parameterised with the Kohonen network application. The classifying network achieved overall accuracy at 76% (from 50% to 91%, depending on the dysfluency type).

...read moreread less

13 citations

Proceedings Article•DOI•

Speech Recognition and Correction of a Stuttered Speech

[...]

Ankit Dash¹, Nikhil Subramani¹, Tejas Manjunath¹, Vishruti Yaragarala¹, Shikha Tripathi¹ - Show less +1 more•Institutions (1)

PES University¹

01 Sep 2018

TL;DR: To remove prolongation(s) from the sample, amplitude thresholding through neural networks is developed and the output signal, void of all stutters, produces better speech recognition.

...read moreread less

Abstract: The aim of this paper is to develop an algorithm to enhance speech recognition of a stuttered speech. Stuttering is a disorder that affects the fluency of speech by involuntary repetition, prolongation of words/syllables, or involuntary silent intervals. Current speech recognition systems fail to recognize stuttered speech. Methods to detect stutter have been reported in literature but efficient techniques for stutter correction have not been reported. This paper addresses this issue and proposes methods to detect and correct stutter within acceptable time limits. To remove prolongation(s) from the sample, amplitude thresholding through neural networks is developed. Repetitions are removed through string repetition removal algorithm using an existing Text-to-Speech (TTS) system. Thus, the output signal, void of all stutters, produces better speech recognition.

...read moreread less

12 citations

Proceedings Article•DOI•

A Comparative Study of the Techniques for Feature Extraction and Classification in Stuttering

[...]

Shweta Khara¹, Shailendra Singh¹, Dharam Vir²•Institutions (2)

PEC University of Technology¹, Post Graduate Institute of Medical Education and Research²

20 Apr 2018

TL;DR: This paper presented a survey on techniques of feature extraction and classification which are applied for automatic speech recognition, and also presented a comparative analysis of stuttering techniques on the basis of accuracy, sensitivity, specificity, and dataset size.

...read moreread less

Abstract: Disability in speech concerns many other communication problems such as hearing, and fluency. Stuttering is a neurodevelopmental disorder identified by the existence of dysfluencies during speech production. The disruptions of speech flow in stuttering have led a large body of researchers to examine the potential processes underlying speech-language production in people who stutter. This paper presents a survey on techniques of feature extraction and classification which are applied for automatic speech recognition, and also presents a comparative analysis of stuttering techniques on the basis of accuracy, sensitivity, specificity, and dataset size.

...read moreread less

11 citations

References

PDF

Open Access

More filters

Book•

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

[...]

Xuedong Huang¹, Alex Acero¹, Hsiao-Wuen Hon¹, Raj Reddy•Institutions (1)

Microsoft¹

01 Jan 2001

TL;DR: Spoken Language Processing draws on the latest advances and techniques from multiple fields: computer science, electrical engineering, acoustics, linguistics, mathematics, psychology, and beyond to create the state of the art in spoken language technology.

...read moreread less

Abstract: From the Publisher: New advances in spoken language processing: theory and practice In-depth coverage of speech processing, speech recognition, speech synthesis, spoken language understanding, and speech interface design Many case studies from state-of-the-art systems, including examples from Microsoft's advanced research labs Spoken Language Processing draws on the latest advances and techniques from multiple fields: computer science, electrical engineering, acoustics, linguistics, mathematics, psychology, and beyond. Starting with the fundamentals, it presents all this and more: Essential background on speech production and perception, probability and information theory, and pattern recognition Extracting information from the speech signal: useful representations and practical compression solutions Modern speech recognition techniques: hidden Markov models, acoustic and language modeling, improving resistance to environmental noises, search algorithms, and large vocabulary speech recognition Text-to-speech: analyzing documents, pitch and duration controls; trainable synthesis, and more Spoken language understanding: dialog management, spoken language applications, and multimodal interfaces To illustrate the book's methods, the authors present detailed case studies based on state-of-the-art systems, including Microsoft's Whisper speech recognizer, Whistler text-to-speech system, Dr. Who dialog system, and the MiPad handheld device. Whether you're planning, designing, building, or purchasing spoken language technology, this is the state of the artfromalgorithms through business productivity.

...read moreread less

1,795 citations

Programming and analysis for digital time series data

[...]

Loren Enochson, Robert K. Otnes

01 Jan 1968

TL;DR: Abstract : Contents: Preprocessing of data; Digital filtering; Fourier series and Fourier transform computations; Correlation function computations'; Spectral density function computation; Frequency response function and coherence function Computations; Probability density function computation; Nonstationary processes; and Test case and examples.

...read moreread less

Abstract: : Contents: Preprocessing of data; Digital filtering; Fourier series and Fourier transform computations; Correlation function computations; Spectral density function computations; Frequency response function and coherence function computations; Probability density function computations; Nonstationary processes; and Test case and examples.

...read moreread less

65 citations

DOI•

Quranic Verse Recitation Feature Extraction using Mel-Frequency Cepstral Coefficient (MFCC)

[...]

Noor Jamaliah Ibrahim, Zaidi Razak, Emran Mohd Tamil, Mohd Yamani Idna Idris, Zulkifli Mohd Yusoff - Show less +1 more

01 Mar 2008

TL;DR: This paper explores the viability of Mel-Frequency Cepstral Coefficient (MFCC) technique to extract features from Quranic verse recitation, one of the most popular feature extraction techniques used in speech recognition.

...read moreread less

Abstract: Each person’s voice is different. Thus, the Quran sound, which had been recited by most of recitors will probably tend to differ a lot from one person to another. Although those Quranic sentence were particularly taken from the same verse, but the way of the sentence in Al-Quran been recited or delivered may be different. It may produce the difference sounds for the different recitors. Those same combinations of letters may be pronounced differently due to the use of harakates. This paper explores the viability of Mel-Frequency Cepstral Coefficient (MFCC) technique to extract features from Quranic verse recitation. Features extraction is crucial to prepare data for classification process. MFCC is one of the most popular feature extraction techniques used in speech recognition, whereby it is based on the frequency domain of Mel scale for human ear scale. MFCCs consist of preprocessing, framing, windowing, DFT, Mel Filterbank, Logarithm and Inverse DFT.

...read moreread less

40 citations