scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Cubic SVM Classifier Based Feature Extraction and Emotion Detection from Speech Signals

TL;DR: Experimental results manifest that the proposed technique garners better accuracy by correctly identifying the emotions and these results were moreover compared to the other existing methods of speech emotion detection.
Abstract: The detection of emotions from the speech is one of the most stirring and intriguing research areas in the field of artificial intelligence. In this paper, the emotion identification from Hindi language speech which is a popular language of India is carried out in a noisy environment after which multifarious emotions are classified into 4 main groups of emotional states namely happiness, sadness, anger and neutral. The proposed technique involves extraction of prosodic and spectral features of an acoustic signal like pitch, energy, formant, Mel-frequency Cepstrum Coefficients (MFCC) and Linear Prediction Cepstral Coefficient (LPCC) along with their classification using a cubic spine Support Vector Machine (SVM) classifier model. The system gave an overall accuracy of, 98.75% in male actor utterances and 95% in female actors. Experimental results manifest that the proposed technique garners better accuracy by correctly identifying the emotions and these results were moreover compared to the other existing methods of speech emotion detection. Furthermore, the extracted features along with, different classifier models were contrasted in this paper for better evaluation.
Citations
More filters
Journal ArticleDOI
TL;DR: An automated EEG based emotion recognition method with a novel fractal pattern feature extraction approach is presented and has been tested on emotional EEG signals with 14 channels using linear discriminant, k-nearest neighborhood, support vector machine, and SVM.
Abstract: Electroencephalogram (EEG) signal analysis is one of the mostly studied research areas in biomedical signal processing, and machine learning. Emotion recognition through machine intelligence plays critical role in understanding the brain activities as well as in developing decision-making systems. In this research, an automated EEG based emotion recognition method with a novel fractal pattern feature extraction approach is presented. The presented fractal pattern is inspired by Firat University Logo and named fractal Firat pattern (FFP). By using FFP and Tunable Q-factor Wavelet Transform (TQWT) signal decomposition technique, a multilevel feature generator is presented. In the feature selection phase, an improved iterative selector is utilized. The shallow classifiers have been considered to denote the success of the presented TQWT and FFP based feature generation. This model has been tested on emotional EEG signals with 14 channels using linear discriminant (LDA), k-nearest neighborhood (k-NN), support vector machine (SVM). The proposed framework achieved 99.82% with SVM classifier.

65 citations

Journal ArticleDOI
01 Feb 2022-Toxics
TL;DR: In this article , the authors used in-situ physicochemical parameters to the limited data on heavy metal (HM) concentration in water resources in Marinduque Island Province in the Philippines, which experienced two mining disasters.
Abstract: Limited monitoring activities to assess data on heavy metal (HM) concentration contribute to worldwide concern for the environmental quality and the degree of toxicants in areas where there are elevated metals concentrations. Hence, this study used in-situ physicochemical parameters to the limited data on HM concentration in SW and GW. The site of the study was Marinduque Island Province in the Philippines, which experienced two mining disasters. Prediction model results showed that the SW models during the dry and wet seasons recorded a mean squared error (MSE) ranging from 6 × 10−7 to 0.070276. The GW models recorded a range from 5 × 10−8 to 0.045373, all of which were approaching the ideal MSE value of 0. Kling–Gupta efficiency values of developed models were all greater than 0.95. The developed neural network-particle swarm optimization (NN-PSO) models for SW and GW were compared to linear and support vector machine (SVM) models and previously published deterministic and artificial intelligence (AI) models. The findings indicated that the developed NN-PSO models are superior to the developed linear and SVM models, up to 1.60 and 1.40 times greater than the best model observed created by linear and SVM models for SW and GW, respectively. The developed models were also on par with previously published deterministic and AI-based models considering their prediction capability. Sensitivity analysis using Olden’s connection weights approach showed that pH influenced the concentration of HM significantly. Established on the research findings, it can be stated that the NN-PSO is an effective and practical approach in the prediction of HM concentration in water resources that contributes a solution to the limited HM concentration monitored data.

14 citations

Proceedings ArticleDOI
10 Jan 2021
TL;DR: In this article, an end-to-end neural embedding system based on triplet loss and residual learning has been proposed for speech emotion recognition, which learns the embeddings from the emotional information of the speech utterances.
Abstract: In this paper, an end-to-end neural embedding system based on triplet loss and residual learning has been proposed for speech emotion recognition. The proposed system learns the embeddings from the emotional information of the speech utterances. The learned embeddings are used to recognize the emotions portrayed by given speech samples of various lengths. The proposed system implements Residual Neural Network architecture. It is trained using softmax pretraining and triplet loss function. The weights between the fully connected and embedding layers of the trained network are used to calculate the embedding values. The embedding representations of various emotions are mapped onto a hyperplane, and the angles among them are computed using the cosine similarity. These angles are utilized to classify a new speech sample into its appropriate emotion class. The proposed system has demonstrated 91.67% and 64.44% accuracy while recognizing emotions for RAVDESS and IEMOCAP dataset, respectively.

14 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: A fractal-based method to detect normal, pre-ictal, and seizure from EEG signals from epilepsy patients using digital signal processing is proposed.
Abstract: Epilepsy is a brain disorder characterized by the occurance of seizure. The International League Against Epilepsy describes that the least occurence of seizure is twice in 24 hours. There are more than 50 million people in the world suffering from this disease. Recurring seizures in epilepsy can be seen from changes in a pattern in EEG signals. EEG is one of the common tools that can be used to record brain activity rather than MRI and CT Scan. EEG recording is conducted by placing electrodes in the scalp (sEEG) or inside the cranium (iEEG). EEG signals are non-linear and non-stationary, so it is difficult to interpret them manually. In order to minimize the mistake during manual interpretation, various methods have been developed to analyse EEG signals in epilepsy patients using digital signal processing. In this study, we proposed a fractal-based method to detect normal, pre-ictal, and seizure from EEG signals. First, EEG signals were decomposed into five sub-bands: Alpha, Beta, Theta, Delta, and Gamma. The Katz fractal dimension (KFD) was calculated for each EEG signal sub-band to be used as features. A Support Vector Machine (SVM) with six different kernels was used as the classifier. The highest accuracy of 98.7% was achieved for three classes of EEG signal data.

11 citations


Cites background from "Cubic SVM Classifier Based Feature ..."

  • ...In multidimensional space, a hyperplane, which separates the classes into the best possible way [25], was found....

    [...]

Proceedings ArticleDOI
01 Oct 2019
TL;DR: This study analyzed the use of Higuchi and Katz fractal dimension as a feature extraction method to detect an interictal and ictal state in the EEG signal and found the proposed method able to determine normal vs. ICTal state with 100% of accuracy.
Abstract: Epilepsy is a neurological disorder which may occur in every human being. The existence of a seizure showed as the characteristic of this disorder. International League Against Epilepsy (ILAE) mentioned that the diagnose of epilepsy required a minimum of two seizure event in 24 hours. A standard tool used by the neurologist to diagnose epilepsy was Electroencephalogram (EEG). An automatic seizure detection in EEG signal may help them to identify the pattern of ictal condition. Since the characteristic of the EEG signal were dynamic and nonstationary, it was very challenging to interpret the signal pattern. In this study, we analyzed the use of Higuchi and Katz fractal dimension as a feature extraction method to detect an interictal and ictal state in the EEG signal. These two states were essential in the term of seizure detection and prediction system. EEG signal extracted into five frequency bands called delta, theta, alpha, beta, and gamma. Each frequency showed a different characteristic of brain-behavior in a specific condition. The extracted features then fed into a support vector machine (SVM) to classify between normal with interictal and ictal states. The proposed method able to determine normal vs. ictal state with 100% of accuracy. On the other hand, the best accuracy obtained for detecting normal vs. interictal state was 99.5%.

11 citations


Cites methods from "Cubic SVM Classifier Based Feature ..."

  • ...By using the hyperplane, we able to separate the multidimensional space classes in the best possible way [32]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Findings on decoding replicate earlier findings on the ability of judges to infer vocally expressed emotions with much-better-than-chance accuracy, including consistently found differences in the recognizability of different emotions.
Abstract: Professional actors' portrayals of 14 emotions varying in intensity and valence were presented to judges. The results on decoding replicate earlier findings on the ability of judges to infer vocally expressed emotions with much-better-than-chance accuracy, including consistently found differences in the recognizability of different emotions. A total of 224 portrayals were subjected to digital acoustic analysis to obtain profiles of vocal parameters for different emotions. The data suggest that vocal parameters not only index the degree of intensity typical for different emotions but also differentiate valence or quality aspects. The data are also used to test theoretical predictions on vocal patterning based on the component process model of emotion (K.R. Scherer, 1986). Although most hypotheses are supported, some need to be revised on the basis of the empirical evidence. Discriminant analysis and jackknifing show remarkably high hit rates and patterns of confusion that closely mirror those found for listener-judges.

1,862 citations


"Cubic SVM Classifier Based Feature ..." refers background in this paper

  • ...The four basic classes of emotions analyzed in this paper are given as anger, happiness, sadness and neutral [7]....

    [...]

Proceedings ArticleDOI
07 Nov 2005
TL;DR: Two classification methods, the hidden Markov model (HMM) and the support vector machine (SVM), are used, to classify five emotional states: anger, happiness, sadness, surprise and a neutral state.
Abstract: Automatic emotion recognition in speech is a current research area with a wide range of applications in human-machine interactions. This paper uses two classification methods, the hidden Markov model (HMM) and the support vector machine (SVM), to classify five emotional states: anger, happiness, sadness, surprise and a neutral state. In the HMM method, 39 candidate instantaneous features were extracted, and the sequential forward selection (SFS) method was used to find the best feature subset. The classification performance of the selected feature subset was then compared with that of the Mel frequency cepstrum coefficients (MFCC). Within the method based on SVM, a new vector measuring the difference between Mel frequency scale sub-bands energies is proposed. The performance of the K-nearest neighbors (KNN) classifier using the proposed vector was also investigated. Both gender dependent and gender independent experiments were conducted on the Danish emotional speech (DES) database. The recognition rates by the HMM classifier were 98.9% for female subjects, 100% for male subjects, and 99.5% for gender independent cases. When the SVM classifier and the proposed feature vector were employed, correct classification rates of 89.4%, 93.6% and 88.9% were obtained for male, female and gender independent cases respectively.

215 citations


"Cubic SVM Classifier Based Feature ..." refers background in this paper

  • ...SVM finds a hyperplane in multidimensional space which divides the classes into best possible way [19]....

    [...]

Proceedings ArticleDOI
15 Mar 1999
TL;DR: The proposed neural TRAPs are found to yield significant amount of complementary information to that of the conventional spectral feature based ASR system, which results in improved robustness to several types of additive and convolutive environmental degradations.
Abstract: We study a new approach to processing temporal information for automatic speech recognition (ASR). Specifically, we study the use of rather long-time temporal patterns (TRAPs) of spectral energies in place of the conventional spectral patterns for ASR. The proposed neural TRAPs are found to yield significant amount of complementary information to that of the conventional spectral feature based ASR system. A combination of these two ASR systems is shown to result in improved robustness to several types of additive and convolutive environmental degradations.

206 citations


"Cubic SVM Classifier Based Feature ..." refers background in this paper

  • ...The standard pitch value for females varies from 165Hz to 255Hz while in the case of male voice it varies from 80 to 180 Hz [3]....

    [...]

Journal ArticleDOI
TL;DR: A feature selection method based on correlation analysis and Fisher is proposed, which can remove the redundant features that have close correlations with each other, which would make it possible to realize the interaction between speaker-independent and computer/robot in the future.

186 citations

Proceedings ArticleDOI
22 Mar 2017
TL;DR: Mel Frequency Cepstral Coefficient (MFCC) technique is used to recognize emotion of a speaker from their voice and the efficiency was found to be about 80%.
Abstract: Speech is a complex signal consisting of various information, such as information about the message to be communicated, speaker, language, region, emotions etc. Speech Processing is one of the important branches of digital signal processing and finds applications in Human computer interfaces, Telecommunication, Assistive technologies, Audio mining, Security and so on. Speech emotion recognition is important to have a natural interaction between human being and machine. In speech emotion recognition, emotional state of a speaker is extracted from his or her speech. The acoustic characteristic of the speech signal is Feature. Feature extraction is the process that extracts a small amount of data from the speech signal that can later be used to represent each speaker. Many feature extraction methods are available and Mel Frequency Cepstral Coefficient (MFCC) is the commonly used method. In this paper, speaker emotions are recognized using the data extracted from the speaker voice signal. Mel Frequency Cepstral Coefficient (MFCC) technique is used to recognize emotion of a speaker from their voice. The designed system was validated for Happy, sad and anger emotions and the efficiency was found to be about 80%.

107 citations


"Cubic SVM Classifier Based Feature ..." refers background in this paper

  • ...Mel-frequency Cepstrum Coefficients (MFCC): MFCC gives high accuracy in the classification because this system considers human perception sensitivity regarding frequency for the recognition [15]....

    [...]