scispace - formally typeset
Search or ask a question

Showing papers by "Samarendra Dandapat published in 2019"


Journal ArticleDOI
TL;DR: A novel method for classification of DME and two stages of AMD namely the drusens and the choroidal neo vascularization from healthy optical coherence tomography (OCT) images using convolutional neural network (CNN) for reliable diagnosis.

60 citations


Journal ArticleDOI
TL;DR: In terms of emotion classification rate, the proposed region switching based classification approach shows significant improvement in comparison to the classification approach by processing entire active speech region, and it outperforms other state-of-the-art approaches for all the three databases.
Abstract: In this work, a novel region switching based classification method is proposed for speech emotion classification using vowel-like regions (VLRs) and non-vowel-like regions (non-VLRs). In literature, normally the entire active speech region is processed for emotion classification. A few studies have been performed on segmented sound units, such as, syllables, phones, vowel, consonant and voiced, for speech emotion classification. This work presents a detailed analysis of emotion information contained independently in segmented VLRs and non-VLRs. The proposed region switching based method is implemented by choosing the features of either VLRs or non-VLRs for each emotion. The VLRs are detected by identifying hypothesized VLR onset and end points. Segmentation of non-VLRs is done by using the knowledge of VLRs and active speech regions. The performance is evaluated using EMODB, IEMOCAP and FAU AIBO databases. Experimental results show that both the VLRs and non-VLRs contain emotion-specific information. In terms of emotion classification rate, the proposed region switching based classification approach shows significant improvement in comparison to the classification approach by processing entire active speech region, and it outperforms other state-of-the-art approaches for all the three databases.

46 citations


Journal ArticleDOI
TL;DR: A novel multiscale amplitude feature is proposed using multiresolution analysis (MRA) and the significance of the vocal tract is investigated for emotion classification from the speech signal and the proposed feature outperforms the other features.
Abstract: In this paper, a novel multiscale amplitude feature is proposed using multiresolution analysis (MRA) and the significance of the vocal tract is investigated for emotion classification from the speech signal. MRA decomposes the speech signal into number of sub-band signals. The proposed feature is computed by using sinusoidal model on each sub-band signal. Different emotions have different impacts on the vocal tract. As a result, vocal tract responds in a unique way for each emotion. The vocal tract information is enhanced using pre-emphasis. Therefore, emotion information manifested in the vocal tract can be well exploited. This may help in improving the performance of emotion classification. Emotion recognition is performed using German emotional EMODB database, interactive emotional dyadic motion capture database, simulated stressed speech database, and FAU AIBO database with speech signal and speech with enhanced vocal tract information (SEVTI). The performance of the proposed multiscale amplitude feature is compared with three different types of features: 1) the mel frequency cepstral coefficients; 2) the Teager energy operator (TEO)-based feature (TEO-CB-Auto-Env); and 3) the breathinesss feature. The proposed feature outperforms the other features. In terms of recognition rates, the features derived from the SEVTI signal, give better performance compared to the features derived from the speech signal. Combination of the features with SEVTI signal shows average recognition rate of 86.7% using EMODB database.

40 citations


Proceedings ArticleDOI
01 Feb 2019
TL;DR: An automatic method for PMI detection using 3-lead vectorcardiogram (3-lead VCG) signal is proposed and a cost sensitive weighted support vector machine (WSVM) classifier is proposed to combat class imbalance, which is a common problem in real-world disease data classification.
Abstract: Myocardial infarction (MI), commonly known as heart attack is a life-threatening arrhythmia occurs due to insufficient oxygen supply to the heart tissues resulted from formation of clots in one or more coronary arteries. There is a growing interest among researchers for automatic detection of MI using computer algorithms. Based on the spatial location of damaged tissues MI is further categorized as anterior MI, septal MI, lateral MI, inferior MI and posterior MI. Among all, automatic detection of posterior MI (PMI) with standard 12-lead electrocardiogram (12-lead ECG) signal is challenging as it does not have monitoring electrodes posterior to human body. In this paper, we propose an automatic method for PMI detection using 3-lead vectorcardiogram (3-lead VCG) signal. The proposed approach exploits changes in electrical conduction properties of heart tissues during cardiac activity for healthy control (HC) and PMI subjects in three-dimensional (3D) space. To quantify these changes multiscale eigen features (MSEF) of subband matrices are used. Furthermore, we propose a cost sensitive weighted support vector machine (WSVM) classifier to combat class imbalance, which is a common problem in real-world disease data classification. The publicly available PhysioNet/PTBDB diagnostic database has been used to validate the proposed method by using a total of 1463 HC, and 148 PMI 4 sec 3-lead VCG signals. The best test accuracy of 96.69%, sensitivity of 80%, and geometric mean of 88.72% are achieved by WSVM classifier with radial basis function (RBF) kernel.

18 citations


Journal ArticleDOI
TL;DR: An extended logistic regression-HSMM algorithm using the proposed duration model is presented for the heart sound segmentation and the total variation filter is used to attenuate the effect of noises and emphasize the fundamental heart sounds.

15 citations


Journal ArticleDOI
TL;DR: The proposed diagnostic information based SR achieves computational time efficiency without compromising with the high resolution (HR) reconstruction accuracy of the fundus image zones.

15 citations


Journal ArticleDOI
TL;DR: The vocal tract constriction feature, peak to side-lobe ratio feature, and spectral moment features augmented by low-order cepstral coefficients are used to capture the spectral and residual deviations for hypernasality detection.
Abstract: The presence of hypernasality in repaired cleft palate (CP) speech is a consequence of velopharyngeal insufficiency. The coupling of the nasal tract with the oral tract adds nasal formant and antiformant pairs in the hypernasal speech spectrum. This addition deviates the spectral and linear prediction (LP) residual characteristics of hypernasal speech compared to normal speech. In this work, the vocal tract constriction feature, peak to side-lobe ratio feature, and spectral moment features augmented by low-order cepstral coefficients are used to capture the spectral and residual deviations for hypernasality detection. The first feature captures the lower-frequencies prominence in speech due to the presence of nasal formants, the second feature captures the undesirable signal components in the residual signal due to the nasal antiformants, and the third feature captures the information about formants and antiformants in the spectrum along with the spectral envelope. The combination of three features gives normal versus hypernasal speech detection accuracies of 87.76%, 91.13%, and 93.70% for /a/, /i/, and /u/ vowels, respectively, and hypernasality severity detection accuracies of 80.13% and 81.25% for /i/ and /u/ vowels, respectively. The speech data are collected from 30 control normal and 30 repaired CP children between the ages of 7 and 12.

8 citations


Journal ArticleDOI
TL;DR: This work proposes an objective measure of sentence-level intelligibility by combining the information of articulation deficits and hypernasality, and shows a significant correlation between the predicted and perceptual intelligibility scores.
Abstract: Assessment of intelligibility is required to characterize the overall speech production capability and to measure the speech outcome of different interventions for individuals with cleft lip and palate (CLP). Researchers have found that articulation error and hypernasality have a significant effect on the degradation of CLP speech intelligibility. Motivated by this finding, the present work proposes an objective measure of sentence-level intelligibility by combining the information of articulation deficits and hypernasality. These two speech disorders represent different aspects of CLP speech. Hence, it is expected that the composite measure based on them may utilize complementary clinical information. The objective scores of articulation and hypernasality are used as features to train a regression model, and the output of the model is considered as the predicted intelligibility score. The Spearman's correlation coefficient based analysis shows a significant correlation between the predicted and perceptual intelligibility scores (ρ = 0.77, p < 0.001).

4 citations


Proceedings ArticleDOI
15 Sep 2019
TL;DR: The analysis of NAE distorted fricatives shows that the maximum spectral density is concentrated in the lower frequency range with steep positive skewness and more variations in the trajectories of peak ERBN-number as compared to the normal fricative.
Abstract: Cleft lip and palate (CLP) is a congenital disorder of the orofacial region. Nasal air emission (NAE) in CLP speech occurs due to the presence of velopharyngeal dysfunction (VPD), and it mostly occurs in the production of fricative sounds. The objective of present work is to study the acoustic characteristics of voiceless sibilant fricatives in Kannada distorted by NAE and develop an SVM-based classification to distinguish normal fricatives from the NAE distorted fricatives. Static spectral measures, such as spectral moments are used to analyze the deviant spectral distribution of NAE distorted fricatives. As the aerodynamic parameters are deviated due to VPD, the temporal variation of spectral characteristics might also get deviated in NAE distorted fricatives. This variation is studied using the peak equivalent rectangular bandwidth (ERBN )-number, a psychoacoustic measure to analyze the temporal variation in the spectral properties of fricatives. The analysis of NAE distorted fricatives shows that the maximum spectral density is concentrated in the lower frequency range with steep positive skewness and more variations in the trajectories of peak ERBN-number as compared to the normal fricatives. The proposed SVM-based classification achieves good detection rates in discriminating NAE distorted fricatives from normal fricatives.

3 citations


Proceedings ArticleDOI
15 Sep 2019
TL;DR: In this work, detection of hypernasality severity in cleft palate speech is attempted using constant Q cepstral coefficients (CQCC) feature, which gives the overall classification accuracy of 83.33 % and 78.47 % for /i/ and /u/ vowels corresponding to normal, mild and moderate-severe hypernasal speech, respectively using multiclass support vector classifier.
Abstract: In this work, detection of hypernasality severity in cleft palate speech is attempted using constant Q cepstral coefficients (CQCC) feature. The coupling of nasal tract with the oral tract during the production of hypernasal speech adds nasal formants and anti-formants in low frequency region of vowel spectrum mainly around the first formant. The strength and position of nasal formants and anti-formants along with the oral formants changes as the severity of nasality changes in hypernasal speech. The CQCC feature is extracted from the constant Q transform (CQT) spectrum which employs geometrically spaced frequency bins and maintains a constant Q factor for across the entire spectrum. This results in a higher frequency resolution at lower frequencies and higher temporal resolution at higher frequencies. The CQT spectrum resolves the nasal and oral formants in low frequency and captures the spectral changes due to change in nasality severity. The CQCC feature gives the overall classification accuracy of 83.33 % and 78.47 % for /i/ and /u/ vowels corresponding to normal, mild and moderate-severe hypernasal speech, respectively using multiclass support vector classifier.

3 citations


Book ChapterDOI
01 Jan 2019
TL;DR: A device to separate nasal murmur from oral speech, when nasalised speech is spoken is designed, and it is found that nasal Murmur produced, is invariant irrespective of the nasalised vowels and so is the nasal tract.
Abstract: In almost every language across the globe nasalised speech is present. Our work is motivated by the fact that nasalised speech detection can improve the speech recognition system. So, to analyse the nasalised speech better, we have designed a device to separate nasal murmur from oral speech, when nasalised speech is spoken. Speech data of different speakers are collected and analysed. Nasalised vowels are analysed first and it has been found that an additional formant is consistently being introduced between 1000 and 1500 Hz. Using various signal processing techniques we analysed different nasalised vowels and found that nasal murmur produced, is invariant irrespective of the nasalised vowels and so is the nasal tract. Nasalisation is being produced in speech by coupling of nasal tract with oral tract. So, when effect of coupling is analysed experimentally, it came out to be addition.

Book ChapterDOI
17 Dec 2019
TL;DR: This work focuses on analysing the changes in speech source signals under physical exercise induced out-of-breath conditions and a new database is recorded for sustained vowel phonations (SVPs).
Abstract: This work focuses on analysing the changes in speech source signals under physical exercise induced out-of-breath conditions. A new database is recorded for sustained vowel phonations (SVPs). Electroglottogram (EGG) and speech signals are recorded in two simultaneous channels. Morphological changes in EGG signal are analysed using a set of five temporal features related to glottal opening and closing. Two source signals, zero frequency filtered (ZFF) and integrated linear prediction residual (ILPR) signals, are estimated from the speech signal. Their respective spectrums are analysed using a harmonic peak based feature. EGG based features indicate changes in vibration pattern of vocal folds. At the same time, analysis on ZFF and ILPR show that they too change from the normal condition. Classification performance of these features are evaluated by support vector machine (SVM), and k-nearest neighbour (KNN). The accuracies are obtained to be nearly 70% for both the classifiers in case of EGG and speech source signals.

Journal ArticleDOI
TL;DR: This work explores and evaluates a new technique for improving the spatial resolution of a low-resolution ECG by integrating the sparse coding and the joint dictionary learning framework and shows that the proposed model captures diagnostic content in the spatially enhanced ECG effectively when compared to existing models.

Book ChapterDOI
17 Dec 2019
TL;DR: An inverse filtering based technique is used to develop a novel feature, which represents the amount of nasalization present in a vowel, which has good separability for oral vowels and nasalized vowels.
Abstract: Vowel nasalization is present in almost every Indic languages. Detection of vowel nasalization can enhance the accuracy of Automatic Speech Recognition (ASR) systems designed for Indian languages. It also provides significant clinical information about the vocal tract. In pursuit of developing some acoustic parameters for detection of nasalized vowels, most researchers have extensively analyzed its spectral domain characteristics. In this work, we have used an inverse filtering based technique to develop a novel feature, which represents the amount of nasalization present in a vowel. The invariability of nasal filter for different nasalized vowels and addition of oral and nasal speech after radiation has been exploited to find out this feature. As the feature gives information about the amount of nasalization, this can be used for detection of vowel nasalization as well as for clinical purposes. Statistical analysis of the feature has been done in this work. The statistical analysis shows that the feature has good separability for oral vowels and nasalized vowels.

Book ChapterDOI
01 Jan 2019
TL;DR: A new stressed speech database is recorded by considering emergency, breathy and pathological conditions and the results show that these recorded stress classes are effectively characterized by the features.
Abstract: Recently, man–machine interaction based on speech recognition has taken an increasing interest in the field of speech processing. The need for machine to understand the human stress levels in a speaker-independent manner, to prioritize the situation, has grown rapidly. A number of databases have been used for stressed speech recognition. Majority of the databases contain styled emotions and Lombard speech. No studies have been reported on stressed speech considering other stress conditions like emergency, breathy, workload, sleep deprivation and pathological condition. In this work, a new stressed speech database is recorded by considering emergency, breathy and pathological conditions. The database is validated with statistical analysis using two features, mel-frequency cepstral coefficient (MFCC) and Fourier parameter (FP). The results show that these recorded stress classes are effectively characterized by the features. A fivefold cross-validation is carried out to assess how the statistical analysis results are independent of the dataset. Support vector machine (SVM) is used to classify different stress classes.