scispace - formally typeset
Search or ask a question

Showing papers on "Cepstrum published in 2014"


Journal ArticleDOI
TL;DR: In this paper, the squared envelope spectrum (SES) and the kurtosis of the corresponding band-pass filtered analytic signal were analyzed for the diagnostics of bearing failures.

187 citations


Journal ArticleDOI
TL;DR: In this article, a response-only structural health monitoring technique that utilizes cepstrum analysis and artificial neural networks for the identification of damage in civil engineering structures is presented. But the method is limited to a single excitation.
Abstract: This article presents a response-only structural health monitoring technique that utilises cepstrum analysis and artificial neural networks for the identification of damage in civil engineering structures. The method begins by applying cepstrum-based operational modal analysis, which separates source and transmission path effects to determine the structure’s frequency response functions from response measurements only. Principal component analysis is applied to the obtained frequency response functions to reduce the data size, and structural damage is then detected using a two-stage ensemble of artificial neural networks. The proposed method is verified both experimentally and numerically using a laboratory two-storey framed structure and a finite element representation, both subjected to a single excitation. The laboratory structure is tested on a large-scale shake table generating ambient loading of Gaussian distribution. In the numerical investigation, the same input is applied to the finite model, but...

64 citations


Journal ArticleDOI
David Siegel1, Wenyu Zhao1, Edzel Lapira1, Mohamed AbuAli1, Jay Lee1 
TL;DR: In this paper, a full-scale baseline wind turbine drive train and a drive train with several gear and bearing failures are tested at the National Renewable Energy Laboratory (NREL) dynamometer test cell during the NREL Gear Reliability Collaborative Round Robin study.
Abstract: The ability to detect and diagnose incipient gear and bearing degradation can offer substantial improvements in reliability and availability of the wind turbine asset. Considering the motivation for improved reliability of the wind turbine drive train, numerous research efforts have been conducted using a vast array of vibration-based algorithms. Despite these efforts, the techniques are often evaluated on smaller-scale test-beds, and existing studies do not provide a detailed comparison between the various vibration-based condition monitoring algorithms. This study evaluates a multitude of methods, including frequency domain and cepstrum analysis, time synchronous averaging narrowband and residual methods, bearing envelope analysis and spectral kurtosis-based methods. A full-scale baseline wind turbine drive train and a drive train with several gear and bearing failures are tested at the National Renewable Energy Laboratory (NREL) dynamometer test cell during the NREL Gear Reliability Collaborative Round Robin study. A tabular set of results is presented to highlight the ability of each algorithm to accurately detect the bearing and gear wheel component health. The results highlight that the cepstrum and the narrowband phase modulation signal were effective methods for diagnosing gear tooth problems, whereas bearing envelope analysis could confidently detect most of the bearing-related failures. Copyright © 2013 John Wiley & Sons, Ltd.

61 citations


Journal ArticleDOI
TL;DR: Speech processing has vast applications in voice dialing, telephone communication, call routing, domestic appliances control, Speech to Text conversion, Text to Speech conversion, lip synchronization, automation systems etc.
Abstract: The automatic recognition of speech means enabling a natural and easy mode of communication between human and machine. Speech processing has vast applications in voice dialing, telephone communication, call routing, domestic appliances control, Speech to Text conversion, Text to Speech conversion, lip synchronization, automation systems etc. Here we have discussed some mostly used feature extraction techniques like Mel frequency Cepstral Co-efficient (MFCC), Linear Predictive Coding (LPC) Analysis, Dynamic Time Wrapping (DTW), Relative Spectra Processing (RASTA) and Zero Crossings with Peak Amplitudes (ZCPA).Some parameters like RASTA and MFCC considers the nature of speech while it extracts the features, while LPC predicts the future features based on previous features.

53 citations


Journal ArticleDOI
TL;DR: A new constrained clustering algorithm is proposed that satisfies as many constraints as possible while optimizing the clustering objective, and is compared with other state-of-the-art supervised and unsupervised multi-pitch streaming approaches that are specifically designed for music or speech.
Abstract: Multi-pitch analysis of concurrent sound sources is an important but challenging problem. It requires estimating pitch values of all harmonic sources in individual frames and streaming the pitch estimates into trajectories, each of which corresponds to a source. We address the streaming problem for monophonic sound sources. We take the original audio, plus frame-level pitch estimates from any multi-pitch estimation algorithm as inputs, and output a pitch trajectory for each source. Our approach does not require pre-training of source models from isolated recordings. Instead, it casts the problem as a constrained clustering problem, where each cluster corresponds to a source. The clustering objective is to minimize the timbre inconsistency within each cluster. We explore different timbre features for music and speech. For music, harmonic structure and a newly proposed feature called uniform discrete cepstrum (UDC) are found effective; while for speech, MFCC and UDC works well. We also show that timbre-consistency is insufficient for effective streaming. Constraints are imposed on pairs of pitch estimates according to their time-frequency relationships. We propose a new constrained clustering algorithm that satisfies as many constraints as possible while optimizing the clustering objective. We compare the proposed approach with other state-of-the-art supervised and unsupervised multi-pitch streaming approaches that are specifically designed for music or speech. Better or comparable results are shown.

47 citations



Proceedings ArticleDOI
10 Apr 2014
TL;DR: The proposed mel-frequency cepstral coefficient based feature extraction scheme is proposed for the classification of electromyography (EMG) signal into normal and a neuromuscular disease, namely the amyotrophic lateral sclerosis.
Abstract: In this paper, mel-frequency cepstral coefficient (MFCC) based feature extraction scheme is proposed for the classification of electromyography (EMG) signal into normal and a neuromuscular disease, namely the amyotrophic lateral sclerosis (ALS). Instead of employing the MFCC directly on EMG data, it is employed on the motor unit action potentials (MUAPs) extracted from the EMG signal via template matching based decomposition technique. Unlike conventional MUAP based methods, only one MUAP with maximum dynamic range is selected for MFCC based feature extraction. First few MFCCs corresponding to the selected MUAP are used as the desired feature, which not only reduces computational burden but also offers better feature quality with high within class compactness and between class separation. For the purpose of classification, the K-nearest neighborhood (KNN) classifier is employed. Extensive analysis is performed on clinical EMG database and it is found that the proposed method provides a very satisfactory performance in terms of specificity, sensitivity, and overall classification accuracy.

34 citations


Proceedings Article
01 Jan 2014
TL;DR: A comparative study on the performance of features extracted from the magnitude spectrum, cepstrum and phase derivatives such as group-delay function (GDF) and instantaneous frequency deviation (IFD) for classifying the playing techniques of electric guitar recordings shows that sparse coding is an effective means of mining useful patterns from the primitive time-frequency representations.
Abstract: Automatic recognition of guitar playing techniques is challenging as it is concerned with subtle nuances of guitar timbres. In this paper, we investigate this research problem by a comparative study on the performance of features extracted from the magnitude spectrum, cepstrum and phase derivatives such as group-delay function (GDF) and instantaneous frequency deviation (IFD) for classifying the playing techniques of electric guitar recordings. We consider up to 7 distinct playing techniques of electric guitar and create a new individual-note dataset comprising of 7 types of guitar tones for each playing technique. The dataset contains 6,580 clips and 11,928 notes. Our evaluation shows that sparse coding is an effective means of mining useful patterns from the primitive time-frequency representations and that combining the sparse representations of logarithm cepstrum, GDF and IFD leads to the highest average F-score of 71.7%. Moreover, from analyzing the confusion matrices we find that cepstral and phase features are particularly important in discriminating highly similar techniques such as pull-off, hammer-on and bending. We also report a preliminary study that demonstrates the potential of the proposed methods in automatic transcription of real-world electric guitar solos.

32 citations


Proceedings ArticleDOI
27 Oct 2014
TL;DR: Results show that the maximum gain in performance is achieved by using two kNNs as opposed to using a single kNN, and fusion is implicitly accomplished by ensemble classification.
Abstract: Security (and cyber security) is an important issue in existing and developing technology. It is imperative that cyber security go beyond password based systems to avoid criminal activities. A human biometric and emotion based recognition framework implemented in parallel can enable applications to access personal or public information securely. The focus of this paper is on the study of speech based emotion recognition using a pattern recognition paradigm with spectral feature extraction and an ensemble of k nearest neighbor (kNN) classifiers. The five spectral features are the linear predictive cepstrum (CEP), mel frequency cepstrum (MFCC), line spectral frequencies (LSF), adaptive component weighted cepstrum (ACW) and the post-filter cepstrum (PFL). The bagging algorithm is used to train the ensemble of kNNs. Fusion is implicitly accomplished by ensemble classification. The LDC emotional prosody speech database is used in all the experiments. Results show that the maximum gain in performance is achieved by using two kNNs as opposed to using a single kNN.

32 citations


Journal ArticleDOI
TL;DR: A robust speaker recognition method that employs a novel adaptive wavelet shrinkage method for noise suppression that exhibits great robustness in various noise conditions is proposed.

30 citations


Proceedings ArticleDOI
04 May 2014
TL;DR: Several strategies involving front-end filter bank redistribution, cepstral dimensionality reduction, and lexicon expansion for alternative pronunciations are proposed to improve robustness of automatic speech recognition of whispered speech with neutral-trained acoustic models.
Abstract: This study focuses on acoustic variations in speech introduced by whispering, and proposes several strategies to improve robustness of automatic speech recognition of whispered speech with neutral-trained acoustic models. In the analysis part, differences in neutral and whispered speech captured in the UT-Vocal Effort II corpus are studied in terms of energy, spectral slope, and formant center frequency and bandwidth distributions in silence, voiced, and unvoiced speech signal segments. In the part dedicated to speech recognition, several strategies involving front-end filter bank redistribution, cepstral dimensionality reduction, and lexicon expansion for alternative pronunciations are proposed. The proposed neutral-trained system employing redistributed filter bank and reduced features provides a 7.7% absolute WER reduction over the baseline system trained on neutral speech, and a 1.3% reduction over a baseline system with whisper-adapted acoustic models.

Proceedings ArticleDOI
06 Nov 2014
TL;DR: The results show that the proposed fusion method can achieve promising heart rate measurement accuracy and robustness against various sensor contact conditions.
Abstract: This paper presents a method of estimating heart rate from arrays of fiber Bragg grating (FBG) sensors embedded in a mat. A cepstral domain signal analysis technique is proposed to characterize Ballistocardiogram (BCG) signals. With this technique, the average heart beat intervals can be estimated by detecting the dominant peaks in the cepstrum, and the signals of multiple sensors can be fused together to obtain higher signal to noise ratio than each individual sensor. Experiments were conducted with 10 human subjects lying on 2 different postures on a bed. The estimated heart rate from BCG was compared with heart rate ground truth from ECG, and the mean error of estimation obtained is below 1 beat per minute (BPM). The results show that the proposed fusion method can achieve promising heart rate measurement accuracy and robustness against various sensor contact conditions.

Journal ArticleDOI
TL;DR: An approach based on the transformation of the Cepstral domain on Hidden Markov Model, which is employed for the automatic diagnosis of the Obstructive Sleep Apnea syndrome, which includes an Electrocardiogram artefacts removal and R wave detection stage is presented.

Journal ArticleDOI
TL;DR: A hybrid noise resilient F0 detection algorithm named BaNa that combines the approaches of harmonic ratios and Cepstrum analysis is presented that achieves the lowest Gross Pitch Error (GPE) rate among all the algorithms.
Abstract: Fundamental frequency (F0) is one of the essential features in many acoustic related applications. Although numerous F0 detection algorithms have been developed, the detection accuracy in noisy environments still needs improvement. We present a hybrid noise resilient F0 detection algorithm named BaNa that combines the approaches of harmonic ratios and Cepstrum analysis. A Viterbi algorithm with a cost function is used to identify the F0 value among several F0 candidates. Speech and music databases with eight different types of additive noise are used to evaluate the performance of the BaNa algorithm and several classic and state-of-the-art F0 detection algorithms. Results show that for almost all types of noise and signal-to-noise ratio (SNR) values investigated, BaNa achieves the lowest Gross Pitch Error (GPE) rate among all the algorithms. Moreover, for the 0 dB SNR scenarios, the BaNa algorithm is shown to achieve 20% to 35% GPE rate for speech and 12% to 39% GPE rate for music. We also describe implementation issues that must be addressed to run the BaNa algorithm as a real-time application on a smartphone platform.

Journal ArticleDOI
01 Jan 2014-Optik
TL;DR: In this article, a modified cepstrum domain approach combined with bit-plane slicing method is proposed to estimate uniform motion blur parameters, which improves the accuracy without any manual intervention.

Proceedings ArticleDOI
22 Dec 2014
TL;DR: An artificial ear is considered that has high resolution in high frequency region and low resolution where the frequency is low and can virtually hear effect of steganography and distinguish between stego and clean audio signals.
Abstract: Some of the previous audio steganalysis systems have suggested features based on human auditory system models. In contrast, this paper exploits the idea of maximum deviation from human auditory system to suggest an efficient audio steganalysis scheme. Based on this idea, an artificial ear is considered that has high resolution in high frequency region and low resolution where the frequency is low. Simulation results show that this artificial ear can virtually hear effect of steganography and distinguish between stego and clean audio signals. Proposed method achieves accuracy of 93% (StegHide@1.563% BPB) and 97% (Hide4Pgp@6.25% BPB) which are 16% and 12% higher than previous MFCC based methods. KeywordsAudio Steganalysis, Audio Steganography, Mel Cepstrum, Reversed Mel Cepstrum, Human Auditory System.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed the Minimum Variance Cepstrum (MVC) estimator to estimate the time difference of arrival (TDOA) of sound waves between microphones.

Book ChapterDOI
01 Jan 2014
TL;DR: The goal and novelty of this work was the analysis of applicability of the parameters selectively used to assess the pathology.
Abstract: Present development of digital registration and methods of recorded voice processing are useful in detection of most pathologies and diseases of a human vocal tract. The recognition of the voice condition requires the creation of a model which is comprised of different acoustic parameters of speech signal. In this study a vector consisting of 31 parameters for analysing the speech signal was created. The speech parameters were extracted from time, frequency and cepstral domains. Using Principal Components Analysis the number of the parameters was reduced to 17. In order to validate the detection of the pathological voice signal, a tenfold cross-validation and confusion matrix were used. The goal and novelty of this work was the analysis of applicability of the parameters selectively used to assess the pathology.

Proceedings ArticleDOI
04 May 2014
TL;DR: Results on speaker identification using the YOHO corpus demonstrate that these physiologically-driven features are both more accurate than and complementary to traditional mel-frequency cepstral coefficients (MFCC).
Abstract: This paper introduces the use of three physiologically-motivated features for speaker identification, Residual Phase Cepstrum Coefficients (RPCC), Glottal Flow Cepstrum Coefficients (GLFCC) and Teager Phase Cepstrum Coefficients (TPCC). These features capture speaker-discriminative characteristics from different aspects of glottal source excitation patterns. The proposed physiologically-driven features give better results with lower model complexities, and also provide complementary information that can improve overall system performance even for larger amounts of data. Results on speaker identification using the YOHO corpus demonstrate that these physiologically-driven features are both more accurate than and complementary to traditional mel-frequency cepstral coefficients (MFCC). In particular, the incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks.

Proceedings ArticleDOI
08 May 2014
TL;DR: Results for different speech signals show that this new method of pitch detection is the best in terms of speech quality and computational complexity.
Abstract: A new pitch detection scheme has been proposed based on the short-time autocorrelation function (ACF) and average magnitude difference function (AMDF). The performance of the proposed scheme has been evaluated, through simulation, in a complete speech analysis-synthesis system. For detection of pitch, local maxima of ACF and local minima of AMDF values are computed. To reduce computational complexity, the original speech signal is converted into a three level signal before computing ACF and AMDF. Synthesized speech quality, computational complexity and time taken during simulation are the parameters that have been considered while comparing this system with the analysis-synthesis systems that use autocorrelation, cepstrum and wavelet based pitch detection methods. Results for different speech signals show that this new method of pitch detection is the best in terms of speech quality and computational complexity.

Proceedings ArticleDOI
04 May 2014
TL;DR: This paper presents a novel feature representation called sparse cepstral codes for instrument identification using the use of sparse coding and power normalization to derive compact codes that better represent the information of the cepstrum.
Abstract: This paper presents a novel feature representation called sparse cepstral codes for instrument identification. We first motivate the approach by discussing why cepstrum is suitable for instrument identification. Then we propose the use of sparse coding and power normalization to derive compact codes that better represent the information of the cepstrum. Our evaluation on both uni-source and multi-source instrument identification tasks show that the proposed feature leads to significantly better accuracy than existing methods. We further show that cepstrum obtained from power-scaled spectrum can do better than typical cepstrum especially in multi-source signal. The proposed system achieves 0.955 F-score in uni-source dataset and 0.688 F-score in multi-source dataset.

Journal ArticleDOI
TL;DR: The evaluation measures reveal that the proposed complex cepstrum based voice conversion system approximate the converted speech signal with better accuracy than the model based on the Mel cepStrum envelope based voice Conversion model with objective and subjective evaluations.
Abstract: The complex cepstrum vocoder is used to modify the speaker specific characteristics of the source speaker speech to that of the target speaker speech. The low time and high time liftering are used to split the calculated cepstrum into the vocal tract and the source excitation parameters. The obtained mixed phase vocal tract and source excitation parameters with finite impulse response preserve the phase properties of the resynthesized speech frame. The radial basis function is explored to capture the nonlinear mapping function for modifying the complex cepstrum based real and imaginary components of the vocal tract and source excitation of the speech signal. The state-of-the-art Mel cepstrum envelope and the fundamental frequency () are considered to represent the vocal tract and the source excitation of the speech frame, respectively. Radial basis function is used to capture and formulate the nonlinear relations between the Mel cepstrum envelope of the source and target speakers. Mean and standard deviation approach is employed to modify the fundamental frequency (). The Mel log spectral approximation filter is used to reconstruct the speech signal from the modified Mel cepstrum envelope and fundamental frequency. A comparison of the proposed complex cepstrum based model has been made with the state-of-the-art Mel Cepstrum Envelope based voice conversion model with objective and subjective evaluations. The evaluation measures reveal that the proposed complex cepstrum based voice conversion system approximate the converted speech signal with better accuracy than the model based on the Mel cepstrum envelope based voice conversion.

Proceedings ArticleDOI
09 Jan 2014
TL;DR: The proposed work presents automatic classification of Indian Classical instruments based on spectral and MFCC features using well trained back propogation neural network classifier using Principal Component Analysis.
Abstract: In applications such as music information and database retrieval systems, classification of musical instruments plays an important role. The proposed work presents automatic classification of Indian Classical instruments based on spectral and MFCC features using well trained back propogation neural network classifier. Musical instruments such as Harmonium, Santo or and Tabla are considered for an experimentation. The spectral features such as amplitude and spectral range along with Mel Frequency Cepstrum Coefficients are considered as features. Being features are not distinguished, classification is done using non parametric classifiers such as neural networks. Being number of cepstrum coefficients are large important coefficients are selected using Principal Component Analysis. It has been observed that using 42 samples for training and 18 for testing, back propogation neural network provides accuracy of 98%. The present work can be extended for more number of Hindustani and Carnitic classical musical Instruments.

Book ChapterDOI
01 Jan 2014
TL;DR: In this article, the use of the cepstrum for removing components from a signal which manifest themselves as periodic spectral components has been described, including discrete frequency components with uniform spacing such as families of harmonics and modulation sidebands, but also narrow band noise peaks coming from slight random modulation of almost periodic signals.
Abstract: The use of the cepstrum for removing components from a signal which manifest themselves as periodic spectral components has previously been described. These include discrete frequency components with uniform spacing such as families of harmonics and modulation sidebands, but also narrow band noise peaks coming from slight random modulation of almost periodic signals, such as higher harmonics of blade pass frequencies. The removal is effected by applying a notch “lifter” to the real cepstrum of the signal, thus removing the targeted components from the log amplitude spectrum, and then combining the modified amplitude spectrum with the original phase spectrum. Not much attention was previously paid to the type of notch lifter, but two different situations occurring in conjunction with analysis of signals from wind turbines showed that different lifters have advantages in different situations. This chapter describes two different approaches, illustrating them with the two examples of application.


Proceedings ArticleDOI
04 May 2014
TL;DR: The proposed uniform discrete cepstrum (UDC) uses a more natural and locally adaptive regularizer to prevent it from overfitting the isolated spectral points and significantly outperform all the other cepstral representations.
Abstract: We propose a novel cepstral representation called the uniform discrete cepstrum (UDC) to represent the timbre of sound sources in a sound mixture. Different from ordinary cepstrum and MFCC which have to be calculated from the full magnitude spectrum of a source after source separation, UDC can be calculated directly from isolated spectral points that are likely to belong to the source in the mixture spectrum (e.g., non-overlapping harmonics of a harmonic source). Existing cepstral representations that have this property are discrete cepstrum and regularized discrete cepstrum, however, compared to the proposed UDC, they are not as effective and are more complex to compute. The key advantage of UDC is that it uses a more natural and locally adaptive regularizer to prevent it from overfitting the isolated spectral points. We derive the mathematical relations between these cepstral representations, and compare their timbre modeling performances in the task of instrument recognition in polyphonic audio mixtures. We show that UDC and its mel-scale variant MUDC significantly outperform all the other representations.

Proceedings ArticleDOI
13 Nov 2014
TL;DR: Experimental results depict that the RMFCC and low-variance spectrum-estimators-based robust feature extractors outperformed the MFCC, PNCC (power normalized cepstral coefficients), and ETSI-AFE features both in clean and multi-condition training conditions.
Abstract: This paper presents robust feature extractors for a continuous speech recognition task in matched and mismatched environments. The mismatched conditions may occur due to additive noise, different channel, and acoustic reverberation. In the conventional Mel-frequency cepstral coefficient (MFCC) feature extraction framework, a subband spectrum enhancement technique is incorporated to improve its robustness. We denote this front-end as robust MFCCs (RMFCC). Based on the gammatone and compressive gammachirp filter-banks, robust gammatone filterbank cepstral coefficients (RGFCC) and robust compressive gammachirp filterbank cepstral coefficients (RCGCC) are also presented for comparison. We also employ low-variance spectrum estimators such as multitaper, regularized minimum-variance distortionless response (RMVDR), instead of a discrete Fourier transform-based direct spectrum estimator for improving robustness against mismatched environments. Speech recognition performances of the robust feature extractors are evaluated in clean as well as multi-style training conditions of the AURORA-4 continuous speech recognition task. Experimental results depict that the RMFCC and low-variance spectrum-estimators-based robust feature extractors outperformed the MFCC, PNCC (power normalized cepstral coefficients), and ETSI-AFE features both in clean and multi-condition training conditions.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the information geometry of a minimum-phase linear system with a finite complex cepstrum norm corresponds to the Kahler potential, and that the Hermitian structure of the manifold is explicitly emergent if and only if the impulse response function of the highest degree in $z$ is constant in model parameters.
Abstract: We prove the correspondence between the information geometry of a signal filter and a Kahler manifold. The information geometry of a minimum-phase linear system with a finite complex cepstrum norm is a Kahler manifold. The square of the complex cepstrum norm of the signal filter corresponds to the Kahler potential. The Hermitian structure of the Kahler manifold is explicitly emergent if and only if the impulse response function of the highest degree in $z$ is constant in model parameters. The Kahlerian information geometry takes advantage of more efficient calculation steps for the metric tensor and the Ricci tensor. Moreover, $\alpha$-generalization on the geometric tensors is linear in $\alpha$. It is also robust to find Bayesian predictive priors, such as superharmonic priors, because Laplace-Beltrami operators on Kahler manifolds are in much simpler forms than those of the non-Kahler manifolds. Several time series models are studied in the Kahlerian information geometry.

Journal ArticleDOI
TL;DR: In this paper, a method of aero-engine rubbing positions identification based on cepstrum analysis is proposed, and the transfer path characteristics which reflect the transfer characteristics information from rubbing points to casing measuring points are separated from the vibration acceleration signals of casing by means of cepstrate analysis.
Abstract: A novel method of aero-engine rubbing positions identification based on cepstrum analysis is proposed, and the transfer path characteristics which reflect the transfer characteristics information from rubbing points to casing measuring points are separated from the vibration acceleration signals of casing by means of cepstrum analysis. Therefore, there is different transfer characteristics information at different rubbing positions, and in view of this, twenty rubbing positions identification features from the cepstrum are extracted. A large number of rubbing experiments of different positions are simulated with the rotor experiment rig of aero-engine, and the characteristic analysis of experimental samples at different rubbing positions is carried out, and the results indicate the consistency of features to the same rubbing position and the difference of the features to the different rubbing positions. Finally, the aero-engine rubbing positions identification is carried out using the nearest neighbor classification method, the recognition rate reaches 100%, and the effectiveness of the method is full verified.

Proceedings ArticleDOI
01 Dec 2014
TL;DR: The paper compares the formant structures of speech and singing, revealing the well-known difference, the presence of an additional formant in singing, called the singing formant, at frequencies between 2500-3000 Hz.
Abstract: Formants are the frequency parts of speech and singing signal those closely describe the human vocal tract geometry. Considering the growing importance of the formants, they are considered the important subject of many work. In this sense, we present two techniques for the estimation of formants, one combining Wavelet with Linear Predictive Coding (LPC) and the other combining Wavelet with Cepstral analysis. The proposed approaches uses multi-resolution analysis of wavelet transform to accurately extract the formants. The proposed techniques were tested on corpus of [a],[e],[i],[o],[u] vowels to extract speech formants, whereas singing formants were analyzed using capella singing voice of trained singers. The paper compares the formant structures of speech and singing, revealing the well-known difference, the presence of an additional formant in singing, called the singing formant, at frequencies between 2500–3000 Hz. The experimental results show the superiority of the proposed techniques in extracting formants over the conventional methods like LPC and Cepstrum.