Showing papers on "Cepstrum published in 2014"

PDF

Open Access

Journal Article•DOI•

The relationship between kurtosis- and envelope-based indexes for the diagnostic of rolling element bearings

[...]

Pietro Borghesani¹, Paolo Pennacchi¹, Steven Chatterton¹•Institutions (1)

03 Feb 2014-Mechanical Systems and Signal Processing

TL;DR: In this paper, the squared envelope spectrum (SES) and the kurtosis of the corresponding band-pass filtered analytic signal were analyzed for the diagnostics of bearing failures.

...read moreread less

187 citations

Journal Article•DOI•

Damage identification based on response-only measurements using cepstrum analysis and artificial neural networks

[...]

Ulrike Dackermann¹, Wade A. Smith², Robert B. Randall²•Institutions (2)

University of Technology, Sydney¹, University of New South Wales²

07 Jul 2014-Structural Health Monitoring-an International Journal

TL;DR: In this article, a response-only structural health monitoring technique that utilizes cepstrum analysis and artificial neural networks for the identification of damage in civil engineering structures is presented. But the method is limited to a single excitation.

...read moreread less

Abstract: This article presents a response-only structural health monitoring technique that utilises cepstrum analysis and artificial neural networks for the identification of damage in civil engineering structures. The method begins by applying cepstrum-based operational modal analysis, which separates source and transmission path effects to determine the structure’s frequency response functions from response measurements only. Principal component analysis is applied to the obtained frequency response functions to reduce the data size, and structural damage is then detected using a two-stage ensemble of artificial neural networks. The proposed method is verified both experimentally and numerically using a laboratory two-storey framed structure and a finite element representation, both subjected to a single excitation. The laboratory structure is tested on a large-scale shake table generating ambient loading of Gaussian distribution. In the numerical investigation, the same input is applied to the finite model, but...

...read moreread less

64 citations

Journal Article•DOI•

A comparative study on vibration‐based condition monitoring algorithms for wind turbine drive trains

[...]

David Siegel¹, Wenyu Zhao¹, Edzel Lapira¹, Mohamed AbuAli¹, Jay Lee¹ - Show less +1 more•Institutions (1)

University of Cincinnati¹

01 May 2014-Wind Energy

TL;DR: In this paper, a full-scale baseline wind turbine drive train and a drive train with several gear and bearing failures are tested at the National Renewable Energy Laboratory (NREL) dynamometer test cell during the NREL Gear Reliability Collaborative Round Robin study.

...read moreread less

Abstract: The ability to detect and diagnose incipient gear and bearing degradation can offer substantial improvements in reliability and availability of the wind turbine asset. Considering the motivation for improved reliability of the wind turbine drive train, numerous research efforts have been conducted using a vast array of vibration-based algorithms. Despite these efforts, the techniques are often evaluated on smaller-scale test-beds, and existing studies do not provide a detailed comparison between the various vibration-based condition monitoring algorithms. This study evaluates a multitude of methods, including frequency domain and cepstrum analysis, time synchronous averaging narrowband and residual methods, bearing envelope analysis and spectral kurtosis-based methods. A full-scale baseline wind turbine drive train and a drive train with several gear and bearing failures are tested at the National Renewable Energy Laboratory (NREL) dynamometer test cell during the NREL Gear Reliability Collaborative Round Robin study. A tabular set of results is presented to highlight the ability of each algorithm to accurately detect the bearing and gear wheel component health. The results highlight that the cepstrum and the narrowband phase modulation signal were effective methods for diagnosing gear tooth problems, whereas bearing envelope analysis could confidently detect most of the bearing-related failures. Copyright © 2013 John Wiley & Sons, Ltd.

...read moreread less

61 citations

Journal Article•DOI•

A Comparative Study of Feature Extraction Techniques for Speech Recognition System

[...]

Pratik K. Kurzekar, Ratnadeep R. Deshmukh, Vishal B. Waghmare, Pukhraj P. Shrishrimal, Babasaheb Ambedkar - Show less +1 more

15 Dec 2014-International Journal of Innovative Research in Science, Engineering and Technology

TL;DR: Speech processing has vast applications in voice dialing, telephone communication, call routing, domestic appliances control, Speech to Text conversion, Text to Speech conversion, lip synchronization, automation systems etc.

...read moreread less

Abstract: The automatic recognition of speech means enabling a natural and easy mode of communication between human and machine. Speech processing has vast applications in voice dialing, telephone communication, call routing, domestic appliances control, Speech to Text conversion, Text to Speech conversion, lip synchronization, automation systems etc. Here we have discussed some mostly used feature extraction techniques like Mel frequency Cepstral Co-efficient (MFCC), Linear Predictive Coding (LPC) Analysis, Dynamic Time Wrapping (DTW), Relative Spectra Processing (RASTA) and Zero Crossings with Peak Amplitudes (ZCPA).Some parameters like RASTA and MFCC considers the nature of speech while it extracts the features, while LPC predicts the future features based on previous features.

...read moreread less

53 citations

Journal Article•DOI•

Multi-pitch Streaming of Harmonic Sound Mixtures

[...]

Zhiyao Duan¹, Jinyu Han², Bryan Pardo³•Institutions (3)

University of Rochester¹, Gracenote², Northwestern University³

01 Jan 2014-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A new constrained clustering algorithm is proposed that satisfies as many constraints as possible while optimizing the clustering objective, and is compared with other state-of-the-art supervised and unsupervised multi-pitch streaming approaches that are specifically designed for music or speech.

...read moreread less

Abstract: Multi-pitch analysis of concurrent sound sources is an important but challenging problem. It requires estimating pitch values of all harmonic sources in individual frames and streaming the pitch estimates into trajectories, each of which corresponds to a source. We address the streaming problem for monophonic sound sources. We take the original audio, plus frame-level pitch estimates from any multi-pitch estimation algorithm as inputs, and output a pitch trajectory for each source. Our approach does not require pre-training of source models from isolated recordings. Instead, it casts the problem as a constrained clustering problem, where each cluster corresponds to a source. The clustering objective is to minimize the timbre inconsistency within each cluster. We explore different timbre features for music and speech. For music, harmonic structure and a newly proposed feature called uniform discrete cepstrum (UDC) are found effective; while for speech, MFCC and UDC works well. We also show that timbre-consistency is insufficient for effective streaming. Constraints are imposed on pairs of pitch estimates according to their time-frequency relationships. We propose a new constrained clustering algorithm that satisfies as many constraints as possible while optimizing the clustering objective. We compare the proposed approach with other state-of-the-art supervised and unsupervised multi-pitch streaming approaches that are specifically designed for music or speech. Better or comparable results are shown.

...read moreread less

47 citations

Journal Article•DOI•

Separation and enhancement of gear and bearing signals for the diagnosis of wind turbine transmission systems

[...]

Nader Sawalhi¹, Robert B. Randall², David Forrester³•Institutions (3)

Prince Mohammad bin Fahd University¹, University of New South Wales², Defence Science and Technology Organisation³

01 May 2014-Wind Energy

34 citations

Proceedings Article•DOI•

Neuromuscular disease classification based on mel frequency cepstrum of motor unit action potential

[...]

Abul Barkat Mollah Sayeed Ud Doulah¹, Shaikh Anowarul Fattah¹•Institutions (1)

Bangladesh University of Engineering and Technology¹

10 Apr 2014

TL;DR: The proposed mel-frequency cepstral coefficient based feature extraction scheme is proposed for the classification of electromyography (EMG) signal into normal and a neuromuscular disease, namely the amyotrophic lateral sclerosis.

...read moreread less

Abstract: In this paper, mel-frequency cepstral coefficient (MFCC) based feature extraction scheme is proposed for the classification of electromyography (EMG) signal into normal and a neuromuscular disease, namely the amyotrophic lateral sclerosis (ALS). Instead of employing the MFCC directly on EMG data, it is employed on the motor unit action potentials (MUAPs) extracted from the EMG signal via template matching based decomposition technique. Unlike conventional MUAP based methods, only one MUAP with maximum dynamic range is selected for MFCC based feature extraction. First few MFCCs corresponding to the selected MUAP are used as the desired feature, which not only reduces computational burden but also offers better feature quality with high within class compactness and between class separation. For the purpose of classification, the K-nearest neighborhood (KNN) classifier is employed. Extensive analysis is performed on clinical EMG database and it is found that the proposed method provides a very satisfactory performance in terms of specificity, sensitivity, and overall classification accuracy.

...read moreread less

34 citations

Proceedings Article•

Sparse Cepstral, Phase Codes for Guitar Playing Technique Classification.

[...]

Li Su¹, Li-Fan Yu², Yi-Hsuan Yang¹•Institutions (2)

Academia Sinica¹, Center for Information Technology²

01 Jan 2014

TL;DR: A comparative study on the performance of features extracted from the magnitude spectrum, cepstrum and phase derivatives such as group-delay function (GDF) and instantaneous frequency deviation (IFD) for classifying the playing techniques of electric guitar recordings shows that sparse coding is an effective means of mining useful patterns from the primitive time-frequency representations.

...read moreread less

Abstract: Automatic recognition of guitar playing techniques is challenging as it is concerned with subtle nuances of guitar timbres. In this paper, we investigate this research problem by a comparative study on the performance of features extracted from the magnitude spectrum, cepstrum and phase derivatives such as group-delay function (GDF) and instantaneous frequency deviation (IFD) for classifying the playing techniques of electric guitar recordings. We consider up to 7 distinct playing techniques of electric guitar and create a new individual-note dataset comprising of 7 types of guitar tones for each playing technique. The dataset contains 6,580 clips and 11,928 notes. Our evaluation shows that sparse coding is an effective means of mining useful patterns from the primitive time-frequency representations and that combining the sparse representations of logarithm cepstrum, GDF and IFD leads to the highest average F-score of 71.7%. Moreover, from analyzing the confusion matrices we find that cepstral and phase features are particularly important in discriminating highly similar techniques such as pull-off, hammer-on and bending. We also report a preliminary study that demonstrates the potential of the proposed methods in automatic transcription of real-world electric guitar solos.

...read moreread less

32 citations

Proceedings Article•DOI•

Speech based emotion recognition using spectral feature extraction and an ensemble of kNN classifiers

[...]

Steven A. Rieger¹, Rajani Muraleedharan², Ravi P. Ramachandran¹•Institutions (2)

Rowan University¹, Saginaw Valley State University²

27 Oct 2014

TL;DR: Results show that the maximum gain in performance is achieved by using two kNNs as opposed to using a single kNN, and fusion is implicitly accomplished by ensemble classification.

...read moreread less

Abstract: Security (and cyber security) is an important issue in existing and developing technology. It is imperative that cyber security go beyond password based systems to avoid criminal activities. A human biometric and emotion based recognition framework implemented in parallel can enable applications to access personal or public information securely. The focus of this paper is on the study of speech based emotion recognition using a pattern recognition paradigm with spectral feature extraction and an ensemble of k nearest neighbor (kNN) classifiers. The five spectral features are the linear predictive cepstrum (CEP), mel frequency cepstrum (MFCC), line spectral frequencies (LSF), adaptive component weighted cepstrum (ACW) and the post-filter cepstrum (PFL). The bagging algorithm is used to train the ensemble of kNNs. Fusion is implicitly accomplished by ensemble classification. The LDC emotional prosody speech database is used in all the experiments. Results show that the maximum gain in performance is achieved by using two kNNs as opposed to using a single kNN.

...read moreread less

32 citations

Journal Article•DOI•

Adaptive wavelet shrinkage for noise robust speaker recognition

[...]

Sumithra Manimegalai Govindan¹, Prakash Duraisamy², Xiaohui Yuan²•Institutions (2)

Bannari Amman Institute of Technology, Sathy¹, University of North Texas²

01 Oct 2014-Digital Signal Processing

TL;DR: A robust speaker recognition method that employs a novel adaptive wavelet shrinkage method for noise suppression that exhibits great robustness in various noise conditions is proposed.

...read moreread less

30 citations

Proceedings Article•DOI•

UT-Vocal Effort II: Analysis and constrained-lexicon recognition of whispered speech

[...]

Shabnam Ghaffarzadegan¹, Hynek Boril¹, John H. L. Hansen¹•Institutions (1)

University of Texas at Dallas¹

04 May 2014

TL;DR: Several strategies involving front-end filter bank redistribution, cepstral dimensionality reduction, and lexicon expansion for alternative pronunciations are proposed to improve robustness of automatic speech recognition of whispered speech with neutral-trained acoustic models.

...read moreread less

Abstract: This study focuses on acoustic variations in speech introduced by whispering, and proposes several strategies to improve robustness of automatic speech recognition of whispered speech with neutral-trained acoustic models. In the analysis part, differences in neutral and whispered speech captured in the UT-Vocal Effort II corpus are studied in terms of energy, spectral slope, and formant center frequency and bandwidth distributions in silence, voiced, and unvoiced speech signal segments. In the part dedicated to speech recognition, several strategies involving front-end filter bank redistribution, cepstral dimensionality reduction, and lexicon expansion for alternative pronunciations are proposed. The proposed neutral-trained system employing redistributed filter bank and reduced features provides a 7.7% absolute WER reduction over the baseline system trained on neutral speech, and a 1.3% reduction over a baseline system with whisper-adapted acoustic models.

...read moreread less

Proceedings Article•DOI•

Heart rate estimation from FBG sensors using cepstrum analysis and sensor fusion.

[...]

Yongwei Zhu¹, Victor Foo Siang Fook¹, Emily Hao Jianzhong¹, Jayachandran Maniyeri¹, Cuntai Guan¹, Haihong Zhang¹, Eugene Phua Jiliang¹, Jit Biswas¹ - Show less +4 more•Institutions (1)

Agency for Science, Technology and Research¹

06 Nov 2014

TL;DR: The results show that the proposed fusion method can achieve promising heart rate measurement accuracy and robustness against various sensor contact conditions.

...read moreread less

Abstract: This paper presents a method of estimating heart rate from arrays of fiber Bragg grating (FBG) sensors embedded in a mat. A cepstral domain signal analysis technique is proposed to characterize Ballistocardiogram (BCG) signals. With this technique, the average heart beat intervals can be estimated by detecting the dominant peaks in the cepstrum, and the signals of multiple sensors can be fused together to obtain higher signal to noise ratio than each individual sensor. Experiments were conducted with 10 human subjects lying on 2 different postures on a bed. The estimated heart rate from BCG was compared with heart rate ground truth from ECG, and the mean error of estimation obtained is below 1 beat per minute (BPM). The results show that the proposed fusion method can achieve promising heart rate measurement accuracy and robustness against various sensor contact conditions.

...read moreread less

Journal Article•DOI•

Building a Cepstrum-HMM kernel for Apnea identification

[...]

Carlos M. Travieso¹, Jesús B. Alonso¹, Marcos del Pozo¹, Jaime R. Ticay¹, Germán Castellanos-Domínguez² - Show less +1 more•Institutions (2)

University of Las Palmas de Gran Canaria¹, National University of Colombia²

01 May 2014-Neurocomputing

TL;DR: An approach based on the transformation of the Cepstral domain on Hidden Markov Model, which is employed for the automatic diagnosis of the Obstructive Sleep Apnea syndrome, which includes an Electrocardiogram artefacts removal and R wave detection stage is presented.

...read moreread less

Journal Article•DOI•

BaNa: a noise resilient fundamental frequency detection algorithm for speech and music

[...]

Na Yang¹, He Ba¹, Weiyang Cai¹, Ilker Demirkol², Wendi Heinzelman¹ - Show less +1 more•Institutions (2)

University of Rochester¹, Polytechnic University of Catalonia²

01 Dec 2014-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A hybrid noise resilient F0 detection algorithm named BaNa that combines the approaches of harmonic ratios and Cepstrum analysis is presented that achieves the lowest Gross Pitch Error (GPE) rate among all the algorithms.

...read moreread less

Abstract: Fundamental frequency (F0) is one of the essential features in many acoustic related applications. Although numerous F0 detection algorithms have been developed, the detection accuracy in noisy environments still needs improvement. We present a hybrid noise resilient F0 detection algorithm named BaNa that combines the approaches of harmonic ratios and Cepstrum analysis. A Viterbi algorithm with a cost function is used to identify the F0 value among several F0 candidates. Speech and music databases with eight different types of additive noise are used to evaluate the performance of the BaNa algorithm and several classic and state-of-the-art F0 detection algorithms. Results show that for almost all types of noise and signal-to-noise ratio (SNR) values investigated, BaNa achieves the lowest Gross Pitch Error (GPE) rate among all the algorithms. Moreover, for the 0 dB SNR scenarios, the BaNa algorithm is shown to achieve 20% to 35% GPE rate for speech and 12% to 39% GPE rate for music. We also describe implementation issues that must be addressed to run the BaNa algorithm as a real-time application on a smartphone platform.

...read moreread less

Journal Article•DOI•

A novel modified cepstral based technique for blind estimation of motion blur

[...]

Ashwini M. Deshpande, Suprava Patnaik

01 Jan 2014-Optik

TL;DR: In this article, a modified cepstrum domain approach combined with bit-plane slicing method is proposed to estimate uniform motion blur parameters, which improves the accuracy without any manual intervention.

...read moreread less

Proceedings Article•DOI•

Reversed-Mel cepstrum based audio steganalysis

[...]

Hamzeh Ghasemzadeh¹, Meisam K. Arjmandi¹•Institutions (1)

Islamic Azad University¹

22 Dec 2014

TL;DR: An artificial ear is considered that has high resolution in high frequency region and low resolution where the frequency is low and can virtually hear effect of steganography and distinguish between stego and clean audio signals.

...read moreread less

Abstract: Some of the previous audio steganalysis systems have suggested features based on human auditory system models. In contrast, this paper exploits the idea of maximum deviation from human auditory system to suggest an efficient audio steganalysis scheme. Based on this idea, an artificial ear is considered that has high resolution in high frequency region and low resolution where the frequency is low. Simulation results show that this artificial ear can virtually hear effect of steganography and distinguish between stego and clean audio signals. Proposed method achieves accuracy of 93% (StegHide@1.563% BPB) and 97% (Hide4Pgp@6.25% BPB) which are 16% and 12% higher than previous MFCC based methods. KeywordsAudio Steganalysis, Audio Steganography, Mel Cepstrum, Reversed Mel Cepstrum, Human Auditory System.

...read moreread less

Journal Article•DOI•

Localization of a sound source in a noisy environment by hyperbolic curves in quefrency domain

[...]

Choon-Su Park¹, Jong Hoon Jeon¹, Yang-Hann Kim¹•Institutions (1)

KAIST¹

13 Oct 2014-Journal of Sound and Vibration

TL;DR: In this paper, the authors proposed the Minimum Variance Cepstrum (MVC) estimator to estimate the time difference of arrival (TDOA) of sound waves between microphones.

...read moreread less

Book Chapter•DOI•

Quantification of Linear and Non-linear Acoustic Analysis Applied to Voice Pathology Detection

[...]

Daria Panek¹, Andrzej Skalski¹, Janusz Gajda¹•Institutions (1)

AGH University of Science and Technology¹

01 Jan 2014

TL;DR: The goal and novelty of this work was the analysis of applicability of the parameters selectively used to assess the pathology.

...read moreread less

Abstract: Present development of digital registration and methods of recorded voice processing are useful in detection of most pathologies and diseases of a human vocal tract. The recognition of the voice condition requires the creation of a model which is comprised of different acoustic parameters of speech signal. In this study a vector consisting of 31 parameters for analysing the speech signal was created. The speech parameters were extracted from time, frequency and cepstral domains. Using Principal Components Analysis the number of the parameters was reduced to 17. In order to validate the detection of the pathological voice signal, a tenfold cross-validation and confusion matrix were used. The goal and novelty of this work was the analysis of applicability of the parameters selectively used to assess the pathology.

...read moreread less

Proceedings Article•DOI•

Physiologically-motivated feature extraction for speaker identification

[...]

Jianglin Wang¹, Michael T. Johnson¹•Institutions (1)

Marquette University¹

04 May 2014

TL;DR: Results on speaker identification using the YOHO corpus demonstrate that these physiologically-driven features are both more accurate than and complementary to traditional mel-frequency cepstral coefficients (MFCC).

...read moreread less

Abstract: This paper introduces the use of three physiologically-motivated features for speaker identification, Residual Phase Cepstrum Coefficients (RPCC), Glottal Flow Cepstrum Coefficients (GLFCC) and Teager Phase Cepstrum Coefficients (TPCC). These features capture speaker-discriminative characteristics from different aspects of glottal source excitation patterns. The proposed physiologically-driven features give better results with lower model complexities, and also provide complementary information that can improve overall system performance even for larger amounts of data. Results on speaker identification using the YOHO corpus demonstrate that these physiologically-driven features are both more accurate than and complementary to traditional mel-frequency cepstral coefficients (MFCC). In particular, the incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks.

...read moreread less

Proceedings Article•DOI•

A new pitch detection scheme based on ACF and AMDF

[...]

Sandeep Kumar¹, Sumantra Bhattacharya¹, Premanand Patel¹•Institutions (1)

Indian Institute of Technology Dhanbad¹

08 May 2014

TL;DR: Results for different speech signals show that this new method of pitch detection is the best in terms of speech quality and computational complexity.

...read moreread less

Abstract: A new pitch detection scheme has been proposed based on the short-time autocorrelation function (ACF) and average magnitude difference function (AMDF). The performance of the proposed scheme has been evaluated, through simulation, in a complete speech analysis-synthesis system. For detection of pitch, local maxima of ACF and local minima of AMDF values are computed. To reduce computational complexity, the original speech signal is converted into a three level signal before computing ACF and AMDF. Synthesized speech quality, computational complexity and time taken during simulation are the parameters that have been considered while comparing this system with the analysis-synthesis systems that use autocorrelation, cepstrum and wavelet based pitch detection methods. Results for different speech signals show that this new method of pitch detection is the best in terms of speech quality and computational complexity.

...read moreread less

Proceedings Article•DOI•

Sparse cepstral codes and power scale for instrument identification

[...]

Li-Fan Yu¹, Li Su¹, Yi-Hsuan Yang¹•Institutions (1)

Center for Information Technology¹

04 May 2014

TL;DR: This paper presents a novel feature representation called sparse cepstral codes for instrument identification using the use of sparse coding and power normalization to derive compact codes that better represent the information of the cepstrum.

...read moreread less

Abstract: This paper presents a novel feature representation called sparse cepstral codes for instrument identification. We first motivate the approach by discussing why cepstrum is suitable for instrument identification. Then we propose the use of sparse coding and power normalization to derive compact codes that better represent the information of the cepstrum. Our evaluation on both uni-source and multi-source instrument identification tasks show that the proposed feature leads to significantly better accuracy than existing methods. We further show that cepstrum obtained from power-scaled spectrum can do better than typical cepstrum especially in multi-source signal. The proposed system achieves 0.955 F-score in uni-source dataset and 0.688 F-score in multi-source dataset.

...read moreread less

Journal Article•DOI•

Complex Cepstrum Based Voice Conversion Using Radial Basis Function

[...]

Jagannath Nirmal¹, Suprava Patnaik¹, Mukesh A. Zaveri¹, Pramod Kachare¹•Institutions (1)

Sardar Vallabhbhai National Institute of Technology, Surat¹

06 Feb 2014-International Scholarly Research Notices

TL;DR: The evaluation measures reveal that the proposed complex cepstrum based voice conversion system approximate the converted speech signal with better accuracy than the model based on the Mel cepStrum envelope based voice Conversion model with objective and subjective evaluations.

...read moreread less

Abstract: The complex cepstrum vocoder is used to modify the speaker specific characteristics of the source speaker speech to that of the target speaker speech. The low time and high time liftering are used to split the calculated cepstrum into the vocal tract and the source excitation parameters. The obtained mixed phase vocal tract and source excitation parameters with finite impulse response preserve the phase properties of the resynthesized speech frame. The radial basis function is explored to capture the nonlinear mapping function for modifying the complex cepstrum based real and imaginary components of the vocal tract and source excitation of the speech signal. The state-of-the-art Mel cepstrum envelope and the fundamental frequency () are considered to represent the vocal tract and the source excitation of the speech frame, respectively. Radial basis function is used to capture and formulate the nonlinear relations between the Mel cepstrum envelope of the source and target speakers. Mean and standard deviation approach is employed to modify the fundamental frequency (). The Mel log spectral approximation filter is used to reconstruct the speech signal from the modified Mel cepstrum envelope and fundamental frequency. A comparison of the proposed complex cepstrum based model has been made with the state-of-the-art Mel Cepstrum Envelope based voice conversion model with objective and subjective evaluations. The evaluation measures reveal that the proposed complex cepstrum based voice conversion system approximate the converted speech signal with better accuracy than the model based on the Mel cepstrum envelope based voice conversion.

...read moreread less

Proceedings Article•DOI•

Classification of Indian Classical Instruments Using Spectral and Principal Component Analysis Based Cepstrum Features

[...]

Sneha Gaikwad¹, Abhijit Chitre¹, Yogesh H. Dandawate¹•Institutions (1)

Vishwakarma Institute of Information Technology¹

09 Jan 2014

TL;DR: The proposed work presents automatic classification of Indian Classical instruments based on spectral and MFCC features using well trained back propogation neural network classifier using Principal Component Analysis.

...read moreread less

Abstract: In applications such as music information and database retrieval systems, classification of musical instruments plays an important role. The proposed work presents automatic classification of Indian Classical instruments based on spectral and MFCC features using well trained back propogation neural network classifier. Musical instruments such as Harmonium, Santo or and Tabla are considered for an experimentation. The spectral features such as amplitude and spectral range along with Mel Frequency Cepstrum Coefficients are considered as features. Being features are not distinguished, classification is done using non parametric classifiers such as neural networks. Being number of cepstrum coefficients are large important coefficients are selected using Principal Component Analysis. It has been observed that using 42 samples for training and 18 for testing, back propogation neural network provides accuracy of 98%. The present work can be extended for more number of Hindustani and Carnitic classical musical Instruments.

...read moreread less

Book Chapter•DOI•

Cepstral Removal of Periodic Spectral Components from Time Signals

[...]

Robert B. Randall¹, Nader Sawalhi²•Institutions (2)

University of New South Wales¹, Prince Mohammad bin Fahd University²

01 Jan 2014

TL;DR: In this article, the use of the cepstrum for removing components from a signal which manifest themselves as periodic spectral components has been described, including discrete frequency components with uniform spacing such as families of harmonics and modulation sidebands, but also narrow band noise peaks coming from slight random modulation of almost periodic signals.

...read moreread less

Abstract: The use of the cepstrum for removing components from a signal which manifest themselves as periodic spectral components has previously been described. These include discrete frequency components with uniform spacing such as families of harmonics and modulation sidebands, but also narrow band noise peaks coming from slight random modulation of almost periodic signals, such as higher harmonics of blade pass frequencies. The removal is effected by applying a notch “lifter” to the real cepstrum of the signal, thus removing the targeted components from the log amplitude spectrum, and then combining the modified amplitude spectrum with the original phase spectrum. Not much attention was previously paid to the type of notch lifter, but two different situations occurring in conjunction with analysis of signals from wind turbines showed that different lifters have advantages in different situations. This chapter describes two different approaches, illustrating them with the two examples of application.

...read moreread less

Journal Article•DOI•

The Utility of Perturbation, Non-linear dynamic, and Cepstrum measures of dysphonia according to Signal Typing

[...]

Seong Hee Choi, Chul-Hee Choi

30 Sep 2014

Proceedings Article•DOI•

A novel cepstral representation for timbre modeling of sound sources in polyphonic mixtures

[...]

Zhiyao Duan¹, Bryan Pardo², Laurent Daudet³•Institutions (3)

University of Rochester¹, Northwestern University², Paris Diderot University³

04 May 2014

TL;DR: The proposed uniform discrete cepstrum (UDC) uses a more natural and locally adaptive regularizer to prevent it from overfitting the isolated spectral points and significantly outperform all the other cepstral representations.

...read moreread less

Abstract: We propose a novel cepstral representation called the uniform discrete cepstrum (UDC) to represent the timbre of sound sources in a sound mixture. Different from ordinary cepstrum and MFCC which have to be calculated from the full magnitude spectrum of a source after source separation, UDC can be calculated directly from isolated spectral points that are likely to belong to the source in the mixture spectrum (e.g., non-overlapping harmonics of a harmonic source). Existing cepstral representations that have this property are discrete cepstrum and regularized discrete cepstrum, however, compared to the proposed UDC, they are not as effective and are more complex to compute. The key advantage of UDC is that it uses a more natural and locally adaptive regularizer to prevent it from overfitting the isolated spectral points. We derive the mathematical relations between these cepstral representations, and compare their timbre modeling performances in the task of instrument recognition in polyphonic audio mixtures. We show that UDC and its mel-scale variant MUDC significantly outperform all the other representations.

...read moreread less

Proceedings Article•DOI•

Robust Feature Extractors for Continuous Speech Recognition

[...]

Md. Jahangir Alam, Patrick Kenny, Pierre Dumouchel, Douglas D. O'Shaughnessy

13 Nov 2014

TL;DR: Experimental results depict that the RMFCC and low-variance spectrum-estimators-based robust feature extractors outperformed the MFCC, PNCC (power normalized cepstral coefficients), and ETSI-AFE features both in clean and multi-condition training conditions.

...read moreread less

Abstract: This paper presents robust feature extractors for a continuous speech recognition task in matched and mismatched environments. The mismatched conditions may occur due to additive noise, different channel, and acoustic reverberation. In the conventional Mel-frequency cepstral coefficient (MFCC) feature extraction framework, a subband spectrum enhancement technique is incorporated to improve its robustness. We denote this front-end as robust MFCCs (RMFCC). Based on the gammatone and compressive gammachirp filter-banks, robust gammatone filterbank cepstral coefficients (RGFCC) and robust compressive gammachirp filterbank cepstral coefficients (RCGCC) are also presented for comparison. We also employ low-variance spectrum estimators such as multitaper, regularized minimum-variance distortionless response (RMVDR), instead of a discrete Fourier transform-based direct spectrum estimator for improving robustness against mismatched environments. Speech recognition performances of the robust feature extractors are evaluated in clean as well as multi-style training conditions of the AURORA-4 continuous speech recognition task. Experimental results depict that the RMFCC and low-variance spectrum-estimators-based robust feature extractors outperformed the MFCC, PNCC (power normalized cepstral coefficients), and ETSI-AFE features both in clean and multi-condition training conditions.

...read moreread less

Journal Article•DOI•

K\"ahlerian information geometry for signal processing

[...]

Jaehyung Choi, Andrew P. Mullhaupt

08 Apr 2014-arXiv: Differential Geometry

TL;DR: In this paper, it was shown that the information geometry of a minimum-phase linear system with a finite complex cepstrum norm corresponds to the Kahler potential, and that the Hermitian structure of the manifold is explicitly emergent if and only if the impulse response function of the highest degree in $z$ is constant in model parameters.

...read moreread less

Abstract: We prove the correspondence between the information geometry of a signal filter and a Kahler manifold. The information geometry of a minimum-phase linear system with a finite complex cepstrum norm is a Kahler manifold. The square of the complex cepstrum norm of the signal filter corresponds to the Kahler potential. The Hermitian structure of the Kahler manifold is explicitly emergent if and only if the impulse response function of the highest degree in $z$ is constant in model parameters. The Kahlerian information geometry takes advantage of more efficient calculation steps for the metric tensor and the Ricci tensor. Moreover, $\alpha$-generalization on the geometric tensors is linear in $\alpha$. It is also robust to find Bayesian predictive priors, such as superharmonic priors, because Laplace-Beltrami operators on Kahler manifolds are in much simpler forms than those of the non-Kahler manifolds. Several time series models are studied in the Kahlerian information geometry.

...read moreread less

Journal Article•DOI•

A novel method for identifying rotor-stator rubbing positions using the cepstrum analysis technique

[...]

Guo Chen¹, Yong-Quan Liu², Guangyi Jiang², Cheng-Gang Li², Guo-Quan Feng², Deyou Wang² - Show less +2 more•Institutions (2)

Nanjing University of Aeronautics and Astronautics¹, Aviation Industry Corporation of China²

20 Sep 2014-Journal of Mechanical Science and Technology

TL;DR: In this paper, a method of aero-engine rubbing positions identification based on cepstrum analysis is proposed, and the transfer path characteristics which reflect the transfer characteristics information from rubbing points to casing measuring points are separated from the vibration acceleration signals of casing by means of cepstrate analysis.

...read moreread less

Abstract: A novel method of aero-engine rubbing positions identification based on cepstrum analysis is proposed, and the transfer path characteristics which reflect the transfer characteristics information from rubbing points to casing measuring points are separated from the vibration acceleration signals of casing by means of cepstrum analysis. Therefore, there is different transfer characteristics information at different rubbing positions, and in view of this, twenty rubbing positions identification features from the cepstrum are extracted. A large number of rubbing experiments of different positions are simulated with the rotor experiment rig of aero-engine, and the characteristic analysis of experimental samples at different rubbing positions is carried out, and the results indicate the consistency of features to the same rubbing position and the difference of the features to the different rubbing positions. Finally, the aero-engine rubbing positions identification is carried out using the nearest neighbor classification method, the recognition rate reaches 100%, and the effectiveness of the method is full verified.

...read moreread less

Proceedings Article•DOI•

Formant estimation of speech and singing voice by combining wavelet with LPC and Cepstrum techniques

[...]

Deepali Y. Loni, Shaila Subbaraman¹•Institutions (1)

Walchand College of Engineering, Sangli¹

01 Dec 2014

TL;DR: The paper compares the formant structures of speech and singing, revealing the well-known difference, the presence of an additional formant in singing, called the singing formant, at frequencies between 2500-3000 Hz.

...read moreread less

Abstract: Formants are the frequency parts of speech and singing signal those closely describe the human vocal tract geometry. Considering the growing importance of the formants, they are considered the important subject of many work. In this sense, we present two techniques for the estimation of formants, one combining Wavelet with Linear Predictive Coding (LPC) and the other combining Wavelet with Cepstral analysis. The proposed approaches uses multi-resolution analysis of wavelet transform to accurately extract the formants. The proposed techniques were tested on corpus of [a],[e],[i],[o],[u] vowels to extract speech formants, whereas singing formants were analyzed using capella singing voice of trained singers. The paper compares the formant structures of speech and singing, revealing the well-known difference, the presence of an additional formant in singing, called the singing formant, at frequencies between 2500–3000 Hz. The experimental results show the superiority of the proposed techniques in extracting formants over the conventional methods like LPC and Cepstrum.

...read moreread less

Collapse