scispace - formally typeset
Search or ask a question

Showing papers on "Cepstrum published in 2004"


Journal ArticleDOI
TL;DR: This article narrates the historical and mathematical background that led to the invention of the term cepstrum and describes how the term has survived and has become part of the digital signal processing lexicon.
Abstract: The idea of the log spectrum or cepstral averaging has been useful in many applications such as audio processing, speech processing, speech recognition, and echo detection for the estimation and compensation of convolutional distortions. To suggest what prompted the invention of the term cepstrum, this article narrates the historical and mathematical background that led to its discovery. The computations of earlier simple echo representations have shown that the spectrum representation domain results does not belong in the frequency or time domain. Bogert et al. (1963) chose to refer to it as quefrency domain and later termed the spectrum of the log of a time waveform as the cepstrum. The article also recounts the analysis of Al Oppenheim in relation to the cepstrum. It was in his theory for nonlinear signal processing, referred to as homomorphic systems, that the realization of the characteristic system of homomorphic convolution was reminiscent of the cepstrum. To retain both the relationship to the work of Bogart et al. and the distinction, the term power cepstrum was eventually applied to the nonlinear mapping in homomorphic deconvolution . While most of the terms in the glossary have faded into the background, the term cepstrum has survived and has become part of the digital signal processing lexicon.

376 citations


Journal ArticleDOI
TL;DR: The Learning Vector quantization methodology demonstrated to be more reliable than the multilayer perceptron architecture yielding 96% frame accuracy under similar working conditions.
Abstract: It is well known that vocal and voice diseases do not necessarily cause perceptible changes in the acoustic voice signal. Acoustic analysis is a useful tool to diagnose voice diseases being a complementary technique to other methods based on direct observation of the vocal folds by laryngoscopy. Through the present paper two neural-network based classification approaches applied to the automatic detection of voice disorders will be studied. Structures studied are multilayer perceptron and learning vector quantization fed using short-term vectors calculated accordingly to the well-known Mel Frequency Coefficient cepstral parameterization. The paper shows that these architectures allow the detection of voice disorders-including glottic cancer-under highly reliable conditions. Within this context, the Learning Vector quantization methodology demonstrated to be more reliable than the multilayer perceptron architecture yielding 96% frame accuracy under similar working conditions.

250 citations


Proceedings ArticleDOI
17 May 2004
TL;DR: Results show that the cepstral features derived from the power spectrum perform better than that from the MGDF, and the product spectrum based features provide the best performance.
Abstract: Mel-frequency cepstral coefficients (MFCCs) are the most widely used features for speech recognition. These are derived from the power spectrum of the speech signal. Recently, the cepstral features derived from the modified group delay function (MGDF) have been studied by Murthy and Gadde (Proc. ICASSP, vol.1, p.68-71, 2003) for speech recognition. In this paper, we propose to use the product of the power spectrum and the group delay function (GDF), and derive the MFCCs from the product spectrum. This spectrum combines the information from the magnitude spectrum as well as the phase spectrum. The MFCCs of the MGDF are also investigated in this paper. Results show that the cepstral features derived from the power spectrum perform better than that from the MGDF, and the product spectrum based features provide the best performance.

97 citations


Journal ArticleDOI
TL;DR: A new algorithm for statistical speech feature enhancement in the cepstral domain is presented, which demonstrates significant improvement in noise-robust recognition accuracy by incorporating the joint prior for both static and dynamic parameter distributions in the speech model.
Abstract: In this paper, we present a new algorithm for statistical speech feature enhancement in the cepstral domain. The algorithm exploits joint prior distributions (in the form of Gaussian mixture) in the clean speech model, which incorporate both the static and frame-differential dynamic cepstral parameters. Full posterior probabilities for clean speech given the noisy observation are computed using a linearized version of a nonlinear acoustic distortion model, and, based on this linear approximation, the conditional minimum mean square error (MMSE) estimator for the clean speech feature is derived rigorously using the full posterior. The final form of the derived conditional MMSE estimator is shown to be a weighted sum of three separate terms, and the sum is weighted again by the posterior for each of the mixture component in the speech model. The first of the three terms is shown to arrive naturally from the predictive mechanism embedded in the acoustic distortion model in absence of any prior information. The remaining two terms result from the speech model using only the static prior and only the dynamic prior, respectively. Comprehensive experiments are carried out using the Aurora2 database to evaluate the new algorithm. The results demonstrate significant improvement in noise-robust recognition accuracy by incorporating the joint prior for both static and dynamic parameter distributions in the speech model, compared with using only the static or dynamic prior and with using no prior.

89 citations


Journal ArticleDOI
TL;DR: In this paper, a model of signals generated by an accelerometer sensor is established on which the theoretical expression for the power cepstrum is partially calculated, which makes it possible to develop an indicator which is little affected by the signal amplitude, the signal-to-noise ratio or the position of the sensor.

73 citations


Proceedings ArticleDOI
27 Sep 2004
TL;DR: In this article, the authors compared several features of vibration signals as indicators of broken rotor bar of a 35 kW induction motor with regular fast Fourier transform (FFT) based power spectrum density (PSD) estimation.
Abstract: Vibration monitoring is studied for fault diagnostics of an induction motor. Several features of vibration signals are compared as indicators of broken rotor bar of a 35 kW induction motor. Regular fast Fourier transform (FFT) based power spectrum density (PSD) estimation is compared to signal processing with higher order spectra (HOS), cepstrum analysis and signal description with autoregressive (AR) modelling. The fault detection routine and feature comparison is carried out with support vector machine (SVM) based classification. The best method for feature extraction seems to be the application of AR coefficients. The result is found out with real measurement data from several motor conditions and load situations.

60 citations


Journal ArticleDOI
TL;DR: A subband-based group delay approach to segment spontaneous speech into syllable-like units using the additiveproperty of the Fourier transform phase and the deconvolution property of the cepstrum to smooth the STE function of the speech signal and make it suitable for syllable boundary detection.
Abstract: In the development of a syllable-centric automatic speech recognition (ASR) system, segmentation of the acoustic signal into syllabic units is an important stage. Although the short-term energy (STE) function contains useful information about syllable segment boundaries, it has to be processed before segment boundaries can be extracted. This paper presents a subband-based group delay approach to segment spontaneous speech into syllable-like units. This technique exploits the additive property of the Fourier transform phase and the deconvolution property of the cepstrum to smooth the STE function of the speech signal and make it suitable for syllable boundary detection. By treating the STE function as a magnitude spectrum of an arbitrary signal, a minimum-phase group delay function is derived. This group delay function is found to be a better representative of the STE function for syllable boundary detection. Although the group delay function derived from the STE function of the speech signal contains segment boundaries, the boundaries are difficult to determine in the context of long silences, semivowels, and fricatives. In this paper, these issues are specifically addressed and algorithms are developed to improve the segmentation performance. The speech signal is first passed through a bank of three filters, corresponding to three different spectral bands. The STE functions of these signals are computed. Using these three STE functions, three minimum-phase group delay functions are derived. By combining the evidence derived from these group delay functions, the syllable boundaries are detected. Further, a multiresolution-based technique is presented to overcome the problem of shift in segment boundaries during smoothing. Experiments carried out on the Switchboard and OGI-MLTS corpora show that the error in segmentation is at most 25 milliseconds for 67% and 76.6% of the syllable segments, respectively.

43 citations


Proceedings ArticleDOI
17 May 2004
TL;DR: A novel feature extraction method in the form of a typical conventional feature, Mel frequency cepstral coefficients, but a flexible segmentation to reduce spectral mismatch between training and testing processes is presented.
Abstract: The paper presents a novel feature extraction method to improve the performance of speaker identification systems. The proposed feature has the form of a typical conventional feature, Mel frequency cepstral coefficients (MFCC), but a flexible segmentation to reduce spectral mismatch between training and testing processes. Specifically, the length and shift size of the analysis frame are determined by a pitch synchronous method, pitch synchronous MFCC (PSMFCC). To verify the performance of the new feature, we measure the cepstral distortion between training and testing and also perform closed set speaker identification tests. With text-independent and text-dependent experiments, the proposed algorithm provides 44.3% and 26.7% relative improvement, respectively.

41 citations


Book ChapterDOI
TL;DR: Empirical experiments carried out on the NIST2001 database suggest that SSCs are somewhat more robust compared to conventional MFCC and LFCC features as well as being partially complementary.
Abstract: Most conventional features used in speaker authentication are based on estimation of spectral envelopes in one way or another, e.g., Mel-scale Filterbank Cepstrum Coefficients (MFCCs), Linear-scale Filterbank Cepstrum Coefficients (LFCCs) and Relative Spectral Perceptual Linear Prediction (RASTA-PLP). In this study, Spectral Subband Centroids (SSCs) are examined. These features are the centroid frequency in each subband. They have properties similar to formant frequencies but are limited to a given subband. Empirical experiments carried out on the NIST2001 database using SSCs, MFCCs, LFCCs and their combinations by concatenation suggest that SSCs are somewhat more robust compared to conventional MFCC and LFCC features as well as being partially complementary.

35 citations


PatentDOI
TL;DR: In this paper, a method and apparatus estimate additive noise in a noisy signal using incremental Bayes learning, where a time-varying noise prior distribution is assumed and hyperparameters (mean and variance) are updated recursively using an approximation for posterior computed at the preceding time step.
Abstract: A method and apparatus estimate additive noise in a noisy signal using incremental Bayes learning, where a time-varying noise prior distribution is assumed and hyperparameters (mean and variance) are updated recursively using an approximation for posterior computed at the preceding time step. The additive noise in time domain is represented in the log-spectrum or cepstrum domain before applying incremental Bayes learning. The results of both the mean and variance estimates for the noise for each of separate frames are used to perform speech feature enhancement in the same log-spectrum or cepstrum domain.

29 citations


Proceedings Article
01 Jan 2004
TL;DR: A new signal processing technique, “specmurt anasylis,” is proposed that provides piano-rolllike visual display of multi-tone signals (e.g., polyphonic music) using specmurt filreting instead of quefrency alanysis using cepstrum liftering.
Abstract: In this paper, we propose a new signal processing technique, “specmurt anasylis,” that provides piano-rolllike visual display of multi-tone signals (e.g., polyphonic music). Specmurt is defined as inverse Fourier transform of linear spectrum with logarithmic frequency, unlike familiar cepstrum defined as inverse Fourier transform of logarithmic spectrum with linear frequency. We apply this technique to music signals frencyque anasylis using specmurt filreting instead of quefrency alanysis using cepstrum liftering. Suppose that each sound contained in the multi-pitch signal has exactly the same harmonic structure pattern (i.e., the energy ratio of harmonic components), in logarithmic frequency domain the overall shape of the multi-pitch spectrum is a superposition of the common spectral patterns with different degrees of parallel shift. The overall shape can be expressed as a convolution of a fundamental frequency pattern (degrees of parallel shift and power) and the common harmonic structure pattern. The fundamental frequency pattern is restored by division of the inverse Fourier transform of a given log-frequency spectrum, i.e., specmurt, by that of the common harmonic structure pattern. The proposed method was successfully tested on several pieces of music recordings.

Proceedings ArticleDOI
12 Jul 2004
TL;DR: The result shows that the first MFCC degrades the identification competence and statistical distribution parameters enhance the training speed of the neural network.
Abstract: This paper presents a study on the effectiveness of mel-frequency cepstrum coefficients (MFCCs) and some of their statistical distribution properties (skewness, kurtosis, standard deviation) as the features for text-dependent speaker identification. Multi-layer neural network with backpropagation learning algorithm is used here as the classification tool. The MFCCs representing the speaker characteristics of a speech segment are computed by nonlinear filterbank analysis and discrete cosine transform. The speaker identification efficiency and the convergence speed of the neural network are investigated for different combinations of the proposed features. The result shows that the first MFCC degrades the identification competence and statistical distribution parameters enhance the training speed of the neural network.

Journal ArticleDOI
TL;DR: The capacity of APT-based speaker adaptation to achieve word error rate reductions superior to those obtained with other popular adaptation techniques, and moreover, reductions that are additive with those provided by VTLN are demonstrated.

Proceedings ArticleDOI
M. Graciarena1, H. Franco1, Jing Zheng1, D. Vergyri1, A. Stolcke1 
17 May 2004
TL;DR: This work augments the Mel cepstral feature representation with voicing features from an independent front end and computed the normalized autocorrelation peak and a newly proposed entropy of the high-order cepstrum to integrate the voicing features into SRI's DECIPHER system.
Abstract: We augment the Mel cepstral (MFCC) feature representation with voicing features from an independent front end. The voicing feature front end parameters are optimized for recognition accuracy. The voicing features computed are the normalized autocorrelation peak and a newly proposed entropy of the high-order cepstrum. We explored several alternatives to integrate the voicing features into SRI's DECIPHER system. Promising early results were obtained in a simple system concatenating the voicing features with MFCC features and optimizing the voicing feature window duration. Best results overall came from a more complex system combining a multiframe voicing feature window with the MFCC plus third differential features using linear discriminant analysis and optimizing the number of voicing feature frames. The best integration approach from the single-pass system experiments was implemented in a multi-pass system for large vocabulary testing on the Switchboard database. An average WER reduction of 2% relative was obtained on the NIST Hub-5 dev2001 and eval2002 databases.

Journal ArticleDOI
TL;DR: Cepstrum analysis enables a more lucid interpretation of signals generated in the cavitation process than the universally used Fourier analysis and proved the existence of irregularities, hardly visible in the classical Fourier spectrum.

Proceedings ArticleDOI
02 May 2004
TL;DR: An approach to speaker identification which jointly exploits vocal tract and glottis source information is proposed, which is proposed to model the source information by a Gaussian mixture model (GMM rather than the uniform probabilistic model.
Abstract: Recently, we proposed an approach to speaker identification which jointly exploits vocal tract and glottis source information. The approach synchronously takes into account the correlation between the two sources of information. The proposed theoretical model, which uses a joint law, is presented. Some restrictions and simplifications are taken into account to show the significance of this approach in practical way. The fundamental frequency and MFCCs (Mel frequency cepstrum coefficients) are used to represent the information of the source and the vocal tract, respectively. The probability density of the source, in particular, was considered to obey a uniform law. Tests were carried out with only female speakers from a speech telephony database (SPIDRE) recorded from various telephone handsets. It is proposed to model the source information by a Gaussian mixture model (GMM) rather than the uniform probabilistic model. Tests were extended to all speakers of the SPIDRE database; four systems were proposed and compared. The first is a baseline system based on the MFCC and does not use any information from the source. The second examines only the voiced segments of the vocal signal. The last two relate to the suggested approaches according to the two techniques. The source information is found to follow a normal distribution in one technique and a log normal distribution in the other. With the proposed approach, the gain in performance is 10.5% for women, 7% for men and 8% for all speakers.

Journal ArticleDOI
TL;DR: This work presents a new alternative based on the time-domain phase analysis of the received signals that works quite well with saturated signals resulting when a high gain is applied to detect small flaws, and can be easily hardware implemented for real-time processing.

Journal ArticleDOI
TL;DR: A hybrid approach based on the marginalisation and the soft decision techniques that make use of the Mel-frequency cepstral coefficients (MFCCs) instead of filter bank coefficients is proposed.
Abstract: Filter bank is the most common feature being employed in the research of the marginalisation approaches for robust speech recognition due to its simplicity in detecting the unreliable data in the frequency domain. In this paper, we propose a hybrid approach based on the marginalisation and the soft decision techniques that make use of the Mel-frequency cepstral coefficients (MFCCs) instead of filter bank coefficients. A new technique for estimating the reliability of each cepstral component is also presented. Experimental results show the effectiveness of the proposed approaches.

Book ChapterDOI
TL;DR: A new watermark embedding technique is introduced that combines frequency hopping spread spectrum (FHSS) and frequency masking (FM) techniques and is experimentally concluded that the proposed technique performs fairly well and is robust to MP3 compression.
Abstract: This study investigates the performances of a variety of audio watermarking techniques designed in time, frequency and cepstrum domains. A framework of comparison is performed with respect to bit error rate (BER), objective and subjective perceptual quality, computational complexity and robustness to signal processing such as low-pass filtering, requantization and MPEG Layer 3 (MP3) compression. It is observed that the cepstrum domain technique is superior to other techniques in terms of almost all criteria. Additionally, a new watermark embedding technique is introduced that combines frequency hopping spread spectrum (FHSS) and frequency masking (FM) techniques. It is experimentally concluded that the proposed technique performs fairly well and is robust to MP3 compression.

Proceedings ArticleDOI
17 May 2004
TL;DR: An algorithm was developed to separate pistachio nuts with closed shells from those with open shells by linearly combining feature vectors from both Mel cepstrum and PCA feature vectors, and the accuracy of closed shell nuts was more than 99% on the test set.
Abstract: An algorithm was developed to separate pistachio nuts with closed shells from those with open shells It was observed that upon impact on a steel plate, nuts with closed shells emit different sounds than nuts with open shells Two feature vectors extracted from the sound signals were Mel cepstrum coefficients and eigenvalues obtained from the principle component analysis of the autocorrelation matrix of the signals Classification of a sound signal was done by linearly combining feature vectors from both Mel cepstrum and PCA feature vectors An important property of the algorithm is that it is easily trainable During the training phase, sounds of the nuts with closed shells and open shells were used to obtain a representative vector of each class The accuracy of closed shell nuts was more than 99% on the test set

Journal ArticleDOI
Y.J. Kim1, J.H. Chung1
TL;DR: In this article, a pitch synchronous cepstrum (PSC) was proposed for robust speaker recognition over telephone channels, which analyses consecutive pitch periods to compensate for spectrum distortion and broadens formants to minimize loss of speaker characteristics for channel normalisation.
Abstract: A new method to extract pitch synchronous cepstrum (PSC) for robust speaker recognition over telephone channels is proposed. The proposed method analyses consecutive pitch periods to compensate for spectrum distortion. It also broadens formants to minimise loss of speaker characteristics for channel normalisation. Compared to the conventional cepstrum, PSC shows an error reduction rate of up to 10.9%.

Proceedings ArticleDOI
23 Aug 2004
TL;DR: A new approach to two-dimensional (2D) blind deconvolution of ultrasonic images with stable results of clearly higher spatial resolution and better defined tissue structures than in the input images.
Abstract: The paper presents a new approach to two-dimensional (2D) blind deconvolution of ultrasonic images. Homomorphic deconvolution, so far the most successful method in this field, is based on the assumption that the point spread function (PSF) and the tissue signal lie in different bands of the cepstrum domain, which is not completely true. Furthermore, 2D phase unwrapping is necessary in 2D homomorphic mapping, which is an ill-posed and noise-sensitive problem. Here both limitations are avoided using blind iterative deconvolution namely Van Cittert algorithm with reblurring. Simplified homomorphic deconvolution is used only for initial estimation. The algorithm is applied to the whole radiofrequency image, meaning that only the global spatially invariant component of the PSF is removed. Tests on synthetic and clinical images have shown that the deconvolution gives stable results of clearly higher spatial resolution and better defined tissue structures than in the input images.

Proceedings ArticleDOI
08 Dec 2004
TL;DR: This paper presents a technique for formant estimation using cepstral envelope analysis, which relies on decomposing the speech signal into two components and localizing the spectral maxima from the smoothed envelope.
Abstract: This paper presents a technique for formant estimation using cepstral envelope analysis. The presumed method which computes cepstrum has been implemented with Matlab and was applied to the problem of accurate measurement of formant frequencies. The conceived algorithm picks formant frequencies from the smoothed spectrum. The approach relies on decomposing the speech signal into two components: the first component presents the excitation, while the second component is intended to present vocal tract resonances. Such procedure was then achieved by applying the homomorphic deconvolution to the treated speech signal. The obtained result, i.e the cepstrum, was then used to estimate the smoothed spectrum. Formant picking is achieved by localizing the spectral maxima from the smoothed envelope. Results showed that there is a wide range in the estimated values of formant frequencies for male and female speakers. Such cepstral method evaluation confirms the limitation of the use of this technique in the estimation of formant frequencies.

Journal ArticleDOI
TL;DR: A filtering method in log‐spectral domain corresponding to the cepstral liftering effect is derived and it is shown that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature.
Abstract: We propose a novel feature processing technique which can provide a cepstral liftering effect in the log-spectral domain. Cepstral liftering aims at the equalization of variance of cepstral coefficients for the distance-based speech recognizer, and as a result, provides the robustness for additive noise and speaker variability. However, in the popular hidden Markov model based framework, cepstral liftering has no effect in recognition performance. We derive a filtering method in log-spectral domain corresponding to the cepstral liftering. The proposed method performs a high-pass filtering based on the decorrelation of filter-bank energies. We show that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature.

Journal ArticleDOI
TL;DR: It is shown in the paper that the proposed method has less variance than the maximum frequency method, and an upper bound for the variance reduction when practical criteria are applied for fitting the cepstrum cut-off frequency.

01 Jun 2004
TL;DR: The experimental result shows that the method estimates spectral envelopes with the highest accuracy when the cepstral order is 48-64, which suggests that the higherorder coefficients are required to represent detailed envelopes reflecting the real vocal-tract responses.
Abstract: This paper introduces a novel articulatory-acoustic mapping in which detailed spectral envelopes are estimated based on the cepstrum, inclusive of the high-quefrency elements which are discarded in conventional speech synthesis to eliminate the pitch component of speech. For this estimation, the method deals with the harmonics of multiple voiced-speech spectra so that several sets of harmonics can be obtained at various pitch frequencies to form a spectral envelope. The experimental result shows that the method estimates spectral envelopes with the highest accuracy when the cepstral order is 48-64, which suggests that the higherorder coefficients are required to represent detailed envelopes reflecting the real vocal-tract responses.

Proceedings ArticleDOI
04 Oct 2004
TL;DR: This study compared the classification of stops /p/, /t/, and /k/ based on spectral moments with classification based on an equal number of Bark Cepstrum coefficients, and found the best model used RMS amplitude plus all four bark-scaled spectral moment features at all four time intervals.
Abstract: Figure 3. Position of each case on linear discriminant 1 (LD1) versus linear discriminant 2 (LD2). Spectral moments analysis has been shown to be effective in deriving acoustic features for classifying voiceless stop release bursts [1], and is an analysis method that has commonly been cited in the clinical phonetics literature dealing with children’s disordered speech. In this study, we compared the classification of stops /p/, /t/, and /k/ based on spectral moments with classification based on an equal number of Bark Cepstrum coefficients. Utterance-initial /p/, /t/, and /k/ (1338 samples in all) were collected from a database of children’s speech. Linear discriminant analysis (LDA) was used to classify the three stops based on four analysis frames from the initial 40 msec of each token. The best model based on spectral moments used RMS amplitude plus all four bark-scaled spectral moment features at all four time intervals and yielded 78.0% correct discrimination. The best model of similar rank based on Bark cepstrum features yielded 86.6% correct segment discrimination.

Proceedings ArticleDOI
22 Jun 2004
TL;DR: Cepstrum modification in regions of speech that are perceptually masked - analogous to embedding in frequency masked regions - may yield imperceptible stego audio with low BER.
Abstract: A method of embedding data in an audio signal using cepstral domain modification is described Based on successful embedding in the spectral points of perceptually masked regions in each frame of speech, first the technique was extended to embedding in the log spectral domain This extension resulted at approximately 62 bits /s of embedding with less than 2 percent of bit error rate (BER) for a clean cover speech (from the TIMIT database), and about 25 percent for a noisy speech (from an air traffic controller database), when all frames - including silence and transition between voiced and unvoiced segments - were used Bit error rate increased significantly when the log spectrum in the vicinity of a formant was modified In the next procedure, embedding by altering the mean cepstral values of two ranges of indices was studied Tests on both a noisy utterance and a clean utterance indicated barely noticeable perceptual change in speech quality when lower range of cepstral indices - corresponding to vocal tract region - was modified in accordance with data With an embedding capacity of approximately 62 bits/s - using one bit per each frame regardless of frame energy or type of speech - initial results showed a BER of less than 15 percent for a payload capacity of 208 embedded bits using the clean cover speech BER of less than 13 percent resulted for the noisy host with a capacity was 316 bits When the cepstrum was modified in the region of excitation, BER increased to over 10 percent With quantization causing no significant problem, the technique warrants further studies with different cepstral ranges and sizes Pitch-synchronous cepstrum modification, for example, may be more robust to attacks In addition, cepstrum modification in regions of speech that are perceptually masked - analogous to embedding in frequency masked regions - may yield imperceptible stego audio with low BER

Proceedings Article
01 Aug 2004
TL;DR: In this article, a BCH code-based audio watermarking approach performed in the cepstrum domain is proposed, which takes advantage of the attack-invariant feature of the CPE domain and the error-correction capability of BCH codes to increase the robustness of audio watermarks.
Abstract: In this article, a BCH code-based audio watermarking approach performed in the cepstrum domain is proposed. The technique takes advantage of the attack-invariant feature of the cepstrum domain and the error-correction capability of BCH code to increase the robustness of audio watermarking. In addition, the watermarked audio has very high perceptual quality. A blind watermark detection technique is developed to identify the embedded watermark under various types of attacks. Experiment results demonstrate that the proposed technique outperforms the existing audio watermarking techniques against most of the asynchronous attacks.

Book ChapterDOI
22 Nov 2004
TL;DR: A new feature set based on Teager Energy Operator and well-known Mel frequency cepstral coefficients (MFCC) is developed and the effectiveness of the newly derived feature set in identifying identical twins has been demonstrated for different Indian languages.
Abstract: Automatic Speaker Recognition (ASR) is an economic method of biometrics because of the availability of low cost and powerful processors. An important question which must be answered for the ASR system is how well the system resists the effects of determined mimics such as those based on physiological characteristics especially identical twins or triplets. In this paper, a new feature set based on Teager Energy Operator (TEO) and well-known Mel frequency cepstral coefficients (MFCC) is developed. The effectiveness of the newly derived feature set in identifying identical twins has been demonstrated for different Indian languages. Polynomial classifiers of 2nd and 3rd order have been used. The results have been compared with other feature sets such as LPC coefficients, LPC cepstrum and baseline MFCC.