scispace - formally typeset
Search or ask a question

Showing papers on "Cepstrum published in 1999"


Journal ArticleDOI
Sassan Ahmadi1, Andreas Spanias1
TL;DR: An improved cepstrum-based voicing detection and pitch determination algorithm is presented and is shown to be robust to additive noise and performance analysis on a large database indicates considerable improvement relative to the conventional cepStrum method.
Abstract: An improved cepstrum-based voicing detection and pitch determination algorithm is presented. Voicing decisions are made using a multifeature voiced/unvoiced classification algorithm based on statistical analysis of cepstral peak, zero-crossing rate, and energy of short-time segments of the speech signal. Pitch frequency information is extracted by a modified cepstrum-based method and then carefully refined using pitch tracking, correction, and smoothing algorithms. Performance analysis on a large database indicates considerable improvement relative to the conventional cepstrum method. The proposed algorithm is also shown to be robust to additive noise.

192 citations


Journal ArticleDOI
TL;DR: Two approaches are concentrated on extracting features that are robust against channel variations and transforming the speaker models to compensate for channel effects, which resulted in a 38% relative improvement on the closed-set 30-s training 5-s testing condition of the NIST'95 Evaluation task.
Abstract: This paper addresses the issue of closed-set text-independent speaker identification from samples of speech recorded over the telephone. It focuses on the effects of acoustic mismatches between training and testing data, and concentrates on two approaches: (1) extracting features that are robust against channel variations and (2) transforming the speaker models to compensate for channel effects. First, an experimental study shows that optimizing the front end processing of the speech signal can significantly improve speaker recognition performance. A new filterbank design is introduced to improve the robustness of the speech spectrum computation in the front-end unit. Next, a new feature based on spectral slopes is described. Its ability to discriminate between speakers is shown to be superior to that of the traditional cepstrum. This feature can be used alone or combined with the cepstrum. The second part of the paper presents two model transformation methods that further reduce channel effects. These methods make use of a locally collected stereo database to estimate a speaker-independent variance transformation for each speech feature used by the classifier. The transformations constructed on this stereo database can then be applied to speaker models derived from other databases. Combined, the methods developed in this paper resulted in a 38% relative improvement on the closed-set 30-s training 5-s testing condition of the NIST'95 Evaluation task, after cepstral mean removal.

97 citations


Journal ArticleDOI
TL;DR: A noise robust Bayesian phase unwrapping method using a noncausal Markov random chain model that improves radial and lateral blind deconvolution on six short ultrasound image sequences recorded in vitro or in vivo.
Abstract: Recently, several blind cepstral deconvolution methods for medical ultrasound images were compared experimentally. The results indicated that the generalized cepstrum or the complex cepstrum with phase unwrapping give the blind homomorphic deconvolution algorithms with the best performance. However, the frequency domain phase unwrapping for pulse estimation, which is an essential part of both methods, is sensitive to the sensor noise when the values of the spectrum are small due to the randomness of the tissue response. The noise introduces abrupt changes in the phase. The phase degradation due to the noise causes variable spatial and gray scale resolution in image sequences following deconvolution. This paper introduces a noise robust Bayesian phase unwrapping method using a noncausal Markov random chain model. The prior regularizing term accounts for the noise and smoothes the phase. The phase unwrapping is formulated as a least mean square optimization problem. The optimization is done noniteratively by solving a difference equation using the cosine transform. The resulting improvement in radial and lateral blind deconvolution is demonstrated on six short ultrasound image sequences recorded in vitro or in vivo.

73 citations


Journal ArticleDOI
TL;DR: A comparison of the two sets of features indicates that J/sub 1/(t) can be used to model the hearing perception much like the mel cepstral coefficients.
Abstract: A compact representation of speech is possible using Bessel functions because of the similarity between voiced speech and the Bessel functions. Both voiced speech and the Bessel functions exhibit quasiperiodicity and decaying amplitude with time. This paper presents the results of speaker identification experiments using features obtained from (1) the Fourier-Bessel expansion and (2) the cepstral representation of speech frames. Identification scores of 65% and 76% were achieved using features based on J/sub 1/(t) expansion of air-to-ground speech transmission databases of 143 and 1054 test utterances, respectively. The corresponding scores for the two databases using cepstral coefficients of a comparable size were 80% and 88%. A comparison of the two sets of features indicates that J/sub 1/(t) can be used to model the hearing perception much like the mel cepstral coefficients.

59 citations


Journal ArticleDOI
TL;DR: This work examines the performance of seven F0 algorithms, based on the average magnitude difference function (AMDF), the input autocorrelation function (AC), the autoc orrelation function of the center-clipped signal, the autOCorrelationfunction of the inverse filtered signal), the signal cepstrum (CEP), the Harmonic Product Spectrum (HPS) of the signal, and the waveform matching function (WM) respectively.
Abstract: Perturbation analysis of sustained vowel waveforms is used routinely in the clinical evaluation of pathological voices and in monitoring patient progress during treatment. Accurate estimation of voice fundamental frequency (F0) is essential for accurate perturbation analysis. Several algorithms have been proposed for fundamental frequency extraction. To be appropriate for clinical use, a key consideration is that an F0 extraction algorithm be robust to such extraneous factors as the presence of noise and modulations in voice frequency and amplitude that are commonly associated with the voice pathologies under study. This work examines the performance of seven F0 algorithms, based on the average magnitude difference function (AMDF), the input autocorrelation function (AC), the autocorrelation function of the center-clipped signal (ACC), the autocorrelation function of the inverse filtered signal (IFAC), the signal cepstrum (CEP), the Harmonic Product Spectrum (HPS) of the signal, and the waveform matching function (WM) respectively. These algorithms were evaluated using sustained vowel samples collected from normal and pathological subjects. The effect of background noise and of frequency and amplitude modulations on these algorithms was also investigated, using synthetic vowel waveforms.

50 citations


Proceedings ArticleDOI
15 Mar 1999
TL;DR: The APT is exploited to develop a speaker adaptation scheme in which the cepstral means of a speech recognition model are transformed to better match the speech of a given speaker.
Abstract: In previous work, a class of transforms were proposed which achieve a remapping of the frequency axis much like conventional vocal tract length normalization. These mappings, known collectively as all-pass transforms (APT), were shown to produce substantial improvements in the performance of a large vocabulary speech recognition system when used to normalize incoming speech prior to recognition. In this application, the most advantageous characteristic of the APT was its cepstral-domain linearity; this linearity makes speaker normalization simple to implement, and provides for the robust estimation of the parameters characterizing individual speakers. In the current work, we exploit the APT to develop a speaker adaptation scheme in which the cepstral means of a speech recognition model are transformed to better match the speech of a given speaker. In a set of speech recognition experiments conducted on the Switchboard corpus, we report reductions in word error rate of 3.7% absolute.

41 citations


Proceedings Article
Diemo Schwartz1, Xavier Rodet1
01 Jan 1999
TL;DR: The proposed high-level approach to spectral envelope handling is followed in software developed at IRCAM, which makes some important applications of spectral envelopes in the domain of additive analysis–synthesis possible.
Abstract: Spectral envelopes are very useful in sound analysis and synthesis because of their connection with production and perception models, and their ability to capture and to manipulate important properties of sound using easily understandable “musical” parameters. It is not easy, however, to estimate and represent them well, as several requirements must be fulfilled. We discuss the strengths and weaknesses of the estimation methods LPC, cepstrum, and discrete cepstrum, and evaluate the representations filter coefficients, sampled, break-point functions, splines, and formants. The proposed high-level approach to spectral envelope handling is followed in software developed at IRCAM, which makes some important applications of spectral envelopes in the domain of additive analysis–synthesis possible.

40 citations


Patent
James P. Ashley1
30 Nov 1999
TL;DR: In this article, a frequency domain comb-filtering (289) technique was used to supplement a traditional spectral noise suppression method, which prevented high frequency components from being unnecessarily attenuated, thereby reducing muffling effects of prior state-of-the-art combfilters.
Abstract: A noise suppression system implemented in communication system provides an improved level of quality during severe signal-to-noise ratio (SNR) conditions. The noise suppression system, inter alia, incorporates a frequency domain comb-filtering (289) technique which supplements a traditional spectral noise suppression method. The invention includes a real cepstrum generator (285) for an input signal (285) G(k) to produce a likely voiced speech pitch lag component and converting a result to frequency domain to obtain a comb-filter function (290) C(k), applying input signal (291) G(k) to comb-filter function (290) C(k), and equalizing the energies of the corresponding pre and post filtered subbands, to produce a signal (293) G″(k) to be used for noise suppression. This prevents high frequency components from being unnecessarily attenuated, thereby reducing muffling effects of prior art comb-filters.

38 citations


Proceedings Article
01 Sep 1999
TL;DR: This paper proposes cepstrum length normalization, applied to the incoming testing utterances, which results in a 13% word error rate reduction on an independent evaluation corpus, additive to the contribution of Maximum Likelihood Linear Regression (MLLR) adaptation.
Abstract: The accuracy of a speech recognition (SR) system depends on many factors, such as the presence of background noise, mismatches in microphone and language models, variations in speaker, accent and even speaking rates. In addition to fast speakers, even normal speakers will tend to speak faster when using a speech recognition system in order to get higher throughput. Unfortunately, state-of-the-art SR systems perform significantly worse on fast speech. In this paper, we present our efforts in making our system more robust to fast speech. We propose cepstrum length normalization, applied to the incoming testing utterances, which results in a 13% word error rate reduction on an independent evaluation corpus. Moreover, this improvement is additive to the contribution of Maximum Likelihood Linear Regression (MLLR) adaptation. Together with MLLR, a 23% error rate reduction was achieved.

36 citations


Journal ArticleDOI
TL;DR: The results showed that adding the pitch gives significant improvement only when the correlation between the pitch and cepstral coefficients is used, and adding only LPC-residual also gives significant improved, but in contrast to the pitch, using the correlation with the cepStral coefficients does not have big effect.
Abstract: In the speaker recognition, when the cepstral coefficients are calculated from the LPC analysis parameters, the prediction error, or LPC residual signal, is usually ignored. However, there is an evidence that it contains a speaker specific information. The fundamental frequency of the speech signal or the pitch, which is usually extracted from the LPC residual, has been used for speaker recognition purposes, but because of the high intraspeaker variability of the pitch it is also often ignored. This paper describes our approach to integrating the pitch and LPC-residual with the LPC-cepstrum in a Gaussian Mixture Model (GMM) based speaker recognition system. The pitch and/or LPC-residual are considered as an additional features to the main LPC derived cepstral coefficients and are represented as a logarithm of the F0 and as a filter bank mel frequency cepstral (MFCC) vector respectively. The second task of this research was to verify whether the correlation between the different information sources is useful for the speaker recognition task. For the experiments we used the NTT database consisting of high quality speech samples. The speaker recognition system was evaluated in three modes-integrating only pitch or only LPC-residual and integrating both of them. The results showed that adding the pitch gives significant improvement only when the correlation between the pitch and cepstral coefficients is used. Adding only LPC-residual also gives significant improvement, but in contrast to the pitch, using the correlation with the cepstral coefficients does not have big effect. The best results we achieved using both the pitch and LPC-residual and are 98.5% speaker identification rate and 0.21% speaker verification equal error rate compared to 97.0% and 1.07% of the baseline system respectively.

28 citations


Proceedings ArticleDOI
15 Mar 1999
TL;DR: The high correlation between the performance of the human listeners and that of the connected digit recognizer leads to some interesting conclusions, including that typical cepstral processing is insufficient to support speech information in noise.
Abstract: We consider the performance of speech recognition in noise and focus on its sensitivity to the acoustic feature set. In particular, we examine the perceived information reduction imposed on a speech signal using a feature extraction method commonly used for automatic speech recognition. We observe that the human recognition rates on noisy digit strings drop considerably as the speech signal undergoes the typical loss of phase and loss of frequency resolution. Steps are taken to ensure that human subjects are constrained in ways similar to that of an automatic recognizer. The high correlation between the performance of the human listeners and that of our connected digit recognizer leads us to some interesting conclusions, including that typical cepstral processing is insufficient to support speech information in noise.

Journal ArticleDOI
TL;DR: In this paper, a high fidelity ultrasonic pulse-echo signal processing technique for detecting delaminations in thin composite laminates was proposed, which can be used effectively for extracting exact time-of-flight information and then constructing accurate B-scan and three-dimensional images of defects in thin composites.
Abstract: Conventional ultrasonic pulse-echo imaging techniques, typically processed in the time domain, are generally limited by the large pulse widths, resulting in inaccurate and confusing B-scan images. This paper deals with a high fidelity ultrasonic pulse-echo signal processing technique suitable for detecting delaminations in thin composite laminates. In this processing scheme, broadband pulse-echo A-scan signals are reconstructed in the transform domain using complex cepstrum and homomorphic deconvolution techniques. The technique is implemented for 8-ply and 16-ply quasi-isotropic graphite/epoxy laminates with embedded Teflon film patches and damage induced by low velocity impact loading. The technique shows excellent improvement in the time resolution so that it can be used effectively for extracting exact time-of-flight information and then constructing accurate B-scan and three-dimensional images of defects in thin composites. Moreover, the approach automatically measures the time delay from the front surface, i.e., it follows the front face, making it useful for inspecting slightly warped laminates or the parts with irregular surfaces.

Proceedings ArticleDOI
22 Aug 1999
TL;DR: This paper presents a novel approach for dereverberation of speech signals, based on an iterative cepstral separation of the non-minimum phase room response that is superior to the standard homomorphic techniques, as it overcomes the loss in computational efficiency inflicted by the long time responses of the inverse filters.
Abstract: This paper presents a novel approach for dereverberation of speech signals, based on an iterative cepstral separation of the non-minimum phase room response. The algorithm performs an iterative flattening of the room transfer function (RTF) magnitude spectrum prior to suppression of phase distortion. For ill-conditioned inversion problems, the proposed method is superior to the standard homomorphic techniques, as it overcomes the loss in computational efficiency inflicted by the long time responses of the inverse filters. By using the fast-decaying sequences shorter than the length of a full inverse, the finite-point nature of a discrete Fourier transform (DFT) is made noncritical. Results with measured room responses conform with the theory presented in the paper.

Journal ArticleDOI
TL;DR: In this paper, a new cepstrum normalization method is proposed which can be used to compensate for distortion caused by additive noise, which is shown to give improved performance compared with that of conventional methods.
Abstract: A new cepstrum normalisation method is proposed which can be used to compensate for distortion caused by additive noise. Conventional methods only compensate for the deviation of the cepstral mean and/or variance. However, deviations of higher order moments also exist in noisy speech signals. The proposed method normalises the cepstrum up to its third-order moment, providing closer probability density functions between clean and noisy cepstra than is possible using conventional methods. From the speaker-independent isolated-word recognition experiments, it is shown that the proposed method gives improved performance compared with that of conventional methods, especially in heavy noise environments.

Proceedings ArticleDOI
22 Aug 1999
TL;DR: Intensive experimental results showed that the proposed technique outperforms both the autocorrelation and cepstral based techniques.
Abstract: In speech processing, methods based on glottal closure instances (GCI) for pitch period estimation have proven to give good results. We propose here a new method to estimate the GCIs based on the Teager energy function (TEF). The technique is simple and has a low computational load. The method is based on the peak detection of the TEF for each frame. A smoothing technique is then implemented to pick the right value for the pitch period. Intensive experimental results showed that the proposed technique outperforms both the autocorrelation and cepstral based techniques.

Proceedings ArticleDOI
17 Oct 1999
TL;DR: A new method of obtaining features from speech signals for robust analysis and recognition-the non-uniform linear prediction (NLP) cepstrum is proposed, to derive a representation that suppresses speaker-dependent characteristics while preserving the linguistic quality of speech segments.
Abstract: We propose a new method of obtaining features from speech signals for robust analysis and recognition-the non-uniform linear prediction (NLP) cepstrum. The objective is to derive a representation that suppresses speaker-dependent characteristics while preserving the linguistic quality of speech segments. The analysis is based on two principles. First, Bark frequency warping is performed on the LP spectrum to emulate the auditory spectrum. While widely used methods such as the mel-frequency and PLP analysis use the FFT spectrum as its basis for warping, the NLP analysis uses the LP-based vocal-tract spectrum with glottal effects removed. Second, all-pole modeling (LP) is used before and after the warping. The pre-warp LP is used to first obtain the vocal-tract spectrum, while the post-warp LP is performed to obtain a smoothed, two-peak model of the warped spectrum. Experiments were conducted to test the effectiveness of the proposed feature in the case of identification/discrimination of vowels uttered by multiple speakers using linear discriminant analysis (LDA), and frame-based vowel recognition with a statistical model. In both cases, the NLP analysis was shown to be an effective tool for speaker-independent speech analysis/recognition applications.

Journal ArticleDOI
TL;DR: Comparison between time and bispectrum averaging is performed using simulated data, proving the more efficient performance of the proposed method, especially in the case of noisy ENGs.

Journal ArticleDOI
TL;DR: In this article, a neural network with two hidden layers was trained using vectors of 10 mel-frequency cepstrum coefficients and the corresponding vocal tract lengths of these utterances, achieving an average error of less than 1% and a maximum error of 3.2% in estimating VT length from single test utterances.
Abstract: A new method of estimating the overall vocal‐tract (VT) length and the normalization of acoustic parameters of different speakers is reported in this paper for acoustic‐to‐articulatory mapping. The main goal of this work was a high accuracy of VT length estimation from a short speech utterance. An articulatory model, originally developed by Maeda, was used as a reference female VT. Linear scaling was used to synthesize training data for VT lengths between 100% and 125% of the reference VT length (14.96 cm). These data had 250 utterances, resulted from different VT lengths, each containing six vowels. A neural network with two hidden layers was trained using vectors of 10 mel‐frequency cepstrum coefficients and the corresponding VT lengths of these utterances. For the same VT length range, similar test data were synthesized using the training vowels but in different contexts. With the trained network, evaluation of this method on test data has shown an average error of less than 1% and a maximum error of 3.2% in estimating VT length from single test utterances. Frequency warping was used to normalize the cepstrum parameters according to estimated length factors ranging between 1.0 and 1.25. [This work was supported by NSERC.]

Journal ArticleDOI
TL;DR: The cepstrum coefficients of an ARMA process, which may easily be derived as functions of the z-plane poles and zeros, can be used to characterise and manipulate a spectrum.

Proceedings ArticleDOI
05 Sep 1999
TL;DR: The study of the filter bank analysis suggests a new frequency scale instead of the currently used mel-scale to extract from the speech signal cepstrum coefficients, which results in better performance in speaker verification.
Abstract: The influence of cepstrum parameters on text-dependent speaker verification and speech recognition is investigated. Experiments are performed to establish the relevance of various resonant frequencies and frequency bands in terms of their speech and speaker recognition ability. A Romanian database of eighteen isolated words has been used. The study of the filter bank analysis suggests a new frequency scale instead of the currently used mel-scale to extract from the speech signal cepstrum coefficients. The proposed scale results in better performance in speaker verification. The processes of speech recognition and speaker verification are carried out by using a neural network system comprising a self-organizing feature map (SOFM) and a multilayer perceptron (MLP).

Proceedings Article
01 Oct 1999
TL;DR: A more high level approach to spectral envelopes is taken, which can avoid the dilemma of how to control hundreds of partials, and the residual noise part can be treated by the same manipulations as the sinusoidal part by using the same representation.
Abstract: A spectral envelope is a curve in the frequency-magnitude plane which envelopes the short time spectrum of a signal, e.g. connecting the peaks which represent sinusoidal partials, or modeling the spectral density of a noise signal. It describes the perceptually pertinent distribution of energy over frequency, which determines a large part of timbre for instruments, and the type of vowel for speech. Because of the importance of using spectral envelopes for sound synthesis, a more high level approach to their handling is taken here. We present programs developed using spectral envelopes for analysis, representation, manipulation, and synthesis. Spectral envelopes can be estimated by linear prediction, cepstrum or discrete cepstrum. The strong and weak points of each are discussed relative to the requirements for estimation, such as robustness and regularity. Improvements of discrete cepstrum estimation (regularization, statistical smoothing, logarithmic frequency scale, adding control points) are presented. For speech signals, a composite envelope is shown to be advantageous. It is estimated from the sinusoidal partials and from the noise part above the maximum partial frequency. The representation of spectral envelopes is the central point for their handling. A good representation is crucial for the ease and flexibility with which they can be manipulated. Several requirements are laid out, such as stability, locality, and flexibility. The representations (filter coefficients, sampled, break-point-functions, splines, formants) are then discussed relative to these requirements. The notion of fuzzy formants based on formant regions is introduced. Some general forms of manipulations and morphing are presented. For morphing between two or more spectral envelopes over time, linear interpolation, and formant shifting which preserves valid vocal tract characteristics, are considered. For synthesis, spectral envelopes are applied to sinusoidal additive synthesis and are used for filtering the residual noise component. This is especially easy and efficient for both components in the FFT-1 technique. Finally, in additive analysis, spectral envelopes can be generalized not only to apply to magnitude, but also to frequency and phase, while keeping the same representation. The frequency envelope expresses harmonicity of partials over frequency, the phase envelope expresses phase relations between harmonic partials. With this high level approach to spectral envelopes, additive synthesis can avoid the dilemma of how to control hundreds of partials, and the residual noise part can be treated by the same manipulations as the sinusoidal part by using the same representation. Also, high quality singing voice synthesis can use morphing between sampled spectral envelopes and formants to combine natural sounding transitions with a precisely modeled sustained part. Abovementioned methods have been implemented in a C-library using the SDIF standard for sound description data as file format and are used in various real-time and non real-time programs on Unix and Macintosh.

Proceedings ArticleDOI
22 Aug 1999
TL;DR: In this article, a method for noise-proof detection of the fundamental frequency of the voice in a noisy environment is described, which uses the property of continuity in the fundamental frequencies and the power spectrum envelope (PSE) of the human voice.
Abstract: This paper describes a method for noise-proof detection of the fundamental frequency of the voice in a noisy environment. Noise reduction techniques have been required in the development of a hearing aid, because noise makes intelligibility of hearing awfully inferior. In various methods of noise reduction, the fundamental frequency is often a significant parameter, but it is difficult to extract the frequency from the noisy voice. In order to utilize a comb filter method for noise reduction, a new method of detecting the fundamental frequency is developed by using the property of continuity in the fundamental frequency and the power spectrum envelope (PSE) of the human voice. The continuity of the PSE is utilized for determining the most reliable frequency. The gross pitch error (GPE) is reduced by the determination. Besides the frequency used for the comb filter is obtained from a linear predicting frequency and the latest fundamental frequency from the noisy voice, so as to suppress fluctuation of the frequency that degrades filtered voice. The procedure improves a fine pitch error (FPE) within 5%. The results of the evaluation showed that the present method proved to be superior to the traditional cepstrum method in the GPE and the FPE. We conclude that the proposed frequency detection method is available for noise reduction in the comb filter method.

Proceedings Article
01 Jan 1999
TL;DR: This paper presents a feature parameter transformation method using ICA (independent component analysis) for text independent speaker identification of telephone speech assuming that the cepstrum vectors of the telephone speech collected from various kinds of channel conditions are linear combinations of some characteristic functions with random noise added.
Abstract: This paper presents a feature parameter transformation method using ICA (independent component analysis) for text independent speaker identification of telephone speech ICA is a signal processing technique which can separate linearly mixed signals into statistically independent signals The proposed method transforms them into new vectors using ICA assuming that the cepstrum vectors of the telephone speech collected from various kinds of channel conditions are linear combinations of some characteristic functions with random noise added The performance of the proposed method was compared to the original cepstrum for the HMM-based speaker identification system Experiments were done in equal and different channel conditions on SPIDRE, a real telephone speech database for text independent speaker identification The identification rates increased from about 1 13% most cases, so it was confirmed that the proposed method is effective for speaker identification systems, and more effective in adverse environments

Proceedings Article
01 Jan 1999
TL;DR: Two recently reported approaches which operate on the sequence of logarithmically compressed mel-scaled filter-bank energies are compared: the first approach - TIFFING (TIme and Frequency FilterING) - applies FIR filters to that 2-D sequence along both axes, while the second one - CTM (Cepstral Time Matrix) - uses the DCT to compute a set of parameters in the2-D transformed domain.
Abstract: In current speech recognition systems, speech is represented by a 2-D sequence of parameters that model the temporal evolution of the spectral envelope of speech. Linear transformation or filtering along both time and frequency axes of that 2-D sequence are used to enhance the discriminative ability and robustness of speech parameters in the HMM pattern-matching formalism. In this paper, we compared two recently reported approaches which operate on the sequence of logarithmically compressed mel-scaled filter-bank energies: the first approach - TIFFING (TIme and Frequency FilterING) - applies FIR filters to that 2-D sequence along both axes, while the second one - CTM (Cepstral Time Matrix) - uses the DCT to compute a set of parameters in the 2-D transformed domain. They are compared in several ways: (1) analytically, using Fourier transformation, (2) statistically and (3) performing recognition tests with clean and noisy speech.

Patent
26 Feb 1999
TL;DR: In this paper, a technique for obtaining an intermediate set of frequency dependant features from a speech signal for use in speech processing and in obtaining estimates of speech pitch is presented, which utilizes multiple tapers derived from Slepian sequences to obtain a product of the speech signal and the Slepians functions.
Abstract: A technique for obtaining an intermediate set of frequency dependant features from a speech signal for use in speech processing and in obtaining estimates of speech pitch. The technique utilizes multiple tapers derived from Slepian sequences to obtain a product of the speech signal and the Slepian functions. Multiple tapered Fourier transforms are then obtained from the product, from which the set of frequency dependent features are calculated. In a preferred embodiment, a derivative of the cepstrum of the speech signal is used as an estimate of speech signal pitch. In another preferred embodiment, the F-spectrum is calculated from the product and the F-cepstrum is obtained therefrom by calculating the Fourier transform of the smoothed derivative of the log of the F-spectrum. The maximum of the F-cepstrum also provides a pitch estimation.

Proceedings Article
01 Jan 1999
TL;DR: A new cepstrum-based channel compensation method is proposed for speaker verification over the telephone network that reduces verification error rate significantly and introduces a novel way of cepStral mean subtraction called differential-partial cepstralmean subtraction (DPCMS).
Abstract: A new cepstrum-based channel compensation method is proposed for speaker verification over the telephone network. The method consists of intra-frame and interframe cepstral processing. For the former, a poleremoved cepstrum is derived, where the LP poles with frequency higher than a certain threshold are removed. For the latter, we introduce a novel way of cepstral mean subtraction called differential-partial cepstral mean subtraction (DPCMS). The main idea is that the cepstral mean of clean speech is not necessarily zero and that the cepstral difference between clean and channel-corrupted speech is mainly contributed by the channel effects on LP poles within a certain frequency range. A speaker verification system based on radial basis function networks was used to evaluate the proposed approach. Clean speech was used to train the networks and telephone speech was used to evaluate their performance. Experimental results show that the proposed method reduces verification error rate significantly.

Proceedings Article
01 Jan 1999
TL;DR: A neural network is used to train and classify normal, benign and malignant states of speech and a new parameter is proposed to discriminate each class based on cepstral analysis technique.
Abstract: In this paper we propose a new method to classify the pathological voice into normal, benign and malignant cases. New parameter is proposed to discriminate each class. New parameter is based on cepstral analysis technique. Pathological speech signal is collected at the hospital. Normal speech signal is also contained at the same database and analyzed as well. Then the results are compared to find the differences between normal and pathological speech. Source components are separated using cepstrum after obtaining residual signal from speech. Then the ratios between harmonic components and noise components are obtained from the original signal and residual signal. Finally a neural network is used to train and classify normal, benign and malignant states of speech.

Proceedings ArticleDOI
12 Oct 1999
TL;DR: This paper proposes an algorithm to optimize the filters using the simplex method, and shows that the recognition rate of Korean digit words can be improved about 3-5%.
Abstract: In this paper we propose a method to optimize the performance of mel-cepstrum that is widely used in speech recognition. Typically, mel-cepstrum is obtained by critical band filters. Thus, the characteristics of filters determine mel-cepstrum, resulting in different performance. By changing the characteristics of filters such as shape, center frequency, and bandwidth, we analyze the performance of mel-cepstrum. Then we propose an algorithm to optimize the filters using the simplex method. Experiments with Korean digit words show that the recognition rate can be improved about 3-5%.

Journal Article
TL;DR: In this paper, a model of the signals generated by an accelerometric sensor is established on which the theoretical expression of the energy cepstrum is partially calculated, making it possible to develop a detection tool which is not affected by the signal amplitude, nor the noise to signal ratio nor the position of the sensor.
Abstract: This article shows the possibilities offered by the use of energy cepstrum for gear box vibratory diagnosis . A model of the signals generated by an accelerometric sensor will be established on which the theoretical expression of the energy cepstrum is partially calculated . This will make it possible to develop a detection tool which we will nether be affected by the signal amplitude, nor the noise to signal ratio nor the position of the sensor . It is shown that the use of an angular sampled signal enables the cepstrum to preserve its full resolution and allows the realization of synchronous averaging to isolate each gear mesh . Some applications of the monitoring procedure are presented .