scispace - formally typeset
Search or ask a question

Showing papers on "Cepstrum published in 1998"


Journal ArticleDOI
TL;DR: An efficient means for estimating a linear frequency Warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis are presented.
Abstract: In an effort to reduce the degradation in speech recognition performance caused by variation in vocal tract shape among speakers, a frequency warping approach to speaker normalization is investigated. A set of low complexity, maximum likelihood based frequency warping procedures have been applied to speaker normalization for a telephone based connected digit recognition task. This paper presents an efficient means for estimating a linear frequency warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis. An experimental study comparing these techniques to other well-known techniques for reducing variability is described. The results have shown that frequency warping is consistently able to reduce word error rate by 20% even for very short utterances.

338 citations


Journal ArticleDOI
01 Dec 1998
TL;DR: An electromyographic (EMG) pattern recognition method to identify motion commands for the control of a prosthetic arm by evidence accumulation based on artificial intelligence with multiple parameters is presented.
Abstract: This paper presents an electromyographic (EMG) pattern recognition method to identify motion commands for the control of a prosthetic arm by evidence accumulation based on artificial intelligence with multiple parameters. The integral absolute value, variance, autoregressive (AR) model coefficients, linear cepstrum coefficients, and adaptive cepstrum vector are extracted as feature parameters from several time segments of EMG signals. Pattern recognition is carried out through the evidence accumulation procedure using the distances measured with reference parameters. A fuzzy mapping function is designed to transform the distances for the application of the evidence accumulation method. Results are presented to support the feasibility of the suggested approach for EMG pattern recognition.

188 citations


Journal ArticleDOI
TL;DR: The glottal to noise excitation ratio (GNE) is an acoustic measure designed to assess the amount of noise in a pulse train generated by the oscillation of the vocal folds that is found to be independent of variations of fundamental frequency and amplitude.
Abstract: The glottal to noise excitation ratio (GNE) is an acoustic measure designed to assess the amount of noise in a pulse train generated by the oscillation of the vocal folds. So far its properties have only been studied for synthesized signals, where it was found to be independent of variations of fundamental frequency (jitter) and amplitude (shimmer). On the other hand, other features designed for the same purpose like NNE (normalized noise energy) or CHNR (cepstrum based harmonics-to-noise ratio) did not show this independence. This advantage of the GNE over NNE and CHNR, as well as its general applicability in voice quality assessment, is now tested for real speech using a large group of pathologic voices (n=447). A set of four acoustic features is extracted from a total of 22 mostly well-known acoustic voice quality measures by correlation analysis, mutual information analysis, and principal components analysis. Three of these measures are chosen to assess primarily different aspects of signal aperiodici...

166 citations


Proceedings Article
01 Nov 1998
TL;DR: This work model the speaker’s f0 movements by fitting a piecewise linear model to the f0 track to obtain a stylized f0 contour, and improves the verification performance of a cepstrum-based Gaussian mixture model system by 10%.
Abstract: Statistics of frame-level pitch have recently been used in speaker recognition systems with good results [1, 2, 3]. Although they convey useful long-term information about a speaker’s distribution of f0 values, such statistics fail to capture information about local dynamics in intonation that characterize an individual’s speaking style. In this work, we take a first step toward capturing such suprasegmental patterns for automatic speaker verification. Specifically, we model the speaker’s f0 movements by fitting a piecewise linear model to the f0 track to obtain a stylized f0 contour. Parameters of the model are then used as statistical features for speaker verification. We report results on 1998 NIST speaker verification evaluation. Prosody modeling improves the verification performance of a cepstrum-based Gaussian mixture model system (as measured by a task-specific Bayes risk) by 10%.

159 citations


Patent
TL;DR: In this paper, a method and system for transforming a sampling rate in speech recognition systems, in accordance with the present invention, includes the steps of providing cepstral based data including utterances comprised of segments at a reference frequency, the segments being represented by CEPstral vector coefficients, converting the cepSTral vector coefficient to energy bands in logarithmic spectra, filtering the energy bands of the log-a-thm spectra to remove energy bands having a frequency above a predetermined portion of a target frequency and converting the filtered energy bands to modified
Abstract: A method and system for transforming a sampling rate in speech recognition systems, in accordance with the present invention, includes the steps of providing cepstral based data including utterances comprised of segments at a reference frequency, the segments being represented by cepstral vector coefficients, converting the cepstral vector coefficients to energy bands in logarithmic spectra, filtering the energy bands of the logarithmic spectra to remove energy bands having a frequency above a predetermined portion of a target frequency and converting the filtered logarithmic spectra to modified cepstral vector coefficients at the target frequency. Another method and system convert system prototypes for speech recognition systems from a reference frequency to a target frequency.

78 citations


Journal ArticleDOI
TL;DR: A spectral parametric technique and the cepstrum approach are compared, based on autoregressive models whose order is adaptively estimated on subsequent signal frames by means of a new method that allows the correct tracking of pitch and formant variations with time.

72 citations


Proceedings Article
01 Jan 1998
TL;DR: It is observed that by selectively combining the cepstral streams representing the LPC parameters and the residual signal it is possible to obtain recognition accuracy directly from the coded parameters that equals or exceeds the recognition accuracy obtained from the reconstructed waveforms.
Abstract: Speech coding affects speech recognition performance, with recognition accuracy deteriorating as the coded bit rate decreases. Virtually all systems that recognize coded speech reconstruct the speech waveform from the coded parameters, and then perform recognition (after possible noise and/or channel compensation) using conventional techniques. In this paper we compare the recognition accuracy of coded speech obtained by reconstructing the speech waveform with the speech recognition accuracy obtained when using cepstral features derived from the coding parameters. We focus our efforts on speech that has been coded using the 13-kbps full-rate GSM codec, a Regular Pulse Excited Long Term Prediction (RPE-LTP) codec. The GSM codec develops separate representations for the linear prediction (LPC) filter and the residual signal components of the coded speech. We measure the effects of quantization and coding on the accuracy with which these parameters are represented, and present two different methods for recombining them for speech recognition purposes. We observe that by selectively combining the cepstral streams representing the LPC parameters and the residual signal it is possible to obtain recognition accuracy directly from the coded parameters that equals or exceeds the recognition accuracy obtained from the reconstructed waveforms.

56 citations


Patent
22 Oct 1998
TL;DR: In this paper, a speech recognition method for recognizing an input speech in a noisy environment by using a plurality of clean speech models is provided, where each clean speech model has a clean speech feature parameter S representing a cepstrum parameter of a clean speaker.
Abstract: A speech recognition method of recognizing an input speech in a noisy environment by using a plurality of clean speech models is provided. Each of the clean speech models has a clean speech feature parameter S representing a cepstrum parameter of a clean speech thereof. The speech recognition method has the processes of: detecting a noise feature parameter N representing a cepstrum parameter of a noise in the noisy environment, immediately before the input speech is input; detecting an input speech feature parameter X representing a cepstrum parameter of the input speech in the noisy environment; calculating a modified clean speech feature parameter Y according to a following equation: Y = k · S + (1-k) · N (0 < k ≦ 1), where the "k" is a predetermined value corresponding to a signal-to-noise ratio in the noise environment; comparing the input speech feature parameter X with the modified clean speech feature parameter Y; and recognizing the input speech by repeatedly carrying out the calculating process and the comparing process with respect to the plurality of clean speech models.

54 citations


Journal ArticleDOI
TL;DR: This work proposes four additional new cepstral features that show less variation when speech is corrupted by convolutional noise (channel) and/or additive noise and proposes an alternative way of doing adaptive component weighting called the ACW2 cepstrum.
Abstract: A common problem in speaker identification systems is that a mismatch in the training and testing conditions sacrifices much performance. We attempt to alleviate this problem by proposing new features that show less variation when speech is corrupted by convolutional noise (channel) and/or additive noise. The conventional feature used is the linear predictive (LP) cepstrum that is derived from an all-pole transfer function which, in turn, achieves a good approximation to the spectral envelope of the speech. A different cepstral feature based on a pole-zero function (called the adaptive component weighted or ACW cepstrum) was previously introduced. We propose four additional new cepstral features based on pole-zero transfer functions. One is an alternative way of doing adaptive component weighting and is called the ACW2 cepstrum. Two others (known as the PFL1 cepstrum and the PFL2 cepstrum) are based on a pole-zero postfilter used in speech enhancement. Finally, an autoregressive moving-average (ARMA) analysis of speech results in a pole-zero transfer function describing the spectral envelope. The cepstrum of this transfer function is the feature. Experiments involving a closed set, text-independent and vector quantizer based speaker identification system are done to compare the various features. The TIMIT and King databases are used. The ACW and PFL1 features are the preferred features, since they do as well or better than the LP cepstrum for all the test conditions. The corresponding spectra show a clear emphasis of the formants and no spectral tilt.

53 citations


Proceedings ArticleDOI
12 May 1998
TL;DR: Experimental results are presented which provide empirical proof of convergence, and demonstrate the effectiveness of the technique in achieving recognition performance advantages by including formant features rather than only using cepstrum features.
Abstract: A formant analyser is interpreted probabilistically via a noisy channel model. This leads to a robust method of incorporating formant features into hidden Markov models for automatic speech recognition. Recognition equations follow trivially, and Baum-Welch style re-estimation equations are derived. Experimental results are presented which provide empirical proof of convergence, and demonstrate the effectiveness of the technique in achieving recognition performance advantages by including formant features rather than only using cepstrum features.

44 citations


Proceedings Article
Laurent Mauuary1
01 Sep 1998
TL;DR: This paper presents a new implementation of this blind equalization scheme in the cepstral domain using a circular-convolution frequency domain adaptive filter that offers almost the same performance as the conventional cEPstral substraction technique (off-line approach).
Abstract: An adaptive filter in a blind equalization scheme has recently been proposed to reduce telephone line effects for speech recognizers. The implementation of the blind equalization scheme using a circular-convolution frequency domain adaptive filter has been described in a previous paper. This paper presents a new implementation of this blind equalization scheme in the cepstral domain. The property of a constant long-term cepstrum of speech helps to compute the gradient used for adapting the weights. The performances of the spectral domain and the cepstral domain implementations are then compared. These filters prove to be efficient for the channel equalization task. Furthermore, speech recognition experiments show that the cepstral domain on-line adaptive filter outperforms the cepstral trajectories high-pass filter (on-line approach). This technique offers almost the same performance as the conventional cepstral substraction technique (off-line approach).

Proceedings ArticleDOI
12 May 1998
TL;DR: This study proposes a new set of feature parameters based on subband analysis of the speech signal for classification of speech under stress, which are scale energy (SE), autocorrelation-scale-energy (ACSE), subband based cepstral parameters (SC), and autoc orrelation-SC (ACSC).
Abstract: This study proposes a new set of feature parameters based on subband analysis of the speech signal for classification of speech under stress. The new speech features are scale energy (SE), autocorrelation-scale-energy (ACSE), subband based cepstral parameters (SC), and autocorrelation-SC (ACSC). The parameters' ability to capture different stress types is compared to widely used mel-scale cepstrum based representations: mel-frequency cepstral coefficients (MFCC) and autocorrelation-mel-scale (AC-mel). Next, a feedforward neural network is formulated for speaker-dependent stress classification of 10 stress conditions: angry, clear, cond50/70, fast, loud, lombard, neutral, question, slow, and soft. The classification algorithm is evaluated using a previously established stressed speech database (SUSAS) (Hansen and Bou-Ghazale 1997). Subband based features are shown to achieve +7.3% and +9.1% increase in the classification rates over the MFCC based parameters for ungrouped and grouped stress closed vocabulary test scenarios respectively. Moreover the average scores across the simulations of new features are +8.6% and +13.6% higher than MFCC based features for the ungrouped and grouped stress test scenarios respectively.

Patent
Laurent Mauuary1, Jean Monne1
TL;DR: In this article, a blind equalization of the effects of a transmission channel on a speech signal is proposed, in which the speech signals are transformed into cepstral vectors which are representative of the speech signal over a given horizon, and each of the vectors is subjected to adaptive filtering by LMS on the basis of the reference cepstrum.
Abstract: A process and device for blind equalization of the effects of a transmission channel on a speech signal. The speech signal is transformed into cepstral vectors which are representative of the speech signal over a given horizon. A reference cepstrum consisting of a constant cepstrum signal representative of the long-term cepstrum of the speech signal is calculated for each cepstral vector. Each of the cepstral vectors is subjected to adaptive filtering by LMS on the basis of the reference cepstrum so as to generate a set of equalized cepstral vectors on the basis of the calculation of an error signal between the reference cepstrum and equalized cepstral vectors. The error signal is expressed as the difference between the reference cepstrum component of a given rank and the component of the same rank of the equalized cepstral vector.

Journal ArticleDOI
01 Mar 1998
TL;DR: Multilayer perceptron (MLP) type neural networks and dynamic feature extraction techniques, namely linear prediction coding (LPC) and LPC cepstrum, are used to classify leakage type and to predict leakage flowrate magnitude in an electrohydraulic cylinder drive.
Abstract: Multilayer perceptron (MLP) type neural networks and dynamic feature extraction techniques, namely linear prediction coding (LPC) and LPC cepstrum, are used to classify leakage type and to ...

Journal ArticleDOI
TL;DR: A new speech feature extracted from adaptive wavelet for speech recognition is described, which shows a slightly better recognition rate than the cepstrum for speaker independent speech recognition and shows a lower standard deviation between speakers than does the cEPstrum.
Abstract: A new speech feature extracted from adaptive wavelet for speech recognition is described. The speech signal is decomposed through adapted local trigonometric transforms. The decomposed signal is classified by M uniform sub-bands for each subinterval. The energy of each sub-band is used as a speech feature. This feature is applied to vector quantisation and the hidden Markov model. The new speech feature shows a slightly better recognition rate than the cepstrum for speaker independent speech recognition. The new speech feature also shows a lower standard deviation between speakers than does the cepstrum.

Patent
Yariv Ephraim1, Mazin G. Rahim1
TL;DR: A method and apparatus for speech recognition using second order statistics and linear estimation of cepstral coefficients was proposed in this article. But it is not suitable for the use of hidden Markov models.
Abstract: A method and apparatus for speech recognition using second order statistics and linear estimation of cepstral coefficients In one embodiment, a speech input signal is received and cepstral features are extracted An answer is generated using the extracted cepstral features and a fixed signal independent diagonal matrix as the covariance matrix for the cepstral components of the speech input signal and, for example, a hidden Markov model In another embodiment, a noisy speech input signal is received and a cepstral vector representing a clean speech input signal is generated based on the noisy speech input signal and an explicit linear minimum mean square error cepstral estimator

Journal ArticleDOI
TL;DR: It appeared that cepstrum mean subtraction and phase-corrected RASTA performed equally well for context-dependent and context-independent models when equal amounts of model parameters were used.

Proceedings Article
01 Jan 1998
TL;DR: This paper describes the attempt towards speech inverse mapping by using the mel-frequency cepstrum coe cients to represent the acoustic parameters of the speech signal by using an articulatory-acoustic codebook derived from Maeda's articulatory model.
Abstract: Recovering vocal tract shapes from the speech signal is a well known inversion problem of transformation from the articulatory system to speech acoustics. Most of the studies on this problem in the past have been focused on vowels. There have not been general methods e ective for recovering the vocal tract shapes from the speech signal for all classes of speech sounds. In this paper we describe our attempt towards speech inverse mapping by using the mel-frequency cepstrum coe cients to represent the acoustic parameters of the speech signal. An inversion method is developed based on Kalman ltering and a dynamic-system model describing the articulatory motion. This method uses an articulatory-acoustic codebook derived from Maeda's articulatory model.

01 Jan 1998
TL;DR: The spectral envelope library, offering complete functionality of spectral envelope handling, was developed according to the principles of software engineering.
Abstract: In this project, Spectral Envelopes in Sound Analysis and Synthesis, various methods for estimation, representation, file storage, manipulation, and application of spectral envelopes to sound synthesis were evaluated, improved, and implemented. A prototyping and testing environment was developed, and a function library to handle spectral envelopes was designed and implemented. For the estimation of spectral envelopes, after defining the requirements, the methods LPC, cepstrum, and discrete cepstrum were examined, and also improvements of the discrete cepstrum method (regularization, stochastic (or probabilistic) smoothing, logarithmic frequency scaling, and adding control points). An evaluation with a large corpus of sound data showed the feasibility of discrete cepstrum spectral envelope estimation. After defining the requirements for the representation of spectral envelopes, filter coefficients, spectral representation, break-point functions, splines, formant representation, and high resolution matching pursuit were examined. A combined spectral representation with indication of the regions of formants (called fuzzy formants) was defined to allow for integration of spectral envelopes with precise formant descriptions. For file storage, new data types were defined for the Sound Description Interchange Format (SDIF) standard. Methods for manipulation were examined, especially interpolation between spectral envelopes, and between spectral envelopes and formants, and other manipulations, based on primitive operations on spectral envelopes. For sound synthesis, application of spectral envelopes to additive synthesis, and time-domain or frequency-domain filtering have been examined. For prototyping and testing of the algorithms, a spectral envelope viewing program was developed. Finally, the spectral envelope library, offering complete functionality of spectral envelope handling, was developed according to the principles of software engineering.

Proceedings ArticleDOI
24 Nov 1998
TL;DR: Simulation experiments suggest that the gross pitch error is lower for the proposed method for pitch extraction in noisy environments than for the conventional one.
Abstract: In this paper, we propose a new method for pitch extraction in noisy environments. The point of our method is to get a clear harmonics structure by removing unnecessary frequency components. Simulation experiments suggest that the gross pitch error is lower for the proposed method than for the conventional one.

Proceedings ArticleDOI
12 May 1998
TL;DR: A robust speech recognition method that can cope with additive noise and multiplicative distortions is described and E-CMN (exact cepstrum mean normalization) which is speaker dependent/environment-dependent CMN for speech/non-speech is proposed.
Abstract: A user-friendly speech interface in a car cabin is highly needed for safety reasons. This paper describes a robust speech recognition method that can cope with additive noise and multiplicative distortions. A known additive noise, a source signal of which is available, might be canceled by NLMS-VAD (normalized least mean squares with frame-wise voice activity detection). On the other hand, an unknown additive noise, a source signal of which is not available, is suppressed with CSS (continuous spectral subtraction). Furthermore, various multiplicative distortions are simultaneously compensated with E-CMN (exact cepstrum mean normalization) which is speaker dependent/environment-dependent CMN for speech/non-speech. Evaluation results of the proposed method for car cabin environments are finally described.

Patent
22 May 1998
TL;DR: In this article, a computer-based method and apparatus for classifying statement types using intonation analysis is presented, which identifies a user's potential query when the user responds to information during dialog with an automated dialog system.
Abstract: A computer-based method and apparatus for classifying statement types using intonation analysis. The method and apparatus identify a user's potential query when the user responds to information during dialog with an automated dialog system. Pitch information is extracted, via a cepstrum, from the speech signal. In one embodiment, the pitch intonation is processed to form a smoothed pitch or intonation contour. Then the smoothed pitch contour is processed by a set of shape detectors and this output, together with statistical information, is sent to a rule-based algorithm which attempts to classify the statement type. In another embodiment, the smoothed pitch contour is processed by a pattern recognition system such as a neural network trained with a back-propagation learning algorithm.

Journal ArticleDOI
01 Aug 1998
TL;DR: The differential cepstrum is an important variant of this class of signal transformations and has been defined in terms of the logarithmic derivative of the z transform of a given signal as mentioned in this paper.
Abstract: The use of cepstral parameters is gaining importance in many areas. However, their introduction is usually through an approach which often mars their simplicity and beauty. The differential cepstrum is an important variant of this class of signal transformations. It has been defined in terms of the logarithmic derivative of the z transform of a given signal. However, a more useful approach is through the Cauchy residue theorem, which yields additional insight and properties. The entire concept and additional properties may be developed in a way that leads naturally to the celebrated Newton identities. These identities are developed and elaborated in the paper. Furthermore, they are employed innovatively in signal-processing problems, including the determination of the minimum phase component of a signal, a stability test for linear systems and the detection of abrupt changes in a signal.

Proceedings Article
01 Sep 1998
TL;DR: Preliminary experimental results show that the hybrid speaker verification system performs better than either of the sub-systems in terms of the equal error rate (EER), and improves the performance of the cepstral-based HMM system by 78% on average.
Abstract: In this paper we report on a study of the variability of voice source parameters in the context of speaker characterisation, and we propose a speaker verification system which incorporates these parameters. The motivation for this approach is that, whilst we have conscious control over the action of our vocal tract articulators such as the tongue and jaw, we have only limited voluntary muscle control over the vocal cords. The conjecture is, therefore, that impostors are less likely to be able to mimic vocal cord effects than vocal tract effects. The hybrid speaker verification system that is proposed incorporates two sub-systems to improve the overall performance: (i) a cepstral-based HMM with cohort normalisation and (ii) voice source parameters derived from Multi-cycle Closed-phase Glottal Inverse Filtering (MCGIF). Preliminary experimental results show that the hybrid system performs better than either of the sub-systems in terms of the equal error rate (EER). Specifically, the hybrid system improved the performance of the cepstral-based HMM system by 78% on average, resulting in a mean EER of 0.42% for the specific tests conducted.

Proceedings ArticleDOI
01 Jan 1998
TL;DR: In this paper, spectral and correlation analysis of acceleration, displacement and operational process parameters (e.g., temperature, pressure, steam flow, etc) are used as evaluation tools.
Abstract: Due to safety and economical reasons diagnostic and monitoring systems are of growing interest in all complex industrial production lines Key components of power plants are rotating machineries like mills, blowers, feed water pumps and turbines Diagnostic systems are requested which detect, diagnose and localize faulty operation conditions at an early stage in order to prevent severe failures The knowledge of the vibrational machine signatures and their time dependent behavior are the basis of efficient condition monitoring of rotating machines By the only use of vibration thresholds given by norms and standards often alarms occur without given hints to the source of excitation Therefore, modern measurement techniques in combination with advanced computerized data processing and acquisition show new ways in the field of machine surveillance by use of spectral- and correlation analysis of acceleration, displacement and the operational process-parameters (eg temperature, pressure, steam flow, etc) Time domain analysis using characteristical values to determine changes by trend setting, spectrum analysis to determine trends of frequencies, amplitude and phase relations, correlation analysis to evaluate common sources of excitation by comparing different sensor signals, as well as cepstrum analysis to detect periodical components of spectra are used as evaluation tools

01 Sep 1998
TL;DR: In this article, a method for updating modal models from response measurements is extended from the case of a single impulsive excitation to a more general broadband excitation in the presence of secondary excitations.
Abstract: A method for updating modal models from response measurements is extended from the case of a single impulsive excitation to a more general broadband excitation in the presence of secondary excitations. The original technique was based on analysis of the cepstrum of the response, as forcing function and transfer function effects are additive in the response cepstrum, and also separated if the force log spectrum is reasonably smooth and flat. Use is made of principal components analysis by singular value decomposition to separate the autospectrum of the response at each point to the dominant excitation, which is then curve fitted in the cepstral domain for its poles and zeros to give updated estimates of the FRFs. The resulting FRFs are scaled (because of including the information on zeros) and in the study gave reasonable estimates of the mode shapes when the dominant force was four times larger than the next largest.

Journal Article
Dejonckere Ph1
TL;DR: Factor analysis demonstrates that the cepstrum peak magnitude indeed is sensitive to aperiodicity of vocal fold vibration as well as to insufficient vocal fold closure and excessive turbulent noise escape, and that it may be considered as a relevant acoustic correlate for the G (Grade) parameter of the GRBAS scale.
Abstract: The cepstrum peak magnitude of a /a:/ becomes reduced when either low or high frequency noise increases. In 18 normal subjects and 68 dysphonic patients, perceptual rating according to the GRBAS--system, cepstrum analysis, videostroboscopic vibration pattern quantification, phonation flow measurement, and multidimensional voice analysis was performed. Factor analysis demonstrates that the cepstrum peak magnitude indeed is sensitive to aperiodicity of vocal fold vibration as well as to insufficient vocal fold closure and excessive turbulent noise escape, and that it may be considered as a relevant acoustic correlate for the G (Grade) parameter of the GRBAS scale.

Proceedings ArticleDOI
12 May 1998
TL;DR: This work chooses to model speech by hidden Markov models (HMMs) in the cepstrum domain and the mismatch is reduced by a parametric function, and presents a frame synchronous estimation of these parameters.
Abstract: An acoustic mismatch between a given utterance and a model degrades the performance of the speech recognition process. We choose to model speech by hidden Markov models (HMMs) in the cepstrum domain and the mismatch by a parametric function. In order to reduce the mismatch, one has to estimate the parameters of this function. We present a frame synchronous estimation of these parameters. We show that the parameters can be computed recursively. Thanks to such methods, parameters variations can be tracked. We give general equations and study the particular case of an affine transform. Finally, we report recognition experiments carried out over both PSTN and cellular telephone network to show the efficiency of the method in a real context.

Proceedings ArticleDOI
12 May 1998
TL;DR: A novel kind of speech feature which is the modified Mellin transform of the log-spectrum of the speech signal (short for MMTLS) is presented, which is more appropriate for speaker-independent speech recognition than the popular used cepstrum.
Abstract: This paper presents a novel kind of speech feature which is the modified Mellin transform of the log-spectrum of the speech signal (short for MMTLS). Because of the scale invariance property of the modified Mellin transform, the new feature is insensitive to the variation of the vocal tract length among individual speakers, and thus it is more appropriate for speaker-independent speech recognition than the popular used cepstrum. The preliminary experiments show that the performance of the MMTLS-based method is much better in comparison with those of the LPC- and MFC-based methods. Moreover, the error rate of this method is very consistent for different outlier speakers.

Journal ArticleDOI
TL;DR: A low-power, low-voltage speech processing system intended to he used in remote speech recognition applications where feature extraction is performed on terminal and high-complexity recognition tasks and moved to a remote server accessed through a radio link is presented.
Abstract: In this paper, a low-power, low-voltage speech processing system is presented. The system is intended to he used in remote speech recognition applications where feature extraction is performed on terminal and high-complexity recognition tasks and moved to a remote server accessed through a radio link. The proposed system is based on a CMOS feature extraction chip for speech recognition that computes 15 cepstrum parameters, each 8 ms, and dissipates 30 /spl mu/W at 0.9-V supply. Single-cell battery operation is achieved. Processing relies on a novel feature extraction algorithm using 1-bit A/D conversion of the input speech signal. The chip has been implemented as a gate array in a standard 0.5-/spl mu/m, three-metal CMOS technology. The average energy required to process a single word of the TI46 speech corpus is 10 /spl mu/J. It achieves recognition rates over 98% in isolated-word speech recognition tasks.