Showing papers on "Cepstrum published in 1997"

PDF

Open Access

Journal Article•DOI•

On-line signature verification using LPC cepstrum and neural networks

[...]

Quen-Zong Wu¹, I-Chang Jou, Suh-Yin Lee•Institutions (1)

01 Feb 1997

TL;DR: An on-line signature verification scheme based on linear prediction coding (LPC) cepstrum and neural networks is proposed that can detect the genuineness of the input signatures from a test database with an error rate as low as 4%

...read moreread less

Abstract: An on-line signature verification scheme based on linear prediction coding (LPC) cepstrum and neural networks is proposed. Cepstral coefficients derived from linear predictor coefficients of the writing trajectories are calculated as the features of the signatures. These coefficients are used as inputs to the neural networks. A number of single-output multilayer perceptrons (MLPs), as many as the number of words in the signature, are equipped for each registered person to verify the input signature. If the summation of output values of all MLPs is larger than the verification threshold, the input signature is regarded as a genuine signature; otherwise, the input signature is a forgery. Simulations show that this scheme can detect the genuineness of the input signatures from a test database with an error rate as low as 4%.

...read moreread less

79 citations

Proceedings Article•

Using formant frequencies in speech recognition.

[...]

John Holmes, Wendy Holmes, Philip N. Garner

01 Jan 1997

TL;DR: A new method of formant analysis is described which includes techniques to overcome both of the above difficulties and shows that including formant features can offer increased accuracy over using cepstrum features only.

...read moreread less

Abstract: Formant frequencies have rarely been used as acoustic features for speech recognition, in spite of their phonetic significance For some speech sounds one or more of the formants may be so badly defined that it is not useful to attempt a frequency measurement Also, it is often difficult to decide which formant labels to attach to particular spectral peaks This paper describes a new method of formant analysis which includes techniques to overcome both of the above difficulties Using the same data and HMM model structure, results are compared between a recognizer using conventional cepstrum features and one using three formant frequencies, combined with fewer cepstrum features to represent general spectral trends For the same total number of features, results show that including formant features can offer increased accuracy over using cepstrum features only

...read moreread less

77 citations

Journal Article•DOI•

Comparison of cepstrum-based methods for radial blind deconvolution of ultrasound images

[...]

Torfinn Taxt¹•Institutions (1)

University of Bergen¹

01 May 1997-IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control

TL;DR: In this paper, the performance of seven different cepstrum-based methods for radial blind deconvolution of medical ultrasound images was compared, and the results showed that the generalized cepstrum method gave the best images closely followed by the complex cepstrate using phase unwrapping or polynomial rooting.

...read moreread less

Abstract: This paper compares the performance of seven different cepstrum-based methods for radial blind deconvolution of medical ultrasound images. The first is the generalized cepstrum method. The second is the spectral root cepstrum method. These methods have received little attention so far. The last five methods are all based on the complex cepstrum, but different computational techniques in the spatial and frequency domain are employed. Using in vivo radio frequency data from a clinical scanner, the generalized cepstrum method gave the best images closely followed by the complex cepstrum using phase unwrapping or polynomial rooting. The complex cepstrum method using higher-order statistics was ranked as low as number five. These results are an important guideline for selecting a specific cepstrum-based radial deconvolution method for implementation in ultrasound scanners.

...read moreread less

70 citations

Journal Article•DOI•

A new cepstral prefiltering technique for estimating time delay under reverberant conditions

[...]

Alex Stephenne¹, Benoit Champagne¹•Institutions (1)

Institut national de la recherche scientifique¹

01 Jun 1997-Signal Processing

TL;DR: This article presents and evaluates a new cepstral prefiltering technique which can be applied on the received signals before the actual TDE in order to obtain a more accurate estimate of the delay in a typical reverberant environment.

...read moreread less

63 citations

Journal Article•DOI•

Detection of corrugation and wheelflats of railway wheels using energy and cepstrum analysis of rail acceleration

[...]

A. Bracciali¹, Gaetano Cascini¹•Institutions (1)

University of Florence¹

01 Mar 1997

TL;DR: In this article, an experimental and numerical procedure for detection of wheel corrugation and wheelflats has been developed and validated; it processes rail acceleration signals collected by using combined energy and cepstrum analysis criteria.

...read moreread less

Abstract: Rolling stock and track damage due to localized (wheelflats) and global (corrugation) railway wheel tread defects is extremely serious, and several devices have been developed to detect these defects. An original experimental and numerical procedure for detection of wheel corrugation and wheelflats has been developed and validated; it processes rail acceleration signals collected by using combined energy and cepstrum analysis criteria. The use of cepstrum analysis proved to be particularly useful as it allows the discrimination of wheelflats independently from the presence of other defects, even when their effects are hidden in globally high acceleration levels due to heavy corrugation. A short survey of damage induced by wheelflats is presented; optimal measurement conditions and extensive examples are then detailed. The results are discussed with particular reference to the applicability of the technique developed to automated detection devices.

...read moreread less

37 citations

Patent•

Adaptive equalization of multipath signals

[...]

Per Enge, Dominic Farmer, John F. Schipper

06 May 1997

TL;DR: In this article, the authors used a plurality of correlators to improve the estimate of direct signal arrival time by identifying detailed features of a correlation function at and adjacent to the correlation peak.

...read moreread less

Abstract: Method and apparatus for using a plurality of correlators to improve the estimate of direct signal arrival time by identifying detailed features of a correlation function at and adjacent to the correlation peak. The errors in location of the center point of a correlation function R(τ), formed by the received signal and a stored copy of the expected signal, are assumed to be strongly correlated for narrow sample spacing and wide sample spacing of the correlation function. Alternatively, the multipath signal strengths and phases are estimated by a least mean squares analysis, using multiple sampling of a correlation function of an expected signal and an arriving composite signal that includes the direct signal and one or more multipath signals. Times of arrival or path delays of the direct signal and the multipath signals are determined separately. Path delays can be determined by at least three approaches: (1) identification of slope transition points in the correlation function R(τ); (2) Cepstrum processing of the received signal, using Fourier transform and inverse transform analysis; and (3) use of a grid of time shift points for the correlation function, and identification of time shift values, associated with certain solution parameters for a least mean squares analysis that have the largest absolute values, as times of arrival of the direct and multipath signals. Separate identification of path delays reduces the least mean squares analysis to a solvable linear problem. A modified received signal is constructed, with multipath signal(s) approximately removed.

...read moreread less

36 citations

DOI•

Temporal processing of speech in a time-feature space

[...]

Carlos Avendano, Hynek Hermansky

01 Jan 1997

TL;DR: This work studies the processing of speech in the temporal-feature or modulation spectrum domain, aiming for alleviation of the effects of such disturbances, and analytically derive and discusses some properties and merits of temporal processing for speech signals.

...read moreread less

Abstract: The performance of speech communication systems often degrades under realistic environmental conditions. Adverse environmental factors include additive noise sources, room reverberation, and transmission channel distortions. This work studies the processing of speech in the temporal-feature or modulation spectrum domain, aiming for alleviation of the effects of such disturbances. Speech reflects the geometry of the vocal organs, and the linguistically dominant component is in the shape of the vocal tract. At any given point in time, the shape of the vocal tract is reflected in the short-time spectral envelope of the speech signal. The rate of change of the vocal tract shape appears to be important for the identification of linguistic components. This rate of change, or the rate of change of the short-time spectral envelope can be described by the modulation spectrum, i.e. the spectrum of the time trajectories described by the short-time spectral envelope. For a wide range of frequency bands, the modulation spectrum of speech exhibits a maximum at about 4 Hz, the average syllabic rate. Disturbances often have modulation frequency components outside the speech range, and could in principle be attenuated without significantly affecting the range with relevant linguistic information. Early efforts for exploiting the modulation spectrum domain (temporal processing), such as the dynamic cepstrum or the RASTA processing, used ad hoc designed processing and appear to be suboptimal. As a major contribution, in this dissertation we aim for a systematic data-driven design of temporal processing. First we analytically derive and discuss some properties and merits of temporal processing for speech signals. We attempt to formalize the concept and provide a theoretical background which has been lacking in the field. In the experimental part we apply temporal processing to a number of problems including adaptive noise reduction in cellular telephone environments, reduction of reverberation for speech enhancement, and improvements on automatic recognition of speech degraded by linear distortions and reverberation.

...read moreread less

34 citations

Patent•

Start/end point detection for word recognition

[...]

Thomas Hörmann¹, Gregor Rozinaj¹•Institutions (1)

Alcatel-Lucent¹

12 May 1997

TL;DR: In this paper, a check quantity or a maximum function of a distribution function is calculated, which detects the start/end points by comparison with a threshold, which can be used for speech recognition of words.

...read moreread less

Abstract: During speech recognition of words, a precise and strong detection of start/end points of the words must be ensured, even in very noisy surroundings. Use of a feature with noise-resistant properties is shown wherein for a feature vector, a function of the signal energy is formed as the first feature and a function of the quadratic difference of an LPC (Linear-Predictive-Coding) cepstrum coefficient as a second feature. A check quantity or a maximum function of a distribution function is calculated, which detects the start/end points by comparison with a threshold.

...read moreread less

32 citations

Book Chapter•DOI•

Speaker Identification Using Harmonic Structure of LP-residual Spectrum

[...]

Shoji Hayakawa¹, Kazuya Takeda¹, Fumitada Itakura¹•Institutions (1)

Nagoya University¹

12 Mar 1997

TL;DR: It is shown that PDSS can compensate for the LPC cepstrum and delta cepStrum for improving speaker identification performance and is proposed as a new feature parameter to extract information of the harmonic structure of the linear prediction residual spectrum.

...read moreread less

Abstract: The harmonic structure of LP-residual spectrum is different in speakers. Therefore the harmonic structure may be useful for speaker recognition. In order to prove this hypothesis, Power Difference of Spectra in Subband (PDSS) is proposed as a new feature parameter to extract information of the harmonic structure of the linear prediction residual spectrum. VQ-based text-independent speaker identification experiments for 25 male and 25 female speakers are conducted to investigate the speaker identification ability of PDSS. Experimental results show that PDSS alone provides 66.9% maximal identification. In addition, it was found that the LPC cepstrum combined with PDSS results in a 41.2% reduction in identification errors compared with using only the LPC cepstrum. Moreover, a 52.4% reduction of identification errors over using only LPC cepstrum is attained by combining the LPC cepstrum with both delta cepstrum and PDSS. It is shown that PDSS can compensate for the LPC cepstrum and delta cepstrum for improving speaker identification performance.

...read moreread less

28 citations

Journal Article•DOI•

Speech recognition in a noisy car environment based on LP of the one-sided autocorrelation sequence and robust similarity measuring techniques

[...]

Javier Hernando¹, Climent Nadeu¹, José B. Mariño¹•Institutions (1)

Polytechnic University of Catalonia¹

01 Feb 1997-Speech Communication

TL;DR: The aim of this work is to show that OSALPC also achieves a good performance in a case of real noisy speech (in a car environment), and to explore its combination with several robust similarity measuring techniques, showing that its performance improves by using cepstral liftering, dynamic features and multilabeling.

...read moreread less

21 citations

Journal Article•DOI•

A fast algorithm for finding the adaptive component weighted cepstrum for speaker recognition

[...]

M.S. Zilovic, Ravi P. Ramachandran, Richard J. Mammone

01 Jan 1997-IEEE Transactions on Speech and Audio Processing

TL;DR: This new method, which avoids root finding, reduces the computer time significantly and imposes negligible overhead when compared with the approach of finding the LP cepstrum.

...read moreread less

Abstract: In speaker recognition systems, the adaptive component weighted (ACW) cepstrum has been shown to be more robust than the conventional linear predictive (LP) cepstrum. The ACW cepstrum is derived from a pole-zero transfer function whose denominator is the pth-order LP polynomial A(z). The numerator is a (p-1)th-order polynomial that is up to now found as follows. The roots of A(z) are computed, and the corresponding residues obtained by a partial fraction expansion of 1/A(z) are set to unity. Therefore, the numerator is the sum of all the (p-1)th-order cofactors of A(z). We show that the numerator polynomial is merely the derivative of the denominator polynomial A(z). This greatly speeds up the computation of the numerator polynomial coefficients since it involves a simple scaling of the denominator polynomial coefficients. Root finding is completely eliminated. Since the denominator is guaranteed to be minimum phase and the numerator can be proven to be minimum phase, two separate recursions involving the polynomial coefficients establishes the ACW cepstrum. This new method, which avoids root finding, reduces the computer time significantly and imposes negligible overhead when compared with the approach of finding the LP cepstrum.

...read moreread less

Proceedings Article•DOI•

Cepstrum-based filter-bank design using discriminative feature extraction training at various levels

[...]

A. Biem, Shigeru Katagiri

21 Apr 1997

TL;DR: Application to vowel and noisy telephone speech recognition tasks shows that the DFE method realization of optimal filter bank-based cepstral parameters realizes a more robust classifier by appropriate feature extraction.

...read moreread less

Abstract: This paper investigates the realization of optimal filter bank-based cepstral parameters. The framework is the discriminative feature extraction method (DFE) which iteratively estimates the filter-bank parameters according to the errors that the system makes. Various parameters of the filter-bank, such as center frequency, bandwidth, and gain are optimized using a string-level optimization and a frame-level optimization scheme. Application to vowel and noisy telephone speech recognition tasks shows that the DFE method realizes a more robust classifier by appropriate feature extraction.

...read moreread less

Journal Article•

EMG Pattern Recognition based on Evidence Accumulation for Prosthesis Control

[...]

Seok-Pil Lee, Sang-Hui Park

01 Oct 1997-Journal of electrical engineering and information science

TL;DR: In this article, a method of electromyographic (EMG) pattern recognition to identify motion commands for the control of a prosthetic arm by evidence accumulation with multiple parameters is presented.

...read moreread less

Abstract: We present a method of electromyographic(EMG) pattern recognition to identify motion commands for the control of a prosthetic arm by evidence accumulation with multiple parameters. Integral absolute value, variance, autoregressive(AR) model coefficients, linear cepstrum coefficients, and adaptive cepstrum vector are extracted as feature parameters from several time segments of the EMG signals. Pattern recognition is carried out through the evidence accumulation procedure using the distances measured with reference parameters. A fuzzy mapping function is designed to transform the distances for the application of the evidence accumulation method. Results are presented to support the feasibility of the suggested approach for EMG pattern recognition.

...read moreread less

Proceedings Article•DOI•

Fast speech recognition algorithm under noisy environment using modified CMS-PMC and improved IDMM+SQ

[...]

Hiroki Yamamoto¹, Tetsuo Kosaka, Masayuki Yamada, Yasuhiro Komori, M. Fujita - Show less +1 more•Institutions (1)

Canon Inc.¹

21 Apr 1997

TL;DR: An integration of parallel model combination and modified cepstral mean subtraction (MCMS) which estimates the cepstrum mean by taking account of the additive noise and new techniques to create the noise-adapted scalar quantized codebook are proposed.

...read moreread less

Abstract: We describe a fast speech recognition algorithm under a noisy environment. To achieve accurate and fast speech recognition under a noisy environment, a very fast speech recognition algorithm with well-adapted model against the noisy environment is required. First, for the model adaptation, we propose the MCMS-PMC: an integration of parallel model combination (PMC) and modified cepstral mean subtraction (MCMS) which estimates the cepstrum mean by taking account of the additive noise. Then, for the fast speech recognition, we propose new techniques to create the noise-adapted scalar quantized codebook in order to introduce the MCMS-PMC into the IDMM+SQ, which we proposed previously as a fast speech recognition algorithm using the scalar quantization approach. Finally, an effect of the proposed method is shown through the speaker-independent telephone-bandwidth continuous speech recognition experiment.

...read moreread less

Proceedings Article•DOI•

A speech enhancement approach E-CMN/CSS for speech recognition in car environments

[...]

M. Shozakai¹, Satoshi Nakamura, Kiyohiro Shikano•Institutions (1)

Nara Institute of Science and Technology¹

14 Dec 1997

TL;DR: A high robustness of the proposed E-CMN/CSS (Exact Cepstrum Mean Normalization)/CSS (Continuous Spectral Subtraction) approach is clarified by comparative evaluation with alternative methods for speech recognition tasks in car environments.

...read moreread less

Abstract: This paper proposes the robust speech enhancement approach E-CMN (Exact Cepstrum Mean Normalization)/CSS (Continuous Spectral Subtraction). The E-CMN, which we proposed for compensation of multiplicative distortions (Shozakai et al., 1997), calculates two cepstrum mean vectors, one for speech for each speaker and the other for non-speech for each environment. The CSS subtracts continuously average spectra at every frame. A high robustness of the proposed method is clarified by comparative evaluation with alternative methods for speech recognition tasks in car environments.

...read moreread less

Proceedings Article•DOI•

Bispectrum features for robust speaker identification

[...]

S. Wenndt, S. Shamsunder

21 Apr 1997

TL;DR: Part of the bispectrum is used as a new feature and its usefulness in varying noise settings is demonstrated and it is shown how it can be used for robust speaker identification.

...read moreread less

Abstract: Along with the spoken message, speech contains information about the identity of the speaker. Thus, the goal of speaker identification is to develop features which unique to each speaker. This paper explores a new feature for speech and shows how it can be used for robust speaker identification. The results are compared to the cepstrum feature due to its widespread use and success in speaker identification applications. The cepstrum, however, has shown a lack of robustness in varying conditions, especially in a cross-condition environment where the classifier has been trained with clean data but then tested on corrupted data. Part of the bispectrum is used as a new feature and we demonstrate its usefulness in varying noise settings.

...read moreread less

Proceedings Article•DOI•

Linear dynamic segmental HMMs: variability representation and training procedure

[...]

W.J. Holmes, M.J. Russell

21 Apr 1997

TL;DR: Investigations into the use of linear dynamic segmental hidden Markov models for modelling speech feature-vector trajectories and their associated variability indicate that a linear trajectory is a reasonable approximation when using models with three states per phone.

...read moreread less

Abstract: This paper describes investigations into the use of linear dynamic segmental hidden Markov models (SHMMs) for modelling speech feature-vector trajectories and their associated variability. These models use linear trajectories to describe how features change over time, and distinguish between extra-segmental variability of different trajectories and intra-segmental variability of individual observations around any one trajectory. Analyses of mel cepstrum features have indicated that a linear trajectory is a reasonable approximation when using models with three states per phone. Good recognition performance has been demonstrated with linear SHMMs. This performance is, however, dependent on the model initialisation and training strategy, and on representing the distributions accurately according to the model assumptions.

...read moreread less

Proceedings Article•DOI•

Unsupervised speaker classification using self-organizing maps (SOM)

[...]

I. Voitovetsky¹, Hugo Guterman, Arnon D. Cohen•Institutions (1)

Ben-Gurion University of the Negev¹

24 Sep 1997

TL;DR: An algorithm for unsupervised speaker classification using Kohonen SOM is presented and correct classification of more than 90% was demonstrated.

...read moreread less

Abstract: An algorithm for unsupervised speaker classification using Kohonen SOM is presented. The system employs 6/spl times/10 SOM networks for each speaker and for non-speech segments. The algorithm was evaluated using high quality as well as telephone quality conversations between two speakers. Correct classification of more than 90% was demonstrated. High quality conversation between three speakers yielded 80% correct classification. The high quality speech required the use of 12/sup th/ order cepstral coefficients vector. In telephone quality speech, an additional 12 features of the difference of the cepstrum were required.

...read moreread less

Proceedings Article•DOI•

Efficient encoding of mel-generalized cepstrum for CELP coders

[...]

Kazuhito Koishida¹, Keiichi Tokuda², Takao Kobayashi¹, Satoshi Imai¹•Institutions (2)

Tokyo Institute of Technology¹, Nagoya Institute of Technology²

21 Apr 1997

TL;DR: A CELP coder is implemented in which mel-generalized cepstral coefficients are quantized using MA prediction, which has a higher objective quality than conventional C ELP.

...read moreread less

Abstract: The performance of several algorithms for the quantization of the mel-generalized cepstral coefficients is studied. First, the objective and subjective performance of two-stage vector quantization (VQ) is measured. It is shown that the subjective quality for the mel-generalized cepstral coefficients is higher than that for LSP. Secondly, interframe prediction is introduced in the encoding of mel-generalized cepstral coefficients. By utilizing interframe moving average (MA) prediction, the mel-generalized cepstral coefficients can be encoded more efficiently than LSP in terms of cepstral distortion. Finally, we implement a CELP coder based on mel-generalized cepstral analysis in which mel-generalized cepstral coefficients are quantized using MA prediction. This coder has a higher objective quality than conventional CELP.

...read moreread less

Proceedings Article•DOI•

A robust and fast endpoint detection algorithm for isolated word recognition

[...]

Yiying Zhang¹, Xiaoyan Zhu, Yu Hao, Yupin Luo•Institutions (1)

Tsinghua University¹

28 Oct 1997

TL;DR: A fast and robust algorithm for accurately locating the endpoints of isolated words is described in detail that utilizes energy and zero crossing parameters to acquire the reference endpoints and the principle of variable frame rate (VFR) is adopted and cepstrum is used to accurately define the boundaries of isolated Words.

...read moreread less

Abstract: The problem of automatic word boundary detection in a quiet environment and in the presence of noise is addressed. A fast and robust algorithm for accurately locating the endpoints of isolated words is described in detail. This algorithm utilizes energy and zero crossing parameters to acquire the reference endpoints, and then the principle of variable frame rate (VFR) is adopted and cepstrum is used to accurately define the boundaries of isolated words. Experimental results show that the accuracy of the algorithm is quite acceptable. Moreover, the computation overload of this algorithm is low since the cepstrum parameters will be used in later recognition procedure.

...read moreread less

Proceedings Article•DOI•

A unified maximum likelihood approach to acoustic mismatch compensation: application to noisy Lombard speech recognition

[...]

Mohamed Afify¹, Yifan Gong, Jean-Paul Haton•Institutions (1)

French Institute for Research in Computer Science and Automation¹

21 Apr 1997

TL;DR: A maximum likelihood approach for joint estimation of both mel cepstral and linear spectral biases from the observed mismatched speech given only one set of clean speech models is presented, and significant improvement in the word recognition rate is achieved.

...read moreread less

Abstract: In the context of continuous density hidden Markov model (CDHMM) we present a unified maximum likelihood (ML) approach to acoustic mismatch compensation. This is achieved by introducing additive Gaussian biases at the state level in both the mel cepstral and linear spectral domains. Flexible modelling of different mismatch effects can be obtained through appropriate bias tying. A maximum likelihood approach for joint estimation of both mel cepstral and linear spectral biases from the observed mismatched speech given only one set of clean speech models is presented, where the obtained bias estimates are used for the compensation of clean speech models during decoding. The proposed approach is applied to the recognition of noisy Lombard speech, and significant improvement in the word recognition rate is achieved.

...read moreread less

Proceedings Article•DOI•

Prediction of speech quality using radial basis functions neural networks

[...]

M.M. Meky¹, T.N. Saadawi•Institutions (1)

City College of New York¹

01 Jul 1997

TL;DR: After extensive experimentation and validation of the proposed techniques, the results indicate that the proposed technique is shown to be effective for estimating the coded speech quality.

...read moreread less

Abstract: The goal of this paper is to propose a new perceptually-based objective technique that uses radial basis functions neural networks, instead of regression algorithms, to estimate the nonlinear mapping function that best represents the relationship among input (perceptual parameters) and output (speech quality) variables in a database. In the proposed technique, the perceptual parameters are obtained by: (1) emulating several known features of perceptual processing of speech sounds by the human ear (including critical-band masking, equal loudness, and the intensity-loudness power law operations) to map the speech power spectrum into the auditory power spectrum (bark domain), (2) deriving the perceptual LPC coefficients from the auditory spectrum that is used to calculate, for each frame, the cepstrum distance between the input and the output coded speech signals; (3) using the radial basis functions neural network to map the perceptual cepstrum distance per frame into the corresponding estimated speech quality. After extensive experimentation and validation of the proposed techniques, the results indicate that the proposed technique is shown to be effective for estimating the coded speech quality.

...read moreread less

Journal Article•DOI•

System reconstruction from higher order spectra slices

[...]

Athina P. Petropulu¹, Udantha R. Abeyratne•Institutions (1)

Drexel University¹

01 Sep 1997-IEEE Transactions on Signal Processing

TL;DR: It is established that the impulse response of a complex system can be reconstructed up to a scalar and a shift based on any pair of HOS slices, as long as the distance between the two slices satisfies a certain condition.

...read moreread less

Abstract: We consider the problem of system reconstruction from higher order spectra (HOS) slices. We establish that the impulse response of a complex system can be reconstructed up to a scalar and a shift based on any pair of HOS slices, as long as the distance between the two slices satisfies a certain condition. One slice is sufficient for the reconstruction in the case of a real system. We propose a cepstrum-based method for system reconstruction. We also propose a new method for the reconstruction of the system Fourier phase based on the phase of any odd-indexed bispectrum slice. Being able to choose the slices to be used in the reconstruction allows us to avoid bispectrum regions dominated by noise.

...read moreread less

Proceedings Article•

CDHMM speaker recognition by means of frequency filtering of filter-bank energies

[...]

Francisco Javier Hernando Pericás, Climent Nadeu Camprubí

01 Jan 1997

TL;DR: Frequency filtering approximately equalizes the cepstrum variance, enhancing the oscillations of the spectral envelope curve that are most effective for discriminating between speakers.

...read moreread less

Abstract: Recently, the set of spectral parameters of every speech frame that result from filtering the frequency sequence of mel-scaled filter-bank energies with a simple first-order high-pass FIR filter have proved to be an efficient speech representation in terms of both speech recognition rate and computational load. In this paper, we apply the same technique to speaker recognition. Frequency filtering approximately equalizes the cepstrum variance, enhancing the oscillations of the spectral envelope curve that are most effective for discriminating between speakers. In this way, even better speaker identification results than using conventional mel-cepstrum were observed in continuous observation Gaussian density HMM, especially in noisy conditions.

...read moreread less

Patent•

Speech recognition method and apparatus, a computer-readable storage medium, and a computer- readable program for obtaining the mean of the time of speech and non-speech portions of input speech in the cepstrum dimension

[...]

Tetsuo Kosaka¹, Yasunori Ohora¹•Institutions (1)

Canon Inc.¹

20 Mar 1997

TL;DR: In this paper, an apparatus and a step for speech recognition includes a device and step for obtaining a mean of the time of a speech portion in the Cepstrum dimension from the speech portion of the input speech.

...read moreread less

Abstract: An apparatus and method for speech recognition includes a device and a step for obtaining a mean of the time of a speech portion in the Cepstrum dimension from the speech portion of the input speech, a device and step for obtaining a mean of a time of the non-speech portion in the Cepstrum dimension from the non-speech portion of the input speech, a device and step for converting each mean time from a Cepstrum region to a linear region, and after that, subtracting it on a linear spectrum dimension, converting the subtracted mean into a Cepstrum dimension, subtracting a mean of a time of a speech portion in a Cepstrum dimension in a speech database for learning from the converted result, and adding the subtracted result to a speech model expressed by Cepstrum. By this arrangement, even when noise is large, the presumed precision of a line fluctuation is raised and the recognition rate can be improved.

...read moreread less

Journal Article•DOI•

Iterative noise and channel estimation under the stochastic matching algorithm framework

[...]

Olivier Siohan¹, Chin-Hui Lee•Institutions (1)

AT&T Labs¹

01 Nov 1997-IEEE Signal Processing Letters

TL;DR: An unsupervised iterative algorithm to adapt HMMs trained using clean speech in order to recognize speech corrupted by an additive and a convolutional noise is introduced.

...read moreread less

Abstract: In this letter, we introduce an unsupervised iterative algorithm to adapt HMMs trained using clean speech in order to recognize speech corrupted by an additive and a convolutional noise. Both types of noise are considered as stochastic processes that can be modeled using HMMs and can be estimated by applying Sankar's stochastic matching (SM) algorithm successively in the cepstral and in the linear spectral domain. These estimates are derived directly from the given test speech signal and the set of clean speech models, and lead to the estimation of a new set of HMMs that maximize the likelihood of the test signal.

...read moreread less

Proceedings Article•DOI•

Auditory masking based acoustic front-end for robust speech recognition

[...]

Kuldip K. Paliwal¹, B.T. Lilly•Institutions (1)

Griffith University¹

02 Dec 1997

TL;DR: This paper presents an acoustic front-end which uses the properties of auditory masking for extracting acoustic features from the speech signal using a masking threshold as a function of frequency for a given speech frame from its power spectrum.

...read moreread less

Abstract: This paper presents an acoustic front-end which uses the properties of auditory masking for extracting acoustic features from the speech signal. Using the properties of simultaneous masking found in the human auditory system, we compute a masking threshold as a function of frequency for a given speech frame from its power spectrum. All those portions of the power spectrum which are below the auditory threshold are not heard by the human auditory system due to masking effects and hence can be discarded. These portions are replaced by the corresponding portions in the masking threshold spectrum. This modified power spectrum is processed by the linear prediction analysis or homomorphic analysis procedure to derive cepstral features for each speech frame. We study the performance of this front-end for speech recognition under noisy environments. This front-end performs significantly better than the conventional linear prediction or homomorphic analysis based front-ends for noisy speech. In terms of signal-to-noise ratio, simultaneous masking offers an advantage of more than 5 dB over the LPCC front-end in isolated word recognition experiments and 3 dB in continuous speech recognition experiments.

...read moreread less

Proceedings Article•DOI•

Correlation based speech formant recovery

[...]

D. Nelson

21 Apr 1997

TL;DR: A new method for generating speech spectrograms based on an autocorrelation function whose parameters are chosen provide processing gain and formant resolution, while minimizing pitch artifacts in the spectrum.

...read moreread less

Abstract: A new method for generating speech spectrograms is presented. This algorithm is based on an autocorrelation function whose parameters are chosen provide processing gain and formant resolution, while minimizing pitch artifacts in the spectrum. Crisp formants are produced, and the power ratio of the formants can be adjusted by pre-filtering the data. The autocorrelation process is functionally equivalent to a time-smoothed, windowed Wigner distribution. The process is an improvement over the normal FFT implementation since it requires much less data to resolve the speech formants, and it is an improvement over the un-smoothed Wigner distribution since the cross-terms normally associated with the Wigner distribution are greatly attenuated by the smoothing operation.

...read moreread less

Proceedings Article•DOI•

Fast approximate DCT: basic-idea, error analysis, applications

[...]

Abdulnasir Hossen¹, Ulrich Heute¹•Institutions (1)

University of Kiel¹

21 Apr 1997

TL;DR: The basic idea of the SB-DCT is discussed which is based on subband decomposition of the input sequence and the complexity of this fast approximate method is examined in comparing it with a fast cosine-transform method in terms of program running-time.

...read moreread less

Abstract: The discrete cosine transform (DCT) has a variety of applications in image and speech processing. The idea of the subband-DFT (SB-DFT) is applied by Jung, Mitra and Mukherjee (see IEEE Trans. on Circuits and Systems for Video Technology, vol.6, no.3, 1996) to the DCT. In this paper the basic idea of the SB-DCT is discussed which is based on subband decomposition of the input sequence. Approximation is done by discarding the computations of bands of little energy. The complexity of this fast approximate method is examined in comparing it with a fast cosine-transform method in terms of program running-time. New accurate analysis of the errors due to the approximation is presented for any number of decomposition stages. New applications of the SB-DCT in the speech cepstrum analysis and in echo detection are also included by using the SB-DCT instead of the full-band FFT in calculating the real and complex cepstra.

...read moreread less

Book Chapter•DOI•

Quantitative Estimation of Scatterer Spacing from Backscattered Ultrasound Signals Using the Complex Cepstrum

[...]

Rashidus S. Mia¹, Murray H. Loew¹, Keith A. Wear², Robert F. Wagner²•Institutions (2)

George Washington University¹, Center for Devices and Radiological Health²

09 Jun 1997

TL;DR: A new method of estimating the distance between regularly-spaced coherent scatterers within soft tissue from backscattered radio-frequency (RF) signals is presented, using simulation data to show that periodic components in the RF signal manifest themselves as peaks in the quefrency (cepstral) domain.

...read moreread less

Abstract: This paper presents a new method of estimating the distance between regularly-spaced coherent scatterers within soft tissue from backscattered radio-frequency (RF) signals. Periodic components in the RF signal manifest themselves as peaks in the quefrency (cepstral) domain. Using simulation data, we show that these peaks are easier to detect using the complex cepstrum rather than the commonly used power cepstrum. Similar improvements are seen using phantom and in vivo liver data.

...read moreread less