Showing papers on "Cepstrum published in 2005"

PDF

Open Access

Journal Article•DOI•

Vocal tract normalization equals linear transformation in cepstral space

[...]

Michael Pitz¹, Hermann Ney¹•Institutions (1)

15 Aug 2005-IEEE Transactions on Speech and Audio Processing

TL;DR: In this paper, the Jacobian determinant of the transformation matrix is computed analytically for three typical warping functions and it is shown that the matrices are diagonal dominant and thus can be approximated by quindiagonal matrices.

...read moreread less

Abstract: Vocal tract normalization (VTN) is a widely used speaker normalization technique which reduces the effect of different lengths of the human vocal tract and results in an improved recognition accuracy of automatic speech recognition systems. We show that VTN results in a linear transformation in the cepstral domain, which so far have been considered as independent approaches of speaker normalization. We are now able to compute the Jacobian determinant of the transformation matrix, which allows the normalization of the probability distributions used in speaker-normalization for automatic speech recognition. We show that VTN can be viewed as a special case of Maximum Likelihood Linear Regression (MLLR). Consequently, we can explain previous experimental results that improvements obtained by VTN and subsequent MLLR are not additive in some cases. For three typical warping functions the transformation matrix is calculated analytically and we show that the matrices are diagonal dominant and thus can be approximated by quindiagonal matrices.

...read moreread less

217 citations

Book•

Electronic Warfare Target Location Methods

[...]

Richard A. Poisel

31 Jan 2005

TL;DR: This is a newly revised and greatly expanded edition of a classic Artech House book that offers practical guidance in electronic warfare target location and provides practitioners with critical information on a variety of geolocation algorithms and techniques.

...read moreread less

Abstract: Introduction to Emitter Geolocation -Introduction. Gradient Descent Algorithm. Concluding Remarks. Triangulation -Introduction. Basic Concepts. Least-Squares Error Estimation. Total Least-Squares Estimation. Least-Squares Distance Error PF Algorithm. Minimum Mean-Squares Error Estimation. The Discrete Probability Density Method. Generalized Bearings. Maximum Likelihood PF Algorithm. Multiple Sample Correlation. Bearing-Only Target Motion Analysis. Sources of Error in Triangulation. Concluding Remarks. DF Techniques -Introduction. Array Processing Direction of Arrival Measurement Methods. Other Methods of Estimating the AOA. MSE Phase Interferometer. DF with a Butler Matrix. Phase Difference Estimation Using SAW Devices. Concluding Remarks. MUSIC -Introduction. MUSIC Overview. MUSIC. Performance of MUSIC in the Presence of Modeling Errors. Determining the Number of Wavefields. Effect of Phase Errors on the Accuracy of MUSIC. Other Superresolution Algorithms. Concluding Remarks. Quadratic Position-Fixing Methods -Introduction. TDOA Position-Fixing Techniques. Differential Doppler. Range Difference Methods. Concluding Remarks. Time Delay Estimation -Introduction. System Overview. Cross Correlation. Generalized Cross-Correlation. Estimating the Time Delay with the Generalized Correlation Method. Time Delay Estimation Using the Phase of the Cross-Spectral Density. Effects of Frequency and Phase Errors in EW TDOA Direction-Finding Systems. Concluding Remarks. Single-Site Location Techniques -Introduction. HF Signal Propagation. Single-Site Location. Passive SSL. Determining the Reflection Delay with the Cepstrum. MUSIC Cepstrum SSL. Earth Curvature. Skywave DF Errors. Ray Tracing. Accuracy Comparison of SSL and Triangulation for Ionospherically Propagated Signals. Concluding Remarks.

...read moreread less

151 citations

Efficient spectral envelope estimation and its application to pitch shifting and envelope preservation

[...]

Axel Roebel, Xavier Rodet

01 Sep 2005

TL;DR: In this article, a cepstrum-based iterative true envelope estimator is proposed for pitch shifting with preservation of the spectral envelope in the phase vocoder, which can reduce the run time by a factor of 2.5-11.

...read moreread less

Abstract: In this article the estimation of the spectral envelope of sound signals is addressed. The intended application for the developed algorithm is pitch shifting with preservation of the spectral envelope in the phase vocoder. As a first step the different existing envelope estimation algorithms are investigated and their specific properties discussed. As the most promising algorithm the cepstrum based iterative true envelope estimator is selected. By means of controlled sub-sampling of the log amplitude spectrum and by means of a simple step size control for the iterative algorithm the run time of the algorithm can be decreased by a factor of 2.5-11. As a remedy for the ringing effects in the the spectral envelope that are due to the rectangular filter used for spectral smoothing we propose the use of a Hamming window as smoothing filter. The resulting implementation of the algorithm has slightly increased computational complexity compared to the standard LPC algorithm but offers significantly improved control over the envelope characteristics. The application of the true envelope estimator in a pitch shifting application is investigated. The main problems for pitch shifting with envelope preservation in a phase vocoder are identified and a simple yet efficient remedy is proposed.

...read moreread less

146 citations

Journal Article•DOI•

Indirect measurement of cylinder pressure from diesel engines using acoustic emission

[...]

M H El-Ghamry¹, John Alexander Steel¹, Robert Lewis Reuben¹, Torben L. Fog•Institutions (1)

Heriot-Watt University¹

01 Jul 2005-Mechanical Systems and Signal Processing

TL;DR: In this paper, an indirect measurement of the cylinder pressure from diesel engines is demonstrated for a large two-stroke marine diesel engine and a small four-stroke diesel engine, which involves reconstructing the cylinder crank angle domain diagram from the acoustic emission generated during the combustion phase.

...read moreread less

83 citations

Journal Article•DOI•

Numerical and experimental analysis of a gear system with teeth defects

[...]

Tahar Fakhfakh, Fakher Chaari, Mohamed Haddar

01 Mar 2005-The International Journal of Advanced Manufacturing Technology

TL;DR: In this article, a one-stage spur gear transmission by a two degrees of freedom system produces two modes: rigid body and elastic, and the time varying meshing stiffness is the main internal excitation source for the transmission and governs the behaviour of the elastic mode.

...read moreread less

Abstract: The modelling of a one-stage spur gear transmission by a two degrees of freedom system produces two modes: rigid body and elastic. The time varying meshing stiffness is the main internal excitation source for the transmission and governs the behaviour of the elastic mode. Deterioration of one or several teeth, which affects the gear mesh stiffness, is considered in this work. The beginning of crack or spalling are modelled respectively by tooth having localised and distributed defect and are taken into account in the model. Simulation results are analysed by cepstrum and spectrum techniques. It is found that cepstrum and spectrum techniques are very efficient for localised and distributed defects, respectively. Series of tests are made in the experimental setup. Spectrum and cepstrum analysis of the recorded responses, with and without defects, are compared with numerical results and confirms their usefulness in gear monitoring .

...read moreread less

81 citations

Proceedings Article•DOI•

On desensitizing the Mel-cepstrum to spurious spectral components for robust speech recognition

[...]

Vivek Tyagi¹, C. Wellekens¹•Institutions (1)

Institut Eurécom¹

18 Mar 2005

TL;DR: It is shown that the proposed technique based features yield a significant increase in speech recognition performance in non-stationary noise conditions when compared directly to the MFCC and RASTA-PLP features.

...read moreread less

Abstract: It is well known that the peaks in the spectrum of a log Mel-filter bank are important cues in characterizing speech sounds. However, low energy perturbations in the power spectrum may become numerically significant after the log compression. We show that even if the spectral peaks are kept constant, the low energy perturbations in the power spectrum can create huge variations in the cepstral coefficients. We show, both analytically and experimentally, that exponentiating the log Mel-filter bank spectrum before the cepstrum computation can significantly reduce the sensitivity of the cepstra to spurious low energy perturbations. The Mel-cepstrum modulation spectrum (Tyagi, V. et al., Proc. IEEE ASRU, 2003) is computed from the processed cepstra which results in further noise robustness of the composite feature vector. In experiments with speech signals, it is shown that the proposed technique based features yield a significant increase in speech recognition performance in non-stationary noise conditions when compared directly to the MFCC and RASTA-PLP features.

...read moreread less

79 citations

Proceedings Article•DOI•

Acoustic feature combination for robust speech recognition

[...]

András Zolnay, Ralf Schlüter, Hermann Ney

18 Mar 2005

TL;DR: Experiments performed on the large-vocabulary task VerbMobil II (German conversational speech) show that the accuracy of automatic speech recognition systems can be improved by the combination of different acoustic features.

...read moreread less

Abstract: In this paper, we consider the use of multiple acoustic features of the speech signal for robust speech recognition. We investigate the combination of various auditory based (mel frequency cepstrum coefficients, perceptual linear prediction, etc.) and articulatory based (voicedness) features. Features are combined by linear discriminant analysis and log-linear model combination based techniques. We describe the two feature combination techniques and compare the experimental results. Experiments performed on the large-vocabulary task VerbMobil II (German conversational speech) show that the accuracy of automatic speech recognition systems can be improved by the combination of different acoustic features.

...read moreread less

78 citations

Proceedings Article•DOI•

Auditory teager energy cepstrum coefficients for robust speech recognition

[...]

Dimitrios Dimitriadis¹, Petros Maragos¹, Alexandros Potamianos²•Institutions (2)

National Technical University of Athens¹, Technical University of Crete²

04 Sep 2005

TL;DR: Error analysis and speech recognition experiments show that the TECCs and the mel frequency cepstrum coefficients (MFCCs) perform similarly for clean recording conditions; while the T ECCs perform significantly better than the MFCCs for noisy recognition tasks.

...read moreread less

Abstract: In this paper, a feature extraction algorithm for robust speech recognition is introduced. The feature extraction algorithm is motivated by the human auditory processing and the nonlinear Teager-Kaiser energy operator that estimates the true energy of the source of a resonance. The proposed features are labeled as Teager Energy Cepstrum Coefficients (TECCs). TECCs are computed by first filtering the speech signal through a dense non constant-Q Gammatone filterbank and then by estimating the “true” energy of the signal’s source, i.e., the short-time average of the output of the Teager-Kaiser energy operator. Error analysis and speech recognition experiments show that the TECCs and the mel frequency cepstrum coefficients (MFCCs) perform similarly for clean recording conditions; while the TECCs perform significantly better than the MFCCs for noisy recognition tasks. Specifically, relative word error rate improvement of 60% over the MFCC baseline is shown for the Aurora-3 database for the high-mismatch condition. Absolute error rate improvement ranging from 5% to 20% is shown for a phone recognition task in (various types of additive) noise.

...read moreread less

77 citations

Proceedings Article•DOI•

Mathematical evidence of the acoustic universal structure in speech

[...]

Nobuaki Minematsu¹•Institutions (1)

University of Tokyo¹

18 Mar 2005

TL;DR: The paper shows mathematically that there exists an acoustic universal structure in speech, which can be interpreted as a physical implementation of structural phonology, and implies that there always exists a distortion-free communication channel between a speaker and a listener.

...read moreread less

Abstract: The paper shows mathematically that there exists an acoustic universal structure in speech, which can be interpreted as a physical implementation of structural phonology. The structure has completely no dimensions of multiplicative and linear transformational distortions, which are inevitably involved in speech communication as differences of vocal tract shape, gender, age, microphone, room, line, hearing characteristics, and so on. A speech event, such as a phone, is probabilistically modeled as a distribution of parameters calculated by a linear transformation of a log spectrum, e.g., cepstrum. A set of events, such as a word, is relatively captured as structure composed of the distributions. An n-point structure is uniquely determined by fixing the lengths of its /sub n/C/sub 2/ diagonal lines, namely, the distance matrix among the n points. The distance between two distributions is calculated as a Bhattacharyya distance. The resulting structure has very interesting characteristics. Multiplicative and linear transformational distortions are geometrically interpreted as shift and rotation of the structure, respectively. This fact implies that there always exists a distortion-free communication channel between a speaker and a listener.

...read moreread less

65 citations

Proceedings Article•DOI•

Language Identification using Warping and the Shifted Delta Cepstrum

[...]

Felicity Allen¹, Eliathamby Ambikairajah¹, Julien Epps²•Institutions (2)

University of New South Wales¹, NICTA²

01 Oct 2005

TL;DR: Experimental results on various configurations of front-end techniques reported herein demonstrate that, besides providing robustness against channel mismatch and noise as found in existing literature, feature warping is useful more generally as a technique for pre-mapping data for improved compatibility with a GMM back-end.

...read moreread less

Abstract: This paper proposes the novel use of feature warping for automatic language identification, in combination with the shifted delta cepstrum (SDC) and perceptual linear predictive coefficients in a Gaussian mixture model (GMM) based system. Experimental results on various configurations of front-end techniques reported herein demonstrate that, besides providing robustness against channel mismatch and noise as found in existing literature, feature warping is useful more generally as a technique for pre-mapping data for improved compatibility with a GMM back-end. The configuration reported in this paper provides a language identification performance of 76.4% using the OGI/NIST database, a 46.5% relative reduction in error rate when compared with a benchmark system employing Mel frequency cepstral coefficients and the SDC

...read moreread less

54 citations

Proceedings Article•DOI•

Audio steganography by cepstrum modification

[...]

K. Gopalan¹•Institutions (1)

Purdue University Calumet¹

18 Mar 2005

TL;DR: Results of embedding using a clean and a noisy hot utterance show the embedded information is robust to additive noise and bandpass filtering.

...read moreread less

Abstract: A method of embedding information in the cepstral domain of a cover audio signal is described for audio steganography applications. The proposed technique combines the commonly employed psychoacoustical masking property of the human auditory system with the decorrelation property of the speech cepstrum, and achieves imperceptible embedding, large payload, and accurate data retrieval. Results of embedding using a clean and a noisy hot utterance show the embedded information is robust to additive noise and bandpass filtering.

...read moreread less

Journal Article•DOI•

Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories

[...]

Giuliano Antoniol¹, V.F. Rollo¹, Gabriele Venturi¹•Institutions (1)

University of Sannio¹

17 May 2005

TL;DR: Inspired by time-frequency duality, this paper proposes the use of Linear Predictive Coding (LPC) and Cepstrum coefficients to model time varying software artifact histories to recover time variant information from software repositories.

...read moreread less

Abstract: This paper presents an approach to recover time variant information from software repositories. It is widely accepted that software evolves due to factors such as defect removal, market opportunity or adding new features. Software evolution details are stored in software repositories which often contain the changes history. On the other hand there is a lack of approaches, technologies and methods to efficiently extract and represent time dependent information. Disciplines such as signal and image processing or speech recognition adopt frequency domain representations to mitigate differences of signals evolving in time. Inspired by time-frequency duality, this paper proposes the use of Linear Predictive Coding (LPC) and Cepstrum coefficients to model time varying software artifact histories. LPC or Cepstrum allow obtaining very compact representations with linear complexity. These representations can be used to highlight components and artifacts evolved in the same way or with very similar evolution patterns. To assess the proposed approach we applied LPC and Cepstral analysis to 211 Linux kernel releases (i.e., from 1.0 to 1.3.100), to identify files with very similar size histories. The approach, the preliminary results and the lesson learned are presented in this paper.

...read moreread less

Proceedings Article•DOI•

Robust bandwidth extension of noise-corrupted narrowband speech.

[...]

Michael L. Seltzer, Alex Acero, Jasha Droppo

04 Sep 2005

TL;DR: By exploiting previous research in mel cepstrum feature enhancement, it is shown that a unified probabilistic framework under which the feature denoising and bandwidth extension processes are tightly integrated using a single shared statistical model is created.

...read moreread less

Abstract: We present a new bandwidth extension algorithm for converting narrowband telephone speech into wideband speech using a transformation in the mel cepstral domain. Unlike previous approaches, the proposed method is designed specifically for bandwidth extension of narrowband speech that has been corrupted by environmental noise. We show that by exploiting previous research in mel cepstrum feature enhancement, we can create a unified probabilistic framework under which the feature denoising and bandwidth extension processes are tightly integrated using a single shared statistical model. By doing so, we are able to both denoise the observed narrowband speech and robustly extend its bandwidth in a jointly optimal manner. A series of experiments on clean and noise-corrupted narrowband speech is performed to validate our approach.

...read moreread less

Journal Article•DOI•

Experiments with fast Fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using hidden Markov model

[...]

P.D. Polur¹, G.E. Miller¹•Institutions (1)

Virginia Commonwealth University¹

12 Dec 2005

TL;DR: The hidden Markov Model constructed and conditions investigated that would provide improved performance for a dysarthric speech (isolated word) recognition system found that a Mel cepstrum based model outperformed a fast Fourier transform and linear prediction based model.

...read moreread less

Abstract: In this study, a hidden Markov Model was constructed and conditions were investigated that would provide improved performance for a dysarthric speech (isolated word) recognition system. The speaker dependant system was intended to act as an assistive/control tool. A small size vocabulary spoken by three cerebral palsy subjects was chosen. Fast Fourier transform, linear predictive, and Mel frequency cepstral coefficients extracted from data provided training input to several whole-word hidden Markov model configurations. The effect of model structure, number of states, and frame rates were also investigated. It was noted that a 10-state ergodic model using 15 msec frames was better than other configurations. Furthermore, it was found that a Mel cepstrum based model outperformed a fast Fourier transform and linear prediction based model. The system offers effective and robust application as a rehabilitation and/or control tool to assist dysarthric motor impaired individuals.

...read moreread less

Proceedings Article•DOI•

Static and dynamic spectral features: their noise robustness and optimal weights for ASR

[...]

Chen Yang¹, F.K. Soong¹, Tan Lee¹•Institutions (1)

The Chinese University of Hong Kong¹

18 Mar 2005

TL;DR: It is found that the dynamic cepstrum is more robust to additive noise than its static counterpart, and a simple yet effective strategy of exponentially weighting the likelihoods that are contributed by the static and dynamic features during the decoding process is proposed.

...read moreread less

Abstract: In this paper, we investigate the relative noise robustness between dynamic and static spectral features, by using two speaker independent continuous digit databases in English (Aurora2) and Cantonese (CUDigit) It is found that the dynamic cepstrum is more robust to additive noise than its static counterpart The results are consistent across different types of noise and under various SNRs Optimal exponential weights for exploiting unequal noise robustness of the two features are discriminatively trained in a development set When tested under various noise conditions, the optimal weights yielded relative word error rate reductions of 366% and 419% for Aurora2 and CUDigit, respectively The proposed weighting is attractive for many ASR applications in noise because: (1) no noise estimation for feature compensation; (2) no adaptation of clean HMMs to a noisy environment; and (3) only a trivial change in the decoding process by weighting log likelihoods of static and dynamic components separately

...read moreread less

Proceedings Article•

Minimum-phase FIR filter design using real cepstrum

[...]

Soo-Chang Pei¹, Huei-Shan Lin¹•Institutions (1)

National Taiwan University¹

01 Sep 2005

TL;DR: The real cepstrum is used to design an arbitrary length minimum-phase FIR filter from a mixed-phase sequence and the resulting magnitude response is exactly the same with the original sequence.

...read moreread less

Abstract: The real cepstrum is used to design an arbitrary length minimum-phase FIR filter from a mixed-phase sequence. There is no need to start with the odd-length equiripple linear-phase sequence first. Neither phase-unwrapping nor root-finding is needed. Only two FFTs and an iterative procedure are required to compute the filter impulse response from real cepstrum; the resulting magnitude response is exactly the same with the original sequence.

...read moreread less

Proceedings Article•DOI•

A hidden trajectory model with bi-directional target filtering: cascaded vs. integrated implementation for phonetic recognition

[...]

Li Deng¹, Xiang Li¹, Dong Yu¹, Alejandro Acero¹•Institutions (1)

Microsoft¹

18 Mar 2005

TL;DR: A novel acoustic model of speech, based on statistical hidden trajectory modeling (HTM) with bi-directional vocal tract resonance (VTR) target filtering, with dramatic reduction of an upper error bound is achieved in the standard TIMIT phonetic recognition task using a large-scale N-best rescoring paradigm.

...read moreread less

Abstract: We present a novel acoustic model of speech, based on statistical hidden trajectory modeling (HTM) with bi-directional vocal tract resonance (VTR) target filtering, for speech recognition. The HTM consists of two stages of the generative process of speech: from the phone sequence to VTR dynamics and then to the cepstrum-based acoustic observation. Two types of model implementation are detailed, one with straightforward two-stage cascading, and another which integrates over the statistical distribution of VTR in model construction and in computing acoustic likelihood. With the use of first-order Taylor series approximation to the nonlinearity in the VTR-to-cepstrum prediction component of HTM, the acoustic likelihood is established in an analytical form. It is a Gaussian with the time-varying mean that gives structured long-span context dependence over the entire utterance, and with the dynamically adjusted variance proportional to the squared "local slope" in the nonlinear mapping function from VTR to cepstrum. When the HTM parameters are trained via maximizing this "integrated" likelihood, dramatic reduction of an upper error bound is achieved in the standard TIMIT phonetic recognition task using a large-scale N-best rescoring paradigm.

...read moreread less

Journal Article•DOI•

Formant frequency estimation of high-pitched speech by homomorphic prediction

[...]

M. Shahidur Rahman¹, Tetsuya Shimamura¹•Institutions (1)

Saitama University¹

01 Nov 2005-Acoustical Science and Technology

TL;DR: In this paper, the authors presented a detailed study on the suitability of homomorphic prediction as a formant tracking tool for high-pitched speech where linear prediction fails to obtain accurate estimation The formant frequencies estimated using the proposed method are found to be accurate by more than an order of magnitude compared to the conventional procedure.

...read moreread less

Abstract: The conventional model of the linear prediction analysis suffers from difficulties in estimating vocal tract characteristics of high-pitched speakers This is because the autocorrelation function used by the autocorrelation method of linear prediction for estimating autoregressive coefficients is actually an aliased version of that of the vocal tract impulse response This aliasing occurs due to the periodic nature of voiced speech Generally it is accepted that homomorphic filtering can be used to obtain an estimate of vocal tract impulse response which is free from periodicity Thus linear prediction of the resulting vocal tract impulse response (referred to as homomorphic prediction) is expected to be free from variations of fundamental frequencies To our knowledge any experimental study, however, has not yet appeared on the suitability of this method for analyzing high-pitched speech This paper presents a detail study on the prospects of homomorphic prediction as a formant tracking tool especially for high-pitched speech where linear prediction fails to obtain accurate estimation The formant frequencies estimated using the proposed method are found to be accurate by more than an order of magnitude compared to the conventional procedure The accuracy of formant estimation is verified on synthetic vowels for a wide range of pitch periods covering typical male and high-pitched female speakers The validity of the proposed method is also examined by inspecting the spectral envelopes of natural speech spoken by high-pitched female speakers We noticed that almost all the previous methods dealing with this limitation of linear prediction are based on the covariance technique where the obtained AR filter can be unstable The solutions obtained by the current method are guaranteed to be stable which makes it superior for many speech analysis applications

...read moreread less

Book Chapter•DOI•

Cepstrum-based harmonics-to-noise ratio measurement in voiced speech

[...]

Peter J. Murphy¹, Olatunji O. Akande¹•Institutions (1)

University of Limerick¹

01 Jan 2005-Lecture Notes in Computer Science

TL;DR: A new approach is introduced and shown to provide accurate HNR measurements for synthesised glottal and voiced speech waveforms and the action of cepstral low-pass liftering and subsequent Fourier transformation is shown to be analogous to a moving average filter.

...read moreread less

Abstract: The estimation of the harmonics-to-noise ratio (HNR) in voiced speech provides an indication of the ratio between the periodic to aperiodic components of the signal. Time-domain methods for HNR estimation are problematic because of the difficulty of estimating the period markers for (pathological) voiced speech. Frequency-domain methods encounter the problem of estimating the noise level at harmonic locations. Cepstral techniques have been introduced to supply noise estimates at all frequency locations in the spectrum. A detailed description of cepstral processing is provided in order to motivate its use as a HNR estimator. The action of cepstral low-pass liftering and subsequent Fourier transformation is shown to be analogous to the action of a moving average filter. Based on this description, short-comings of two existing cepstral-based HNRs are illustrated and a new approach is introduced and shown to provide accurate HNR measurements for synthesised glottal and voiced speech waveforms.

...read moreread less

Book Chapter•DOI•

Advanced methods for glottal wave extraction

[...]

Jacqueline Walker¹, Peter J. Murphy¹•Institutions (1)

University of Limerick¹

19 Apr 2005

TL;DR: An overview is given of advanced methods for inverse filtering: model based, adaptive iterative, higher order statistics and cepstral approaches are examined and the advantages and disadvantages of these methods are highlighted.

...read moreread less

Abstract: Glottal inverse filtering is a technique used to derive the glottal waveform during voiced speech. Closed phase inverse filtering (CPIF) is a common approach for achieving this goal. During the closed phase there is no input to the vocal tract and hence the impulse response of the vocal tract can be determined through linear prediction. However, a number of problems are known to exist with the CPIF approach. This review paper briefly details the CPIF technique and highlights certain associated theoretical and methodological problems. An overview is then given of advanced methods for inverse filtering: model based, adaptive iterative, higher order statistics and cepstral approaches are examined. The advantages and disadvantages of these methods are highlighted. Outstanding issues and suggestions for further work are outlined.

...read moreread less

Book Chapter•DOI•

Cepstrum-Based estimation of the harmonics-to-noise ratio for synthesized and human voice signals

[...]

Peter J. Murphy¹, Olatunji O. Akande¹•Institutions (1)

University of Limerick¹

19 Apr 2005

TL;DR: The present study highlights the cepstrum-based noise baseline estimation process; it is shown to analogous to the action of a moving average filter applied to the power spectrum of voiced speech.

...read moreread less

Abstract: Cepstral analysis is used to estimate the harmonics-to-noise ratio (HNR) in speech signals. The inverse Fourier transformed liftered cepstrum approximates a noise baseline from which the harmonics-to-noise ratio is estimated. The present study highlights the cepstrum-based noise baseline estimation process; it is shown to analogous to the action of a moving average filter applied to the power spectrum of voiced speech. The noise baseline, which is taken to approximate the noise excited vocal tract is influenced by the window length and the shape of the glottal source spectrum. Two existing estimation techniques are tested systematically using synthetically generated glottal flow and voiced speech signals with a priori knowledge of the HNR. The source influence is removed using a novel harmonic pre-emphasis technique. The results indicate accurate HNR estimation using the present approach. A preliminary investigation of the method with a set of normal/ pathological data is investigated.

...read moreread less

Malay Speech Recognition using Self-Organizing Map and Multilayer Perceptron

[...]

Goh Kia Eng, Abdul Manan Ahmad

01 Jan 2005

TL;DR: A hybrid method based neural network algorithm has been proposed for speech recognition that combines Self-Organizing Map (SOM) and Multilayer Perceptron (MLP) which improves the recognition accuracy up to about 4%.

...read moreread less

Abstract: In this paper, a hybrid method based neural network algorithm has been proposed for speech recognition. The proposed method combines Self-Organizing Map (SOM) which known as unsupervised network and Multilayer Perceptron (MLP) which known as supervised network for Malay speech recognition. After the acoustic preprocessing where Linear Prediction Coding (LPC) is used to extract the acoustic information from raw signal, then a 2-dimensional (2D) self-organizing feature map is used as also a feature extractor which acts as a sequential mapping function in order to transform the acoustic vector sequences of speech signal into trajectories. The SOM is used to produce the trajectory vector for classification. The SOM converts the cepstrum vectors into a binary matrix which has the same dimension with the SOM. The idea behind this method is accumulating the all winner node of a syllable utterance in a same dimension map where the winner node is scaled into value “1” and others are scaled into value “0”. This result a binary pattern in the 2D map which represent the speech content. The transformation of the feature vector by SOM simplifies the classification task by recognizer using Multilayer Perceptron. The MLP classifies feature vector that each utterance corresponds to. Various experiments were conducted on the 15 Malay syllables by a speaker (speaker dependent system) for conventional technique (MLP only) and the proposed method (SOM and MLP). Our proposed algorithm has achieved better performance where improves the recognition accuracy up to about 4%.

...read moreread less

Proceedings Article•DOI•

Analysis of audio watermarking schemes

[...]

G.C. Rodriguez, M.N. Miyatake, H.M.P. Meana

14 Nov 2005

TL;DR: The audio watermarking is classified in three categories: patchwork in the frequency domain, echo hiding in the time domain and cepstrum domain and experimental results show which scheme has a good robustness against common signal processing manipulations.

...read moreread less

Abstract: In this paper, we survey the audio watermarking. The watermarking implementation techniques are briefly summarized and analyzed. The audio watermarking is classified in three categories: patchwork in the frequency domain, echo hiding in the time domain and cepstrum domain. Experimental results show us which scheme has a good robustness against common signal processing manipulations.

...read moreread less

Patent•

Speech recognition device

[...]

Cho Shi, Manabe Hiroyuki, Hiraiwa Akira, Horikoshi Tsutomu, Sugimura Toshiaki - Show less +1 more

19 May 2005

TL;DR: In this article, the authors proposed a speech recognition system for recognizing vowels and consonants according to myoelectric signals, which is based on the hidden Markov model.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To provide a speech recognition device recognizing vowels and consonants according to myoelectric signals. SOLUTION: The speech recognition device 10 is equipped with: a myoelectric signal detecting section 101 which detects the myoelectric signals generated during an utterance action from a plurality of regions; an LPC analysis section 102 which separates and recognizes the respective detected myoelectric signals to spectrum envelope portions and fine change portions based on linear prediction coefficient analyses; a feature extraction section 103 which calculates the linear prediction coefficient cepstrum according to the separated and recognized spectrum envelope portions and calculates the myoelectric signal featured values by each of the channels corresponding to the regions based on the results of the calculation; a likelihood calculating section 106 which receives the myoelectric signal featured values by each of the calculated channels as input vectors and calculates the likelihood based on the hidden Markov model; and a speech recognition section 107 which specifies the speech corresponding to the utterance action based on the calculated likelihood. COPYRIGHT: (C)2005,JPO&NCIPI

...read moreread less

Journal Article•DOI•

Adaptive Nonlinear Regression Using Multiple Distributed Microphones for In-Car Speech Recognition

[...]

Weifeng Li¹, Chiyomi Miyajima¹, Takanori Nishino¹, Katsunobu Itou¹, Kazuya Takeda¹, Fumitada Itakura² - Show less +2 more•Institutions (2)

Nagoya University¹, Meijo University²

01 Jul 2005-IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

TL;DR: In this article, the authors proposed a multiple linear regression of the log spectra (MRLS) for estimating the spectra of speech at a close-talking microphone and extended the MRLS concept to nonlinear regressions.

...read moreread less

Abstract: In this paper, we address issues in improving hands-free speech recognition performance in different car environments using multiple spatially distributed microphones. In the previous work, we proposed the multiple linear regression of the log spectra (MRLS) for estimating the log spectra of speech at a close-talking microphone. In this paper, the concept is extended to nonlinear regressions. Regressions in the cepstrum domain are also investigated. An effective algorithm is developed to adapt the regression weights automatically to different noise environments. Compared to the nearest distant microphone and adaptive beamformer (Generalized Sidelobe Canceller), the proposed adaptive nonlinear regression approach shows an advantage in the average relative word error rate (WER) reductions of 58.5% and 10.3%, respectively, for isolated word recognition under 15 real car environments.

...read moreread less

Book Chapter•DOI•

Robust parallel speech recognition in multiple energy bands

[...]

Andreas Maier¹, Christian Hacker¹, Stefan Steidl¹, Elmar Nöth¹, Heinrich Niemann¹ - Show less +1 more•Institutions (1)

University of Erlangen-Nuremberg¹

31 Aug 2005

TL;DR: It is pointed out that the use of artificial reverberation leads to more robustness to noise in general and most TRAP-based features excel in phone recognition.

...read moreread less

Abstract: In this paper we will investigate the performance of TRAP-features on clean and noisy data. Multiple feature sets are evaluated on a corpus which was recorded in clean and noisy environment. In addition, the clean version was reverberated artificially. The feature sets are assembled from selected energy bands. In this manner multiple recognizers are trained using different energy bands. The outputs of all recognizers are joined with ROVER in order to achieve a single recognition result. This system is compared to a baseline recognizer that uses Mel frequency cepstrum coefficients (MFCC). In this paper we will point out that the use of artificial reverberation leads to more robustness to noise in general. Furthermore most TRAP-based features excel in phone recognition. While MFCC features prove to be better in a matched training/test situation, TRAP-features clearly outperform them in a mismatched training/test situation: When we train on clean data and evaluate on noisy data the word accuracy (WA) can be raised by 173 % relative (from 12.0 % to 32.8 % WA).

...read moreread less

Journal Article•DOI•

Nonlinear and Noisy Extension of Independent Component Analysis: Theory and Its Application to a Pitch Sensation Model

[...]

Shin-ichi Maeda¹, Wen Jie Song², Shin Ishii¹•Institutions (2)

Nara Institute of Science and Technology¹, Osaka University²

01 Jan 2005-Neural Computation

TL;DR: The model suggests that the linear transformation can be acquired through learning from actual acoustic signals and is expected to provide a useful feature extraction method that has often been given by the cepstrum analysis.

...read moreread less

Abstract: In this letter, we propose a noisy nonlinear version of independent component analysis (ICA). Assuming that the probability density function (p. d. f.) of sources is known, a learning rule is derived based on maximum likelihood estimation (MLE). Our model involves some algorithms of noisy linear ICA (e. g., Bermond & Cardoso, 1999) or noise-free nonlinear ICA (e. g., Lee, Koehler, & Orglmeister, 1997) as special cases. Especially when the nonlinear function is linear, the learning rule derived as a generalized expectation-maximization algorithm has a similar form to the noisy ICA algorithm previously presented by Douglas, Cichocki, and Amari (1998). Moreover, our learning rule becomes identical to the standard noise-free linear ICA algorithm in the noiseless limit, while existing MLE-based noisy ICA algorithms do not rigorously include the noise-free ICA. We trained our noisy nonlinear ICA by using acoustic signals such as speech and music. The model after learning successfully simulates virtual pitch phenomena, and the existence region of virtual pitch is qualitatively similar to that observed in a psychoacoustic experiment. Although a linear transformation hypothesized in the central auditory system can account for the pitch sensation, our model suggests that the linear transformation can be acquired through learning from actual acoustic signals. Since our model includes a cepstrum analysis in a special case, it is expected to provide a useful feature extraction method that has often been given by the cepstrum analysis.

...read moreread less

Patent•

Acoustic model creating device, method, and program, speech recognition device, method, and program, and recording medium

[...]

Akihiro Imamura, Kobashigawa Satoru, Takahashi Satoshi, Yoshikazu Yamaguchi, 明弘今村, 哲小橋川, 義和山口, 敏高橋 - Show less +4 more

13 Jun 2005

TL;DR: In this article, a CMN acoustic model is synthesized by obtaining an approximate cepstral mean (CM) of the speech data for learning, and by subtracting the obtained CM from a mean parameter of each distribution concerning the cepstrum in the acoustic model while using a model parameter created without performing the CMN processing.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To solve the following problem: a CMN acoustic model having learnt from a feature level after a CMN processing to a large amount of speech data for learning and a lot of time have been required so as to create an acoustic model,. SOLUTION: The acoustic model after the CMN processing is synthesized by obtaining an approximate cepstral mean (CM) of the speech data for learning, and by subtracting the obtained CM from a mean parameter of each distribution concerning the cepstrum in the acoustic model while using a model parameter in the acoustic model created without performing the CMN processing, or using statistical information obtained in creating the acoustic model. Further, speech recognition is performed by obtaining a likelihood by collating this acoustic model after CMN processing with the feature level extracted by performing the CMN processing to a speech signal for recognition. COPYRIGHT: (C)2007,JPO&INPIT

...read moreread less

Proceedings Article•

Real time signal transposition with envelope preservation in the phase vocoder

[...]

Axel Röbel, Xavier Rodet

01 Jan 2005

TL;DR: The implementation that is presented reduces the run time required by the algorithm depending on the cepstral order on the estimation parameters by a factor of 2 to 9 such that real time processing becomes feasible.

...read moreread less

Abstract: The following article presents a new real time implementation of an iterative cepstrum based spectral envelope estimation technique that was originally published under the name true envelope. Because the original algorithm is hardly known outside Japan we will first describe the algorithm and compare it to the standard techniques, i.e. LPC and discrete cepstrum. The estimation properties are compared and it is shown that the true envelope estimator achieves convincing envelope estimations even for problematic, high pitch signals. The algorithm is analyzed with the objective to find an efficient implementation that sufficiently reduces the computational complexity such that the algorithm can be used in real time within the phase vocoder. The implementation that is presented reduces the run time required by the algorithm depending on the cepstral order on the estimation parameters by a factor of 2 to 9 such that real time processing becomes feasible.

...read moreread less

Journal Article•DOI•

Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction.

[...]

Xu Shao¹, Ben Milner•Institutions (1)

University of East Anglia¹

04 Aug 2005-Journal of the Acoustical Society of America

TL;DR: Experimental results on speaker-independent male and female speech show that accurate voicing classification and fundamental frequency prediction is attained when compared to hand-corrected reference fundamental frequency measurements.

...read moreread less

Abstract: This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-frequency cepstral coefficients (MFCCs) as may be encountered in a distributed speech recognition (DSR) system. Previous methods for speech reconstruction have required, in addition to the MFCC vectors, fundamental frequency and voicing components. In this work the voicing classification and fundamental frequency are predicted from the MFCC vectors themselves using two maximum a posteriori (MAP) methods. The first method enables fundamental frequency prediction by modeling the joint density of MFCCs and fundamental frequency using a single Gaussian mixture model (GMM). The second scheme uses a set of hidden Markov models (HMMs) to link together a set of state-dependent GMMs, which enables a more localized modeling of the joint density of MFCCs and fundamental frequency. Experimental results on speaker-independent male and female speech show that accurate voicing classification and fundamental frequency prediction is attained when compared to hand-corrected reference fundamental frequency measurements. The use of the predicted fundamental frequency and voicing for speech reconstruction is shown to give very similar speech quality to that obtained using the reference fundamental frequency and voicing.

...read moreread less