Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

doi:10.1016/J.CSL.2008.02.003

Journal ArticleDOI

Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

Sankaran Panchapagesan, +1 more

- 01 Jan 2009 -

Computer Speech & Language

- Vol. 23, Iss: 1, pp 42-64

TLDR

The performance of the new LT was comparable to that of regular VTLN implemented by warping the Mel filterbank, when the MLS criterion was used for FW estimation, and it is shown that the approximations involved do not lead to any performance degradation.

About:

This article is published in Computer Speech & Language.The article was published on 2009-01-01. It has received 46 citations till now.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Accent conversion through cross-speaker articulatory synthesis

Sandesh Aryal, +1 more

TL;DR: This work builds a cross-speaker forward mapping (CSFM) to generate L2 acoustic observations directly from L1 articulatory trajectories and evaluated the CSFM against a baseline articulatory synthesizer trained with L2 articulators.

...read moreread less

Journal ArticleDOI

Using Phonetic Posteriorgram Based Frame Pairing for Segmental Accent Conversion

Guanlong Zhao, +1 more

- 01 Oct 2019 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A new approach that matches frames between the two speakers based on their phonetic (rather than acoustic) similarity, which outperforms the prior approach and can be applied to non-parallel training data, achieving the same accent conversion performance.

...read moreread less

Patent

Method and system for cross-lingual voice conversion

Ioannis Agiomyrgiannakis

TL;DR: In this article, a method and system for cross-lingual voice conversion is described, where a hidden Markov model (HMM) HMM based speech modeling for both recognizing input speech and synthesizing output speech is presented.

...read moreread less

Journal ArticleDOI

VTLN Using Analytically Determined Linear-Transformation on Conventional MFCC

D. R. Sanand, +1 more

- 01 Jul 2012 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A method to analytically obtain a linear-transformation on the conventional Mel frequency cepstral coefficients (MFCC) features that corresponds to conventional vocal tract length normalization (VTLN)-warped MFCC features, thereby simplifying the VTLN processing.

...read moreread less

Journal ArticleDOI

Maximum Entropy-Based Reinforcement Learning Using a Confidence Measure in Speech Recognition for Telephone Speech

Carlos Molina, +4 more

- 01 Jul 2010 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: A two-step Viterbi decoding is presented which estimates a correction factor for the observation log-likelihoods that makes the recognized and neighboring HMMs more or less likely by using a confidence score.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Maximum likelihood from incomplete data via the EM algorithm

Arthur P. Dempster, +2 more

- 01 Sep 1977 -

Journal of the royal statistical society...

Journal ArticleDOI

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

S. Davis, +1 more

- 01 Aug 1980 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.

...read moreread less

Journal ArticleDOI

Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models

C. J. Leggetter, +1 more

- 01 Apr 1995 -

Computer Speech & Language

TL;DR: An important feature of the method is that arbitrary adaptation data can be used—no special enrolment sentences are needed and that as more data is used the adaptation performance improves.

...read moreread less

Journal ArticleDOI

Maximum likelihood linear transformations for HMM-based speech recognition

Mark J. F. Gales

- 01 Apr 1998 -

Computer Speech & Language

TL;DR: The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform.

...read moreread less

Journal ArticleDOI

Minimum classification error rate methods for speech recognition

Biing-Hwang Juang, +2 more

- 01 May 1997 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: The issue of speech recognizer training from a broad perspective with root in the classical Bayes decision theory is discussed, and the superiority of the minimum classification error (MCE) method over the distribution estimation method is shown by providing the results of several key speech recognition experiments.

...read moreread less