A frequency warping approach to speaker normalization

doi:10.1109/89.650310

Open AccessJournal ArticleDOI

A frequency warping approach to speaker normalization

L. Lee, +1 more

- 01 Jan 1998 -

IEEE Transactions on Speech and Audio Pr...

- Vol. 6, Iss: 1, pp 49-60

Chats0

TLDR

An efficient means for estimating a linear frequency Warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis are presented.

Abstract:

In an effort to reduce the degradation in speech recognition performance caused by variation in vocal tract shape among speakers, a frequency warping approach to speaker normalization is investigated. A set of low complexity, maximum likelihood based frequency warping procedures have been applied to speaker normalization for a telephone based connected digit recognition task. This paper presents an efficient means for estimating a linear frequency warping factor and a simple mechanism for implementing frequency warping by modifying the filterbank in mel-frequency cepstrum feature analysis. An experimental study comparing these techniques to other well-known techniques for reducing variability is described. The results have shown that frequency warping is consistently able to reduce word error rate by 20% even for very short utterances.

Citations

PDF

Open Access

More filters

Proceedings Article

Towards End-To-End Speech Recognition with Recurrent Neural Networks

Alex Graves, +1 more

TL;DR: A speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation is presented, based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Temporal Classification objective function.

...read moreread less

Journal ArticleDOI

Data augmentation for deep neural network acoustic modeling

Xiaodong Cui, +2 more

- 01 Sep 2015 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: Two data augmentation approaches, vocal tract length perturbation (VTLP) and stochastic feature mapping (SFM) for deep neural network acoustic modeling based on label-preserving transformations to deal with data sparsity are investigated.

...read moreread less

Vocal Tract Length Perturbation (VTLP) improves speech recognition

Navdeep Jaitly, +1 more

TL;DR: Improvements in speech recognition are suggested without increasing the number of training epochs, and it is suggested that data transformations should be an important component of training neural networks for speech, especially for data limited projects.

...read moreread less

Journal ArticleDOI

An empirical survey of data augmentation for time series classification with neural networks.

Brian Kenji Iwana, +1 more

- 15 Jul 2021 -

PLOS ONE

TL;DR: A taxonomy is proposed and outline the four families in time series data augmentation, including transformation-based methods, pattern mixing, generative models, and decomposition methods, and their application to time series classification with neural networks.

...read moreread less

Journal ArticleDOI

A review of recent advances in visual speech decoding

Ziheng Zhou, +3 more

- 01 Sep 2014 -

Image and Vision Computing

TL;DR: A detailed review of recent advances in visual speech decoding, focusing on the important questions asked by researchers and summarize the recent studies that attempt to answer them, and providing details of audio-visual speech databases.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Maximum likelihood from incomplete data via the EM algorithm

Arthur P. Dempster, +2 more

- 01 Sep 1977 -

Journal of the royal statistical society...

Journal ArticleDOI

A tutorial on hidden Markov models and selected applications in speech recognition

Lawrence R. Rabiner

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.

...read moreread less

Book

Fundamentals of speech recognition

Lawrence R. Rabiner, +1 more

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.

...read moreread less

Journal ArticleDOI

Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences

S. Davis, +1 more

- 01 Aug 1980 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: In this article, several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system, and the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations.

...read moreread less

Journal ArticleDOI

Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains

Jean-Luc Gauvain, +1 more

- 01 Apr 1994 -

IEEE Transactions on Speech and Audio Pr...

TL;DR: A framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented, and Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications.

...read moreread less