BUT BABEL system for spontaneous Cantonese

doi:10.21437/INTERSPEECH.2013-582

Proceedings ArticleDOI

BUT BABEL system for spontaneous Cantonese

Martin Karafiat, +4 more

- pp 2589-2593

Chats0

TLDR

The key-points include feature extraction by 6-layer Stacked Bottle-Neck neural network and using fundamental frequency information at its input and an efficient combination with PLP using Region-Dependent transforms.

Abstract:

This paper presents our work on speech recognition of Cantonese spontaneous telephone conversations. The key-points include feature extraction by 6-layer Stacked Bottle-Neck neural network and using fundamental frequency information at its input. We have also investigated into robustness of SBN training (silence, normalization) and shown an efficient combination with PLP using Region-Dependent transforms. A combination of RDT with another popular adaptation technique (SAT) was shown beneficial. The results are reported on BABEL Cantonese data. Index Terms: speech recognition, discriminative training, bottle-neck neural networks, region-dependent transforms

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Analysis of DNN approaches to speaker identification

Pavel Matejka, +6 more

TL;DR: This work studies the usage of the Deep Neural Network Bottleneck (BN) features together with the traditional MFCC features in the task of i-vector-based speaker recognition and decouple the sufficient statistics extraction by using separate GMM models for frame alignment, and for statistics normalization.

...read moreread less

Proceedings ArticleDOI

Adaptation of multilingual stacked bottle-neck neural network structure for new language

Frantisek Grezl, +2 more

TL;DR: The Stacked Bottle-Neck neural networks structure is trained on multilingual data investigating several training strategies while treating the target language as the unseen one and it is shown that the adaptation can significantly improve system performance over both, the multilingual network and network trained only on target data.

...read moreread less

Proceedings ArticleDOI

Score normalization and system combination for improved keyword spotting

Damianos Karakos, +17 more

TL;DR: Two techniques are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures, which resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.

...read moreread less

Proceedings ArticleDOI

Using neural network front-ends on far field multiple microphones based speech recognition

Yulan Liu, +2 more

TL;DR: Results presented in this paper indicate that channel concatenation gives similar or better results than beamforming, andAugmenting the standard DNN input with the bottleneck feature from a Speaker Aware Deep Neural Network (SADNN) shows a general advantage over theStandard DNN based recognition system, and yields additional improvements for far field speech recognition.

...read moreread less

Journal ArticleDOI

Maxout neurons for deep convolutional and LSTM neural networks in speech recognition

Meng Cai, +1 more

- 01 Mar 2016 -

Speech Communication

TL;DR: This paper combines the maxout neurons with two popular DNN structures for acoustic modeling, namely the convolutional neural network (CNN) and the long short-term memory (LSTM) recurrent Neural Network (RNN).

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal Article

Deep Neural Networks for Acoustic Modeling in Speech Recognition

Geoffrey E. Hinton, +10 more

- 01 Nov 2012 -

IEEE Signal Processing Magazine

TL;DR: This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.

...read moreread less

Journal ArticleDOI

Maximum likelihood linear transformations for HMM-based speech recognition

Mark J. F. Gales

- 01 Apr 1998 -

Computer Speech & Language

TL;DR: The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform.

...read moreread less

Journal ArticleDOI

Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error

Erik McDermott, +4 more

- 01 Jan 2007 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary speech recognition tasks.

...read moreread less

Proceedings ArticleDOI

fMPE: discriminatively trained features for speech recognition

Daniel Povey, +5 more

TL;DR: In this paper, a matrix projection from posteriors of Gaussians to a normal size feature space is used to train a matrix and then add the projected features to normal features such as PLP.

...read moreread less

Journal ArticleDOI

The subspace Gaussian mixture model-A structured model for speech recognition

Daniel Povey, +12 more

- 01 Apr 2011 -

Computer Speech & Language

TL;DR: A new approach to speech recognition, in which all Hidden Markov Model states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state, appears to give better results than a conventional model.

...read moreread less

BUT BABEL system for spontaneous Cantonese

Citations

Analysis of DNN approaches to speaker identification

Adaptation of multilingual stacked bottle-neck neural network structure for new language

Score normalization and system combination for improved keyword spotting

Using neural network front-ends on far field multiple microphones based speech recognition

Maxout neurons for deep convolutional and LSTM neural networks in speech recognition

References

Deep Neural Networks for Acoustic Modeling in Speech Recognition

Maximum likelihood linear transformations for HMM-based speech recognition

Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error

fMPE: discriminatively trained features for speech recognition

The subspace Gaussian mixture model-A structured model for speech recognition

Related Papers (5)

The language-independent bottleneck features

Investigation into bottle-neck features for meeting speech recognition.

Convolutive Bottleneck Network features for LVCSR

Sequence-discriminative training of deep neural networks

The Kaldi Speech Recognition Toolkit