scispace - formally typeset
Proceedings ArticleDOI

BUT BABEL system for spontaneous Cantonese

Reads0
Chats0
TLDR
The key-points include feature extraction by 6-layer Stacked Bottle-Neck neural network and using fundamental frequency information at its input and an efficient combination with PLP using Region-Dependent transforms.
Abstract
This paper presents our work on speech recognition of Cantonese spontaneous telephone conversations. The key-points include feature extraction by 6-layer Stacked Bottle-Neck neural network and using fundamental frequency information at its input. We have also investigated into robustness of SBN training (silence, normalization) and shown an efficient combination with PLP using Region-Dependent transforms. A combination of RDT with another popular adaptation technique (SAT) was shown beneficial. The results are reported on BABEL Cantonese data. Index Terms: speech recognition, discriminative training, bottle-neck neural networks, region-dependent transforms

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Analysis of DNN approaches to speaker identification

TL;DR: This work studies the usage of the Deep Neural Network Bottleneck (BN) features together with the traditional MFCC features in the task of i-vector-based speaker recognition and decouple the sufficient statistics extraction by using separate GMM models for frame alignment, and for statistics normalization.
Proceedings ArticleDOI

Adaptation of multilingual stacked bottle-neck neural network structure for new language

TL;DR: The Stacked Bottle-Neck neural networks structure is trained on multilingual data investigating several training strategies while treating the target language as the unseen one and it is shown that the adaptation can significantly improve system performance over both, the multilingual network and network trained only on target data.
Proceedings ArticleDOI

Score normalization and system combination for improved keyword spotting

TL;DR: Two techniques are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures, which resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.
Proceedings ArticleDOI

Using neural network front-ends on far field multiple microphones based speech recognition

TL;DR: Results presented in this paper indicate that channel concatenation gives similar or better results than beamforming, andAugmenting the standard DNN input with the bottleneck feature from a Speaker Aware Deep Neural Network (SADNN) shows a general advantage over theStandard DNN based recognition system, and yields additional improvements for far field speech recognition.
Journal ArticleDOI

Maxout neurons for deep convolutional and LSTM neural networks in speech recognition

TL;DR: This paper combines the maxout neurons with two popular DNN structures for acoustic modeling, namely the convolutional neural network (CNN) and the long short-term memory (LSTM) recurrent Neural Network (RNN).
References
More filters
Journal Article

Deep Neural Networks for Acoustic Modeling in Speech Recognition

TL;DR: This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Journal ArticleDOI

Maximum likelihood linear transformations for HMM-based speech recognition

TL;DR: The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform.
Journal ArticleDOI

Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error

TL;DR: This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary speech recognition tasks.
Proceedings ArticleDOI

fMPE: discriminatively trained features for speech recognition

TL;DR: In this paper, a matrix projection from posteriors of Gaussians to a normal size feature space is used to train a matrix and then add the projected features to normal features such as PLP.
Related Papers (5)