Proceedings ArticleDOI
BUT BABEL system for spontaneous Cantonese
Martin Karafiat,Frantisek Grezl,Mirko Hannemann,Karel Veselý,Jan Cernocký +4 more
- pp 2589-2593
Reads0
Chats0
TLDR
The key-points include feature extraction by 6-layer Stacked Bottle-Neck neural network and using fundamental frequency information at its input and an efficient combination with PLP using Region-Dependent transforms.Abstract:
This paper presents our work on speech recognition of Cantonese spontaneous telephone conversations. The key-points include feature extraction by 6-layer Stacked Bottle-Neck neural network and using fundamental frequency information at its input. We have also investigated into robustness of SBN training (silence, normalization) and shown an efficient combination with PLP using Region-Dependent transforms. A combination of RDT with another popular adaptation technique (SAT) was shown beneficial. The results are reported on BABEL Cantonese data. Index Terms: speech recognition, discriminative training, bottle-neck neural networks, region-dependent transformsread more
Citations
More filters
Proceedings ArticleDOI
Analysis of DNN approaches to speaker identification
Pavel Matejka,Ondrej Glembek,Ondrej Novotny,Oldrich Plchot,Frantisek Grezl,Lukas Burget,Jan Cernocky +6 more
TL;DR: This work studies the usage of the Deep Neural Network Bottleneck (BN) features together with the traditional MFCC features in the task of i-vector-based speaker recognition and decouple the sufficient statistics extraction by using separate GMM models for frame alignment, and for statistics normalization.
Proceedings ArticleDOI
Adaptation of multilingual stacked bottle-neck neural network structure for new language
TL;DR: The Stacked Bottle-Neck neural networks structure is trained on multilingual data investigating several training strategies while treating the target language as the unseen one and it is shown that the adaptation can significantly improve system performance over both, the multilingual network and network trained only on target data.
Proceedings ArticleDOI
Score normalization and system combination for improved keyword spotting
Damianos Karakos,Richard Schwartz,Stavros Tsakalidis,Le Zhang,Shivesh Ranjan,Tim Ng,Roger Hsiao,Guruprasad Saikumar,Ivan Bulyko,Long Nguyen,John Makhoul,Frantisek Grezl,Mirko Hannemann,Martin Karafiat,Igor Szöke,Karel Vesely,Lori Lamel,Viet Bac Le +17 more
TL;DR: Two techniques are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures, which resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.
Proceedings ArticleDOI
Using neural network front-ends on far field multiple microphones based speech recognition
TL;DR: Results presented in this paper indicate that channel concatenation gives similar or better results than beamforming, andAugmenting the standard DNN input with the bottleneck feature from a Speaker Aware Deep Neural Network (SADNN) shows a general advantage over theStandard DNN based recognition system, and yields additional improvements for far field speech recognition.
Journal ArticleDOI
Maxout neurons for deep convolutional and LSTM neural networks in speech recognition
TL;DR: This paper combines the maxout neurons with two popular DNN structures for acoustic modeling, namely the convolutional neural network (CNN) and the long short-term memory (LSTM) recurrent Neural Network (RNN).
References
More filters
Journal Article
Deep Neural Networks for Acoustic Modeling in Speech Recognition
Geoffrey E. Hinton,Li Deng,Dong Yu,George E. Dahl,Abdelrahman Mohamed,Navdeep Jaitly,Andrew W. Senior,Vincent Vanhoucke,Patrick Nguyen,Tara N. Sainath,Brian Kingsbury +10 more
TL;DR: This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Journal ArticleDOI
Maximum likelihood linear transformations for HMM-based speech recognition
TL;DR: The paper compares the two possible forms of model-based transforms: unconstrained, where any combination of mean and variance transform may be used, and constrained, which requires the variance transform to have the same form as the mean transform.
Journal ArticleDOI
Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error
TL;DR: This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary speech recognition tasks.
Proceedings ArticleDOI
fMPE: discriminatively trained features for speech recognition
TL;DR: In this paper, a matrix projection from posteriors of Gaussians to a normal size feature space is used to train a matrix and then add the projected features to normal features such as PLP.
Journal ArticleDOI
The subspace Gaussian mixture model-A structured model for speech recognition
Daniel Povey,Lukas Burget,Mohit Agarwal,Pinar Akyazi,Feng Kai,Arnab Ghoshal,Ondřej Glembek,Nagendra Kumar Goel,Martin Karafiat,Ariya Rastrow,Richard Rose,Petr Schwarz,Samuel Thomas +12 more
TL;DR: A new approach to speech recognition, in which all Hidden Markov Model states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state, appears to give better results than a conventional model.