P
Petr Motlicek
Researcher at Idiap Research Institute
Publications - 209
Citations - 8177
Petr Motlicek is an academic researcher from Idiap Research Institute. The author has contributed to research in topics: Computer science & Speaker recognition. The author has an hindex of 25, co-authored 179 publications receiving 7093 citations. Previous affiliations of Petr Motlicek include Oregon Health & Science University & DuPont.
Papers
More filters
Proceedings Article
The Kaldi Speech Recognition Toolkit
Daniel Povey,Arnab Ghoshal,Gilles Boulianne,Lukas Burget,Ondrej Glembek,Nagendra Kumar Goel,Mirko Hannemann,Petr Motlicek,Yanmin Qian,Petr Schwarz,Jan Silovsky,Georg Stemmer,Karel Vesely +12 more
TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
Proceedings ArticleDOI
Generating exact lattices in the WFST framework
Daniel Povey,Mirko Hannemann,Gilles Boulianne,Lukas Burget,Arnab Ghoshal,Milos Janda,Martin Karafiat,Stefan Kombrink,Petr Motlicek,Yanmin Qian,Korbinian Riedhammer,Karel Vesely,Ngoc Thang Vu +12 more
TL;DR: A lattice generation method that is exact, i.e. it satisfies all the natural properties the authors would want from a lattice of alternative transcriptions of an utterance, and does not introduce substantial overhead above one-best decoding.
Proceedings ArticleDOI
Multilingual deep neural network based acoustic modeling for rapid language adaptation
TL;DR: The studies reveal that crosslingual acoustic model transfer through multilingual DNNs is superior to unsupervised RBM pre-training and greedy layer-wise supervised training and that KL-HMM based decoding consistently outperforms conventional hybrid decoding, especially in low-resource scenarios.
Proceedings ArticleDOI
Deep Neural Networks for Multiple Speaker Detection and Localization
TL;DR: This paper proposes a likelihood-based encoding of the network output, which naturally allows the detection of an arbitrary number of sources, and investigates the use of sub-band cross-correlation information as features for better localization in sound mixtures.
Proceedings ArticleDOI
End-to-End Accented Speech Recognition.
TL;DR: This work explores the use of multi-task training and accent embedding in the context of end-to-end ASR trained with the connectionist temporal classification loss and shows relative improvement in word error rate.