O
Ondrej Klejch
Researcher at University of Edinburgh
Publications - 34
Citations - 490
Ondrej Klejch is an academic researcher from University of Edinburgh. The author has contributed to research in topics: Computer science & Acoustic model. The author has an hindex of 11, co-authored 25 publications receiving 314 citations. Previous affiliations of Ondrej Klejch include Charles University in Prague & Google.
Papers
More filters
Posted Content
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
Joseph Roth,Sourish Chaudhuri,Ondrej Klejch,Radhika Marvin,Andrew C. Gallagher,Liat Kaver,Sharadh Ramaswamy,Arkadiusz Stopczynski,Cordelia Schmid,Zhonghua Xi,Caroline Pantofaru +10 more
TL;DR: This paper presents the AVA Active Speaker detection dataset (AVA-ActiveSpeaker), which has been publicly released to facilitate algorithm development and comparison, and introduces a state-of-the-art, jointly trained audio-visual model for real-time active speaker detection and compares several variants.
Proceedings ArticleDOI
Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection
Joseph Roth,Sourish Chaudhuri,Ondrej Klejch,Radhika Marvin,Andrew C. Gallagher,Liat Kaver,Sharadh Ramaswamy,Arkadiusz Stopczynski,Cordelia Schmid,Zhonghua Xi,Caroline Pantofaru +10 more
TL;DR: The AVA Active Speaker dataset (AVA-ActiveSpeaker) as discussed by the authors contains temporally labeled face tracks in videos, where each face instance is labeled as speaking or not, and whether the speech is audible.
Journal ArticleDOI
Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview
TL;DR: A meta-analysis of the performance of speech recognition adaptation algorithms is presented, based on relative error rate reductions as reported in the literature, to characterize adaptation algorithms as based on embeddings, model parameter adaptation, or data augmentation.
Proceedings ArticleDOI
Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features
TL;DR: An extension of the previously described neural machine translation based system for punctuated transcription allows the system to map from per frame acoustic features to word level representations by replacing the traditional encoder in the encoder-decoder architecture with a hierarchical encoder.
Proceedings ArticleDOI
Punctuated transcription of multi-genre broadcasts using acoustic and lexical approaches
TL;DR: It is indicated that using longer context improves the prediction of question marks and acoustic information improves prediction of exclamation marks, and even though the systems are complementary, their straightforward combination does not yield better F-measures than a single system using neural machine translation.