scispace - formally typeset
O

Ondrej Klejch

Researcher at University of Edinburgh

Publications -  34
Citations -  490

Ondrej Klejch is an academic researcher from University of Edinburgh. The author has contributed to research in topics: Computer science & Acoustic model. The author has an hindex of 11, co-authored 25 publications receiving 314 citations. Previous affiliations of Ondrej Klejch include Charles University in Prague & Google.

Papers
More filters
Posted Content

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

TL;DR: This paper presents the AVA Active Speaker detection dataset (AVA-ActiveSpeaker), which has been publicly released to facilitate algorithm development and comparison, and introduces a state-of-the-art, jointly trained audio-visual model for real-time active speaker detection and compares several variants.
Proceedings ArticleDOI

Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection

TL;DR: The AVA Active Speaker dataset (AVA-ActiveSpeaker) as discussed by the authors contains temporally labeled face tracks in videos, where each face instance is labeled as speaking or not, and whether the speech is audible.
Journal ArticleDOI

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

TL;DR: A meta-analysis of the performance of speech recognition adaptation algorithms is presented, based on relative error rate reductions as reported in the literature, to characterize adaptation algorithms as based on embeddings, model parameter adaptation, or data augmentation.
Proceedings ArticleDOI

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

TL;DR: An extension of the previously described neural machine translation based system for punctuated transcription allows the system to map from per frame acoustic features to word level representations by replacing the traditional encoder in the encoder-decoder architecture with a hierarchical encoder.
Proceedings ArticleDOI

Punctuated transcription of multi-genre broadcasts using acoustic and lexical approaches

TL;DR: It is indicated that using longer context improves the prediction of question marks and acoustic information improves prediction of exclamation marks, and even though the systems are complementary, their straightforward combination does not yield better F-measures than a single system using neural machine translation.