Open AccessPosted Content
Low Latency ASR for Simultaneous Speech Translation.
Thai-Son Nguyen,Jan Niehues,Eunah Cho,Thanh-Le Ha,Kevin Kilgour,Markus Müller,Matthias Sperber,Sebastian Stueker,Alex Waibel +8 more
TLDR
In order to minimize the latency, a combination of run-on decoding with a technique for identifying stable partial hypotheses when stream decoding and a protocol for dynamic output update that allows to revise the most recent parts of the transcription is combined.Abstract:
User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal. We therefore have worked on several techniques for reducing the latency for both components, the automatic speech recognition and the speech translation module. Since the commonly used commitment latency is not appropriate in our case of continuous stream decoding, we focused on word latency. We used it to analyze the performance of our current system and to identify opportunities for improvements. In order to minimize the latency we combined run-on decoding with a technique for identifying stable partial hypotheses when stream decoding and a protocol for dynamic output update that allows to revise the most recent parts of the transcription. This combination reduces the latency at word level, where the words are final and will never be updated again in the future, from 18.1s to 1.1s without sacrificing performance in terms of word error rate.read more
Citations
More filters
Posted Content
Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection
TL;DR: This work proposes three latency reduction techniques for chunk-based incremental inference and evaluates their efficiency in terms of accuracy-latency trade-off and shows that their approach is also applicable to low-latencies speech translation.
Posted Content
Super-Human Performance in Online Low-latency Recognition of Conversational Speech
TL;DR: Results are presented for a system that can achieve super-human performance (at a WER of 5.0%, over the Switchboard conversational benchmark) at a word based latency of only 1 second behind a speaker's speech.
Journal ArticleDOI
Deep-Sync: A novel deep learning-based tool for semantic-aware subtitling synchronisation
Alejandro Martín,Israel González-Carrasco,Victor Rodriguez-Fernandez,Monica Souto‐Rico,David Camacho,Belén Ruiz-Mezcua +5 more
TL;DR: Deep-Sync, a tool for the alignment of subtitles with the audio-visual content, integrates a deep language representation model and a real-time voice recognition software to build a semantic-aware alignment tool that successfully aligns most of the subtitles even when there is no direct correspondence between the re-speaker and the audio content.
Proceedings ArticleDOI
ELITR Non-Native Speech Translation at IWSLT 2020
Dominik Macháček,Jonáš Kratochvíl,Sangeet Sagar,Matúš Žilinec,Ondřej Bojar,Thai-Son Nguyen,Felix Schneider,Philip Williams,Yuekun Yao +8 more
TL;DR: This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020 and develops a new end-to-end general ASR system, and a hybrid ASR trained on non- native speech.
Proceedings ArticleDOI
High Performance Sequence-to-Sequence Model for Streaming Speech Recognition
TL;DR: An additional loss function controlling the uncertainty of the attention mechanism, a modified beam search identifying partial, stable hypotheses, ways of working withBLSTM in the encoder, and the use of chunked BLSTM are introduced.
References
More filters
Book ChapterDOI
“Your Word is my Command”: Google Search by Voice: A Case Study
Johan Schalkwyk,Doug Beeferman,Francoise Beaufays,Bill Byrne,Ciprian Chelba,Michael M. Cohen,Maryam Kamvar,Brian Strope +7 more
TL;DR: An important goal at Google is to make spoken access ubiquitously available and performance works so well that the modality adds no friction to the interaction.
Journal ArticleDOI
Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera
Takaaki Hori,Shoko Araki,Takuya Yoshioka,Masakiyo Fujimoto,Shinji Watanabe,Takanobu Oba,Atsunori Ogawa,Kazuhiro Otsuka,Dan Mikami,Keisuke Kinoshita,Tomohiro Nakatani,Atsushi Nakamura,Junji Yamato +12 more
TL;DR: The techniques and the attempt to achieve the low-latency monitoring of meetings are described, the experimental results for real-time meeting transcription are shown, and the goal is to recognize automatically “who is speaking what” in an online manner for meeting assistance.
Proceedings Article
Stability and Accuracy in Incremental Speech Recognition
TL;DR: This paper presents a method that increases the stability and accuracy of ISR output, without adding delay, and next presents a pair of methods that give ISR more utility for real spoken dialogue systems.
Proceedings Article
Towards automatic closed captioning : low latency real time broadcast news transcription.
TL;DR: A low latency real-time Broadcast News recognition system capable of transcribing live television newscasts with reasonable accuracy and recent modeling and efficiency improvements that yield a 22% word error rate on the Hub4e98 test set while running faster than real- time.
DissertationDOI
A System for Simultaneous Translation of Lectures and Speeches
TL;DR: This thesis realizes the first existing automatic system for simultaneous speech-to-speech translation from English to Spanish and the different aspects described in this thesis will be helpful for developing simultaneous translation systems for other domains or languages.