scispace - formally typeset
Open AccessPosted Content

Low Latency ASR for Simultaneous Speech Translation.

TLDR
In order to minimize the latency, a combination of run-on decoding with a technique for identifying stable partial hypotheses when stream decoding and a protocol for dynamic output update that allows to revise the most recent parts of the transcription is combined.
Abstract
User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal. We therefore have worked on several techniques for reducing the latency for both components, the automatic speech recognition and the speech translation module. Since the commonly used commitment latency is not appropriate in our case of continuous stream decoding, we focused on word latency. We used it to analyze the performance of our current system and to identify opportunities for improvements. In order to minimize the latency we combined run-on decoding with a technique for identifying stable partial hypotheses when stream decoding and a protocol for dynamic output update that allows to revise the most recent parts of the transcription. This combination reduces the latency at word level, where the words are final and will never be updated again in the future, from 18.1s to 1.1s without sacrificing performance in terms of word error rate.

read more

Citations
More filters
Posted Content

Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection

TL;DR: This work proposes three latency reduction techniques for chunk-based incremental inference and evaluates their efficiency in terms of accuracy-latency trade-off and shows that their approach is also applicable to low-latencies speech translation.
Posted Content

Super-Human Performance in Online Low-latency Recognition of Conversational Speech

TL;DR: Results are presented for a system that can achieve super-human performance (at a WER of 5.0%, over the Switchboard conversational benchmark) at a word based latency of only 1 second behind a speaker's speech.
Journal ArticleDOI

Deep-Sync: A novel deep learning-based tool for semantic-aware subtitling synchronisation

TL;DR: Deep-Sync, a tool for the alignment of subtitles with the audio-visual content, integrates a deep language representation model and a real-time voice recognition software to build a semantic-aware alignment tool that successfully aligns most of the subtitles even when there is no direct correspondence between the re-speaker and the audio content.
Proceedings ArticleDOI

ELITR Non-Native Speech Translation at IWSLT 2020

TL;DR: This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020 and develops a new end-to-end general ASR system, and a hybrid ASR trained on non- native speech.
Proceedings ArticleDOI

High Performance Sequence-to-Sequence Model for Streaming Speech Recognition

TL;DR: An additional loss function controlling the uncertainty of the attention mechanism, a modified beam search identifying partial, stable hypotheses, ways of working withBLSTM in the encoder, and the use of chunked BLSTM are introduced.
References
More filters
Book ChapterDOI

“Your Word is my Command”: Google Search by Voice: A Case Study

TL;DR: An important goal at Google is to make spoken access ubiquitously available and performance works so well that the modality adds no friction to the interaction.
Journal ArticleDOI

Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera

TL;DR: The techniques and the attempt to achieve the low-latency monitoring of meetings are described, the experimental results for real-time meeting transcription are shown, and the goal is to recognize automatically “who is speaking what” in an online manner for meeting assistance.
Proceedings Article

Stability and Accuracy in Incremental Speech Recognition

TL;DR: This paper presents a method that increases the stability and accuracy of ISR output, without adding delay, and next presents a pair of methods that give ISR more utility for real spoken dialogue systems.
Proceedings Article

Towards automatic closed captioning : low latency real time broadcast news transcription.

TL;DR: A low latency real-time Broadcast News recognition system capable of transcribing live television newscasts with reasonable accuracy and recent modeling and efficiency improvements that yield a 22% word error rate on the Hub4e98 test set while running faster than real- time.
DissertationDOI

A System for Simultaneous Translation of Lectures and Speeches

TL;DR: This thesis realizes the first existing automatic system for simultaneous speech-to-speech translation from English to Spanish and the different aspects described in this thesis will be helpful for developing simultaneous translation systems for other domains or languages.
Related Papers (5)