Low Latency ASR for Simultaneous Speech Translation.

Open AccessPosted Content

Low Latency ASR for Simultaneous Speech Translation.

- 22 Mar 2020 -

TLDR

In order to minimize the latency, a combination of run-on decoding with a technique for identifying stable partial hypotheses when stream decoding and a protocol for dynamic output update that allows to revise the most recent parts of the transcription is combined.

Abstract:

User studies have shown that reducing the latency of our simultaneous lecture translation system should be the most important goal. We therefore have worked on several techniques for reducing the latency for both components, the automatic speech recognition and the speech translation module. Since the commonly used commitment latency is not appropriate in our case of continuous stream decoding, we focused on word latency. We used it to analyze the performance of our current system and to identify opportunities for improvements. In order to minimize the latency we combined run-on decoding with a technique for identifying stable partial hypotheses when stream decoding and a protocol for dynamic output update that allows to revise the most recent parts of the transcription. This combination reduces the latency at word level, where the words are final and will never be updated again in the future, from 18.1s to 1.1s without sacrificing performance in terms of word error rate.

Citations

PDF

Open Access

More filters

Posted Content

Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection

Danni Liu, +2 more

- 22 May 2020 -

arXiv: Computation and Language

TL;DR: This work proposes three latency reduction techniques for chunk-based incremental inference and evaluates their efficiency in terms of accuracy-latency trade-off and shows that their approach is also applicable to low-latencies speech translation.

...read moreread less

Posted Content

Super-Human Performance in Online Low-latency Recognition of Conversational Speech

Thai-Son Nguyen, +2 more

- 07 Oct 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Results are presented for a system that can achieve super-human performance (at a WER of 5.0%, over the Switchboard conversational benchmark) at a word based latency of only 1 second behind a speaker's speech.

...read moreread less

Journal ArticleDOI

Deep-Sync: A novel deep learning-based tool for semantic-aware subtitling synchronisation

Alejandro Martín, +5 more

- 08 Feb 2021 -

Neural Computing and Applications

TL;DR: Deep-Sync, a tool for the alignment of subtitles with the audio-visual content, integrates a deep language representation model and a real-time voice recognition software to build a semantic-aware alignment tool that successfully aligns most of the subtitles even when there is no direct correspondence between the re-speaker and the audio content.

...read moreread less

Proceedings ArticleDOI

ELITR Non-Native Speech Translation at IWSLT 2020

Dominik Macháček, +8 more

TL;DR: This paper is an ELITR system submission for the non-native speech translation task at IWSLT 2020 and develops a new end-to-end general ASR system, and a hybrid ASR trained on non- native speech.

...read moreread less

Proceedings ArticleDOI

High Performance Sequence-to-Sequence Model for Streaming Speech Recognition

Thai-Son Nguyen, +3 more

TL;DR: An additional loss function controlling the uncertainty of the attention mechanism, a modified beam search identifying partial, stable hypotheses, ways of working withBLSTM in the encoder, and the use of chunked BLSTM are introduced.

...read moreread less

References

PDF

Open Access

More filters

Book ChapterDOI

“Your Word is my Command”: Google Search by Voice: A Case Study

Johan Schalkwyk, +7 more

TL;DR: An important goal at Google is to make spoken access ubiquitously available and performance works so well that the modality adds no friction to the interaction.

...read moreread less

Journal ArticleDOI

Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera

Takaaki Hori, +12 more

- 01 Feb 2012 -

IEEE Transactions on Audio, Speech, and ...

TL;DR: The techniques and the attempt to achieve the low-latency monitoring of meetings are described, the experimental results for real-time meeting transcription are shown, and the goal is to recognize automatically “who is speaking what” in an online manner for meeting assistance.

...read moreread less

Proceedings Article

Stability and Accuracy in Incremental Speech Recognition

Ethan Selfridge, +3 more

TL;DR: This paper presents a method that increases the stability and accuracy of ISR output, without adding delay, and next presents a pair of methods that give ISR more utility for real spoken dialogue systems.

...read moreread less

Proceedings Article

Towards automatic closed captioning : low latency real time broadcast news transcription.

Murat Saraclar, +3 more

TL;DR: A low latency real-time Broadcast News recognition system capable of transcribing live television newscasts with reasonable accuracy and recent modeling and efficiency improvements that yield a 22% word error rate on the Hub4e98 test set while running faster than real- time.

...read moreread less

DissertationDOI

A System for Simultaneous Translation of Lectures and Speeches

Christian Fügen

TL;DR: This thesis realizes the first existing automatic system for simultaneous speech-to-speech translation from English to Spanish and the different aspects described in this thesis will be helpful for developing simultaneous translation systems for other domains or languages.

...read moreread less

arXiv: Computation and Language

Low Latency ASR for Simultaneous Speech Translation.

Citations

Low-Latency Sequence-to-Sequence Speech Recognition and Translation by Partial Hypothesis Selection

Super-Human Performance in Online Low-latency Recognition of Conversational Speech

Deep-Sync: A novel deep learning-based tool for semantic-aware subtitling synchronisation

ELITR Non-Native Speech Translation at IWSLT 2020

High Performance Sequence-to-Sequence Model for Streaming Speech Recognition

References

“Your Word is my Command”: Google Search by Voice: A Case Study

Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera

Stability and Accuracy in Incremental Speech Recognition

Towards automatic closed captioning : low latency real time broadcast news transcription.

A System for Simultaneous Translation of Lectures and Speeches

Related Papers (5)

Reducing speech recognition latency

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition

Attention is All you Need

Improving Sequence-To-Sequence Speech Recognition Training with On-The-Fly Data Augmentation

Scaling Up Online Speech Recognition Using ConvNets