Open AccessProceedings Article
Aligning Subtitles in Sign Language Videos
Hannah Bull,Triantafyllos Afouras,Gül Varol,Samuel Albanie,Liliane Momeni,Andrew Zisserman +5 more
- pp 11552-11561
TLDR
In this article, a Transformer architecture is proposed to temporally align asynchronous subtitles in sign language videos, which is trained on manually annotated alignments covering over 15,000 subtitles that span 17.7 hours of video.Abstract:
The goal of this work is to temporally align asynchronous subtitles in sign language videos. In particular, we focus on sign-language interpreted TV broadcast data comprising (i) a video of continuous signing, and (ii) subtitles corresponding to the audio content. Previous work exploiting such weakly-aligned data only considered finding keyword-sign correspondences, whereas we aim to localise a complete subtitle text in continuous signing. We propose a Transformer architecture tailored for this task, which we train on manually annotated alignments covering over 15K subtitles that span 17.7 hours of video. We use BERT subtitle embeddings and CNN video representations learned for sign recognition to encode the two signals, which interact through a series of attention layers. Our model outputs frame-level predictions, i.e., for each video frame, whether it belongs to the queried subtitle or not. Through extensive evaluations, we show substantial improvements over existing alignment baselines that do not make use of subtitle text embeddings for learning. Our automatic alignment model opens up possibilities for advancing machine translation of sign languages via providing continuously synchronized video-text data.read more
Citations
More filters
Posted Content
BBC-Oxford British Sign Language Dataset.
Samuel Albanie,Gül Varol,Liliane Momeni,Hannah Bull,Triantafyllos Afouras,Himel Chowdhury,Neil Fox,Bencie Woll,Rob Cooper,Andrew McParland,Andrew Zisserman +10 more
TL;DR: The BBC-Oxford British Sign Language (BOBSL) dataset as discussed by the authors is a large-scale video collection of British sign language (BSL), which is an extended and publicly released dataset based on the BSL-1K dataset.
Posted Content
BBC-Oxford British Sign Language Dataset
TL;DR: The BBC-Oxford British Sign Language (BOBSL) dataset as discussed by the authors is a large-scale video collection of British sign language (BSL), which is an extended and publicly released dataset based on the BSL-1K dataset.