Re-Sign: Re-Aligned End-to-End Sequence Modelling with Deep Recurrent CNN-HMMs

doi:10.1109/CVPR.2017.364

Proceedings ArticleDOI

Re-Sign: Re-Aligned End-to-End Sequence Modelling with Deep Recurrent CNN-HMMs

Oscar Koller, +2 more

- pp 3416-3424

Chats0

TLDR

This work proposes an algorithm that treats the provided training labels as weak labels and refines the label-to-image alignment on-the-fly in a weakly supervised fashion, and embedded into an HMM the resulting deep model continuously improves its performance in several re-alignments.

Abstract:

This work presents an iterative re-alignment approach applicable to visual sequence labelling tasks such as gesture recognition, activity recognition and continuous sign language recognition. Previous methods dealing with video data usually rely on given frame labels to train their classifiers. However, looking at recent data sets, these labels often tend to be noisy which is commonly overseen. We propose an algorithm that treats the provided training labels as weak labels and refines the label-to-image alignment on-the-fly in a weakly supervised fashion. Given a series of frames and sequence-level labels, a deep recurrent CNN-BLSTM network is trained end-to-end. Embedded into an HMM the resulting deep model corrects the frame labels and continuously improves its performance in several re-alignments. We evaluate on two challenging publicly available sign recognition benchmark data sets featuring over 1000 classes. We outperform the state-of-the-art by up to 10% absolute and 30% relative.

Citations

PDF

Open Access

More filters

Posted Content

Visual Alignment Constraint for Continuous Sign Language Recognition

Yuecong Min, +3 more

- 06 Apr 2021 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Wang et al. as discussed by the authors proposed a Visual Alignment Constraint (VAC) to enhance the feature extractor with more alignment supervision, which is composed of two auxiliary losses: one makes predictions based on visual features only, and the other aligns short-term visual and long-term contextual features.

...read moreread less

Proceedings ArticleDOI

Translation of Sign Language Glosses to Text Using Sequence-to-Sequence Attention Models

Nikolaos Arvanitis, +2 more

TL;DR: This is the first work that used Sequence to Sequence models with attention mechanism to translate American gloss sentences to English, and the results are very promising and can be useful for the further implementation of a full sign language recognition system.

...read moreread less

Posted Content

BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues.

Samuel Albanie, +6 more

- 23 Jul 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The BSL-1K dataset as discussed by the authors is a collection of British Sign Language (BSL) signs of unprecedented scale, which can be used to train strong sign recognition models for co-articulated signs in BSL and additionally form excellent pretraining for other sign languages.

...read moreread less

Proceedings ArticleDOI

Fingerspelling Detection in American Sign Language

Bowen Shi, +3 more

TL;DR: In this article, a multi-task learning model was proposed to detect fingerspelling in raw, untrimmed sign language videos, incorporating pose estimation and fingerspelling recognition along with detection.

...read moreread less

Journal ArticleDOI

Graph-Based Multimodal Sequential Embedding for Sign Language Translation

- 01 Jan 2022 -

IEEE Transactions on Multimedia

TL;DR: In this paper , a graph-based multimodal sequential embedding network (MSeqGraph) is proposed, in which multiple sequential modalities are densely correlated and a graph embedding unit (GEU) is designed to realize the intra-modal and intermodal correlations, and a hierarchical GEU stacker with a pooling-based skip connection is proposed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Journal ArticleDOI