scispace - formally typeset
Proceedings ArticleDOI

Re-Sign: Re-Aligned End-to-End Sequence Modelling with Deep Recurrent CNN-HMMs

Reads0
Chats0
TLDR
This work proposes an algorithm that treats the provided training labels as weak labels and refines the label-to-image alignment on-the-fly in a weakly supervised fashion, and embedded into an HMM the resulting deep model continuously improves its performance in several re-alignments.
Abstract
This work presents an iterative re-alignment approach applicable to visual sequence labelling tasks such as gesture recognition, activity recognition and continuous sign language recognition. Previous methods dealing with video data usually rely on given frame labels to train their classifiers. However, looking at recent data sets, these labels often tend to be noisy which is commonly overseen. We propose an algorithm that treats the provided training labels as weak labels and refines the label-to-image alignment on-the-fly in a weakly supervised fashion. Given a series of frames and sequence-level labels, a deep recurrent CNN-BLSTM network is trained end-to-end. Embedded into an HMM the resulting deep model corrects the frame labels and continuously improves its performance in several re-alignments. We evaluate on two challenging publicly available sign recognition benchmark data sets featuring over 1000 classes. We outperform the state-of-the-art by up to 10% absolute and 30% relative.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Neural Sign Language Translation

TL;DR: This work formalizes SLT in the framework of Neural Machine Translation (NMT) for both end-to-end and pretrained settings (using expert knowledge) and allows to jointly learn the spatial representations, the underlying language model, and the mapping between sign and spoken language.
Proceedings ArticleDOI

Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective

TL;DR: The results of an interdisciplinary workshop are presented, providing key background that is often overlooked by computer scientists, a review of the state-of-the-art, a set of pressing challenges, and a call to action for the research community.
Journal ArticleDOI

A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training

TL;DR: This work develops a continuous sign language (SL) recognition framework with deep neural networks, which directly transcribes videos of SL sentences to sequences of ordered gloss labels, and proposed architecture adopts deep convolutional neural networks with stacked temporal fusion layers as the feature extraction module.
Journal ArticleDOI

Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos

TL;DR: This work applies the approach to the domain of sign language recognition exploiting the sequential parallelism to learn sign language, mouth shape and hand shape classifiers and clearly outperform the state-of-the-art on all data sets and observe significantly faster convergence using the parallel alignment approach.
Posted Content

Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation

TL;DR: A novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation while being trainable in an end-to-end manner is introduced by using a Connectionist Temporal Classification (CTC) loss to bind the recognition and translation problems into a single unified architecture.
References
More filters
Proceedings ArticleDOI

LSTM Neural Networks for Language Modeling.

TL;DR: This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.
Journal ArticleDOI

A Novel Connectionist System for Unconstrained Handwriting Recognition

TL;DR: This paper proposes an alternative approach based on a novel type of recurrent neural network, specifically designed for sequence labeling tasks where the data is hard to segment and contains long-range bidirectional interdependencies, significantly outperforming a state-of-the-art HMM-based system.
Proceedings ArticleDOI

Hierarchical recurrent neural network for skeleton based action recognition

TL;DR: This paper proposes an end-to-end hierarchical RNN for skeleton based action recognition, and demonstrates that the model achieves the state-of-the-art performance with high computational efficiency.
Book

Connectionist Speech Recognition: A Hybrid Approach

TL;DR: Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous speech recognition systems based on Hidden Markov Models (HMMs) to improve their performance.
Proceedings ArticleDOI

Describing Videos by Exploiting Temporal Structure

TL;DR: In this paper, a spatial temporal 3-D convolutional neural network (3-D CNN) representation of the short temporal dynamics is used for video description, which is trained on video action recognition tasks, so as to produce a representation that is tuned to human motion and behavior.
Related Papers (5)