scispace - formally typeset
Proceedings ArticleDOI

Re-Sign: Re-Aligned End-to-End Sequence Modelling with Deep Recurrent CNN-HMMs

Reads0
Chats0
TLDR
This work proposes an algorithm that treats the provided training labels as weak labels and refines the label-to-image alignment on-the-fly in a weakly supervised fashion, and embedded into an HMM the resulting deep model continuously improves its performance in several re-alignments.
Abstract
This work presents an iterative re-alignment approach applicable to visual sequence labelling tasks such as gesture recognition, activity recognition and continuous sign language recognition. Previous methods dealing with video data usually rely on given frame labels to train their classifiers. However, looking at recent data sets, these labels often tend to be noisy which is commonly overseen. We propose an algorithm that treats the provided training labels as weak labels and refines the label-to-image alignment on-the-fly in a weakly supervised fashion. Given a series of frames and sequence-level labels, a deep recurrent CNN-BLSTM network is trained end-to-end. Embedded into an HMM the resulting deep model corrects the frame labels and continuously improves its performance in several re-alignments. We evaluate on two challenging publicly available sign recognition benchmark data sets featuring over 1000 classes. We outperform the state-of-the-art by up to 10% absolute and 30% relative.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Neural Sign Language Translation Based on Human Keypoint Estimation

TL;DR: In this article, a sign language translation system based on human keypoint estimation was proposed, where the obtained human keypoints vector is normalized by the mean and standard deviation of the keypoints and used as input to the translation model based on the sequence-to-sequence architecture.
Journal ArticleDOI

Attention-Based 3D-CNNs for Large-Vocabulary Sign Language Recognition

TL;DR: This paper presents attention-based 3D-convolutional neural networks (3D-CNNs) for SLR, a framework that learns spatio-temporal features from raw video without prior knowledge and helps to select the clue.
Book ChapterDOI

BSL-1K: Scaling Up Co-articulated Sign Language Recognition Using Mouthing Cues

TL;DR: A new scalable approach to data collection for sign recognition in continuous videos is introduced, and it is shown that BSL-1K can be used to train strong sign recognition models for co-articulated signs in BSL and that these models additionally form excellent pretraining for other sign languages and benchmarks.
Proceedings ArticleDOI

Unsupervised Learning of Action Classes With Continuous Temporal Embedding

TL;DR: This work uses a continuous temporal embedding of framewise features to benefit from the sequential nature of activities and identifies clusters of temporal segments across all videos that correspond to semantic meaningful action classes.
Posted Content

NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning

TL;DR: This work proposes a novel learning algorithm with a Viterbi-based loss that allows for online and incremental learning of weakly annotated video data and shows that explicit context and length modeling leads to huge improvements in video segmentation and labeling tasks.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Related Papers (5)