Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization

doi:10.1109/CVPR.2017.175

Proceedings ArticleDOI

Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization

- pp 1610-1618

TLDR

This work presents a weakly supervised framework with deep neural networks for vision-based continuous sign language recognition, where the ordered gloss labels but no exact temporal locations are available with the video of sign sentence, and the amount of labeled sentences for training is limited.

Abstract:

This work presents a weakly supervised framework with deep neural networks for vision-based continuous sign language recognition, where the ordered gloss labels but no exact temporal locations are available with the video of sign sentence, and the amount of labeled sentences for training is limited. Our approach addresses the mapping of video segments to glosses by introducing recurrent convolutional neural network for spatio-temporal feature extraction and sequence learning. We design a three-stage optimization process for our architecture. First, we develop an end-to-end sequence learning scheme and employ connectionist temporal classification (CTC) as the objective function for alignment proposal. Second, we take the alignment proposal as stronger supervision to tune our feature extractor. Finally, we optimize the sequence learning model with the improved feature representations, and design a weakly supervised detection network for regularization. We apply the proposed approach to a real-world continuous sign language recognition benchmark, and our method, with no extra supervision, achieves results comparable to the state-of-the-art.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Neural Sign Language Translation

Necati Cihan Camgoz, +4 more

TL;DR: This work formalizes SLT in the framework of Neural Machine Translation (NMT) for both end-to-end and pretrained settings (using expert knowledge) and allows to jointly learn the spatial representations, the underlying language model, and the mapping between sign and spoken language.

...read moreread less

Journal ArticleDOI

A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training

Runpeng Cui, +2 more

- 01 Jan 2019 -

IEEE Transactions on Multimedia

TL;DR: This work develops a continuous sign language (SL) recognition framework with deep neural networks, which directly transcribes videos of SL sentences to sequences of ordered gloss labels, and proposed architecture adopts deep convolutional neural networks with stacked temporal fusion layers as the feature extraction module.

...read moreread less

Journal ArticleDOI

Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos

Oscar Koller, +3 more

- 01 Sep 2020 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work applies the approach to the domain of sign language recognition exploiting the sequential parallelism to learn sign language, mouth shape and hand shape classifiers and clearly outperform the state-of-the-art on all data sets and observe significantly faster convergence using the parallel alignment approach.

...read moreread less

Posted Content

Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation

Necati Cihan Camgoz, +3 more

- 30 Mar 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation while being trainable in an end-to-end manner is introduced by using a Connectionist Temporal Classification (CTC) loss to bind the recognition and translation problems into a single unified architecture.

...read moreread less

Proceedings ArticleDOI

Iterative Alignment Network for Continuous Sign Language Recognition

Junfu Pu, +2 more

TL;DR: The framework consists of a 3D convolutional residual network for feature learning and an encoder-decoder network with connectionist temporal classification (CTC) for sequence modelling that is optimized in an alternate way for weakly supervised continuous sign language recognition.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Journal ArticleDOI

Mean shift: a robust approach toward feature space analysis

Dorin Comaniciu, +1 more

- 01 May 2002 -

IEEE Transactions on Pattern Analysis an...

TL;DR: It is proved the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and, thus, its utility in detecting the modes of the density.

...read moreread less

Proceedings ArticleDOI

Learning Spatiotemporal Features with 3D Convolutional Networks

Du Tran, +5 more

TL;DR: The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks.

...read moreread less

Collapse

IEEE Transactions on Pattern Analysis an...

Recurrent Convolutional Neural Networks for Continuous Sign Language Recognition by Staged Optimization

Citations

Neural Sign Language Translation

A Deep Neural Framework for Continuous Sign Language Recognition by Iterative Training

Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos

Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation

Iterative Alignment Network for Continuous Sign Language Recognition

References

Adam: A Method for Stochastic Optimization

Going deeper with convolutions

ImageNet Large Scale Visual Recognition Challenge

Mean shift: a robust approach toward feature space analysis

Learning Spatiotemporal Features with 3D Convolutional Networks

Related Papers (5)

Neural Sign Language Translation

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

Real-time American sign language recognition using desk and wearable computer based video