Proceedings ArticleDOI
Re-Sign: Re-Aligned End-to-End Sequence Modelling with Deep Recurrent CNN-HMMs
Oscar Koller,Sepehr Zargaran,Hermann Ney +2 more
- pp 3416-3424
Reads0
Chats0
TLDR
This work proposes an algorithm that treats the provided training labels as weak labels and refines the label-to-image alignment on-the-fly in a weakly supervised fashion, and embedded into an HMM the resulting deep model continuously improves its performance in several re-alignments.Abstract:
This work presents an iterative re-alignment approach applicable to visual sequence labelling tasks such as gesture recognition, activity recognition and continuous sign language recognition. Previous methods dealing with video data usually rely on given frame labels to train their classifiers. However, looking at recent data sets, these labels often tend to be noisy which is commonly overseen. We propose an algorithm that treats the provided training labels as weak labels and refines the label-to-image alignment on-the-fly in a weakly supervised fashion. Given a series of frames and sequence-level labels, a deep recurrent CNN-BLSTM network is trained end-to-end. Embedded into an HMM the resulting deep model corrects the frame labels and continuously improves its performance in several re-alignments. We evaluate on two challenging publicly available sign recognition benchmark data sets featuring over 1000 classes. We outperform the state-of-the-art by up to 10% absolute and 30% relative.read more
Citations
More filters
Proceedings ArticleDOI
Unsupervised Key Hand Shape Discovery of Sign Language Videos with Correspondence Sparse Autoencoders
TL;DR: This paper assigns labels of an isolated Sign Language (SL) dataset using end-to-end neural network architectures that have proven success in unsupervised discovery of sub-word acoustic units in speech processing and observes that key-hand-shape s(KHS), which are meaningful visual basic parts of signs in a SL dataset can be detected using unsuper supervised clustering techniques.
Proceedings ArticleDOI
Sign Language Gesture Classification using Neural Networks.
TL;DR: The efficiency of the LeNet convolutional neural network for isolated word sign language recognition is demonstrated and several techniques to obtain the same dimension for the input that contains gesture information are applied.
Journal ArticleDOI
Fine-tuning of sign language recognition models: a technical report
TL;DR: In this article , a skeleton-aware multi-modal sign language recognition model was proposed for real-time gesture recognition in three different languages (Arabic, Turkish and AUTSL) and three different sign languages datasets (WLASL, Turkish, AUTSL and RSL).
Book ChapterDOI
Unsupervised Discovery of Sign Terms by K-Nearest Neighbours Approach
Korhan Polat,Murat Saraclar +1 more
TL;DR: In this article, the authors used visual features extracted from RGB videos to find the repeating terms from continuous sign videos without any supervision, using a k-nearest neighbours based discovery algorithm designed for speech can also discover sign terms.
Posted Content
Better Sign Language Translation with STMC-Transformer
Kayo Yin,Jesse Read +1 more
TL;DR: This article showed that glosses are an inefficient representation of sign language, and suggested an end-to-end training of the recognition and translation models, or using a different sign language annotation scheme.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.