scispace - formally typeset
Journal ArticleDOI

Isolated Sign Language Recognition with Grassmann Covariance Matrices

TLDR
This article proposes a covariance matrix--based representation to naturally fuse information from multimodal sources to utilize long-term dynamics over an isolated sign sequence, and demonstrates that the proposed method outperforms the state-of-the-art methods both in accuracy and computational cost.
Abstract
In this article, to utilize long-term dynamics over an isolated sign sequence, we propose a covariance matrix--based representation to naturally fuse information from multimodal sources. To tackle the drawback induced by the commonly used Riemannian metric, the proximity of covariance matrices is measured on the Grassmann manifold. However, the inherent Grassmann metric cannot be directly applied to the covariance matrix. We solve this problem by evaluating and selecting the most significant singular vectors of covariance matrices of sign sequences. The resulting compact representation is called the Grassmann covariance matrix. Finally, the Grassmann metric is used to be a kernel for the support vector machine, which enables learning of the signs in a discriminative manner. To validate the proposed method, we collect three challenging sign language datasets, on which comprehensive evaluations show that the proposed method outperforms the state-of-the-art methods both in accuracy and computational cost.

read more

Citations
More filters
Proceedings ArticleDOI

SubUNets: End-to-End Hand Shape and Continuous Sign Language Recognition

TL;DR: A novel deep learning approach to solve simultaneous alignment and recognition problems (referred to as “Sequence-to-sequence” learning) is proposed, which decompose the problem into a series of specialised expert systems referred to as SubUNets, and serves to significantly improve the performance of the overarching recognition system.
Posted Content

Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation

TL;DR: A novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation while being trainable in an end-to-end manner is introduced by using a Connectionist Temporal Classification (CTC) loss to bind the recognition and translation problems into a single unified architecture.
Proceedings Article

Hierarchical LSTM for Sign Language Translation

TL;DR: A hierarchical-LSTM (HLSTM) encoderdecoder model with visual content and word embedding for SLT exhibits promising performance on singer-independent test with seen sentences and also outperforms the comparison algorithms on unseen sentences.
Proceedings ArticleDOI

Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation

TL;DR: Sign Language Transformers as mentioned in this paper use a Connectionist Temporal Classification (CTC) loss to bind the recognition and translation problems into a single unified architecture, which leads to significant performance gains.
Journal ArticleDOI

Dynamic Sign Language Recognition Based on Video Sequence With BLSTM-3D Residual Networks

TL;DR: A multimodal dynamic sign language recognition method based on a deep 3-dimensional residual ConvNet and bi-directional LSTM networks, which is named as BLSTM-3D residual network (B3D ResNet), which can obtain state-of-the-art recognition accuracy.
References
More filters
Journal ArticleDOI

LIBSVM: A library for support vector machines

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Book

Matrix computations

Gene H. Golub
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Journal ArticleDOI

Real-time human pose recognition in parts from single depth images

TL;DR: This work takes an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem, and generates confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes.
Journal ArticleDOI

On Space-Time Interest Points

TL;DR: This paper builds on the idea of the Harris and Förstner interest point operators and detects local structures in space-time where the image values have significant local variations in both space and time and illustrates how a video representation in terms of local space- time features allows for detection of walking people in scenes with occlusions and dynamic cluttered backgrounds.