scispace - formally typeset
Open AccessBook ChapterDOI

Finding recurrent patterns from continuous sign language sentences for automated extraction of signs

Reads0
Chats0
TLDR
In this paper, a probabilistic framework is presented to automatically learn recurring signs from multiple sign language video sequences containing the vocabulary of interest, which is robust to the variations produced by adjacent signs.
Abstract
We present a probabilistic framework to automatically learn models of recurring signs from multiple sign language video sequences containing the vocabulary of interest. We extract the parts of the signs that are present in most occurrences of the sign in context and are robust to the variations produced by adjacent signs. Each sentence video is first transformed into a multidimensional time series representation, capturing the motion and shape aspects of the sign. Skin color blobs are extracted from frames of color video sequences, and a probabilistic relational distribution is formed for each frame using the contour and edge pixels from the skin blobs. Each sentence is represented as a trajectory in a low dimensional space called the space of relational distributions. Given these time series trajectories, we extract signemes from multiple sentences concurrently using iterated conditional modes (ICM). We show results by learning single signs from a collection of sentences with one common pervading sign, multiple signs from a collection of sentences with more than one common sign, and single signs from a mixed collection of sentences. The extracted signemes demonstrate that our approach is robust to some extent to the variations produced within a sign due to different contexts. We also show results whereby these learned sign models are used for spotting signs in test sequences.

read more

Content maybe subject to copyright    Report

Journal of Machine Learning Research 13 (2012) 2589-2615 Submitted 11/11; Revised 5/12; Published 9/12
Finding Recurrent Patterns from Continuous Sign Language
Sentences for Automated Extraction of Signs
Sunita Nayak SNAYAK@TAAZ.COM
Taaz Inc.
4250 Executive Square, Suite 420
La Jolla, CA 92037 USA
Kester Duncan KKDUNCAN@CSE.USF.EDU
Sudeep Sarkar SARKAR@CSE.USF.EDU
Department of Computer Science & Engineering
University of South Florida
Tampa, FL 33620, USA
Barbara Loeding BARBARA@USF.EDU
Department of Special Education
University of South Florida
Lakeland, FL 33803, USA
Editor: Isabelle Guyon
Abstract
We present a probabilistic framework to automatically learn models of recurring signs from mul-
tiple sign language video sequences containing the vocabulary of interest. We extract the parts of
the signs that are present in most occurrences of the sign in context and are robust to the variations
produced by adjacent signs. Each sentence video is first transformed into a multidimensional time
series representation, capturing the motion and shape aspects of the si gn. Skin color blobs are ex-
tracted from frames of color video sequences, and a probabilistic relational distribution is formed
for each frame using the contour and edge pixels from the skin blobs. Each sentence is represented
as a trajectory in a low dimensional space called the space of relational distributions. Given these
time series trajectories, we extract signemes from multiple sentences concurrently using iterated
conditional modes (ICM). We show results by learning single signs from a collection of sentences
with one common pervading sign, multiple signs from a collection of sentences with more than
one common sign, and single signs from a mixed collection of sentences. The extracted signemes
demonstrate that our approach is robust to some extent to the variations produced within a sign due
to different contexts. We also show results whereby these learned sign models are used for spotting
signs in test sequences.
Keywords: pattern extraction, sign language r ecognition, signeme extraction, sign modeling,
iterated conditional modes
1. Introduction
Sign language research in the computer vision community has primarily focused on improving
recognition rates of signs either by improving the motion representation and similarity measures
(Yang et al., 2002; Al-Jarrah and Halawani, 2001; Athitsos et al., 2004; Cui and Weng, 2000; Wang
et al., 2007; Bauer and Hienz, 2000) or by adding linguistic clues during the recognition process
c
2012 Sunita Nayak, Kester Duncan, Sudeep Sarkar and Barbara Loeding.

NAYAK, DUNCAN, SARKAR AND LOEDING
(Bowden et al., 2004; Derpanis et al., 2004). Ong and Ranganath (2005) presented a review of
the automated sign language research and also highlighted one important issue in continuous sign
language recognition. While signing a sentence, there exists transitions of the hands between two
consecutive signs that do not belong to either sign. This is called movement epenthesis (Liddell and
Johnson, 1989). This needs to be dealt with first before dealing with any other phonological issues
in sign language (Ong and Ranganath, 2005). Most of the existing work in sign language assumes
that the training signs are already available and often signs used in the training set are the isolated
signs with the boundaries chopped off, or manually selected frames from continuous sentences.
The ability to recognize isolated signs does not guarantee the recognition of signs in continuous
sentences. Unlike isolated signs, a sign in a continuous sentence is strongly affected by its context
in the sentence. Figure 1 shows two sentences ‘I BUY TI CKET WHERE?’ and ‘YOU CAN BUY
THIS FOR HER’ with a common sign ‘BUY’ between them. The frames representing the sign
‘BUY’ and the neighboring signs are marked. The unmarked frames between the signs indicate
the frames corresponding to movement epenthesis. It can be observed that the same sign ‘BUY’ is
preceded and succeeded by movement epenthesis that depends on the end and start of the preceding
and succeeding sign respectively. The movement epenthesis also affects how the sign is signed.
This effect makes the automated extraction, modeling and recognition of signs from continuous
sentences more difficult when compared to just plain gestures, isolated signs, or finger spelling.
In this paper, we address the problem of automatically extracting the par t of a sign that is most
common in all occurrences of the sign, and hence expected to be robust with respect to the variation
of adjacent signs. These common parts can be used for spotting or recognition of signs in continuous
sign language sentences. They can also be used by sign language experts for teaching or studying
variations between instances of signs in continuous sign language sentences, or in automated sign
language tutoring systems. Furthermore, they can be used even in the process of translating sign
language videos directly to spoken words.
In a related work inspired by the success of the use of phonemes in speech recognition, the
authors sought to extract common parts in different instances of a sign and thus arrive at a phoneme-
analogue for signs (Bauer and Kraiss, 2002). But unlike speech, sign language does not have a
completely defined set of phonemes. Hence, we consider extracting commonalities at the sentence
and sub-sentence level.
A different but a closely related problem is the extraction of common subsequences, also called
motifs, from very long multiple gene sequences in biology (Bailey and Elkan, 1995; Lawrence et al.,
1993; Pevzner and Sze, 2000; Rigoutsos and Floratos, 1998). Lawrence et al. (1993) used a Gibbs
sampling approach based on discrete matches or mismatches of subsequences that were strings of
symbols of gene sequences. Bailey and Elkan (1995) used expectation maximization to find com-
mon subsequences in univariate biopolymer sequences. In biology, researchers deal with univariate
discrete sequences, and hence their algorithms are not always directly applicable to other multi-
variate continuous domains in time series like speech or sign language. Some researchers tried to
symbolize a continuous time series into discrete sequences and used existing algorithms from bioin-
formatics. For example, Chiu et al. (2003) symbolized the time series into a sequence of symbols
using local approximations and used random projections to extract common subsequences in noisy
data. Tanaka et al. (2005) extended their work by performing principal component analysis on the
multivariate time series data and projected them onto a single dimension and symbolized the data
into discrete sequences. However, it is not always possible to get all the important information in
2590

FINDING RECURRENT PATTERNS FROM CONTINUOUS SIGN LANGUAGE SENTENCES
(a) Continuous Sentence ‘I BUY TICKET WHERE?’
(b) Continuous Sentence ‘YOU CAN BUY THIS FOR HER’
Figure 1: Movement epenthesis in sign language sentences. Frames corresponding to the common
sign ‘BUY’ are marked in red. Signs adjacent to BUY are marked in magenta. Frames
between marked frames represent movement epenthesis that is, the transition between
signs. Note that the sign itself is also affected by having different signs preceding or
following it.
the first principal component alone. Further extending his work, Duchne et al. (2007) find recurrent
patterns from multivariate discrete data using time series random projections.
Due to the inherent continuous nature of many time series data like gesture and speech, new
methods were developed that do not require approximating the data to a sequence of discrete sym-
bols. Denton (2005) used a continuous random-walk noise model to cluster similar substrings.
Nayak et al. (2005) and Minnen et al. (2007) use continuous multivariate sequences and dynamic
time warping to find distances between the substrings. Oates (2002); Nayak et al. (2005) and Nayak
et al. (2009a) are among the few works in finding recurrent patterns that address non-uniform sam-
pling of time series. The recurrent pattern extraction approach proposed in this paper is based
2591

NAYAK, DUNCAN, SARKAR AND LOEDING
on multivariate continuous time series, uses dynamic time warping to find distances between sub-
strings, and handles length variations of common patterns.
Following the success of Hidden Markov Models (HMMs) in speech recognition, they were
used by sign language researchers (Vogler and Metaxas, 1999; Starner and Pentland, 1997; Bowden
et al., 2004; Bauer and Hienz, 2000; Starner et al., 1998) for representing and recognizing signs.
However, HMMs require a large number of training data and unlike speech, data from native sign-
ers is not as easily available as speech data. Hence, non-HMM-based approaches have been used
(Farhadi et al., 2007; Nayak et al., 2009a; Yang et al., 2010; Buehler et al., 2009; Nayak et al.,
2009b; Oszust and Wysocki, 2010; Han et al., 2009). In this paper, we use a continuous trajectory
representation of signs in a multidimensional space and use dynamic time warping to match sub-
sequences. The relative configuration of the two hands and face in each frame is represented by a
relational distribution (Vega and Sarkar, 2003; Nayak et al., 2005), which in itself is a probability
density function. The motion dynamics of the s igner is captured as changes in the relational distri-
butions. It also allows us to interpolate motion, if required, for data sets with lower frame capture
rates. It should also be noted that, unlike many of the previous works in sign language that perform
tracking of the hands using 3D magnetic trackers or color gloves (Fang et al., 2004; Vogler and
Metaxas, 2001; Wang et al., 2002; Ma et al., 2000; Cooper and Bowden, 2009), our representation
does not require tracking and relies on skin segmentation.
We present a Bayesian framework to extract the common subsequences or signemes from all
the given sentences simultaneously. Figure 2 depicts the overview of our approach. With this
framework, we can extract the first most common sign, the second most common sign, the third
most common sign and so on. We represent each sentence as a trajectory in a multi-dimensional
space that implicitly captures the shape and motion in the video. Skin color blobs are extracted
from frames of color video, and a relational distribution is formed for each frame using the edge
pixels in the skin blobs. Each sentence is then represented as a trajectory in a low dimensional space
called the space of relational distributions, which is arrived at by performing principal component
analysis (PCA) on the relational distributions. There are other alternatives to PCA that are possible
and discussed in Nayak et al. (2009b). The other choices do not change the nature of the signeme
finding approach, they only affect the quality of the features. The starting locations (a
1
,...a
n
) and
widths (w
1
,...w
n
) of the candidate signemes in all the n sentences are together represented by a
parameter vector. The starting locations are initialized with random starting locations, based on
uniform random sampling from each sentence, and the initial width values are randomly selected
from a given range of values. The parameter vector is updated sequentially by sampling the starting
point and width of the possible signeme in each sentence from a joint conditional distribution that is
based on the locations and widths of the target possible signeme in all other sentences. The process
is iterated till the parameter values converge to a stable solution. Monte Carlo approaches like
Gibbs sampling (Robert and Casella, 2004; Gilks et al., 1998; Casella and George, 1992), which
is a special case of the Metropolis-Hastings algorithm (Chib and Greenberg, 1995) can be used for
global optimization while updating the parameter vector by performing importance sampling on the
conditional probability distribution. However, this has a high burn-in period.
In this paper, we adopt a greedy approach based on the us e of iterated conditional modes (ICM)
(Besag, 1986). ICM converges much faster than a Gibbs sampler, but is known to be largely de-
pendent on the initialization. We overcome this limitation by performing ICM a number of times
equal to the average length of the n sentences, with different initializations. The most frequently
occurring solution from all the ICM runs is considered as the final solution.
2592

FINDING RECURRENT PATTERNS FROM CONTINUOUS SIGN LANGUAGE SENTENCES
Figure 2: Overview of our approach. Each of the n sentences is represented as a sequence in the
Space of Relational Distributions, and common patterns are extracted using iterated con-
ditional modes (ICM). The parameter set {a
1
,w
1
,...a
n
,w
n
} is initialized using uniform
random sampling and the conditional density corresponding to each sentence is updated
in a sequential manner.
The work in this paper builds on the work of Nayak et al. (2009a) and is different in multiple
respects. We propose a system that is generalized to extract more than one common sign from a
collection of sentences (first most common sign, second most common sign and so on), whereas
2593

Figures
Citations
More filters
Journal ArticleDOI

Continuous Sign Language Recognition: Towards Large Vocabulary Statistical Recognition Systems Handling Multiple Signers

TL;DR: This work presents a statistical recognition approach performing large vocabulary continuous sign language recognition across different signers, and is the first time system design on a large data set with true focus on real-life applicability is thoroughly presented.
Book ChapterDOI

Domain-Adaptive Discriminative One-Shot Learning of Gestures

TL;DR: The objective of this paper is to recognize gestures in videos – both localizing the gesture and classifying it into one of multiple classes.
Journal ArticleDOI

Continuous sign language recognition using level building based on fast hidden Markov model

TL;DR: Hidden Markov model (HMM) is used to calculate the similarity between the sign model and testing sequence, and a fast algorithm for computing the likelihood of HMM is proposed to reduce the computation complexity.
Journal Article

Challenges in multimodal gesture recognition

TL;DR: The state of the art on multimodal gesture recognition and the JMLR special topic on gesture recognition 2011-2015 are surveyed and a proposed taxonomy for gesture recognition is introduced, discussing challenges and future lines of research.
Journal ArticleDOI

Dynamic-static unsupervised sequentiality, statistical subunits and lexicon for sign language recognition

TL;DR: A new computational phonetic modeling framework for sign language (SL) recognition based on dynamic-static statistical subunits and provides sequentiality in an unsupervised manner, without prior linguistic information is introduced.
References
More filters
Journal ArticleDOI

Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning

TL;DR: Data acquisition, feature extraction and classification methods employed for the analysis of sign language gestures are examined and the overall progress toward a true test of sign recognition systems--dealing with natural signing by native signers is discussed.
Journal ArticleDOI

Extraction of 2D motion trajectories and its application to hand gesture recognition

TL;DR: Experimental results show that motion patterns of hand gestures can be extracted and recognized accurately using motion trajectories and applied to recognize 40 hand gestures of American Sign Language.
Journal ArticleDOI

A Framework for Recognizing the Simultaneous Aspects of American Sign Language

TL;DR: This paper presents a novel framework to ASL recognition that aspires to being a solution to the scalability problems, based on breaking down the signs into their phonemes and modeling them with parallel hidden Markov models.
Proceedings ArticleDOI

Parallel hidden Markov models for American sign language recognition

TL;DR: A novel approach to ASL recognition that aspires to being a solution to the scalability problems, based on parallel HMMs (PaHMMs), which model the parallel processes independently and can be trained independently, and do not require consideration of the different combinations at training time.
Journal ArticleDOI

Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle

TL;DR: This paper proposes a motif discovery algorithm to extract a motif that represents a characteristic pattern of the given data based on Minimum Description Length (MDL) principle, and can extract motifs from multi-dimensional time-series data by using Principal Component Analysis (PCA).
Related Papers (5)
Frequently Asked Questions (2)
Q1. What are the contributions in "Finding recurrent patterns from continuous sign language sentences for automated extraction of signs" ?

The authors present a probabilistic framework to automatically learn models of recurring signs from multiple sign language video sequences containing the vocabulary of interest. The authors extract the parts of the signs that are present in most occurrences of the sign in context and are robust to the variations produced by adjacent signs. Given these time series trajectories, the authors extract signemes from multiple sentences concurrently using iterated conditional modes ( ICM ). The authors show results by learning single signs from a collection of sentences with one common pervading sign, multiple signs from a collection of sentences with more than one common sign, and single signs from a mixed collection of sentences. The extracted signemes demonstrate that their approach is robust to some extent to the variations produced within a sign due to different contexts. The authors also show results whereby these learned sign models are used for spotting signs in test sequences. 

Additionally, the authors plan to extend their work to address the challenge of handling the large variations encountered when automatically recognizing signemes across different signers. The authors plan to work on a variation of dynamic time warping that is robust to amplitude differences between various instances of signs.