scispace - formally typeset
Search or ask a question
Author

Alexandra Stefan

Other affiliations: Boston University
Bio: Alexandra Stefan is an academic researcher from University of Texas at Arlington. The author has contributed to research in topics: Gesture recognition & Gesture. The author has an hindex of 8, co-authored 12 publications receiving 472 citations. Previous affiliations of Alexandra Stefan include Boston University.

Papers
More filters
Journal ArticleDOI
TL;DR: A novel metric for time series, called Move-Split-Merge (MSM), is proposed, which uses as building blocks three fundamental operations: Move, Split, and Merge, which can be applied in sequence to transform any time series into any other time series.
Abstract: A novel metric for time series, called Move-Split-Merge (MSM), is proposed. This metric uses as building blocks three fundamental operations: Move, Split, and Merge, which can be applied in sequence to transform any time series into any other time series. A Move operation changes the value of a single element, a Split operation converts a single element into two consecutive elements, and a Merge operation merges two consecutive elements into one. Each operation has an associated cost, and the MSM distance between two time series is defined to be the cost of the cheapest sequence of operations that transforms the first time series into the second one. An efficient, quadratic-time algorithm is provided for computing the MSM distance. MSM has the desirable properties of being metric, in contrast to the Dynamic Time Warping (DTW) distance, and invariant to the choice of origin, in contrast to the Edit Distance with Real Penalty (ERP) metric. At the same time, experiments with public time series data sets demonstrate that MSM is a meaningful distance measure, that oftentimes leads to lower nearest neighbor classification error rate compared to DTW and ERP.

136 citations

Proceedings ArticleDOI
23 Jun 2008
TL;DR: The ASL lexicon video dataset is introduced, a large and expanding public dataset containing video sequences of thousands of distinct ASL signs, as well as annotations of those sequences, including start/end frames and class label of every sign.
Abstract: The lack of a written representation for American sign language (ASL) makes it difficult to do something as commonplace as looking up an unknown word in a dictionary. The majority of printed dictionaries organize ASL signs (represented in drawings or pictures) based on their nearest English translation; so unless one already knows the meaning of a sign, dictionary look-up is not a simple proposition. In this paper we introduce the ASL lexicon video dataset, a large and expanding public dataset containing video sequences of thousands of distinct ASL signs, as well as annotations of those sequences, including start/end frames and class label of every sign. This dataset is being created as part of a project to develop a computer vision system that allows users to look up the meaning of an ASL sign. At the same time, the dataset can be useful for benchmarking a variety of computer vision and machine learning methods designed for learning and/or indexing a large number of visual classes, and especially approaches for analyzing gestures and human communication.

128 citations

Proceedings ArticleDOI
25 May 2011
TL;DR: This paper proposes a method that accommodates such challenging conditions by detecting the hands using scene depth information from the Kinect using Dynamic Time Warping (DTW) and can be generalized to recognize a wider range of gestures.
Abstract: In human-computer interaction applications, gesture recognition has the potential to provide a natural way of communication between humans and machines. The technology is becoming mature enough to be widely available to the public and real-world computer vision applications start to emerge. A typical example of this trend is the gaming industry and the launch of Microsoft's new camera: the Kinect. Other domains, where gesture recognition is needed, include but are not limited to: sign language recognition, virtual reality environments and smart homes. A key challenge for such real-world applications is that they need to operate in complex scenes with cluttered backgrounds, various moving objects and possibly challenging illumination conditions. In this paper we propose a method that accommodates such challenging conditions by detecting the hands using scene depth information from the Kinect. On top of our detector we employ a dynamic programming method for recognizing gestures, namely Dynamic Time Warping (DTW). Our method is translation and scale invariant which is a desirable property for many HCI systems. We have tested the performance of our approach on a digits recognition system. All experimental datasets include hand signed digits gestures but our framework can be generalized to recognize a wider range of gestures.

120 citations

Book ChapterDOI
10 Sep 2010
TL;DR: A method is presented to help users look up the meaning of an unknown sign from American Sign Language (ASL), where the user submits a video of the unknown sign as a query, and the system retrieves the most similar signs from a database of sign videos.
Abstract: A method is presented to help users look up the meaning of an unknown sign from American Sign Language (ASL) The user submits a video of the unknown sign as a query, and the system retrieves the most similar signs from a database of sign videos The user then reviews the retrieved videos to identify the video displaying the sign of interest Hands are detected in a semi-automatic way: the system performs some hand detection and tracking, and the user has the option to verify and correct the detected hand locations Features are extracted based on hand motion and hand appearance Similarity between signs is measured by combining dynamic time warping (DTW) scores, which are based on hand motion, with a simple similarity measure based on hand appearance In user-independent experiments, with a system vocabulary of 1,113 signs, the correct sign was included in the top 10 matches for 78% of the test queries

48 citations

Proceedings ArticleDOI
16 Jul 2008
TL;DR: The key idea is to integrate a face detection module into the gesture recognition system, and use the face location and size to make gesture recognition invariant to scale and translation.
Abstract: Gestures are a natural means of communication between humans, and also a natural modality for human-computer interaction. Automatic recognition of gestures using computer vision is an important task in many real-world applications, such as sign language recognition, computer games control, virtual reality, intelligent homes, and assistive environments. In order for a gesture recognition system to be robust and deployable in non-laboratory settings, the system needs to be able to operate in complex scenes, with complicated backgrounds and multiple moving and skin-colored objects. In this paper we propose an approach for improving gesture recognition performance in such complex environments. The key idea is to integrate a face detection module into the gesture recognition system, and use the face location and size to make gesture recognition invariant to scale and translation. Our experiments demonstrate the significant advantages of the proposed method over alternative computer vision methods for gesture recognition.

36 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A comprehensive review of recent Kinect-based computer vision algorithms and applications covering topics including preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping.
Abstract: With the invention of the low-cost Microsoft Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use. The complementary nature of the depth and visual information provided by the Kinect sensor opens up new opportunities to solve fundamental problems in computer vision. This paper presents a comprehensive review of recent Kinect-based computer vision algorithms and applications. The reviewed approaches are classified according to the type of vision problems that can be addressed or enhanced by means of the Kinect sensor. The covered topics include preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping. For each category of methods, we outline their main algorithmic contributions and summarize their advantages/differences compared to their RGB counterparts. Finally, we give an overview of the challenges in this field and future research trends. This paper is expected to serve as a tutorial and source of references for Kinect-based computer vision researchers.

1,513 citations

Book
02 Jan 1991

1,377 citations

Journal ArticleDOI
TL;DR: This work implemented 18 recently proposed algorithms in a common Java framework and compared them against two standard benchmark classifiers (and each other) by performing 100 resampling experiments on each of the 85 datasets, indicating that only nine of these algorithms are significantly more accurate than both benchmarks.
Abstract: In the last 5 years there have been a large number of new time series classification algorithms proposed in the literature. These algorithms have been evaluated on subsets of the 47 data sets in the University of California, Riverside time series classification archive. The archive has recently been expanded to 85 data sets, over half of which have been donated by researchers at the University of East Anglia. Aspects of previous evaluations have made comparisons between algorithms difficult. For example, several different programming languages have been used, experiments involved a single train/test split and some used normalised data whilst others did not. The relaunch of the archive provides a timely opportunity to thoroughly evaluate algorithms on a larger number of datasets. We have implemented 18 recently proposed algorithms in a common Java framework and compared them against two standard benchmark classifiers (and each other) by performing 100 resampling experiments on each of the 85 datasets. We use these results to test several hypotheses relating to whether the algorithms are significantly more accurate than the benchmarks and each other. Our results indicate that only nine of these algorithms are significantly more accurate than both benchmarks and that one classifier, the collective of transformation ensembles, is significantly more accurate than all of the others. All of our experiments and results are reproducible: we release all of our code, results and experimental details and we hope these experiments form the basis for more robust testing of new algorithms in the future.

1,070 citations

Proceedings ArticleDOI
01 Nov 2011
TL;DR: This work uses a state-of-the-art big and deep neural network combining convolution and max-pooling for supervised feature learning and classification of hand gestures given by humans to mobile robots using colored gloves.
Abstract: Automatic recognition of gestures using computer vision is important for many real-world applications such as sign language recognition and human-robot interaction (HRI). Our goal is a real-time hand gesture-based HRI interface for mobile robots. We use a state-of-the-art big and deep neural network (NN) combining convolution and max-pooling (MPCNN) for supervised feature learning and classification of hand gestures given by humans to mobile robots using colored gloves. The hand contour is retrieved by color segmentation, then smoothened by morphological image processing which eliminates noisy edges. Our big and deep MPCNN classifies 6 gesture classes with 96% accuracy, nearly three times better than the nearest competitor. Experiments with mobile robots using an ARM 11 533MHz processor achieve real-time gesture recognition performance.

555 citations

Journal ArticleDOI
TL;DR: This work believes that their ensemble is the first ever classifier to significantly outperform DTW and raises the bar for future work in this area, and demonstrates that the ensemble is more accurate than approaches not based in the time domain.
Abstract: Several alternative distance measures for comparing time series have recently been proposed and evaluated on time series classification (TSC) problems. These include variants of dynamic time warping (DTW), such as weighted and derivative DTW, and edit distance-based measures, including longest common subsequence, edit distance with real penalty, time warp with edit, and move---split---merge. These measures have the common characteristic that they operate in the time domain and compensate for potential localised misalignment through some elastic adjustment. Our aim is to experimentally test two hypotheses related to these distance measures. Firstly, we test whether there is any significant difference in accuracy for TSC problems between nearest neighbour classifiers using these distance measures. Secondly, we test whether combining these elastic distance measures through simple ensemble schemes gives significantly better accuracy. We test these hypotheses by carrying out one of the largest experimental studies ever conducted into time series classification. Our first key finding is that there is no significant difference between the elastic distance measures in terms of classification accuracy on our data sets. Our second finding, and the major contribution of this work, is to define an ensemble classifier that significantly outperforms the individual classifiers. We also demonstrate that the ensemble is more accurate than approaches not based in the time domain. Nearly all TSC papers in the data mining literature cite DTW (with warping window set through cross validation) as the benchmark for comparison. We believe that our ensemble is the first ever classifier to significantly outperform DTW and as such raises the bar for future work in this area.

443 citations