scispace - formally typeset
Search or ask a question

Showing papers by "Jacob Eisenstein published in 2006"


Proceedings ArticleDOI
30 Jul 2006
TL;DR: An empirical study is described showing that the removal of auditory information significantly impairs the ability of human raters to classify gestures, and an automatic gesture classification system is presented based solely on an n-gram model of linguistic context.
Abstract: Classification of natural hand gestures is usually approached by applying pattern recognition to the movements of the hand. However, the gesture categories most frequently cited in the psychology literature are fundamentally multimodal; the definitions make reference to the surrounding linguistic context. We address the question of whether gestures are naturally multimodal, or whether they can be classified from hand-movement data alone. First, we describe an empirical study showing that the removal of auditory information significantly impairs the ability of human raters to classify gestures. Then we present an automatic gesture classification system based solely on an n-gram model of linguistic context; the system is intended to supplement a visual classifier, but achieves 66% accuracy on a three-class classification problem on its own. This represents higher accuracy than human raters achieve when presented with the same information.

23 citations


Proceedings ArticleDOI
04 Jun 2006
TL;DR: This work explores features of hand gesture that are correlated with coreference and concludes that combining these features with a traditional textual model yields a statistically significant improvement in overall performance.
Abstract: Coreference resolution, like many problems in natural language processing, has most often been explored using datasets of written text. While spontaneous spoken language poses well-known challenges, it also offers additional modalities that may help disambiguate some of the inherent disfluency. We explore features of hand gesture that are correlated with coreference. Combining these features with a traditional textual model yields a statistically significant improvement in overall performance.

18 citations


Proceedings ArticleDOI
30 Jul 2006
TL;DR: It is described how the presence of a display can affect the use of gesture and common gesturing behaviors that, if accommodated, may improve the naturalness and usability of gestural user interfaces.
Abstract: Gesture plays a prominent role in human-human interaction, and it offers promise as a new modality for human-computer interaction. However, our understanding of gesture is still at an early stage. This study explores gesture in natural interaction and describes how the presence of a display can affect the use of gesture. We identify common gesturing behaviors that, if accommodated, may improve the naturalness and usability of gestural user interfaces.

16 citations


Proceedings ArticleDOI
22 Apr 2006
TL;DR: This study compares two interaction techniques based on computer vision: motion sensing, with EyeToy®-like feedback, and object tracking as they would be used in real-world applications, with integrated user feedback, allowing interface designers to choose the one that best suits the specific user requirements for their particular application.
Abstract: Communication appliances, intended for home settings, require intuitive forms of interaction Computer vision offers a potential solution, but is not yet sufficiently accurateAs interaction designers, we need to know more than the absolute accuracy of such techniques: we must also be able to compare how they will work in our design settings, especially if we allow users to collaborate in the interpretation of their actions We conducted a 2x4 within-subjects experiment to compare two interaction techniques based on computer vision: motion sensing, with EyeToy®-like feedback, and object tracking Both techniques were 100% accurate with 2 or 5 choices With 21 choices, object-tracking had significantly fewer errors and took less time for an accurate selection Participants' subjective preferences were divided equally between the two techniques This study compares these techniques as they would be used in real-world applications, with integrated user feedback, allowing interface designers to choose the one that best suits the specific user requirements for their particular application

15 citations


Book ChapterDOI
01 May 2006
TL;DR: A new, small-scale video corpus of spontaneous spoken-language dialogues is introduced, from which a set of gesture features are automatically derived, to conduct a quantitative analysis of the relationship between gesture and semantics, without having to explicitly formalize semantics through an ontology.
Abstract: If gesture communicates semantics, as argued by many psychologists, then it should be relevant to bridging the gap between syntax and semantics in natural language processing. One benchmark problem for computational semantics is coreference resolution: determining whether two noun phrases refer to the same semantic entity. Focusing on coreference allows us to conduct a quantitative analysis of the relationship between gesture and semantics, without having to explicitly formalize semantics through an ontology. We introduce a new, small-scale video corpus of spontaneous spoken-language dialogues, from which we have used computer vision to automatically derive a set of gesture features. The relevance of these features to coreference resolution is then discussed. An analysis of the timing of these features also enables us to present new findings on gesture-speech synchronization.

5 citations


Journal Article
TL;DR: This paper introduced a small-scale video corpus of spontaneous spoken-language dialogues, from which they have used computer vision to automatically derive a set of gesture features and discussed the relevance of these features to coreference resolution.
Abstract: If gesture communicates semantics, as argued by many psychologists, then it should be relevant to bridging the gap between syntax and semantics in natural language processing. One benchmark problem for computational semantics is coreference resolution: determining whether two noun phrases refer to the same semantic entity. Focusing on coreference allows us to conduct a quantitative analysis of the relationship between gesture and semantics, without having to explicitly formalize semantics through an ontology. We introduce a new, small-scale video corpus of spontaneous spoken-language dialogues, from which we have used computer vision to automatically derive a set of gesture features. The relevance of these features to coreference resolution is then discussed. An analysis of the timing of these features also enables us to present new findings on gesture-speech synchronization.

5 citations


Proceedings ArticleDOI
04 Jun 2006
TL;DR: This work aims to identify features from additional modalities in unconstrained natural language that can aid in language understanding through analysis of face-to-face spoken language.
Abstract: Although the natural-language processing community has dedicated much of its focus to text, face-to-face spoken language is ubiquitous, and offers the potential for breakthrough applications in domains such as meetings, lectures, and presentations. Because spontaneous spoken language is typically more disfluent and less structured than written text, it may be critical to identify features from additional modalities that can aid in language understanding. However, due to the long-standing emphasis on text datasets, there has been relatively little work on nontextual features in unconstrained natural language (prosody being the most studied non-textual modality, e.g. (Shriberg et al., 2000)).