scispace - formally typeset
Search or ask a question

Showing papers by "Jacob Eisenstein published in 2007"


Proceedings Article
22 Jul 2007
TL;DR: This model predicts gesture salience as a hidden variable in a conditional framework, with observable features from both the visual and textual modalities, that significantly outperforms competitive baselines that do not use gesture information.
Abstract: Creating video recordings of events such as lectures or meetings is increasingly inexpensive and easy. However, reviewing the content of such video may be time-consuming and difficult. Our goal is to produce a "comic book" summary, in which a transcript is augmented with keyframes that disambiguate and clarify accompanying text. Unlike most previous keyframe extraction systems which rely primarily on visual cues, we present a linguistically-motivated approach that selects keyframes that contain salient gestures. Rather than learning gesture salience directly, it is estimated by measuring the contribution of gesture to understanding other discourse phenomena. More specifically, we bootstrap from multimodal coreference resolution to identify gestures that improve performance. We then select keyframes that capture these gestures. Our model predicts gesture salience as a hidden variable in a conditional framework, with observable features from both the visual and textual modalities. This approach significantly outperforms competitive baselines that do not use gesture information.

10 citations


Proceedings Article
01 Jun 2007
TL;DR: Conditional modality fusion is presented, which formalizes intuition by treating the informativeness of gesture as a hidden variable to be learned jointly with the class label to improve semantic tasks such as coreference resolution.
Abstract: Non-verbal modalities such as gesture can improve processing of spontaneous spoken language. For example, similar hand gestures tend to predict semantic similarity, so features that quantify gestural similarity can improve semantic tasks such as coreference resolution. However, not all hand movements are informative gestures; psychological research has shown that speakers are more likely to gesture meaningfully when their speech is ambiguous. Ideally, one would attend to gesture only in such circumstances, and ignore other hand movements. We present conditional modality fusion, which formalizes this intuition by treating the informativeness of gesture as a hidden variable to be learned jointly with the class label. Applied to coreference resolution, conditional modality fusion significantly outperforms both early and late modality fusion, which are current techniques for modality combination.

10 citations