YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition

doi:10.1109/ICCV.2013.337

Proceedings Article•DOI•

YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition

Sergio Guadarrama¹, Niveda Krishnamoorthy², Girish Malkarnenkar², Subhashini Venugopalan², Raymond J. Mooney², Trevor Darrell¹, Kate Saenko³ - Show less +3 more•Institutions (3)

University of California, Berkeley¹, University of Texas at Austin², University of Massachusetts Lowell³

01 Dec 2013-pp 2712-2719

TL;DR: This paper presents a solution that takes a short video clip and outputs a brief sentence that sums up the main activity in the video, such as the actor, the action and its object, and uses a Web-scale language model to ``fill in'' novel verbs.

read less

Abstract: Despite a recent push towards large-scale object recognition, activity recognition remains limited to narrow domains and small vocabularies of actions. In this paper, we tackle the challenge of recognizing and describing activities ``in-the-wild''. We present a solution that takes a short video clip and outputs a brief sentence that sums up the main activity in the video, such as the actor, the action and its object. Unlike previous work, our approach works on out-of-domain actions: it does not require training videos of the exact activity. If it cannot find an accurate prediction for a pre-trained model, it finds a less specific answer that is also plausible from a pragmatic standpoint. We use semantic hierarchies learned from the data to help to choose an appropriate level of generalization, and priors learned from Web-scale natural language corpora to penalize unlikely combinations of actors/actions/objects, we also use a Web-scale language model to ``fill in'' novel verbs, i.e. when the verb does not appear in the training set. We evaluate our method on a large YouTube corpus and demonstrate it is able to generate short sentence descriptions of video clips better than baseline approaches.

...read moreread less

Content maybe subject to copyright Report

YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition

Citations

Cites methods from "YouTube2Text: Recognizing and Descr..."

Cites methods from "YouTube2Text: Recognizing and Descr..."

Cites methods from "YouTube2Text: Recognizing and Descr..."

References

"YouTube2Text: Recognizing and Descr..." refers methods in this paper

"YouTube2Text: Recognizing and Descr..." refers methods in this paper

"YouTube2Text: Recognizing and Descr..." refers background or methods in this paper

"YouTube2Text: Recognizing and Descr..." refers methods in this paper

Related Papers (5)