scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Video genre classification using dynamics

07 May 2001-Vol. 3, pp 1557-1560
TL;DR: Two methods of extracting motion from a video sequence: foreground object motion and background camera motion are presented, extracted, processed and applied to classify 3 broad classes: sports, cartoons and news.
Abstract: The problem addressed here is the classification of videos at the highest level into pre-defined genre. The approach adopted is based on the dynamic content of short sequences (/spl sim/30 secs). This paper presents two methods of extracting motion from a video sequence: foreground object motion and background camera motion. These dynamics are extracted, processed and applied to classify 3 broad classes: sports, cartoons and news. Experimental results for this 3 class problem give error rates of 17%, 8% and 6% for camera motion, object motion and both combined respectively, on /spl sim/30 second sequences.
Citations
More filters
Journal ArticleDOI
01 Nov 2011
TL;DR: Methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, and video retrieval including query interfaces are analyzed.
Abstract: Video indexing and retrieval have a wide spectrum of promising applications, motivating the interest of researchers worldwide. This paper offers a tutorial and an overview of the landscape of general strategies in visual content-based video indexing and retrieval, focusing on methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, video retrieval including query interfaces, similarity measure and relevance feedback, and video browsing. Finally, we analyze future research directions.

606 citations

Journal ArticleDOI
01 May 2008
TL;DR: This paper surveys the video classification literature and finds that features are drawn from three modalities - text, audio, and visual - and that a large variety of combinations of features and classification have been explored.
Abstract: There is much video available today. To help viewers find video of interest, work has begun on methods of automatic video classification. In this paper, we survey the video classification literature. We find that features are drawn from three modalities - text, audio, and visual - and that a large variety of combinations of features and classification have been explored. We describe the general features chosen and summarize the research in this area. We conclude with ideas for further research.

329 citations


Additional excerpts

  • ...[76] perform classification using only dynamics,...

    [...]

Proceedings ArticleDOI
23 Oct 2017
TL;DR: This paper proposes an end-to-end model which gradually refines its attention over the appearance and motion features of the video using the question as guidance and demonstrates the effectiveness of the model by analyzing the refined attention weights during the question answering procedure.
Abstract: Recently image question answering (ImageQA) has gained lots of attention in the research community. However, as its natural extension, video question answering (VideoQA) is less explored. Although both tasks look similar, VideoQA is more challenging mainly because of the complexity and diversity of videos. As such, simply extending the ImageQA methods to videos is insufficient and suboptimal. Particularly, working with the video needs to model its inherent temporal structure and analyze the diverse information it contains. In this paper, we consider exploiting the appearance and motion information resided in the video with a novel attention mechanism. More specifically, we propose an end-to-end model which gradually refines its attention over the appearance and motion features of the video using the question as guidance. The question is processed word by word until the model generates the final optimized attention. The weighted representation of the video, as well as other contextual information, are used to generate the answer. Extensive experiments show the advantages of our model compared to other baseline models. We also demonstrate the effectiveness of our model by analyzing the refined attention weights during the question answering procedure.

292 citations


Cites methods from "Video genre classification using dy..."

  • ...To measure the understanding ability of models on videos, different intermediate tasks are proposed such as video classification [12, 19, 35, 41] and action recognition [15, 20, 22, 24, 46]....

    [...]

Patent
01 Oct 2009
TL;DR: In this paper, a computer-implemented method is proposed to determine a social network graph for at least a portion of the social network, the graph including a plurality of nodes connected by links, each node corresponding to a user that has a profile page on the social networks.
Abstract: In one implementation, a computer-implemented method includes receiving at information related to users of a social network site, and determining a social network graph for at least a portion of the social network, the graph including a plurality of nodes connected by links, each node corresponding to a user that has a profile page on the social network. The method can also include identifying first nodes from the plurality of nodes as including content associated with a particular subject of interest, and seeding the identified first nodes with first scores. The method can additionally include determining second scores for second nodes based on propagation of the first scores from the first nodes to the second nodes using the links of the social network graph; and providing the determined second scores for the second nodes.

196 citations

Patent
01 May 2007

122 citations

References
More filters
Journal ArticleDOI
Shih-Fu Chang1, William Chen1, Horace J. Meng1, Hari Sundaram1, Di Zhong1 
TL;DR: The resulting system, called VideoQ, is the first on-line video search engine supporting automatic object-based indexing and spatiotemporal queries, and performs well, with the user being able to retrieve complex video clips such as those of skiers and baseball players with ease.
Abstract: The rapidity with which digital information, particularly video, is being generated has necessitated the development of tools for efficient search of these media. Content-based visual queries have been primarily focused on still image retrieval. In this paper, we propose a novel, interactive system on the Web, based on the visual paradigm, with spatiotemporal attributes playing a key role in video retrieval. We have developed innovative algorithms for automated video object segmentation and tracking, and use real-time video editing techniques while responding to user queries. The resulting system, called VideoQ , is the first on-line video search engine supporting automatic object-based indexing and spatiotemporal queries. The system performs well, with the user being able to retrieve complex video clips such as those of skiers and baseball players with ease.

431 citations


"Video genre classification using dy..." refers methods in this paper

  • ...An automatic approach by Chang et al supporting spatio-temporal queries is presented in [4]....

    [...]

Dissertation
01 Aug 1992

287 citations


"Video genre classification using dy..." refers methods in this paper

  • ...The feature vectors are then used to train a Gaussian Mixture Model (GMM) based classifier [14]....

    [...]

Proceedings ArticleDOI
01 Jan 1995
TL;DR: It is the goal to automatically classify the large body of existing video for easier access in digital video-on-demand databases.
Abstract: Film genres in digital video can be detected automatically. In a three-step approach we analyze first the syntactic properties of digital films: color statistics, cut detection, camera motion, object motion and audio. In a second step we use these statistics to derive at a more abstract level film style attributes such as camera panning and zooming, speech and music. These are distinguishing properties for film genres, e.g. newscasts vs. sports vs. commercials. In the third and final step we map the detected style attributes to film genres. Algorithms for the three steps are presented in detail, and we report on initial experience with real videos. It is our goal to automatically classify the large body of existing video for easier access in digital video-on-demand databases.

191 citations


"Video genre classification using dy..." refers background or methods in this paper

  • ...Although this third form of dynamic has been reported in [6, 7] to have discriminatory properties within th e application to video classification, it has not been conside re here although it is the subject of further work....

    [...]

  • ...A good example of this is presented by Fischeret al [7] using image based and audio based features applied to video genre classification....

    [...]

  • ...The results show that the dynamic feature extraction methods reported have good discriminatory properties and justify being part of an overall classificati on system possibly including static and audio features, such a s in [6, 7]....

    [...]

Journal ArticleDOI
TL;DR: To assist human analysis of video data, a technique has been developed to perform automatic, content-based video indexing from object motion to analyse the semantic content of the video.

188 citations


"Video genre classification using dy..." refers background in this paper

  • ...Event classifiers usually have complicated low-level motion measures, as described by Courtney [8]....

    [...]

  • ...[8] J. D. Courtney, “Automatic video indexing via object motion analysis,” Pattern Recognition, vol. 30, no. 4, pp. 607–625, 1997....

    [...]

Proceedings ArticleDOI
01 Jan 2000
TL;DR: This research goes beyond the existing work with a systematic analysis of trends exhibited by each of the features in genres such as cartoons, commercials, music, news, and sports, and it enables an understanding of the similarities, dissimilarities, and also likely confusion between genres.
Abstract: Presents a set of computational features originating from our study of editing effects, motion, and color used in videos, for the task of automatic video categorization. These features besides representing human understanding of typical attributes of different video genres, are also inspired by the techniques and rules used by many directors to endow specific characteristics to a genre-program which lead to certain emotional impact on viewers. We propose new features whilst also employing traditionally used ones for classification. This research, goes beyond the existing work with a systematic analysis of trends exhibited by each of our features in genres such as cartoons, commercials, music, news, and sports, and it enables an understanding of the similarities, dissimilarities, and also likely confusion between genres. Classification results from our experiments on several hours of video establish the usefulness of this feature set. We also explore the issue of video clip duration required to achieve reliable genre identification and demonstrate its impact on classification accuracy.

156 citations


"Video genre classification using dy..." refers background or methods or result in this paper

  • ...Although this third form of dynamic has been reported in [6, 7] to have discriminatory properties within th e application to video classification, it has not been conside re here although it is the subject of further work....

    [...]

  • ...An approach that includes all these categories plus music vide os is presented by Dorai et al in [6]....

    [...]

  • ...Here we investigate the performance of a purely dynamic based approach [10, 11] applied to video classification as the works of [9] and comparable to [6]....

    [...]

  • ...The results show that the dynamic feature extraction methods reported have good discriminatory properties and justify being part of an overall classificati on system possibly including static and audio features, such a s in [6, 7]....

    [...]

  • ...This performance is comparable with that of Dorai et al [6] who report approximately 90% accuracy using both static and dynamic features to classify sport, cartoon, news, commercials and music videos, obtaining the best results on 60 seconds of video on a 5 class problem....

    [...]