scispace - formally typeset
Journal ArticleDOI

Multimodal Video Indexing: A Review of the State-of-the-art

Reads0
Chats0
TLDR
A unifying and multimodal framework is put forward, which views a video document from the perspective of its author, which forms the guiding principle for identifying index types, for which automatic methods are found in literature.
Abstract
Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video indexing have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. Therefore, instead of separately treating the different information sources involved, and their specific algorithms, we focus on the similarities and differences between the modalities. To that end we put forward a unifying and multimodal framework, which views a video document from the perspective of its author. This framework forms the guiding principle for identifying index types, for which automatic methods are found in literature. It furthermore forms the basis for categorizing these different methods.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Image retrieval: Ideas, influences, and trends of the new age

TL;DR: Almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation are surveyed, and the spawning of related subfields are discussed, to discuss the adaptation of existing image retrieval techniques to build systems that can be useful in the real world.
Journal ArticleDOI

Multimodal Machine Learning: A Survey and Taxonomy

TL;DR: This paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy to enable researchers to better understand the state of the field and identify directions for future research.
Proceedings ArticleDOI

A new approach to cross-modal multimedia retrieval

TL;DR: It is shown that accounting for cross-modal correlations and semantic abstraction both improve retrieval accuracy and are shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task.
Journal ArticleDOI

Multimodal fusion for multimedia analysis: a survey

TL;DR: This survey aims at providing multimedia researchers with a state-of-the-art overview of fusion strategies, which are used for combining multiple modalities in order to accomplish various multimedia analysis tasks.
Journal ArticleDOI

A Survey on Visual Content-Based Video Indexing and Retrieval

TL;DR: Methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, and video retrieval including query interfaces are analyzed.
References
More filters
Journal ArticleDOI

A tutorial on hidden Markov models and selected applications in speech recognition

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Book

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

TL;DR: Probabilistic Reasoning in Intelligent Systems as mentioned in this paper is a complete and accessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty, and provides a coherent explication of probability as a language for reasoning with partial belief.
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Journal ArticleDOI

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection

TL;DR: A face recognition algorithm which is insensitive to large variation in lighting direction and facial expression is developed, based on Fisher's linear discriminant and produces well separated classes in a low-dimensional subspace, even under severe variations in lighting and facial expressions.
Book

Foundations of Statistical Natural Language Processing

TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Related Papers (5)