scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Survey on Visual Content-Based Video Indexing and Retrieval

01 Nov 2011-Vol. 41, Iss: 6, pp 797-819
TL;DR: Methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, and video retrieval including query interfaces are analyzed.
Abstract: Video indexing and retrieval have a wide spectrum of promising applications, motivating the interest of researchers worldwide. This paper offers a tutorial and an overview of the landscape of general strategies in visual content-based video indexing and retrieval, focusing on methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, video retrieval including query interfaces, similarity measure and relevance feedback, and video browsing. Finally, we analyze future research directions.
Citations
More filters
Journal ArticleDOI
TL;DR: The need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats is highlighted and the need to devise new tools for predictive analytics for structured big data is reinforced.
Abstract: We define what is meant by big data.We review analytics techniques for text, audio, video, and social media data.We make the case for new statistical techniques for big data.We highlight the expected future developments in big data analytics. Size is the first, and at times, the only dimension that leaps out at the mention of big data. This paper attempts to offer a broader definition of big data that captures its other unique and defining characteristics. The rapid evolution and adoption of big data by industry has leapfrogged the discourse to popular outlets, forcing the academic press to catch up. Academic journals in numerous disciplines, which will benefit from a relevant discussion of big data, have yet to cover the topic. This paper presents a consolidated description of big data by integrating definitions from practitioners and academics. The paper's primary focus is on the analytic methods used for big data. A particular distinguishing feature of this paper is its focus on analytics related to unstructured data, which constitute 95% of big data. This paper highlights the need to develop appropriate and efficient analytical methods to leverage massive volumes of heterogeneous data in unstructured text, audio, and video formats. This paper also reinforces the need to devise new tools for predictive analytics for structured big data. The statistical methods in practice were devised to infer from sample data. The heterogeneity, noise, and the massive size of structured big data calls for developing computationally efficient algorithms that may avoid big data pitfalls, such as spurious correlation.

2,962 citations

Journal ArticleDOI
TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.
Abstract: In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

2,303 citations


Cites methods from "A Survey on Visual Content-Based Vi..."

  • ...Generally, multimedia indexing and retrieval include five procedures: structural analysis, feature extraction, data mining, classification and annotation, query and retrieval [131]....

    [...]

Journal ArticleDOI
TL;DR: This paper presents a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics, and presents the prevalent Hadoop framework for addressing big data challenges.
Abstract: Recent technological advancements have led to a deluge of data from distinctive domains (e.g., health care and scientific sensors, user-generated data, Internet and financial companies, and supply chain systems) over the past two decades. The term big data was coined to capture the meaning of this emerging trend. In addition to its sheer volume, big data also exhibits other unique characteristics as compared with traditional data. For instance, big data is commonly unstructured and require more real-time analysis. This development calls for new system architectures for data acquisition, transmission, storage, and large-scale data processing mechanisms. In this paper, we present a literature survey and system tutorial for big data analytics platforms, aiming to provide an overall picture for nonexpert readers and instill a do-it-yourself spirit for advanced audiences to customize their own big-data solutions. First, we present the definition of big data and discuss big data challenges. Next, we present a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics. These four modules form a big data value chain. Following that, we present a detailed survey of numerous approaches and mechanisms from research and industry communities. In addition, we present the prevalent Hadoop framework for addressing big data challenges. Finally, we outline several evaluation benchmarks and potential research directions for big data systems.

1,002 citations


Cites background from "A Survey on Visual Content-Based Vi..."

  • ...Multimedia indexing and retrieval concerns the description, storage, and organization of multimedia information to help people find multimedia resources conveniently and quickly [218]....

    [...]

Journal ArticleDOI
TL;DR: A survey of optical flow estimation classifying the main principles elaborated during this evolution, with a particular concern given to recent developments is proposed.
Abstract: We propose a survey of optical flow estimation focusing on recent developments.We adopt a classification approach organizing methods in a comprehensive framework.The paper is conceived as a tutorial introducing and explaining the main concepts. Optical flow estimation is one of the oldest and still most active research domains in computer vision. In 35years, many methodological concepts have been introduced and have progressively improved performances, while opening the way to new challenges. In the last decade, the growing interest in evaluation benchmarks has stimulated a great amount of work. In this paper, we propose a survey of optical flow estimation classifying the main principles elaborated during this evolution, with a particular concern given to recent developments. It is conceived as a tutorial organizing in a comprehensive framework current approaches and practices. We give insights on the motivations, interests and limitations of modeling and optimization techniques, and we highlight similarities between methods to allow for a clear understanding of their behavior.

368 citations


Cites background from "A Survey on Visual Content-Based Vi..."

  • ...Optical flow can also be a determinant feature in video indexing and retrieval [123,202,230]....

    [...]

Journal ArticleDOI
TL;DR: This paper proposes PROVID, a PROgressive Vehicle re-IDentification framework based on deep neural networks, which not only utilizes the multimodality data in large-scale video surveillance, such as visual features, license plates, camera locations, and contextual information, but also considers vehicle reidentification in two progressive procedures: coarse- to-fine search in the feature domain, and near-to-distantsearch in the physical space.
Abstract: Compared with person reidentification, which has attracted concentrated attention, vehicle reidentification is an important yet frontier problem in video surveillance and has been neglected by the multimedia and vision communities. Since most existing approaches mainly consider the general vehicle appearance for reidentification while overlooking the distinct vehicle identifier, such as the license plate number, they attain suboptimal performance. In this paper, we propose PROVID, a PROgressive Vehicle re-IDentification framework based on deep neural networks. In particular, our framework not only utilizes the multimodality data in large-scale video surveillance, such as visual features, license plates, camera locations, and contextual information, but also considers vehicle reidentification in two progressive procedures: coarse-to-fine search in the feature domain, and near-to-distant search in the physical space. Furthermore, to evaluate our progressive search framework and facilitate related research, we construct the VeRi dataset, which is the most comprehensive dataset from real-world surveillance videos. It not only provides large numbers of vehicles with varied labels and sufficient cross-camera recurrences but also contains license plate numbers and contextual information. Extensive experiments on the VeRi dataset demonstrate both the accuracy and efficiency of our progressive vehicle reidentification framework.

339 citations


Additional excerpts

  • ...image retrieval [8], [9], content-based video search [10], and object instance search [11], vehicle Re-Id is to find the vehi-...

    [...]

References
More filters
Proceedings ArticleDOI
23 Jul 2002
TL;DR: The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking.
Abstract: This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.

4,453 citations

Proceedings ArticleDOI
23 Jun 2008
TL;DR: A new method for video classification that builds upon and extends several recent ideas including local space-time features,space-time pyramids and multi-channel non-linear SVMs is presented and shown to improve state-of-the-art results on the standard KTH action dataset.
Abstract: The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. Using the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multi-channel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8% accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to learning and classifying challenging action classes in movies and show promising results.

3,833 citations

Journal ArticleDOI
TL;DR: This survey reviews 100+ recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques.
Abstract: Extending beyond the boundaries of science, art, and culture, content-based multimedia information retrieval provides new paradigms and methods for searching through the myriad variety of media all over the world. This survey reviews 100p recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques. Based on the current state of the art, we discuss the major challenges for the future.

1,652 citations

Journal ArticleDOI
TL;DR: The proposed framework includes some novel low-level processing algorithms, such as dominant color region detection, robust shot boundary detection, and shot classification, as well as some higher-level algorithms for goal detection, referee detection,and penalty-box detection.
Abstract: We propose a fully automatic and computationally efficient framework for analysis and summarization of soccer videos using cinematic and object-based features. The proposed framework includes some novel low-level processing algorithms, such as dominant color region detection, robust shot boundary detection, and shot classification, as well as some higher-level algorithms for goal detection, referee detection, and penalty-box detection. The system can output three types of summaries: i) all slow-motion segments in a game; ii) all goals in a game; iii) slow-motion segments classified according to object-based features. The first two types of summaries are based on cinematic features only for speedy processing, while the summaries of the last type contain higher-level semantics. The proposed framework is efficient, effective, and robust. It is efficient in the sense that there is no need to compute object-based features when cinematic features are sufficient for the detection of certain events, e.g., goals in soccer. It is effective in the sense that the framework can also employ object-based features when needed to increase accuracy (at the expense of more computation). The efficiency, effectiveness, and robustness of the proposed framework are demonstrated over a large data set, consisting of more than 13 hours of soccer video, captured in different countries and under different conditions.

943 citations

Journal ArticleDOI
TL;DR: The purpose of this article is to provide a systematic classification of various ideas and techniques proposed towards the effective abstraction of video contents, and identify and detail, for each approach, the underlying components and how they are addressed in specific works.
Abstract: The demand for various multimedia applications is rapidly increasing due to the recent advance in the computing and network infrastructure, together with the widespread use of digital video technology. Among the key elements for the success of these applications is how to effectively and efficiently manage and store a huge amount of audio visual information, while at the same time providing user-friendly access to the stored data. This has fueled a quickly evolving research area known as video abstraction. As the name implies, video abstraction is a mechanism for generating a short summary of a video, which can either be a sequence of stationary images (keyframes) or moving images (video skims). In terms of browsing and navigation, a good video abstract will enable the user to gain maximum information about the target video sequence in a specified time constraint or sufficient information in the minimum time. Over past years, various ideas and techniques have been proposed towards the effective abstraction of video contents. The purpose of this article is to provide a systematic classification of these works. We identify and detail, for each approach, the underlying components and how they are addressed in specific works.

879 citations


"A Survey on Visual Content-Based Vi..." refers background or methods in this paper

  • ...In general, a scene is a group of contiguous shots that are coherent with a certain subject or theme....

    [...]

  • ...The static key frame features useful for video indexing and retrieval are mainly classified as color-based, texture-based, and shape-based....

    [...]

  • ...Then in each block, color histogram and color moments are extracted for video retrieval....

    [...]