scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Content based video retrieval systems

TL;DR: This survey reviews the interesting features that can be extracted from video data for indexing and retrieval along with similarity measurement methods and identifies present research issues in area of content based video retrieval systems.
Abstract: With the development of multimedia data types and available bandwidth there is huge demand of video retrieval systems, as users shift from text based retrieval systems to content based retrieval systems. Selection of extracted features play an important role in content based video retrieval regardless of video attributes being under consideration. These features are intended for selecting, indexing and ranking according to their potential interest to the user. Good features selection also allows the time and space costs of the retrieval process to be reduced. This survey reviews the interesting features that can be extracted from video data for indexing and retrieval along with similarity measurement methods. We also identify present research issues in area of content based video retrieval systems.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Evaluating the performance of the KNN using a large number of distance measures, tested on a number of real-world data sets, with and without adding different levels of noise found that a recently proposed nonconvex distance performed the best when applied on most data sets comparing with the other tested distances.
Abstract: The K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the distance or similarity between the tested examples and the training examples. This raises a major question about which distance measures to be used for the KNN classifier among a large number of distance and similarity measures available? This review attempts to answer this question through evaluating the performance (measured by accuracy, precision and recall) of the KNN using a large number of distance measures, tested on a number of real-world datasets, with and without adding different levels of noise. The experimental results show that the performance of KNN classifier depends significantly on the distance used, and the results showed large gaps between the performances of different distances. We found that a recently proposed non-convex distance performed the best when applied on most datasets comparing to the other tested distances. In addition, the performance of the KNN with this top performing distance degraded only about $20\%$ while the noise level reaches $90\%$, this is true for most of the distances used as well. This means that the KNN classifier using any of the top $10$ distances tolerate noise to a certain degree. Moreover, the results show that some distances are less affected by the added noise comparing to other distances.

170 citations


Cites methods from "Content based video retrieval syste..."

  • ...mentioned distance measures. 12 7.1Vicis-Wave Hedges distance (VWHD): The so-called "Wave-Hedges distance" has been applied to compressed image retrieval [36], content based video retrieval [58], time series classication [25],image - delity [53], nger print recognition [7], etc.. Interestingly, the source of the "Wave-Hedges" metric has not been correctly cited, and some of the pr...

    [...]

Posted Content
TL;DR: A number of the most commonly-used performance fitness and error metrics for regression and classification algorithms, with emphasis on engineering applications, are examined.
Abstract: Machine learning (ML) is the field of training machines to achieve high level of cognition and perform human-like analysis. Since ML is a data-driven approach, it seemingly fits into our daily lives and operations as well as complex and interdisciplinary fields. With the rise of commercial, open-source and user-catered ML tools, a key question often arises whenever ML is applied to explore a phenomenon or a scenario: what constitutes a good ML model? Keeping in mind that a proper answer to this question depends on a variety of factors, this work presumes that a good ML model is one that optimally performs and best describes the phenomenon on hand. From this perspective, identifying proper assessment metrics to evaluate performance of ML models is not only necessary but is also warranted. As such, this paper examines a number of the most commonly-used performance fitness and error metrics for regression and classification algorithms, with emphasis on engineering applications.

56 citations

Journal ArticleDOI
TL;DR: A new framework based on dynamic mode decomposition (DMD) for shot boundary detection, which has a high detection accuracy, even the color changes are not obvious, the illumination changes slowly, or the foreground objects overlap.
Abstract: Shot detection is widely used in video semantic analysis, video scene segmentation, and video retrieval. However, this is still a challenging task, due to the weak boundary and a sudden change in brightness or foreground objects. In this paper, we propose a new framework based on dynamic mode decomposition (DMD) for shot boundary detection. Because the DMD can extract several temporal foreground modes and one temporal background mode from video data, shot boundaries can be detected when the amplitude changes sharply. Here, the amplitude is the DMD coefficient to restore the video. The main idea behind the shot boundaries detection is finding the amplitude change of background mode. We can reduce error detection when the illumination changes sharply or the foreground object (or camera) moves very quickly. At the same time, our algorithm has a high detection accuracy, even the color changes are not obvious, the illumination changes slowly, or the foreground objects overlap. Meanwhile, a color space for DMD is selected for reducing false detection. Finally, the effectiveness of our method will be demonstrated through detecting the shot boundaries of the various content types of videos.

55 citations


Cites methods from "Content based video retrieval syste..."

  • ...Other algorithms are based on video content for shot detection, including texture [27], color [28] and shape [29]....

    [...]

Journal ArticleDOI
TL;DR: A new algorithm for recognizing surgical tasks in real-time in a video stream based on adaptive spatiotemporal polynomials is introduced, particularly suited to characterize deformable moving objects with fuzzy borders, which are typically found in surgical videos.
Abstract: This paper introduces a new algorithm for recognizing surgical tasks in real-time in a video stream. The goal is to communicate information to the surgeon in due time during a video-monitored surgery. The proposed algorithm is applied to cataract surgery, which is the most common eye surgery. To compensate for eye motion and zoom level variations, cataract surgery videos are first normalized. Then, the motion content of short video subsequences is characterized with spatiotemporal polynomials: a multiscale motion characterization based on adaptive spatiotemporal polynomials is presented. The proposed solution is particularly suited to characterize deformable moving objects with fuzzy borders, which are typically found in surgical videos. Given a target surgical task, the system is trained to identify which spatiotemporal polynomials are usually extracted from videos when and only when this task is being performed. These key spatiotemporal polynomials are then searched in new videos to recognize the target surgical task. For improved performances, the system jointly adapts the spatiotemporal polynomial basis and identifies the key spatiotemporal polynomials using the multiple-instance learning paradigm. The proposed system runs in real-time and outperforms the previous solution from our group, both for surgical task recognition ( $A_z = 0.851$ on average, as opposed to $A_z = 0.794$ previously) and for the joint segmentation and recognition of surgical tasks ( $A_z = 0.856$ on average, as opposed to $A_z = 0.832$ previously).

50 citations


Cites background from "Content based video retrieval syste..."

  • ...In automatic video analysis systems, the visual content of a video is usually characterized by feature vectors that represent the shape, the texture, the color and, more importantly, the motion content of the video at different time instants [10], [11]....

    [...]

Journal ArticleDOI
01 Sep 2016
TL;DR: This article is targeted to focus on the relevant hybrid soft computing techniques which are in practice for content-based image and video retrieval, which serve to enhance the overall performance and robustness of the system with reduced human interference.
Abstract: Graphical abstractDisplay Omitted There has been an unrestrained growth of videos on the Internet due to proliferation of multimedia devices. These videos are mostly stored in unstructured repositories which pose enormous challenges for the task of both image and video retrieval. Users aim to retrieve videos of interest having content which is relevant to their need. Traditionally, low-level visual features have been used for content based video retrieval (CBVR). Consequently, a gap existed between these low-level features and the high level semantic content. The semantic differential was partially bridged by proliferation of research on interest point detectors and descriptors, which represented mid-level features of the content. The computational time and human interaction involved in the classical approaches for CBVR are quite cumbersome. In order to increase the accuracy, efficiency and effectiveness of the retrieval process, researchers resorted to soft computing paradigms. The entire retrieval task was automated to a great extent using individual soft computing components. Due to voluminous growth in the size of multimedia databases, augmented by an exponential rise in the number of users, integration of two or more soft computing techniques was desirable for enhanced efficiency and accuracy of the retrieval process. The hybrid approaches serve to enhance the overall performance and robustness of the system with reduced human interference. This article is targeted to focus on the relevant hybrid soft computing techniques which are in practice for content-based image and video retrieval.

41 citations

References
More filters
Proceedings ArticleDOI
23 Jul 2002
TL;DR: The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking.
Abstract: This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.

4,453 citations

Proceedings ArticleDOI
23 Jun 2008
TL;DR: A new method for video classification that builds upon and extends several recent ideas including local space-time features,space-time pyramids and multi-channel non-linear SVMs is presented and shown to improve state-of-the-art results on the standard KTH action dataset.
Abstract: The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. Using the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multi-channel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8% accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to learning and classifying challenging action classes in movies and show promising results.

3,833 citations

Journal ArticleDOI
TL;DR: Almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation are surveyed, and the spawning of related subfields are discussed, to discuss the adaptation of existing image retrieval techniques to build systems that can be useful in the real world.
Abstract: We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.

3,433 citations


"Content based video retrieval syste..." refers background in this paper

  • ...Although the term "search engine" is often used indiscriminately to describe crawler-based search engines, human-powered directories, and everything in between, they are not all the same....

    [...]

Journal ArticleDOI
TL;DR: This survey reviews 100+ recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques.
Abstract: Extending beyond the boundaries of science, art, and culture, content-based multimedia information retrieval provides new paradigms and methods for searching through the myriad variety of media all over the world. This survey reviews 100p recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques. Based on the current state of the art, we discuss the major challenges for the future.

1,652 citations

Journal ArticleDOI
TL;DR: The proposed framework includes some novel low-level processing algorithms, such as dominant color region detection, robust shot boundary detection, and shot classification, as well as some higher-level algorithms for goal detection, referee detection,and penalty-box detection.
Abstract: We propose a fully automatic and computationally efficient framework for analysis and summarization of soccer videos using cinematic and object-based features. The proposed framework includes some novel low-level processing algorithms, such as dominant color region detection, robust shot boundary detection, and shot classification, as well as some higher-level algorithms for goal detection, referee detection, and penalty-box detection. The system can output three types of summaries: i) all slow-motion segments in a game; ii) all goals in a game; iii) slow-motion segments classified according to object-based features. The first two types of summaries are based on cinematic features only for speedy processing, while the summaries of the last type contain higher-level semantics. The proposed framework is efficient, effective, and robust. It is efficient in the sense that there is no need to compute object-based features when cinematic features are sufficient for the detection of certain events, e.g., goals in soccer. It is effective in the sense that the framework can also employ object-based features when needed to increase accuracy (at the expense of more computation). The efficiency, effectiveness, and robustness of the proposed framework are demonstrated over a large data set, consisting of more than 13 hours of soccer video, captured in different countries and under different conditions.

943 citations