scispace - formally typeset
Search or ask a question
Author

Huanbo Luan

Bio: Huanbo Luan is an academic researcher from Chinese Academy of Sciences. The author has contributed to research in topics: Visual Word & TRECVID. The author has an hindex of 5, co-authored 7 publications receiving 81 citations.

Papers
More filters
Journal IssueDOI
01 Aug 2008
TL;DR: This article shows three exemplar systems which demonstrate the state of the art in interactive, content-based retrieval of video shots, and these three are just three of the more than 20 systems developed for the 2007 iteration of the annual TRECVid benchmarking activity.
Abstract: The growth in available online video material over the Internet is generally combined with user-assigned tags or content description, which is the mechanism by which we then access such video. However, user-assigned tags have limitations for retrieval and often we want access where the content of the video itself is directly matched against a user's query rather than against some manually assigned surrogate tag. Content-based video retrieval techniques are not yet scalable enough to allow interactive searching on Internet-scale, but the techniques are proving robust and effective for smaller collections. In this article, we show three exemplar systems which demonstrate the state of the art in interactive, content-based retrieval of video shots, and these three are just three of the more than 20 systems developed for the 2007 iteration of the annual TRECVid benchmarking activity. The contribution of our article is to show that retrieving from video using content-based methods is now viable, that it works, and that there are many systems which now do this, such as the three outlined herein. These systems, and others can provide effective search on hundreds of hours of video content and are samples of the kind of content-based search functionality we can expect to see on larger video archives when issues of scale are addressed. © 2008 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 18, 195–201, 2008

23 citations

Proceedings ArticleDOI
29 Sep 2007
TL;DR: This paper segregate the process of relevance feedback into 2 distinct facets: (a) recall-directed feedback; and (b) precision-directed Feedback, which employs general features such as text and high level features to maximize efficiency and recall during feedback, making it very suitable for large corporations.
Abstract: Existing video research incorporates the use of relevance feedback based on user-dependent interpretations to improve the retrieval results. In this paper, we segregate the process of relevance feedback into 2 distinct facets: (a) recall-directed feedback; and (b) precision-directed feedback. The recall-directed facet employs general features such as text and high level features (HLFs) to maximize efficiency and recall during feedback, making it very suitable for large corpuses. The precision-directed facet on the other hand uses many other multimodal features in an active learning environment for improved accuracy. Combined with a performance-based adaptive sampling strategy, this process continuously re-ranks a subset of instances as the user annotates. Experiments done using TRECVID 2006 dataset show that our approach is efficient and effective.

22 citations

Proceedings ArticleDOI
07 Jul 2008
TL;DR: In this article, the authors propose adaptive multiple feedback strategies for interactive video retrieval, which enable expert searchers to flexibly decide on the types of feedback they want to employ under different situations.
Abstract: In this paper, we propose adaptive multiple feedback strategies for interactive video retrieval. We first segregate interactive feedback into 3 distinct types (recall-driven relevance feedback, precision-driven active learning and locality-driven relevance feedback) so that a generic interaction mechanism with more flexibility can be performed to cover different search queries and different video corpuses. Our system facilitates expert searchers to flexibly decide on the types of feedback they want to employ under different situations. To cater to the large number of novice users (non-expert users), an adaptive option is built-in to learn the expert user behavior so as to provide recommendations on the next feedback strategy, leading to a more precise and personalized search for the novice users. Experimental results on TRECVID news video corpus demonstrate that our proposed adaptive multiple feedback strategies are effective.

12 citations

Proceedings ArticleDOI
07 Jul 2008
TL;DR: The system VisionGo is described which provides an interactive platform for video retrieval that is fitted with an intuitive interface and an automated backend recommender that recommends users the optimal feedback technique during retrieval.
Abstract: This paper describes our system VisionGo which provides an interactive platform for video retrieval. The system is fitted with an intuitive interface and an automated backend recommender that recommends users the optimal feedback technique during retrieval.

10 citations

Patent
02 Aug 2006
TL;DR: In this paper, an evaluation method and a system for the performance of a phone continuous phone identification system, in which, said system includes a recording module, a tested phone continuous ID system, a grammar library, grammar spreading module, phone information selection module, slot resolution module and an automatic evaluating module.
Abstract: This invention discloses an evaluation method and a system for the performance of a phone continuous phone identification system, in which, said system includes a recording module, a tested phone continuous phone identification system, a grammar library, a grammar spreading module, a phone information selection module, a slot resolution module and an automatic evaluating module, which designs several slots according to the grammar definitions of several fields looked up by phones, spreads in terms of the grammar to select sentences as the evaluating ones from the generated sentences and records the test speech to be input into the being evaluated phone continuous identification system, then the phone identified result of sentences is analyzed as the slot contained to be output, compares the identified result with the standard solution to compute the correctness of the identification to slots to get the judged target of the system performance.

8 citations


Cited by
More filters
Journal ArticleDOI
01 Nov 2011
TL;DR: Methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, and video retrieval including query interfaces are analyzed.
Abstract: Video indexing and retrieval have a wide spectrum of promising applications, motivating the interest of researchers worldwide. This paper offers a tutorial and an overview of the landscape of general strategies in visual content-based video indexing and retrieval, focusing on methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, video retrieval including query interfaces, similarity measure and relevance feedback, and video browsing. Finally, we analyze future research directions.

606 citations

Book
26 May 2009
TL;DR: This paper presents a component-wise decomposition of such an interdisciplinary multimedia system, covering influences from information retrieval, computer vision, machine learning, and human–computer interaction and lays down the anatomy of a concept-based video search engine.
Abstract: In this paper, we review 300 references on video retrieval, indicating when text-only solutions are unsatisfactory and showing the promising alternatives which are in majority concept-based. Therefore, central to our discussion is the notion of a semantic concept: an objective linguistic description of an observable entity. Specifically, we present our view on how its automated detection, selection under uncertainty, and interactive usage might solve the major scientific problem for video retrieval: the semantic gap. To bridge the gap, we lay down the anatomy of a concept-based video search engine. We present a component-wise decomposition of such an interdisciplinary multimedia system, covering influences from information retrieval, computer vision, machine learning, and human–computer interaction. For each of the components we review state-of-the-art solutions in the literature, each having different characteristics and merits. Because of these differences, we cannot understand the progress in video retrieval without serious evaluation efforts such as carried out in the NIST TRECVID benchmark. We discuss its data, tasks, results, and the many derived community initiatives in creating annotations and baselines for repeatable experiments. We conclude with our perspective on future challenges and opportunities.

416 citations

Journal ArticleDOI
TL;DR: This work presents the experience with a class of interactive video retrieval scenarios and the methodology to stimulate the evolution of new interactive video retrieved approaches, focusing on the years 2015–2017.
Abstract: The last decade has seen innovations that make video recording, manipulation, storage, and sharing easier than ever before, thus impacting many areas of life. New video retrieval scenarios emerged as well, which challenge the state-of-the-art video retrieval approaches. Despite recent advances in content analysis, video retrieval can still benefit from involving the human user in the loop. We present our experience with a class of interactive video retrieval scenarios and our methodology to stimulate the evolution of new interactive video retrieval approaches. More specifically, the video browser showdown evaluation campaign is thoroughly analyzed, focusing on the years 2015–2017. Evaluation scenarios, objectives, and metrics are presented, complemented by the results of the annual evaluations. The results reveal promising interactive video retrieval techniques adopted by the most successful tools and confirm assumptions about the different complexity of various types of interactive retrieval scenarios. A comparison of the interactive retrieval tools with automatic approaches (including fully automatic and manual query formulation) participating in the TRECVID 2016 ad hoc video search task is discussed. Finally, based on the results of data analysis, a substantial revision of the evaluation methodology for the following years of the video browser showdown is provided.

100 citations

Journal ArticleDOI
TL;DR: A comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications, which classify them into three groups: applications that use video-player-like interaction, video retrieval applications, and browsing solutions based on video surrogates.
Abstract: We present a comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications. There has been a significant increase in activity (e.g., storage, retrieval, and sharing) employing video data in the past decade, both for personal and professional use. The ever-growing amount of video content available for human consumption and the inherent characteristics of video data-which, if presented in its raw format, is rather unwieldy and costly-have become driving forces for the development of more effective solutions to present video contents and allow rich user interaction. As a result, there are many contemporary research efforts toward developing better video browsing solutions, which we summarize. We review more than 40 different video browsing and retrieval interfaces and classify them into three groups: applications that use video-player-like interaction, video retrieval applications, and browsing solutions based on video surrogates. For each category, we present a summary of existing work, highlight the technical aspects of each solution, and compare them against each other.

84 citations

01 Jan 2008
TL;DR: Zhang et al. as mentioned in this paper proposed a novel method based on Latent Dirichlet Allocation (LDA): LDA-based multiple-SVM (LDASVM) to improve the training efficiency and explore the knowledge between concepts or hidden sub-domains more easily and efficiently.
Abstract: For TRECVID 2008 concept detection task, we principally focus on: (1) Early fusion of texture, edge and color features TECM, abbreviation of the combined TF*IDF weights based on SIFT features, Edge Histogram, and Color Moments. (2) To improve the training efficiency and explore the knowledge between concepts or hidden sub-domains more easily and efficiently, we propose a novel method based on Latent Dirichlet Allocation (LDA): LDA-based multiple-SVM (LDASVM). We first use LDA to cluster all the keyframes into topics according to the maximum element of the topic-simplex representation vector (TRV) of each keyframe. Then, we train the annotated data in each topic for each concept. During training, unlike multi-bag SVM, we only use positive samples in current topic for the sake of retaining sample’s separability, instead of all positive samples among the whole training set, and ignore the topics with too few positive samples. While testing a keyframe for a given concept, we adopt TRV as the weight vector, instead of equal weighting strategy, to combine the SVM outputs of topic-models. (3) Introduction of Pseudo Relevance Feedback (PRF) into our concept detection system for the purpose of making re-trained models more adaptive to the test data: unlike existing PRF techniques in text and video retrieval, we propose a preliminary strategy to explore the visual features of positive training samples to improve the quality of pseudo positive samples. Experimental results demonstrate that our proposed LDASVM approach is both effective and efficient.

63 citations