scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Video Interaction Tools: A Survey of Recent Work

TL;DR: This article surveys literature at the intersection of Human-Computer Interaction and Multimedia, integrating literature from video browsing and navigation, direct video manipulation, video content visualization, as well as interactive video summarization and interactive video retrieval.
Abstract: Digital video enables manifold ways of multimedia content interaction. Over the last decade, many proposals for improving and enhancing video content interaction were published. More recent work particularly leverages on highly capable devices such as smartphones and tablets that embrace novel interaction paradigms, for example, touch, gesture-based or physical content interaction. In this article, we survey literature at the intersection of Human-Computer Interaction and Multimedia. We integrate literature from video browsing and navigation, direct video manipulation, video content visualization, as well as interactive video summarization and interactive video retrieval. We classify the reviewed works by the underlying interaction method and discuss the achieved improvements so far. We also depict a set of open problems that the video interaction community should address in future.
Citations
More filters
Journal ArticleDOI
TL;DR: This work presents the experience with a class of interactive video retrieval scenarios and the methodology to stimulate the evolution of new interactive video retrieved approaches, focusing on the years 2015–2017.
Abstract: The last decade has seen innovations that make video recording, manipulation, storage, and sharing easier than ever before, thus impacting many areas of life. New video retrieval scenarios emerged as well, which challenge the state-of-the-art video retrieval approaches. Despite recent advances in content analysis, video retrieval can still benefit from involving the human user in the loop. We present our experience with a class of interactive video retrieval scenarios and our methodology to stimulate the evolution of new interactive video retrieval approaches. More specifically, the video browser showdown evaluation campaign is thoroughly analyzed, focusing on the years 2015–2017. Evaluation scenarios, objectives, and metrics are presented, complemented by the results of the annual evaluations. The results reveal promising interactive video retrieval techniques adopted by the most successful tools and confirm assumptions about the different complexity of various types of interactive retrieval scenarios. A comparison of the interactive retrieval tools with automatic approaches (including fully automatic and manual query formulation) participating in the TRECVID 2016 ad hoc video search task is discussed. Finally, based on the results of data analysis, a substantial revision of the evaluation methodology for the following years of the video browser showdown is provided.

100 citations


Cites background from "Video Interaction Tools: A Survey o..."

  • ...Interactive video retrieval [18] represents a promising solution to break this cycle because it benefits from human-machine cooperation....

    [...]

  • ..., “query by text and browse the first few results”), it addresses highly interactive search systems [18], which focus on the human in the loop [19] and which are able to reduce shortcomings of automatic visual content retrieval due to many flexible search features [20]....

    [...]

Journal ArticleDOI
TL;DR: The last installment of the Video Browser Showdown 2015 is presented which was held in conjunction with the International Conference on MultiMedia Modeling 2015 and has the stated aim of pushing for a better integration of the user into the search process.
Abstract: Interactive video retrieval tools developed over the past few years are emerging as powerful alternatives to automatic retrieval approaches by giving the user more control as well as more responsibilities. Current research tries to identify the best combinations of image, audio and text features that combined with innovative UI design maximize the tools performance. We present the last installment of the Video Browser Showdown 2015 which was held in conjunction with the International Conference on MultiMedia Modeling 2015 (MMM 2015) and has the stated aim of pushing for a better integration of the user into the search process. The setup of the competition including the used dataset and the presented tasks as well as the participating tools will be introduced . The performance of those tools will be thoroughly presented and analyzed. Interesting highlights will be marked and some predictions regarding the research focus within the field for the near future will be made.

69 citations

Journal ArticleDOI
TL;DR: The results collected at the VBS evaluation server confirm that searching for one particular scene in the collection when given a limited time is still a challenging task for many of the approaches, and reveal that user-centric interfaces are still required to mediate access to specific content.
Abstract: This work summarizes the findings of the 7th iteration of the Video Browser Showdown (VBS) competition organized as a workshop at the 24th International Conference on Multimedia Modeling in Bangkok. The competition focuses on video retrieval scenarios in which the searched scenes were either previously observed or described by another person (i.e., an example shot is not available). During the event, nine teams competed with their video retrieval tools in providing access to a shared video collection with 600 hours of video content. Evaluation objectives, rules, scoring, tasks, and all participating tools are described in the article. In addition, we provide some insights into how the different teams interacted with their video browsers, which was made possible by a novel interaction logging mechanism introduced for this iteration of the VBS. The results collected at the VBS evaluation server confirm that searching for one particular scene in the collection when given a limited time is still a challenging task for many of the approaches that were showcased during the event. Given only a short textual description, finding the correct scene is even harder. In ad hoc search with multiple relevant scenes, the tools were mostly able to find at least one scene, whereas recall was the issue for many teams. The logs also reveal that even though recent exciting advances in machine learning narrow the classical semantic gap problem, user-centric interfaces are still required to mediate access to specific content. Finally, open challenges and lessons learned are presented for future VBS events.

56 citations


Cites background from "Video Interaction Tools: A Survey o..."

  • ...The general idea of this system is to provide many different content search features that support several different search scenarios (query-by-browsing, text, filtering, example) [36, 37]....

    [...]

  • ..., automatic detection of relevant semantics), indexing relevant content segments for efficient retrieval [42, 47], and providing a powerful user interface [37] that enables different user groups—both experts and non-experts—to perform content search in a simple, efficient, and effective way [46]....

    [...]

Proceedings ArticleDOI
02 May 2017
TL;DR: This work demonstrates an application that enables a user to directly edit spherical video while fully immersed in a VR headset and is built upon a familiar timeline design, but is enhanced with custom widgets to enable intuitive editing of spherical video inside the headset.
Abstract: Creative professionals are creating Virtual Reality (VR) experiences today by capturing spherical videos, but video editing is still done primarily in traditional 2D desktop GUI applications such as Premiere. These interfaces provide limited capabilities for previewing content in a VR headset or for directly manipulating the spherical video in an intuitive way. As a result, editors must alternate between editing on the desktop and previewing in the headset, which is tedious and interrupts the creative process. We demonstrate an application that enables a user to directly edit spherical video while fully immersed in a VR headset. We first interviewed professional VR filmmakers to understand current practice and derived a suitable workflow for in-headset VR video editing. We then developed a prototype system implementing this new workflow. Our system is built upon a familiar timeline design, but is enhanced with custom widgets to enable intuitive editing of spherical video inside the headset. We conducted an expert review study and found that with our prototype, experts were able to edit videos entirely within the headset. Experts also found our interface and widgets useful, providing intuitive controls for their editing needs.

51 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations

Proceedings ArticleDOI
21 Jun 1994
TL;DR: A feature selection criterion that is optimal by construction because it is based on how the tracker works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not correspond to points in the world are proposed.
Abstract: No feature-based vision system can work unless good features can be identified and tracked from frame to frame. Although tracking itself is by and large a solved problem, selecting features that can be tracked well and correspond to physical points in the world is still hard. We propose a feature selection criterion that is optimal by construction because it is based on how the tracker works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not correspond to points in the world. These methods are based on a new tracking algorithm that extends previous Newton-Raphson style search methods to work under affine image transformations. We test performance with several simulations and experiments. >

8,432 citations

Proceedings ArticleDOI
13 Jun 2010
TL;DR: It is discovered that “classical” flow formulations perform surprisingly well when combined with modern optimization and implementation techniques, and while median filtering of intermediate flow fields during optimization is a key to recent performance gains, it leads to higher energy solutions.
Abstract: The accuracy of optical flow estimation algorithms has been improving steadily as evidenced by results on the Middlebury optical flow benchmark. The typical formulation, however, has changed little since the work of Horn and Schunck. We attempt to uncover what has made recent advances possible through a thorough analysis of how the objective function, the optimization method, and modern implementation practices influence accuracy. We discover that “classical” flow formulations perform surprisingly well when combined with modern optimization and implementation techniques. Moreover, we find that while median filtering of intermediate flow fields during optimization is a key to recent performance gains, it leads to higher energy solutions. To understand the principles behind this phenomenon, we derive a new objective that formalizes the median filtering heuristic. This objective includes a nonlocal term that robustly integrates flow estimates over large spatial neighborhoods. By modifying this new term to include information about flow and image boundaries we develop a method that ranks at the top of the Middlebury benchmark.

1,529 citations


"Video Interaction Tools: A Survey o..." refers methods in this paper

  • ...Their system uses optical flow estimation [Sun et al. 2010] and feature tracking [Shi and Tomasi 1994] in order to compute motion trajectories of moving objects in the video....

    [...]

Proceedings ArticleDOI
26 Oct 2006
TL;DR: An introduction to information retrieval (IR) evaluation from both a user and a system perspective is given, high-lighting that system evaluation is by far the most prevalent type of evaluation carried out.
Abstract: The TREC Video Retrieval Evaluation (TRECVid)is an international benchmarking activity to encourage research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations 1 interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video corpus,automatic detection of a variety of semantic and low-level video features, shot boundary detection and the detection of story boundaries in broadcast TV news. This paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, high-lighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation bench-marking campaign and this allows us to discuss whether such campaigns are a good thing or a bad thing. There are arguments for and against these campaigns and we present some of them in the paper concluding that on balance they have had a very positive impact on research progress.

1,395 citations


"Video Interaction Tools: A Survey o..." refers methods in this paper

  • ...However, in difference to the KIS task at TRECVID participating teams are required to perform the search on site (VBS is an event in conjunction with the International Conference on MultiMedia Modeling), and also for queries with visual cues only....

    [...]

  • ...Many recent works were evaluated in the interactive search task of TRECVID [Over et al. 2013; Smeaton et al. 2006] and/or in the Video Browser Showdown (VBS) competition [Schoeffmann and Bailer 2012; Schoeffmann 2014], where Known-Item-Search (KIS) tasks have to be performed in a competitive situation....

    [...]

  • ...If users are sure that the video is the right one they can add its ID to the oracle cue, which holds all IDs that are sent to the server for verification of the search task (which was provided by the organizers of the TRECVID task [Smeaton et al. 2006])....

    [...]

  • ...In the KIS task of TRECVID, which ran from 2010 to 2012, the participating systems were challenged with the goal of finding known scenes, which are described by textual queries and clues....

    [...]

  • ...The KIS task at TRECVID, however, was discontinued in 2012 after three iterations due to insignificant progress....

    [...]

Proceedings ArticleDOI
26 Sep 2010
TL;DR: The video recommendation system in use at YouTube, the world's most popular online video community, is discussed, with details on the experimentation and evaluation framework used to test and tune new algorithms.
Abstract: We discuss the video recommendation system in use at YouTube, the world's most popular online video community. The system recommends personalized sets of videos to users based on their activity on the site. We discuss some of the unique challenges that the system faces and how we address them. In addition, we provide details on the experimentation and evaluation framework used to test and tune new algorithms. We also present some of the findings from these experiments.

1,069 citations


"Video Interaction Tools: A Survey o..." refers background in this paper

  • ...In addition to implicit analysis of the watching [Chen et al. 2012; Davidson et al. 2010] and sharing behavior [Ma et al....

    [...]

  • ...In addition to implicit analysis of the watching [Chen et al. 2012; Davidson et al. 2010] and sharing behavior [Ma et al. 2014], many systems provide the possibility to explicitly rate content or to manage lists of favored and unfavored videos [Cui et al. 2014; Mei et al. 2011]....

    [...]