scispace - formally typeset
Search or ask a question

Showing papers on "Video browsing published in 2011"


Journal ArticleDOI
01 Nov 2011
TL;DR: Methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, and video retrieval including query interfaces are analyzed.
Abstract: Video indexing and retrieval have a wide spectrum of promising applications, motivating the interest of researchers worldwide. This paper offers a tutorial and an overview of the landscape of general strategies in visual content-based video indexing and retrieval, focusing on methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, video retrieval including query interfaces, similarity measure and relevance feedback, and video browsing. Finally, we analyze future research directions.

606 citations


Journal ArticleDOI
TL;DR: By developing a quadrangle candidate generation algorithm and refining the model fitting score, this paper ameliorates the court-based camera calibration technique to be applicable to broadcast basketball videos and shows the robustness of the proposed calibration and tracking algorithms.
Abstract: With the growth of fandom population, a considerable amount of broadcast sports videos have been recorded, and a lot of research has focused on automatically detecting semantic events in the recorded video to develop an efficient video browsing tool for a general viewer. However, a professional sportsman or coach wonders about high level semantics in a different perspective, such as the offensive or defensive strategy performed by the players. Analyzing tactics is much more challenging in a broadcast basketball video than in other kinds of sports videos due to its complicated scenes and varied camera movements. In this paper, by developing a quadrangle candidate generation algorithm and refining the model fitting score, we ameliorate the court-based camera calibration technique to be applicable to broadcast basketball videos. Player trajectories are extracted from the video by a CamShift-based tracking method and mapped to the real-world court coordinates according to the calibrated results. The player position/trajectory information in the court coordinates can be further analyzed for professional-oriented applications such as detecting wide open event, retrieving target video clips based on trajectories, and inferring implicit/explicit tactics. Experimental results show the robustness of the proposed calibration and tracking algorithms, and three practicable applications are introduced to address the applicability of our system.

67 citations


Journal ArticleDOI
TL;DR: This paper presents a method to adapt the playback velocity of the video to the temporal information density, so that the users can explore the video under controlled cognitive load and show its advantages over motion-based measures.
Abstract: Automated video analysis lacks reliability when searching for unknown events in video data. The practical approach is to watch all the recorded video data, if applicable in fast-forward mode. In this paper we present a method to adapt the playback velocity of the video to the temporal information density, so that the users can explore the video under controlled cognitive load. The proposed approach can cope with static changes and is robust to video noise. First, we formulate temporal information as symmetrized Renyi divergence, deriving this measure from signal coding theory. Further, we discuss the animated visualization of accelerated video sequences and propose a physiologically motivated blending approach to cope with arbitrary playback velocities. Finally, we compare the proposed method with the current approaches in this field by experiments and a qualitative user study, and show its advantages over motion-based measures.

46 citations


Patent
17 Nov 2011
TL;DR: In this paper, a method and system for contextual browsing of videos that are relevant to a current video is presented in a user interface, where labels represent sets of videos organized according to their relevancy to the current video.
Abstract: A method and system for contextual browsing of videos that are relevant to a current video. Browsing controls that include labels and previews are presented in a user interface. The labels represent sets of videos that are organized according to their relevancy to a current video. The previews represent videos from a set of videos that is currently in focus. If the user switches focus from one set of videos to another set of videos, the previews are updated to correspond to the videos in the second set of videos. The user can also browse through the previews in order to select another video for playback.

40 citations


Book ChapterDOI
05 Jan 2011
TL;DR: This paper investigates how thumbnail number, size, and motion influence the performance of humans in common recognition tasks and shows that users are quite able to handle and assess multiple small thumbnails at the same time, especially when they show moving images.
Abstract: Various interfaces for video browsing and retrieval have been proposed that provide improved usability, better retrieval performance, and richer user experience compared to simple result lists that are just sorted by relevance. These browsing interfaces take advantage of the rather large screen estate on desktop and laptop PCs to visualize advanced configurations of thumbnails summarizing the video content. Naturally, the usefulness of such screenintensive visual browsers can be called into question when applied on small mobile handheld devices, such as smart phones. In this paper, we address the usefulness of thumbnail images for mobile video retrieval interfaces. In particular, we investigate how thumbnail number, size, and motion influence the performance of humans in common recognition tasks. Contrary to widespread believe that screens of handheld devices are unsuited for visualizing multiple (small) thumbnails simultaneously, our study shows that users are quite able to handle and assess multiple small thumbnails at the same time, especially when they show moving images. Our results give suggestions for appropriate video retrieval interface designs on handheld devices.

26 citations


Proceedings ArticleDOI
11 Jul 2011
TL;DR: A novel boundary evaluation criterion is proposed, including the multiple normalized min-max cut scores, which consider not only neighboring but non-neighboring scene similarities with a memory-fading model, and the maximal cross-boundary strict shot similarity, which considers both color and structure similarities.
Abstract: Video scene segmentation is a fundamental step for video summarization and browsing, which is a very promising application of multimedia analysis. There are two key elements, namely, boundary evaluation and boundary searching, in a scene segmentation algorithm. In this paper, we propose a novel boundary evaluation criterion, including the multiple normalized min-max cut scores, which consider not only neighboring but non-neighboring scene similarities with a memory-fading model, and the maximal cross-boundary strict shot similarity, which considers both color and structure similarities. Dynamic programming with a heuristic search scheme is adopted to quickly find the global optimal scene boundary sequence. Moreover, a Monte Carlo method is adopted to improve the stability of the searching process. Experimental results on a dataset of 40 diversified videos have proven the algorithm efficient, robust, and superior to the existent methods.

26 citations


Proceedings ArticleDOI
18 Apr 2011
TL;DR: A novel interface for browsing through a video keyframe hierarchy to find frames or clips is created and the interface is shown to be more efficient than scrolling linearly through all keyframes.
Abstract: Although people are capturing more video with their mobile phones, digital cameras, and other devices, they rarely watch all that video. More commonly, users extract a still image from the video to print or a short clip to share with others. We created a novel interface for browsing through a video keyframe hierarchy to find frames or clips. The interface is shown to be more efficient than scrolling linearly through all keyframes. We developed algorithms for selecting quality keyframes and for clustering keyframes hierarchically. At each level of the hierarchy, a single representative keyframe from each cluster is shown. Users can drill down into the most promising cluster and view representative keyframes for the sub-clusters. Our clustering algorithms optimize for short navigation paths to the desired keyframe. A single keyframe is located using a non-temporal clustering algorithm. A video clip is located using one of two temporal clustering algorithms. We evaluated the clustering algorithms using a simulated search task. User feedback provided us with valuable suggestions for improvements to our system.

23 citations


Proceedings ArticleDOI
28 Nov 2011
TL;DR: This paper investigates the synchronization of multiple media content in the physical form of hyperlinking them to develop browsing systems that author search results with rich media information mined from various knowledge sources.
Abstract: With the rapid growth of social media, there are plenty of information sources freely available online for use. Nevertheless, how to synchronize and leverage these diverse forms of information for multimedia applications remains a problem yet to be seriously studied. This paper investigates the synchronization of multiple media content in the physical form of hyperlinking them. The ultimate goal is to develop browsing systems that author search results with rich media information mined from various knowledge sources. The authoring enables the vivid visualization and exploration of different information landscapes inherent in search results. Several key techniques are studied in this paper for developing these browsing features. These techniques include content mining and selection from web videos, space-time alignment of multiple media, and augmenting of search result with when and what information. We conduct both quantitative and user studies on a large video dataset for performance evaluation. Comparison with traditional techniques including storyboard summarization and video skimming are also presented.

16 citations


Patent
04 May 2011
TL;DR: In this paper, a video monitoring platform is used for uniform transcoding processing on each client unit, so that the normality of transcoding Processing is realized, where the video browsing request carries a transcoding identification parameter, and according to identification information of front-end acquisition equipment, a code stream before transcoding acquired by the front end acquisition equipment is searched for.
Abstract: The embodiment of the invention provides a video monitoring method, a video monitoring system and video monitoring equipment. The method comprises the following steps that: a video monitoring platform receives a video browsing request from a client unit, wherein the video browsing request carries a transcoding identification parameter; according to identification information of front-end acquisition equipment carried in the video browsing request, a code stream before transcoding acquired by the front-end acquisition equipment is searched for; the video monitoring platform transcodes the code stream before transcoding to form the code stream after transcoding according to the transcoding identification parameter; and the video monitoring platform transmits the code stream after transcoding to the client unit, so that the client unit can receive and play the code stream after transcoding corresponding to the transcoding identification parameter. In the embodiment of the invention, the video monitoring platform is used for performing uniform transcoding processing on each client unit, so that the normality of transcoding processing is realized.

16 citations


Patent
11 May 2011
TL;DR: In this article, the authors propose a video preprocessing and playing method and system that can be used for smoothly switching the playing control between the video segments and an original video file, thereby reducing the number of video frames required for watching, allowing the users to concentrate on important frames and improving the video browsing efficiency.
Abstract: The invention relates to the technical field of multimedia, in particular to a video preprocessing and playing method and system. The method comprises the following steps of: (1) extracting video segments including contents that users are interested in through video segmentation by using a public intelligent video technology; (2) storing information of the video segments into a database or an accompanying document; and (3) organizing the video segments in chronological order into a logical video file. The video preprocessing and playing method and system achieve the function of conventional playing control of the logical video file, and can be used for smoothly switching the playing control between the video segments and an original video file, thereby reducing the number of video frames required for watching, allowing the users to concentrate on important frames, and improving the video browsing efficiency.

14 citations


Proceedings ArticleDOI
28 Nov 2011
TL;DR: A video browsing tool that combines advantages of the hierarchical browsing concept with 3D projection and multi-threaded programming in order to provide a convenient and efficient interface.
Abstract: We present a video browsing tool that combines advantages of the hierarchical browsing concept with 3D projection and multi-threaded programming in order to provide a convenient and efficient interface. The tool allows for instantaneous hierarchical browsing of video and uses a dynamic approach (i.e. tree of playable video segments instead of static key frames)that also supports parallel playback.

Proceedings ArticleDOI
13 Jun 2011
TL;DR: The proposed mechanisms aims to provide efficient and reusable techniques for browsing and retrieval, trying to minimize the computational and storage cost of the approach while offering novel functionalities such as personalized/real-time video summarization.
Abstract: In this paper we describe the video browsing and retrieval techniques included within the ASSETS project system, focused on providing enhanced access to video repositories. The proposed mechanisms aims to provide efficient and reusable techniques for browsing and retrieval, trying to minimize the computational and storage cost of the approach while offering novel functionalities such as personalized/real-time video summarization. The system is under design and development within the ASSETS project that deals with advanced tools for accessing to cultural content.

Journal ArticleDOI
TL;DR: A novel approach for the interactive search that displays the result set in a flexible manner is presented, based on a simple and fast algorithm to build video stories and on an effective visual structure to arrange the storyboards, called Clustering Set.
Abstract: The fast evolution of technology has led to a growing demand for video data, increasing the amount of research into efficient systems to manage those materials. Making efficient use of video information requires that data be accessed in a user-friendly way. Ideally, one would like to perform video search using an intuitive tool. Most of existing browsers for the interactive search of video sequences, however, have employed a too rigid layout to arrange the results, restricting users to explore the results using list-or grid-based layouts. This paper presents a novel approach for the interactive search that displays the result set in a flexible manner. The proposed method is based on a simple and fast algorithm to build video stories and on an effective visual structure to arrange the storyboards, called Clustering Set. It is able to group together videos with similar content and to organize the result set in a well-defined tree. Results from a rigorous empirical comparison with a subjective evaluation show that such a strategy makes the navigation more coherent and engaging to users.

Proceedings ArticleDOI
18 Apr 2011
TL;DR: An interactive 3D storyboard that take advantage of 3D graphics in order to overcome certain limitations of conventional 2D storyboards when used for the task of image and video browsing is demonstrated.
Abstract: We demonstrate an interactive 3D storyboard that take advantage of 3D graphics in order to overcome certain limitations of conventional 2D storyboards when used for the task of image and video browsing.

Journal ArticleDOI
TL;DR: Experimental results on the TRECVID 2007 video dataset have demonstrated the effectiveness of the proposed key frame extraction method in terms of the percentage of the extracted key frames and the retrieval precision.
Abstract: In consequence of the popularity of family video recorders and the surge of Web 2.0, increasing amounts of videos have made the management and integration of the information in videos an urgent and important issue in video retrieval. Key frames, as a high-quality summary of videos, play an important role in the areas of video browsing, searching, categorisation, and indexing. An effective set of key frames should include major objects and events of the video sequence, and should contain minimum content redundancies. In this paper, an innovative key frame extraction method is proposed to select representative key frames for a video. By analysing the differences between frames and utilising the clustering technique, a set of key frame candidates (KFCs) is first selected at the shot level, and then the information within a video shot and between video shots is used to filter the candidate set to generate the final set of key frames. Experimental results on the TRECVID 2007 video dataset have demonstrated the effectiveness of our proposed key frame extraction method in terms of the percentage of the extracted key frames and the retrieval precision.

Proceedings ArticleDOI
11 Jul 2011
TL;DR: This work takes a new aspect view to profile a video volume to a visual track and create a digest of video for preview and contains spatial and temporal information inclusively in a 2D image scroll that is continuous, scalable, and indexing to frames.
Abstract: Video indexing is important to video browsing, editing, retrieval, and summarization. This work takes a new aspect view to profile a video volume to a visual track and create a digest of video for preview. Our projected profile of video contains spatial and temporal information inclusively in a 2D image scroll that is continuous, scalable, and indexing to frames. We analyze the camera kinematics including zoom, translation and rotation, and categorize camera works for profiling various types of video. The key idea is to use a sampling line to sweep the video volume across the major optical flow so as to obtain an intrinsic scene space that is less influenced by the camera motion. We also use motion blur technique to render dynamic targets in the profile. The resulting video track can provide a video preview for guiding the access to the frames. It will facilitate video surveillance, visual archiving of environment, video retrieval, and video editing.

Proceedings ArticleDOI
28 Nov 2011
TL;DR: This work takes a new aspect view to profile a video volume to a video track as a digest for video preview and investigates the global flow field under all camera actions, and proposes a uniformed scheme that uses a sampling line to cut the video volume across the major optical flow field.
Abstract: This work takes a new aspect view to profile a video volume to a video track as a digest for video preview. Our projected video profile contains both spatial and temporal information inclusively in a 2D image scroll that is continuous, compact, scalable, and indexing to each frame. To profile various types of video clips, we investigate the global flow field under all camera actions, and propose a uniformed scheme that uses a sampling line to cut the video volume across the major optical flow field. The resulting profile obtains an intrinsic scene space less influenced by the camera actions, and can be displayed in a video track to guide the access to video frames, and facilitate video browsing, editing, and retrieval.

Journal ArticleDOI
TL;DR: An integrated music video browsing system for personalized digital television that has the functions of automatic music emotion classification, automatic theme-based music classification, salient region detection, and shot classification that can be adopted in any digital television for providing personalized services.
Abstract: In this paper, we propose an integrated music video browsing system for personalized digital television. The system has the functions of automatic music emotion classification, automatic theme-based music classification, salient region detection, and shot classification. From audio (music) tracks, highlight detection and emotion classification are performed on the basis of information on temporal energy, timbre and tempo. For video tracks, shot detection is fulfilled to classify shots into face shots and color-based shots. Lastly automatic grouping of themes is executed on music titles and their lyrics. With a database of international music videos, we evaluate the performance of each function implemented in this paper. The experimental results show that the music browsing system achieves remarkable performances. Thus, our system can be adopted in any digital television for providing personalized services.

Proceedings ArticleDOI
28 Nov 2011
TL;DR: The results show that user requirements for video quality are related to personal preference, technology background and video viewing experience, and the preferred quality-delivery mode and interactive mode are diverse.
Abstract: The increase of powerful mobile devices has accelerated the demand for mobile videos. Previous studies in mobile video have focused on understanding of mobile video usage, improvement of video quality, and user interface design in video browsing. However, research focusing on a deep understanding of users' needs for a pleasing quality delivery of mobile video is lacking. In particular, what quality-delivery mode users prefer and what information relevant to video quality they need requires attention. This paper presents a qualitative interview study with 38 participants to gain an insight into three aspects: influencing factors of user-desired video quality, user-preferred quality-delivery modes, and user-required interaction information of mobile video. The results show that user requirements for video quality are related to personal preference, technology background and video viewing experience, and the preferred quality-delivery mode and interactive mode are diverse. These complex user requirements call for flexible and personalised quality delivery and interaction of mobile video.

Proceedings ArticleDOI
28 Nov 2011
TL;DR: This paper reports a system developed for video browsing based on multimodal analysis that performs audio transcription for shot categorization (sports, weather, politics and economy) combining audio and visual information for theme categorization.
Abstract: This paper reports a system developed for video browsing based on multimodal analysis. Our multimodal approach performs audio transcription for shot categorization (sports, weather, politics and economy) combining audio and visual information for theme categorization. Its main features include static and dynamic summaries, segmentation using face detection, classification into Indoor/Outdoor scenes based on Support Vector Machine (SVM) and audio transcription for theme keyword search. Keywords are selected to represent the subjects, followed by a simple text search. We conduct a set of experiments for evaluating the effectiveness of the shot subject categorization using audio transcription information.

Proceedings ArticleDOI
14 Jun 2011
TL;DR: This study brought to light a simple yet efficient fingerprinting technique allowing short video sequences to be tracked, established that the average probability errors for both missed detection and false alarm are lower than 0.0007.
Abstract: Uniquely identifying visual content remains a challenging issue for a large variety of nowadays applications, as video browsing, database search and multimedia security, for instance. In this respect, our study brought to light a simple yet efficient fingerprinting technique allowing short video sequences to be tracked. Three corpora, all of them containing 3780 video excerpts, with different excerpts lengths (20 seconds, 40 seconds and 60 seconds) were considered in the experiments. The quantitative results established that the average probability errors for both missed detection and false alarm are lower than 0.0007. These good practical results derive from the very fine mathematical properties of stationarity governing the DWT coefficients representing the fingerprint.

27 Sep 2011
TL;DR: This work presents a new interface for browsing or searching news broadcasts (video/audio) that exploits new language processing tools to provide immediate access to topical passages within news broadcasts, and performs cross lingual search of news broadcasts.
Abstract: One important class of online videos is that of news broadcasts. Most news organisations provide near-immediate access to topical news broadcasts over the Internet, through RSS streams or podcasts. Until lately, technology has not made it possible for a user to automatically go to the smaller parts, within a longer broadcast, that might interest them. Recent advances in both speech recognition systems and natural language processing have led to a number of robust tools that allow us to provide users with quicker, more focussed access to relevant segments of one or more news broadcast videos. Here we present our new interface for browsing or searching news broadcasts (video/audio) that exploits these new language processing tools to (i) provide immediate access to topical passages within news broadcasts, (ii) browse news broadcasts by events as well as by people, places and organisations, (iii) perform cross lingual search of news broadcasts, (iv) search for news through a map interface, (v) browse news by trending topics, and (vi) see automatically-generated textual clues for news segments, before listening. Our publicly searchable demonstrator currently indexes daily broadcast news content from 50 sources in English, French, Chinese, Arabic, Spanish, Dutch and Russian.

Patent
07 Sep 2011
TL;DR: In this article, an intelligent video monitoring system based on a digital television set top box is presented, which enables users to change the existing video browsing into an interactive mode and make an intelligent setup, so that the digital television is more intelligent and can provide richer digital television applications.
Abstract: The utility model discloses an intelligent video monitoring system based on a digital television set top box. A front collecting device comprises a video camera and a camera, a client terminal device comprises the digital television set top box, a television and a remoter for controlling the digital television set top box and the television through infrared signals. The output end of the video camera is connected with the input end of a video server through a video cable or an IP (internet protocol) network, the output end of the camera is connected with the input end of the video server through the IP network, the output end of the video server is connected with the input end of the digital television set top box through a digital television bidirectional interactive network, and the output end of the digital television set top box is connected with the television through the video cable. The intelligent video monitoring system based on the digital television set top box realizes intelligent digital television operations and enables users to change the existing video browsing into an interactive mode and make an intelligent setup, so that the digital television is more intelligent and can provide richer digital television applications.

Proceedings ArticleDOI
28 Nov 2011
TL;DR: The proposed interface Galaxy Browser adopts the recent advances in near-duplicate detection and then synchronizes the detected near-Duplicate information with comprehensive background knowledge derived from online external resources to create a topic structure on which users can easily browse and explore.
Abstract: Most search engines return a ranked list of items in response to a query. The list however tells very little about the relationship among items. For videos especially, users often read to spend significant amount of time to navigate the search result. Exploratory search presents a new paradigm for browsing where the browser takes up the role of information exploring and presents a well-organized browsing structure for users to navigate. The proposed interface Galaxy Browser adopts the recent advances in near-duplicate detection and then synchronizes the detected near-duplicate information with comprehensive background knowledge derived from online external resources. The result is a topic structure on which users can easily browse and explore.

Proceedings ArticleDOI
29 Nov 2011
TL;DR: A system, Comp2Watch which is pronounced like "come to watch", implies the meaning of "composing the frames into a collage" and "compressing the watching time" and puts ROI factors into consideration in order to help users take a quick glance at videos.
Abstract: The mobile devices have been widely spread and become frequently used equipment in daily life. Besides, watching videos on these devices has become a more and more popular activity. However, there are several challenges (e.g., small mobile screen size, low bandwidth, fragmented watching time) hindering mobile video watching: they either interrupt the watching process or limit users to browse many contents at the same time. Traditional video summarization techniques are suffering the small screen issue. Therefore, we propose a system, Comp2Watch which is pronounced like "come to watch". It implies the meaning of "composing the frames into a collage" and "compressing the watching time". It puts ROI factors into consideration in order to help users take a quick glance at videos. Also, we modify the cost function to incorporate the templates with variable aspect ratios. We also address the monotone layout problem caused by the limited space. The experimental results show that users can obtain clearer subject without losing many contexts.

Patent
10 Aug 2011
TL;DR: In this paper, the utility model discloses a video networking management system, which comprises at least one client, an internet protocol (IP) network, digital video recorder (DVR) equipment and a video network management server.
Abstract: The utility model discloses a video networking management system, which comprises at least one client, an internet protocol (IP) network, digital video recorder (DVR) equipment and a video networking management server, wherein the video networking management server is used by an administrator for adding, modifying and deleting DVR equipment and granting authorities to users aiming at added DVR equipment to endow the users with the authorities of video browsing, video playback, holder control and remote setting, and the client, the DVR equipment and the video networking management server are communicatively connected through the IP network. The video networking management system has the advantages that a great amount of DVR equipment can be uniformly managed, the uniform management of the authorities is realized through a uniform user management platform, and finally the users can conveniently and rapidly manage the entire video monitoring network.

Journal Article
TL;DR: A quick surveillance video browsing system to collect all of moving objects which carry the most significant information in surveillance videos to construct a corresponding compact video by tuning positions of these moving objects.
Abstract: Surveillance cameras have been widely installed in large cities to monitor and record human activities for different applications. Since surveillance cameras often record all events 24 hours/day, it necessarily takes huge workforce watching surveillance videos to search for specific targets, thus a system that helps the user quickly look for targets of interest is highly demanded. To this end, we propose a quick surveillance video browsing system. Our basic idea is to collect all of moving objects which carry the most significant information in surveillance videos to construct a corresponding compact video by tuning positions of these moving objects. The compact video rearranges the spatiotemporal coordinates of moving objects to enhance the compression, but the temporal relationships among moving objects are still kept. The compact video can preserve the essential activities involved in the original surveillance video. This paper presents the details of our browsing system and the approach to producing the compact video from a source surveillance video.

Book ChapterDOI
05 Jan 2011
TL;DR: This work proposes extending an existing video browsing tool in order to support clustering of objects with similar motion and visualization of the objects' positions and trajectories, which requires the automatic extraction of moving objects and estimation of their trajectories.
Abstract: Video browsing methods are complementary to search and retrieval approaches, as they allow for exploration of unknown content sets Objects and their motion convey important semantics of video content, which is relevant information for video browsing We propose extending an existing video browsing tool in order to support clustering of objects with similar motion and visualization of the objects' positions and trajectories This requires the automatic extraction of moving objects and estimation of their trajectories, as well as the ability to group objects with similar trajectories For the first issue we describe the application of a recently proposed motion trajectory clustering algorithm, for the second we use k-medoids clustering and the dynamic time warping distance We present evaluation results of both steps on real world traffic sequences from the Hopkins155 data set Finally we describe the description of analysis results using MPEG-7 and the integration into the video browsing tool

Book ChapterDOI
09 Nov 2011
TL;DR: This paper outlines some of the more promising content analysis techniques currently being researched in multimedia and computer vision and discusses how these could be used to develop visually-oriented end-user interfaces that support searching, browsing and summarization of the media contents in various usage contexts.
Abstract: Automatic media content analysis in multimedia is a very promising field of research bringing in various possibilities for enhancing visual informatics By computationally analysing the quantitative data contained in text, audio, image and video media, more semantically meaningful and useful information on the media contents can be derived, extracted and visualised, informing human users those facts and patterns initially hidden in the bit streams of data Insights into how to transform the emerging technological possibilities from these media analysis tools into usable visual interfaces to help people see visual information in novel ways will be an important contribution to visual informatics In this paper, we outline some of the more promising content analysis techniques currently being researched in multimedia and computer vision and discuss how these could be used to develop visually-oriented end-user interfaces that support searching, browsing and summarization of the media contents in various usage contexts We illustrate this with a few example applications that we have developed over the years, all of which designed in such a way as to take advantage of the automatic content analysis and to discover and create novel usage scenarios of consuming visually-oriented media contents

Journal Article
TL;DR: A content-sensitive approach for Video Browsing and Retrieval in the Context of Video Delivery: VBaR Framework is presented.
Abstract: Content-sensitive Approach for Video Browsing and Retrieval in the Context of Video Delivery: VBaR Framework