scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions on Multimedia Computing, Communications, and Applications in 2006"


Journal ArticleDOI
TL;DR: This survey reviews 100+ recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques.
Abstract: Extending beyond the boundaries of science, art, and culture, content-based multimedia information retrieval provides new paradigms and methods for searching through the myriad variety of media all over the world. This survey reviews 100p recent articles on content-based multimedia information retrieval and discusses their role in current research directions which include browsing and search paradigms, user studies, affective computing, learning, semantic queries, new features and media types, high performance indexing, and evaluation techniques. Based on the current state of the art, we discuss the major challenges for the future.

1,652 citations


Journal ArticleDOI
TL;DR: This article presents a comparative analysis of a few different solutions for description and retrieval by similarity of 3D models that are representative of the principal classes of approaches proposed and develops an experimental analysis by comparing these methods according to their robustness to deformations, the ability to capture an object's structural complexity, and the resolution at which models are considered.
Abstract: In the past few years, there has been an increasing availability of technologies for the acquisition of digital 3D models of real objects and the consequent use of these models in a variety of applications, in medicine, engineering, and cultural heritage. In this framework, content-based retrieval of 3D objects is becoming an important subject of research, and finding adequate descriptors to capture global or local characteristics of the shape has become one of the main investigation goals. In this article, we present a comparative analysis of a few different solutions for description and retrieval by similarity of 3D models that are representative of the principal classes of approaches proposed. We have developed an experimental analysis by comparing these methods according to their robustness to deformations, the ability to capture an object's structural complexity, and the resolution at which models are considered.

167 citations


Journal ArticleDOI
TL;DR: An unsupervised approach to automated story picturing by extracting semantic keywords from the story, an annotated image database is searched and a novel image ranking scheme automatically determines the importance of each image.
Abstract: We present an unsupervised approach to automated story picturing. Semantic keywords are extracted from the story, an annotated image database is searched. Thereafter, a novel image ranking scheme automatically determines the importance of each image. Both lexical annotations and visual content play a role in determining the ranks. Annotations are processed using the Wordnet. A mutual reinforcement-based rank is calculated for each image. We have implemented the methods in our Story Picturing Engine (SPE) system. Experiments on large-scale image databases are reported. A user study has been performed and statistical analysis of the results has been presented.

141 citations


Journal ArticleDOI
TL;DR: A model that assesses quality variation from three distinct levels: the network, the media and the content levels; and from two views: the technical and the user perspective shows that a significant reduction in frame rate does not proportionally reduce the user's understanding of the presentation independent of technical parameters.
Abstract: This article presents the results of a study that explored the human side of the multimedia experience. We propose a model that assesses quality variation from three distinct levels: the network, the media and the content levels; and from two views: the technical and the user perspective. By facilitating parameter variation at each of the quality levels and from each of the perspectives, we were able to examine their impact on user quality perception. Results show that a significant reduction in frame rate does not proportionally reduce the user's understanding of the presentation independent of technical parameters, that multimedia content type significantly impacts user information assimilation, user level of enjoyment, and user perception of quality, and that the device display type impacts user information assimilation and user perception of quality. Finally, to ensure the transfer of information, low-level abstraction (network-level) parameters, such as delay and jitter, should be adapted; to maintain the user's level of enjoyment, high-level abstraction quality parameters (content-level), such as the appropriate use of display screens, should be adapted.

95 citations


Journal ArticleDOI
TL;DR: A framework that utilizes both internal AV features and various types of external information sources for event detection in team sports video is proposed and shown to be effective on soccer and American football.
Abstract: The use of AV features alone is insufficient to induce high-level semantics. This article proposes a framework that utilizes both internal AV features and various types of external information sources for event detection in team sports video. Three schemes are also proposed to tackle the asynchronism between the fusion of AV and external information. The framework is extensible as it can provide increasing functionalities given more detailed external information and domain knowledge. By demonstrating its effectiveness on soccer and American football, we believe that with the availability of appropriate domain knowledge, the framework is applicable to other team sports.

53 citations


Journal ArticleDOI
TL;DR: A generic and robust framework for news video indexing which is founded on a broadcast news production model and demonstrates that the accuracy of the proposed style analysis framework for classification of several rich semantic concepts is state-of-the-art.
Abstract: We propose a generic and robust framework for news video indexing which we founded on a broadcast news production model. We identify within this model four production phases, each providing useful metadata for annotation. In contrast to semiautomatic indexing approaches which exploit this information at production time, we adhere to an automatic data-driven approach. To that end, we analyze a digital news video using a separate set of multimodal detectors for each production phase. By combining the resulting production-derived features into a statistical classifier ensemble, the framework facilitates robust classification of several rich semantic concepts in news video; rich meaning that concepts share many similarities in their production process. Experiments on an archive of 120 hours of news video from the 2003 TRECVID benchmark show that a combined analysis of production phases yields the best results. In addition, we demonstrate that the accuracy of the proposed style analysis framework for classification of several rich semantic concepts is state-of-the-art.

45 citations


Journal ArticleDOI
TL;DR: Extensive simulations with TCP/UDP and SCTP show that the proposed 3D middleware can achieve the dual objectives of maintaining low transmission delay and small distortion, and thus supporting high quality 3D streaming with high flexibility.
Abstract: Streaming 3D graphics have been widely used in multimedia applications such as online gaming and virtual reality. However, a gap exists between the zero-loss-tolerance of the existing compression schemes and the lossy network transmissions. In this article, we propose a generic 3D middleware between the 3D application layer and the transport layer for the transmission of triangle-based progressively compressed 3D models. Significant features of the proposed middleware include. 1) handling 3D compressed data streams from multiple progressive compression techniques. 2) considering end user hardware capabilities for effectively saving the data size for network delivery. 3) a minimum cost dynamic reliable set selector to choose the transport protocol for each sublayer based on the real-time network traffic. Extensive simulations with TCP/UDP and SCTP show that the proposed 3D middleware can achieve the dual objectives of maintaining low transmission delay and small distortion, and thus supporting high quality 3D streaming with high flexibility.

39 citations


Journal ArticleDOI
TL;DR: The proposed MX-Onto ontology enables capturing the subjective interpretation of music genres by defining multiple membership relations between a music resource and the corresponding music genres, thus supporting context-based and proximity-based search of music resources.
Abstract: In this article, we describe the MX-Onto ontology for providing a Semantic Web compatible representation of music resources based on their context. The context representation is realized by means of an OWL ontology that describes music information and that defines rules and classes for a flexible genre classification. By flexible classification we mean that the proposed approach enables capturing the subjective interpretation of music genres by defining multiple membership relations between a music resource and the corresponding music genres, thus supporting context-based and proximity-based search of music resources.

38 citations


Journal ArticleDOI
TL;DR: A new metric is suggested to identify interactive processes by explicitly measuring interactions with the user, and it is used to design and implement a process scheduler that is very effective in distinguishing between competing interactive and noninteractive processes.
Abstract: Desktop operating systems such as Windows and Linux base scheduling decisions on CPU consumption; processes that consume fewer CPU cycles are prioritized, assuming that interactive processes gain from this since they spend most of their time waiting for user input. However, this doesn't work for modern multimedia applications which require significant CPU resources. We therefore suggest a new metric to identify interactive processes by explicitly measuring interactions with the user, and we use it to design and implement a process scheduler. Measurements using a variety of applications indicate that this scheduler is very effective in distinguishing between competing interactive and noninteractive processes.

28 citations


Journal ArticleDOI
TL;DR: It is shown by analysis that VTP preserves most of the convergence properties of AIMD and converges to its fair share fast and excels in all tested scenarios in terms of smoothness, fairness, and opportunistic friendliness.
Abstract: In this article we study a smooth and efficient transport protocol for real-time video over wireless networks. The proposed scheme, named the video transport protocol (VTP), has a new and unique end-to-end rate control mechanism that aims to avoid drastic rate fluctuations while maintaining friendliness to legacy protocols. VTP is also equipped with an achieved rate estimation scheme and a loss discrimination algorithm, both end-to-end, to cope with random errors in wireless networks efficiently. We show by analysis that VTP preserves most of the convergence properties of AIMD and converges to its fair share fast. VTP is compared to two recent TCP friendly rate control (TFRC) extensions, namely TFRC Wireless and MULTFRC, in wired-cum-wireless scenarios in Ns-2. Results show that VTP excels in all tested scenarios in terms of smoothness, fairness, and opportunistic friendliness. VTP is also implemented to work with a video camera and an H.263 video codec as part of our hybrid testbed, where its good performance as a transport layer protocol is confirmed by measurement results.

28 citations


Journal ArticleDOI
TL;DR: Evaluating the quality of music summaries and effectiveness of the proposed summarization approach indicate that summaries generated using the proposed method are effective in helping realize users' expectations
Abstract: In this article, we propose a novel approach for automatic music video summarization. The proposed summarization scheme is different from the current methods used for video summarization. The music video is separated into the music track and video track. For the music track, a music summary is created by analyzing the music content using music features, an adaptive clustering algorithm, and music domain knowledge. Then, shots in the video track are detected and clustered. Finally, the music video summary is created by aligning the music summary and clustered video shots. Subjective studies by experienced users have been conducted to evaluate the quality of music summaries and effectiveness of the proposed summarization approach. Experiments are performed on different genres of music videos and comparisons are made with the summaries generated based on music track, video track, and manually. The evaluation results indicate that summaries generated using the proposed method are effective in helping realize users' expectations.

Journal ArticleDOI
TL;DR: A new multidimensional model is proposed that integrates functional dimension versions allowing the descriptors of the multiddimensional data to be computed by different functions and is used to develop an OLAP application for navigation into a hypercube integrating various functional dimension version for the calculus of descriptors in a medical use case.
Abstract: Data warehouses are dedicated to collecting heterogeneous and distributed data in order to perform decision analysis. Based on multidimensional model, OLAP commercial environments such as they are currently designed in traditional applications are used to provide means for the analysis of facts that are depicted by numeric data (e.g., sales depicted by amount or quantity sold). However, in numerous fields, like in medical or bioinformatics, multimedia data are used as valuable information in the decisional process. One of the problems when integrating multimedia data as facts in a multidimensional model is to deal with dimensions built on descriptors that can be obtained by various computation modes on raw multimedia data. Taking into account these computation modes makes possible the characterization of the data by various points of view depending on the user's profile, his best-practices, his level of expertise, and so on. We propose a new multidimensional model that integrates functional dimension versions allowing the descriptors of the multidimensional data to be computed by different functions. With this approach, the user is able to obtain and choose multiple points of view on the data he analyses. This model is used to develop an OLAP application for navigation into a hypercube integrating various functional dimension versions for the calculus of descriptors in a medical use case.

Journal ArticleDOI
TL;DR: Initial results show significant improvement in accuracy and efficiency of haptic perception in augmented reality environments when compared to conventional approaches that do not model context in haptic rendering.
Abstract: Haptic perception refers to the ability of human beings to perceive spatial properties through touch-based sensations. In haptics, contextual clues about material,shape, size, texture, and weight configurations of an object are perceived by individuals leading to recognition of the object and its spatial features. In this paper, we present strategies and algorithms to model context in haptic applications that allow users to haptically explore objects in virtual reality/augmented reality environments. Initial results show significant improvement in accuracy and efficiency of haptic perception in augmented reality environments when compared to conventional approaches that do not model context in haptic rendering.

Journal ArticleDOI
TL;DR: The concept of automatic content-based editing of preexisting semantic home video metadata is introduced and a formal representation and implementation techniques for reusing and repurposing semantic video metadata in concordance with the actual video editing operations are proposed.
Abstract: This article addresses the problem of processing the annotations of preexisting video productions to enable reuse and repurposing of metadata. We introduce the concept of automatic content-based editing of preexisting semantic home video metadata. We propose a formal representation and implementation techniques for reusing and repurposing semantic video metadata in concordance with the actual video editing operations. A novel representation for metadata editing is proposed and an implementation framework for editing the metadata in accordance with the video editing operations is demonstrated. Conflict resolution and regularization operations are defined and implemented in the context of the video metadata editing operations.

Journal ArticleDOI
TL;DR: Delays Distribution Measurement (DDM) based admission control algorithm is presented, the first measurement-based approach that effectively exploits statistical multiplexing along the delay dimension.
Abstract: Growth of performance sensitive applications, such as voice and multimedia, has led to widespread adoption of resource virtualization by a variety of service providers (xSPs). For instance, Internet Service Providers (ISPs) increasingly differentiate their offerings by means of customized services, such as virtual private networks (VPN) with Quality of Service (QoS) guarantees or QVPNs. Similarly Storage Service Providers (SSPs) use storage area networks (SAN)/network attached storage (NAS) technology to provision virtual disks with QoS guarantees or QVDs. The key challenge faced by these xSPs is to maximize the number of virtual resource units they can support by exploiting the statistical multiplexing nature of the customers' input request load.While a number of measurement-based admission control algorithms utilize statistical multiplexing along the bandwidth dimension, they do not satisfactorily exploit statistical multiplexing along the delay dimension to guarantee distinct per-virtual-unit delay bounds. This article presents Delay Distribution Measurement (DDM) based admission control algorithm, the first measurement-based approach that effectively exploits statistical multiplexing along the delay dimension. In other words, DDM exploits the well-known fact that the actual delay experienced by most service requests (packets or disk I/O requests) for a virtual unit is usually far smaller than its worst-case delay bound requirement because multiple virtual units rarely send request bursts at the same time. Additionally, DDM supports virtual units with distinct probabilistic delay guarantees---virtual units that can tolerate more delay violations can reserve fewer resources than those that tolerate less, even though they require the same delay bound. Comprehensive trace-driven performance evaluation of QVPNs (using Voice over IP traces) and QVDs (using video stream, TPC-C, and Web search I/O traces) shows that, when compared to deterministic admission control, DDM can potentially increase the number of admitted virtual units (and resource utilization) by up to a factor of 3.

Journal ArticleDOI
TL;DR: This article presents a graphics software architecture for next-generation digital television receivers that should include a standardised Java-based procedural environment capable of rendering 2D/3D graphics and video, and a declarative environment supporting W3C recommendations such as SMIL and XForms.
Abstract: This article presents a graphics software architecture for next-generation digital television receivers. We propose that such receivers should include a standardised Java-based procedural environment capable of rendering 2D/3D graphics and video, and a declarative environment supporting W3C recommendations such as SMIL and XForms. We also introduce a graphics architecture model that meets such requirements. As a proof-of-concept, a prototype implementation of the model is presented. This implementation enhances television content by allowing the user to play 3D graphics games, to run Java applications, and to browse XML-based documents while meeting current hardware restrictions.

Journal ArticleDOI
TL;DR: This article presents techniques for QoS-aware composition of applications for real-time video content analysis, based on dynamic Bayesian networks and demonstrates that increasing QoS requirements can be met by allocating additional CPUs for parallel processing, with only minor overhead.
Abstract: Real-Time content-based access to live video data requires content analysis applications that are able to process video streams in real-time and with an acceptable error rate. Statements such as this express quality of service (QoS) requirements. In general, control of the QoS provided can be achieved by sacrificing application quality in one QoS dimension for better quality in another, or by controlling the allocation of processing resources to the application. However, controlling QoS in video content analysis is particularly difficult, not only because main QoS dimensions like accuracy are nonadditive, but also because both the communication- and the processing-resource requirements are challenging.This article presents techniques for QoS-aware composition of applications for real-time video content analysis, based on dynamic Bayesian networks. The aim of QoS-aware composition is to determine application deployment configurations which satisfy a given set of QoS requirements. Our approach consists of: (1) an algorithm for QoS-aware selection of configurations of feature extractor and classification algorithms which balances requirements for timeliness and accuracy against available processing resources, (2) a distributed content-based publish/subscribe system which provides application scalability at multiple logical levels of distribution, and (3) scalable solutions for video streaming, filtering/transformation, feature extraction, and classification.We evaluate our approach based on experiments with an implementation of a real-time motion vector based object-tracking application. The evaluation shows that the application largely behaves as expected when resource availability and selections of configurations of feature extractor and classification algorithms vary. The evaluation also shows that increasing QoS requirements can be met by allocating additional CPUs for parallel processing, with only minor overhead.

Journal Article
TL;DR: This paper experimentally shows data dependence between training and test set and proposes how this bias can be prevented using episode-constrained crossvalidation, and shows that a 17% higher classifier performance can be achieved by using episode constrained cross-validation for classifier parameter tuning.
Abstract: Digital video is sequential in nature. When video data is used in a semantic concept classification task, the episodes are usually summarized with shots. The shots are annotated as containing, or not containing, a certain concept resulting in a labeled dataset. These labeled shots can subsequently be used by supervised learning methods (classifiers) where they are trained to predict the absence or presence of the concept in unseen shots and episodes. The performance of such automatic classification systems is usually estimated with cross-validation. By taking random samples from the dataset for training and testing as such, part of the shots from an episode are in the training set and another part from the same episode is in the test set. Accordingly, data dependence between training and test set is introduced, resulting in too optimistic performance estimates. In this paper, we experimentally show this bias, and propose how this bias can be prevented using episode-constrained crossvalidation. Moreover, we show that a 17% higher classifier performance can be achieved by using episode constrained cross-validation for classifier parameter tuning.

Journal ArticleDOI
TL;DR: This special issue contains extended versions of the three best papers presented at the 11th International Workshop on Multimedia Information Systems (MIS) held between 19 and 21 September 2005 in Sorrento, Italy.
Abstract: This special issue contains extended versions of the three best papers presented at the 11th International Workshop on Multimedia Information Systems (MIS) held between 19 and 21 September 2005 in Sorrento, Italy. The primary goal of the MIS workshops is to report on cutting edge research across all areas of multimedia systems so that researchers in different areas of multimedia information systems get to know, and are influenced by, other research directions. MIS workshops bring together investigators from academia, industry, and government labs to address research problems in various areas related to multimedia information systems. The areas of focus of MIS include databases, knowledgebases, operating systems, and networking to support development of effective and efficient multimedia systems. Despite the diversity of technical challenges underlying multimedia system design and development, we note that a common requirement in any multimedia information system is to be able to select, direct, and focus its processing appropriately (i.e., as necessitated by the task and context) to transform raw media data into useful information. Current information systems make significant use of multimedia data. A key issue with multimedia information systems is that their behavior cannot be generic, but must be tailored to several parameters that describe the user situation in its many facets, and that collectively represent what is called a context. To obtain a context-aware behavior, context-oriented design is required. Context has been studied in application development and user computer interaction. Initially linked to the concepts of space and time of action, then extended to include adaptation of the information presentation to a variety of devices, it has evolved to include any information about the environment in which a system acts that can influence its operation [Dey 2001; Schmidt et al. 1999]. In information systems the context is used to supplement the user operations with implicit information about the user state. In multimedia information systems this role is even grater, due to the many facets of multimedia data, which require a system to behave differently not only in the way the information is presented, but also in selecting the right information to deliver. Context is therefore a “hidden partner” of the user, silently feeding part of information, or silently tuning some parameters that allow the user to receive the needed information and service in the right place, in the right way, and with an adequate interaction effort [Dey 2001; Schmidt et al. 1999; Weiser and Brown 1997]. Contextual adaptation is necessary at multiple levels: first of all, at the lowest level, multimedia data can be stored and transmitted at different precision scales depending on the available resources and application semantics. Secondly, multimedia data can be indexed and retrieved with varying accuracy