scispace - formally typeset
Search or ask a question

Showing papers in "ACM Transactions on Multimedia Computing, Communications, and Applications in 2005"


Journal ArticleDOI
TL;DR: This work presents several variants of an automatic unsupervised algorithm to partition a collection of digital photographs based either on temporal similarity alone, or on temporal and content-based similarity.
Abstract: Organizing digital photograph collections according to events such as holiday gatherings or vacations is a common practice among photographers. To support photographers in this task, we present similarity-based methods to cluster digital photos by time and image content. The approach is general and unsupervised, and makes minimal assumptions regarding the structure or statistics of the photo collection. We present several variants of an automatic unsupervised algorithm to partition a collection of digital photographs based either on temporal similarity alone, or on temporal and content-based similarity. First, interphoto similarity is quantified at multiple temporal scales to identify likely event clusters. Second, the final clusters are determined according to one of three clustering goodness criteria. The clustering criteria trade off computational complexity and performance. We also describe a supervised clustering method based on learning vector quantization. Finally, we review the results of an experimental evaluation of the proposed algorithms and existing approaches on two test collections.

222 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe two video sensor platforms that can deliver high-quality video over 802.11 networks with a power requirement less than 5 watts, and describe the streaming and prioritization mechanisms that they have designed to allow it to survive long-periods of disconnected operation.
Abstract: Video-based sensor networks can provide important visual information in a number of applications including: environmental monitoring, health care, emergency response, and video security. This article describes the Panoptes video-based sensor networking architecture, including its design, implementation, and performance. We describe two video sensor platforms that can deliver high-quality video over 802.11 networks with a power requirement less than 5 watts. In addition, we describe the streaming and prioritization mechanisms that we have designed to allow it to survive long-periods of disconnected operation. Finally, we describe a sample application and bitmapping algorithm that we have implemented to show the usefulness of our platform. Our experiments include an in-depth analysis of the bottlenecks within the system as well as power measurements for the various components of the system.

211 citations


Journal ArticleDOI
TL;DR: A new scheduling algorithm is proposed, SCAN-EDF, that combines the features of SCAN type of seek optimizing algorithm with an Earliest Deadline First (EDF) type of real-time scheduling algorithm.
Abstract: This article provides a retrospective of our original paper by the same title in the Proceedings of the First ACM Conference on Multimedia, published in 1993. This article examines the problem of disk scheduling in a multimedia I/O system. In a multimedia server, the disk requests may have constant data rate requirements and need guaranteed service. We propose a new scheduling algorithm, SCAN-EDF, that combines the features of SCAN type of seek optimizing algorithm with an Earliest Deadline First (EDF) type of real-time scheduling algorithm. We compare SCAN-EDF with other scheduling strategies and show that SCAN-EDF combines the best features of both SCAN and EDF. We also investigate the impact of buffer space on the maximum number of video streams that can be supported.We show that by making the deadlines larger than the request periods, a larger number of streams can be supported.We also describe how we extended the SCAN-EDF algorithm in the PRISM multimedia architecture. PRISM is an integrated multimedia server, designed to satisfy the QOS requirements of multiple classes of requests. Our experience in implementing the extended SCAN-EDF algorithm in a generic operating system is discussed and performance metrics and results are presented to illustrate how the SCAN-EDF extensions and implementation strategies have succeeded in meeting the QOS requirements of different classes of requests.

207 citations


Journal ArticleDOI
TL;DR: In this paper, a selection of multimedia authoring environments within four different authoring paradigms: structured, timeline, graph, and scripting is presented, and the authors argue that the structured paradigm provides the most useful framework for presentation authoring.
Abstract: Authoring context sensitive, interactive multimedia presentations is much more complex than authoring either purely audiovisual applications or text Interactions among media objects need to be described as a set of spatio-temporal relationships that account for synchronous and asynchronous interactions, as well as on-demand linking behavior This article considers the issues that need to be addressed by an authoring environment We begin with a partitioning of concerns based on seven classes of authoring problems We then describe a selection of multimedia authoring environments within four different authoring paradigms: structured, timeline, graph and scripting We next provide observations and insights into the authoring process and argue that the structured paradigm provides the most useful framework for presentation authoring We close with an example application of the structured multimedia authoring paradigm in the context of our own structure-based system GRiNS

132 citations


Journal ArticleDOI
TL;DR: The retreat suggested that the community focus on solving three grand challenges to make authoring complex multimedia titles as easy as using a word processor or drawing program, and make capturing, storing, finding, and using digital media an everyday occurrence in the authors' computing environment.
Abstract: The ACM Multimedia Special Interest Group was created ten years ago. Since that time, researchers have solved a number of important problems related to media processing, multimedia databases, and distributed multimedia applications. A strategic retreat was organized as part of ACM Multimedia 2003 to assess the current state of multimedia research and suggest directions for future research. This report presents the recommendations developed during the retreat. The major observation is that research in the past decade has significantly advanced hardware and software support for distributed multimedia applications and that future research should focus on identifying and delivering applications that impact users in the real-world.The retreat suggested that the community focus on solving three grand challenges: (1) make authoring complex multimedia titles as easy as using a word processor or drawing program, (2) make interactions with remote people and environments nearly the same as interactions with local people and environments, and (3) make capturing, storing, finding, and using digital media an everyday occurrence in our computing environment. The focus of multimedia researchers should be on applications that incorporate correlated media, fuse data from different sources, and use context to improve application performance.

120 citations


Journal ArticleDOI
TL;DR: Coliseum is a multiuser immersive remote teleconferencing system designed to provide collaborative workers the experience of face-to-face meetings from their desktops, and issues related to its performance are summarized.
Abstract: Coliseum is a multiuser immersive remote teleconferencing system designed to provide collaborative workers the experience of face-to-face meetings from their desktops. Five cameras are attached to each PC display and directed at the participant. From these video streams, view synthesis methods produce arbitrary-perspective renderings of the participant and transmit them to others at interactive rates, currently about 15 frames per second. Combining these renderings in a shared synthetic environment gives the appearance of having all participants interacting in a common space. In this way, Coliseum enables users to share a virtual world, with acquired-image renderings of their appearance replacing the synthetic representations provided by more conventional avatar-populated virtual worlds. The system supports virtual mobility---participants may move around the shared space---and reciprocal gaze, and has been demonstrated in collaborative sessions of up to ten Coliseum workstations, and sessions spanning two continents.Coliseum is a complex software system which pushes commodity computing resources to the limit. We set out to measure the different aspects of resource, network, CPU, memory, and disk usage to uncover the bottlenecks and guide enhancement and control of system performance. Latency is a key component of Quality of Experience for video conferencing. We present how each aspect of the system---cameras, image processing, networking, and display---contributes to total latency. Performance measurement is as complex as the system to which it is applied. We describe several techniques to estimate performance through direct light-weight instrumentation as well as use of realistic end-to-end measures that mimic actual user experience. We describe the various techniques and how they can be used to improve system performance for Coliseum and other network applications. This article summarizes the Coliseum technology and reports on issues related to its performance---its measurement, enhancement, and control.

69 citations


Journal ArticleDOI
TL;DR: This article presents an analytical framework to quantitatively study the features of a hybrid media streaming model, and derives an equation to describe the capacity growth of a single-file streaming system and proposes a failure model under arbitrarily distributed peer lifespan.
Abstract: Recent research efforts have demonstrated the great potential of building cost-effective media streaming systems on top of peer-to-peer (P2P) networks. A P2P media streaming architecture can reach a large streaming capacity that is difficult to achieve in conventional server-based streaming services. Hybrid streaming systems that combine the use of dedicated streaming servers and P2P networks were proposed to build on the advantages of both paradigms. However, the dynamics of such systems and the impact of various factors on system behavior are not totally clear. In this article, we present an analytical framework to quantitatively study the features of a hybrid media streaming model. Based on this framework, we derive an equation to describe the capacity growth of a single-file streaming system. We then extend the analysis to multi-file scenarios. We also show how the system achieves optimal allocation of server bandwidth among different media objects. The unpredictable departure/failure of peers is a critical factor that affects the performance of P2P systems. We utilize the concept of peer lifespan to model peer failures. The original capacity growth equation is enhanced with coefficients generated from peer lifespans that follow an exponential distribution. We also propose a failure model under arbitrarily distributed peer lifespan. Results from large-scale simulations support our analysis.

68 citations


Journal ArticleDOI
TL;DR: Analytic experiments over a range of network and application conditions indicate that adjustable FEC with temporal scaling can provide a significant performance improvement over current approaches, and can be effective as part of a streaming protocol that chooses FEC and temporal scaling patterns that meet dynamically-changing application and network conditions.
Abstract: New TCP-friendly constraints require multimedia flows to reduce their data rates under packet loss to that of a conformant TCP flow. To reduce data rates while preserving real-time playout, temporal scaling can be used to discard the encoded multimedia frames that have the least impact on perceived video quality. To limit the impact of lost packets, Forward Error Correction (FEC) can be used to repair frames damaged by packet loss. However, adding FEC requires further reduction of multimedia data, making the decision of how much FEC to use of critical importance. Current approaches use either inflexible FEC patterns or adapt to packet loss on the network without regard to TCP-friendly data rate constraints. In this article, we analytically model the playable frame rate of a TCP-friendly MPEG stream with FEC and temporal scaling, capturing the impact of distributing FEC within MPEG frame types with interframe dependencies. For a given network condition and MPEG video encoding, we use our model to exhaustively search for the optimal combination of FEC and temporal scaling that yields the highest playable frame rate within TCP-friendly constraints. Analytic experiments over a range of network and application conditions indicate that adjustable FEC with temporal scaling can provide a significant performance improvement over current approaches. Extensive simulation experiments based on Internet traces show that our model can be effective as part of a streaming protocol that chooses FEC and temporal scaling patterns that meet dynamically-changing application and network conditions.

37 citations


Journal ArticleDOI
TL;DR: An integrated media creation environment is discussed, its efficacy in the generation of two simple home movies is demonstrated, and content repurposing powered by those same transformations added to the rich semantic information maintained at each phase of the process.
Abstract: We discuss the design goals for an integrated media creation environment (IMCE) aimed at enabling the average user to create media artifacts with professional qualities. The resulting requirements are implemented and we demonstrate the efficacy of the resulting system with the generation of two simple home movies. The significance for the average user seeking to create home movies lies in the flexible and automatic application of film principles to the task, removal of tedious low-level editing by means of well-formed media transformations in terms of high-level film constructs (e.g., tempo), and content repurposing powered by those same transformations added to the rich semantic information maintained at each phase of the process.

29 citations


Journal ArticleDOI
TL;DR: This article proposes an optimal solution using dynamic programming to compute the optimal locations for caching multiple versions of the same multimedia object in transcoding proxies for tree networks, and significantly outperforms existing models that consider Web caching in transcoded proxies either on a single path or at individual nodes.
Abstract: Transcoding is a promising technology that allows systems to effect a quality-versus-size tradeoff on multimedia objects. As audio and video applications have proliferated on the Internet, caching in transcoding proxies has become an important technique for improving network performance, especially in mobile networks. This article addresses the problem of coordinated enroute multimedia object caching in transcoding proxies for tree networks. We formulate this problem as an optimization problem based on our proposed model, in which multimedia object caching decisions are made on all enroute caches along the routing path by integrating both object placement and replacement policies and cache status information along the routing path of a request is used to determine the optimal locations for caching multiple versions of the same multimedia object. We propose an optimal solution using dynamic programming to compute the optimal locations. We also extend this solution to solve the same problem for several constrained cases, including constraints on the cost gain per node and on the number of versions to be placed. Our model is evaluated on different performance metrics through extensive simulation experiments. The implementation results show that our model significantly outperforms existing models that consider Web caching in transcoding proxies either on a single path or at individual nodes.

28 citations


Journal ArticleDOI
TL;DR: This paper proposes a confidence-based dynamic ensemble (CDE) to overcome the shortcomings of the traditional static classifiers, and demonstrates that CDE is effective in annotating large-scale, real-world image datasets.
Abstract: Providing accurate and scalable solutions to map low-level perceptual features to high-level semantics is essential for multimedia information organization and retrieval. In this paper, we propose a confidence-based dynamic ensemble (CDE) to overcome the shortcomings of the traditional static classifiers. In contrast to the traditional models, CDE can make dynamic adjustments to accommodate new semantics, to assist the discovery of useful low-level features, and to improve class-prediction accuracy. We depict two key components of CDE: a multi-level function that asserts class-prediction confidence, and the dynamic ensemble method based upon the confidence function. Through theoretical analysis and empirical study, we demonstrate that CDE is effective in annotating large-scale, real-world image datasets.

Journal ArticleDOI
TL;DR: A compression algorithm is presented which scales comparably to the acquisition and reconstruction, reduces network transmission bandwidth, and reduces the rendering requirement for real-time performance in 3D tele-immersion environments.
Abstract: The goal of tele-immersion has long been to enable people at remote locations to share a sense of presence. A tele-immersion system acquires the 3D representation of a collaborator's environment remotely and sends it over the network where it is rendered in the user's environment. Acquisition, reconstruction, transmission, and rendering all have to be done in real-time to create a sense of presence. With added commodity hardware resources, parallelism can increase the acquisition volume and reconstruction data quality while maintaining real-time performance. However, this is not as easy for rendering since all of the data need to be combined into a single display.In this article, we present an algorithm to compress data from such 3D environments in real-time to solve this imbalance. We present a compression algorithm which scales comparably to the acquisition and reconstruction, reduces network transmission bandwidth, and reduces the rendering requirement for real-time performance. This is achieved by exploiting the coherence in the 3D environment data and removing them in real-time. We have tested the algorithm using a static office data set as well as a dynamic scene, the results of which are presented in the article.

Journal ArticleDOI
TL;DR: It is argued that multimedia document systems should provide mechanisms for automatically producing temporal layouts for documents and the Firefly multimedia document system is described, which was developed in 1992 to test the potential of automatic temporal formatting.
Abstract: A traditional static document has a spatial layout that specifies where objects in the document appear Because multimedia documents incorporate time, they also require a temporal layout, or schedule, that specifies when events in the document occur This article argues that multimedia document systems should provide mechanisms for automatically producing temporal layouts for documents The major advantage of this approach is that it makes it easier for authors to create and modify multimedia documentsThis article revisits our 1993 framework for understanding automatic temporal formatters and explores the basic issues surrounding them It also describes the Firefly multimedia document system, which was developed in 1992 to test the potential of automatic temporal formatting Using our original framework, the paper reviews a representative sample of recent automatic document formatters This analysis validates the basic framework and demonstrates the progress of the field in the intervening decade A discussion of potential extensions to the framework is included

Journal ArticleDOI
TL;DR: The original performance results are compared with experiments run on a modern processor to demonstrate the gains of processing power in the past ten years relative to this specific application and the history of MPEG-1 video software decoding and the Berkeley MPEG research group is discussed.
Abstract: This article reprises the description of the Berkeley software-only MPEG-1 video decoder originally published in the proceedings of the 1st International ACM Conference on Multimedia in 1993. The software subsequently became widely used in a variety of research systems and commercial products. Its main impact was to provide a platform for experimenting with streaming compressed video and to expose the strengths and weaknesses of software-only video decoding using general purpose computing architectures. This article compares the original performance results with experiments run on a modern processor to demonstrate the gains of processing power in the past ten years relative to this specific application and discusses the history of MPEG-1 video software decoding and the Berkeley MPEG research group.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed joint L-ULP and pre-interleaving scheme is able to achieve as good performance as that of the ULP while the complexity is much lower.
Abstract: Most existing unequal loss protection (ULP) schemes do not consider the minimum quality requirement and usually have high computation complexity. In this research, we propose a layered ULP (L-ULP) scheme to solve these problems. In particular, we use the rate-based optimal solution with a local search to find the average forward error correction (FEC) allocation and use the gradient search to find the FEC solution for each layer. Experimental results show that the executing time of L-ULP is much faster than the traditional ULP scheme but the average distortion is worse. Therefore, we further propose to combine the L-ULP with the pre-interleaving to have an improved L-ULP (IL-ULP) system. By using the pre-interleaving, we are able to delay the occurrence of the first unrecoverable loss in the source bitstream and thus improve the loss resilience performance. With the better loss resilience performance in the source bitstream, our proposed IL-ULP scheme is allowed to have a weaker FEC protection and allocate more bits to the source coding which leads to the improvement of overall performance. Experimental results show that our proposed IL-ULP scheme even outperforms the global optimal result obtained by any traditional ULP scheme while the complexity of IL-ULP is almost the same as L-ULP.

Journal ArticleDOI
TL;DR: ACM should create its own archival journal in Multimedia, the creation of the ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) was speedily approved in December 2003.
Abstract: Multimedia is now a mature field, having evolved over approximately 20 years. The term " media " traditionally referred to entities such as audio, video, text, images, graphics, animation. New media will be added in the future, including virtual reality, term " multimedia " has now been accepted to mean documents composed of at least two correlated media. The correlation could be temporal, spatial or semantic. Applications now appear in many fields such The first IEEE workshop that had that term in its title was held in 1987 in Osaka, Japan. The ACM SIG Multimedia (SIGMM) held its first multimedia conference in San Diego in 1993. Subsequently, the IEEE Multimedia magazine appeared and then the IEEE Transactions on Multimedia. A Springer publication was established in the early 1990s and has, for some years now, been referred to as the ACM/Springer Multimedia Systems Journal (MMSJ). Despite its name, the publication was fully owned by Springer, and ACM had not obtained the right to include this journal in its digital library. Subscribers to the library also had to subscribe to MMSJ separately in order to access journal articles. Although they supported MMJS, the SIGMM leadership had some difficulties with the management of the journal. They believed that papers were kept in the publications queue because of successive changes to the position of managing editor. Finally, MMSJ is back on track, thanks to the leadership of its current managing editor, Professor Klara Nahrstedt of the University of Illinois in Urbana-Champaign. However, it is still not an ACM journal. Since the early days, SIGMM wanted to create an ACM archival journal of its own in the multi-media field. In November 2003, SIGMM held a retreat in Berkeley, California, to discuss where the field of multimedia is going. The results are reported in this issue in the article by Larry Rowe and Ramesh Jain. Given recent developments, resulting in the merger of Springer with Kluwer under new UK-based ownership, and the uncertainty of the future of the Springer (MMSJ) and Kluwer (MTAP: Multimedia Tools and Applications) journals, the SIGMM leading researchers felt that this is the right time that ACM should create its own archival journal in Multimedia. A proposal was made to the ACM Publications Board in November 2003 for the creation of the ACM Transactions on Multimedia Computing , Communications and Applications (TOMCCAP). The proposal was speedily approved in December 2003. The journal …

Journal ArticleDOI
TL;DR: A general operating system mechanism that may be used to implement a wide variety of online quality management functions, based on the notions of events, event channels, and event handlers is described.
Abstract: To meet end users' quality-of-service (QoS) requirements, online quality management for multimedia applications must include appropriate allocation of the underlying computing platform's resources. Previous work has developed novel operating system (OS) functionality for dynamic QoS management, including multimedia or real-time CPU schedulers and OS extensions for online performance monitoring and for adaptations, as well as QoS-aware applications that adapt their behavior to gain additional benefits from such functionality. This article describes a general OS mechanism that may be used to implement a wide variety of online quality management functions. ECalls is a communication mechanism that implements multiple cross-domain calling conventions that can be customized to the quality management needs of applications. The ECalls mechanism is based on the notions of events, event channels, and event handlers. Using events, applications can share relevant QoS attributes with OS services, and OS-level resource management services can efficiently provide monitoring data to target applications or application managers. Dynamically generated event handlers can be used to customize event delivery to meet diverse application needs, for example, to achieve high scalability for Web servers or small jitter for real-time data delivery.

Journal ArticleDOI
TL;DR: The readers of ACM TOMCCAP will be interested to learn about the impact of the most influential ACM MM'93 papers, and the authors of five papers have been able to produce a revised version in time.
Abstract: first issue of the journal is publishing revised versions of the most influential papers from the 54 papers presented at ACM Multimedia 1993. Ten years, actually 11.5 years by the time this issue is published, are quite a long time with respect to the rapid developments in the field of multimedia. Therefore, we are convinced that the readers of ACM TOMCCAP will be interested to learn about the impact of the most influential ACM MM'93 papers. Have the results been integrated in public domain software or commercial products? Are they having an impact on a large user population? What are the experiences? Did the work have an impact onto standardization? Which follow-up problems have been addressed later on and are the major problems from 1993 solved in the meantime or still open research problems? Obviously, the authors of the most influential papers are good candidates to answer these questions. Therefore, we explicitly asked them to address these questions in the revised version of their paper. However, before asking the authors, it was necessary to identify the most influential papers. Our personal judgments about the relevance and impact of the ACM MM'93 papers are certainly subjective, even if we aimed to make objective evaluations. We considered the number of citations listed in the citation indexes provided by the ACM Digital Library and CiteSeer. The outcome of this process was a set of eight articles. We mentioned earlier that ten years is a long time and, during this time, some of the authors have experienced major changes in their professional lives, which made it impossible for them to produce a revised version of their original ACM MM'93 paper for this issue. Therefore, we are glad that the authors of five papers have been able to produce a revised version in time, and each of them makes a very interesting reading! Laura Teodosio and Walter Bender discuss in \" Salient Stills \" follow-up work of their original research and the commercialization of their results. A. L. Narasimha Reddy and Jim Wyllie present in \" Scheduling in a Multimedia I/O System \" the implementation and evaluation of their originally proposed SCAN-EDF scheduling policy in the PRISM system. In \" Automatic Temporal Layout Mechanisms Revisited \" , Cecilia Buchanan and Polle T. Zellweger expand the analysis of other pre-1993 automatic temporal formatters and use the framework to review a representative sample of more recent automatic …