scispace - formally typeset
Search or ask a question

Showing papers in "IEEE MultiMedia in 2007"


Journal ArticleDOI
William I. Grosky1, Aman Kansal1, Suman Nath1, Jie Liu1, Feng Zhao1 
TL;DR: The SenseWeb allows applications to initiate and access sensor data streams from shared sensors across the entire Internet and helps ensure optimal sensor selection for each application and efficient sharing of sensor streams among multiple applications.
Abstract: Peer-produced systems can achieve what might be infeasible for stand-alone systems developed by a single entity The SenseWeb's goal is to enable these kinds of capabilities Using SenseWeb, applications can initiate and access sensor data streams from shared sensors across the entire Internet The SenseWeb infrastructure helps ensure optimal sensor selection for each application and efficient sharing of sensor streams among multiple applications

357 citations


Journal ArticleDOI
TL;DR: A common event model for multimedia could serve as a unifying foundation for all of the next-generation multimedia applications such as eChronicles, life logs, or the Event Web.
Abstract: Although events are ubiquitous in multimedia, no common notion of events has emerged. Events appear in multimedia presentation formats, programming frameworks, and databases, as well as in next-generation multimedia applications such as eChronicles, life logs, or the Event Web. A common event model for multimedia could serve as a unifying foundation for all of these applications

226 citations


Journal ArticleDOI
TL;DR: In this article, an automated surveillance system deployed in a variety of real-world scenarios ranging from railway security to law enforcement is presented, where the authors discuss the challenges of developing surveillance systems and present some solutions implemented in Knight that overcome these challenges.
Abstract: In this article, we present Knight, an automated surveillance system deployed in a variety of real-world scenarios ranging from railway security to law enforcement. We also discuss the challenges of developing surveillance systems, present some solutions implemented in Knight that overcome these challenges, and evaluate Knight's performance in unconstrained environments

180 citations


Journal ArticleDOI
TL;DR: A museum guidance system called PhoneGuide is presented that uses widespread camera-equipped mobile phones for on-device object recognition in combination with pervasive tracking and provides location- and object-aware multimedia content to museum visitors, and is scalable to cover a large number of museum objects.
Abstract: We present a museum guidance system called PhoneGuide that uses widespread camera-equipped mobile phones for on-device object recognition in combination with pervasive tracking. It also provides location- and object-aware multimedia content to museum visitors, and is scalable to cover a large number of museum objects.

153 citations


Journal ArticleDOI
TL;DR: This work developed two general reranking methods that explore the recurrent visual patterns in many contexts, such as the returned images or video shots from initial text queries, and video stories from multiple channels.
Abstract: Most semantic video search methods use text-keyword queries or example video clips and images. But such methods have limitations. To address the problems of example-based video search approaches and avoid the use of specialized models, we conduct semantic video searches using a reranking method that automatically reorders the initial text search results based on visual cues and associated context. We developed two general reranking methods that explore the recurrent visual patterns in many contexts, such as the returned images or video shots from initial text queries, and video stories from multiple channels.

91 citations


Journal ArticleDOI
TL;DR: From a multimedia perspective the question arises what multimedia and Web 2.0 have in common, where the two fields meet, and how they can benefit each other.
Abstract: Web 2.0 is an area that's gained much attention recently, especially with Google's acquisition of YouTube. Given the strong focus on media in many Web 2.0 applications, from a multimedia perspective the question arises what multimedia (research) and Web 2.0 have in common, where the two fields meet, and how they can benefit each other

83 citations


Journal ArticleDOI
TL;DR: By using end hosts' huge bandwidth and computational capacity, peer-to-peer technologies shed new light on media streaming applications' development, yet, locating supplying peers and content delivery path maintenance are two major challenges.
Abstract: Large-scale multimedia streaming over the Internet requires an enormous amount of server and network resources Traditional client-server approaches allocate a dedicated stream from the server for each client request, which is expensive and doesn't scale well By using end hosts' huge bandwidth and computational capacity, peer-to-peer technologies shed new light on media streaming applications' development Yet, locating supplying peers and content delivery path maintenance are two major challenges in this area

76 citations


Journal ArticleDOI
TL;DR: Research addressing the next generation of mobility technology, which will deliver "the right experience in the right moment" is described, which focuses on rich interactive mobile experiences triggered by context information available from the users, their environment, and a wealth of context-enabled content.
Abstract: Here we describe research addressing the next generation of mobility technology, which will deliver "the right experience in the right moment." The maturing field of pervasive computing yields the technology and the challenges described here. We focus on rich interactive mobile experiences triggered by context information available from the users, their environment, and a wealth of context-enabled content. We call such applications mediascapes.

70 citations


Journal ArticleDOI
TL;DR: A force-field haptic rendering method for converting visual data to haptic data and how visually impaired people can learn to navigate with these maps, using off-the-shelf haptic algorithms.
Abstract: We introduce a force-field haptic rendering method for converting visual data to haptic data. The method includes a novel framework to convert specialized 3D map models into force fields. We generate the final force fields by using structure from motion and implicit surface approximation algorithms. Visually impaired people then can learn to navigate with these maps, using off-the-shelf haptic

62 citations


Journal ArticleDOI
TL;DR: By combining novel statistical modeling techniques and the WordNet ontology, this work offers a promising new approach to image search that uses automatic image tagging directly to perform retrieval.
Abstract: By combining novel statistical modeling techniques and the WordNet ontology, we offer a promising new approach to image search that uses automatic image tagging directly to perform retrieval.

62 citations


Journal ArticleDOI
TL;DR: Parallel-Horus, a support tool for applications in multimedia grid computing, lets users implement multimedia applications as sequential programs for efficient execution on clusters and grids, based on wide-area multimedia services.
Abstract: As the world uses more digital video that requires greater storage space, grid computing is becoming indispensable for urgent problems in multimedia content analysis. Parallel-Horus, a support tool for applications in multimedia grid computing, lets users implement multimedia applications as sequential programs for efficient execution on clusters and grids, based on wide-area multimedia services.

Journal ArticleDOI
TL;DR: A user interface to music repositories called nepTune creates a virtual landscape for an arbitrary collection of digital music files, letting users freely navigate the collection, and automatically extracting features from the audio signal and clustering the music pieces helps generate a 3D island landscape.
Abstract: A user interface to music repositories called nepTune creates a virtual landscape for an arbitrary collection of digital music files, letting users freely navigate the collection. Automatically extracting features from the audio signal and clustering the music pieces accomplish this. The clustering helps generate a 3D island landscape. The rapidly growing research field of music information retrieval is developing the technological foundations for a new generation of more intelligent music devices and services. Researchers are creating algorithms for audio and music analysis, studying methods for retrieving music-related information from the Internet, and investigating scenarios for using music-related information for novel types of computer-based music services. The range of applications for such technologies is broad - from automatic music recommendation services through personalized, adaptive radio stations, to novel types of intelligent, reactive musical devices and environments.

Journal ArticleDOI
TL;DR: The system enables an Internet-based interactive media service for e-learning and e-entertainment, allowing an explicitly formed group of clients to view and cooperatively control a shared remote media playback.
Abstract: We propose a system called Comodin, based on a content distribution network (CDN), that provides collaborative media playback services. Specifically, the system enables an Internet-based interactive media service for e-learning and e-entertainment, allowing an explicitly formed group of clients to view and cooperatively control a shared remote media playback

Journal ArticleDOI
TL;DR: The VC-1 standard offers a competitive quality-complexity tradeoff compared to H.264, especially for high-definition services, and can expect to seeVC-1 co-existing with H. 264 in the next generation of broadband and broadcast video services.
Abstract: VC-1 is a new video coding standard developed by Microsoft and standardized by the SMPTE. VC-1 is one of the three video compression algorithms standardized for high definition DVD. With high definition DVD players expecting to support MPEG-2, H.264, and VC-1, end users do not have to be concerned about the coding formats. The VC-1 standard offers a competitive quality-complexity tradeoff compared to H.264, especially for high-definition services. With a diverse digital video market, we can expect to see VC-1 co-existing with H.264 in the next generation of broadband and broadcast video services.

Journal ArticleDOI
TL;DR: A stereoscopic system based on a multispectral camera and an LCD projector uses mult ispectral information for 3D object reconstruction to give a physical representation of the matter independent from illuminant, observer, and acquisition devices.
Abstract: A stereoscopic system based on a multispectral camera and an LCD projector uses multispectral information for 3D object reconstruction. By linking 3D points to a curve representing the spectral reflectance, the system gives a physical representation of the matter that's independent from illuminant, observer, and acquisition devices

Journal ArticleDOI
TL;DR: A new way of enhancing the commercial value of sports video Webcasts with automatic machine detection techniques is proposed to avoid viewers' perceiving them as unnecessary clutter in the video.
Abstract: Sports content continues to generate a global appeal that transcends national, cultural, religious, and gender boundaries. Historically, TV technology's success has been intertwined with the development of televised sports. In the famous words of pioneering TV sports director Harry Coyle, "television got off the ground because of sports." Sports showcases offer a splendid platform to promote new media technologies. During the 2006 FIFA (Federation Internationale de Football Association) World Cup tournament in Germany, major mobile companies launched a plethora of mobile TV services with such offerings as video streaming, text-based services, ring-tone downloads, and mobile blogging proposes a new way of enhancing the commercial value of sports video Webcasts. Clearly, the amount of advertising exposures must be managed to avoid viewers' perceiving them as unnecessary clutter in the video. As with legacy video broadcasting systems staffed by human operators, this check can be easily managed manually. Automatic machine detection techniques, however, can further facilitate advertising content insertion

Journal ArticleDOI
TL;DR: A progressive-cut algorithm for background and foreground segmentation is presented, which outperforms existing graph-cut methods in both accuracy and speed, and it effectively removes the fluctuation effect, making results more controllable with fewer strokes.
Abstract: In this article, we presented a progressive-cut algorithm for background and foreground segmentation. We first analyzed the user intention behind the additional user-specified stroke, and then incorporated the user intention into the graph-cut framework: we derived an eroded graph to prevent overshrinkage, and added a user attention term to the energy function to compress overexpansion in low-interest areas. Experiments showed that the new algorithm outperforms existing graph-cut methods in both accuracy and speed, and it effectively removes the fluctuation effect, making results more controllable with fewer strokes.

Journal ArticleDOI
TL;DR: A multicamera system that uses a new calibration-free behavior recognition method for monitoring human activity at a subway station and a method of attention control that greatly reduced computation and increased classification accuracy is developed.
Abstract: We have developed a multicamera system, Digital City Surveillance, which uses a new calibration-free behavior recognition method for monitoring human activity at a subway station. We trained nine support vector machines from operator-classified data to recognize 512 combinations of events. Our method of attention control greatly reduced computation and increased classification accuracy

Journal ArticleDOI
TL;DR: The mixed-reality platform helps train surgeons in minimally invasive surgery and objectively assesses their performance and uses multicamera stereo inside a patient manikin to measure the 3D positions of unmodified surgical instruments.
Abstract: Our mixed-reality platform helps train surgeons in minimally invasive surgery and objectively assesses their performance. The platform uses multicamera stereo inside a patient manikin to measure the 3D positions of unmodified surgical instruments. It uses this information to drive a mixed-reality, computer-mediated learning system and provide objective measures of surgical skill.

Journal ArticleDOI
TL;DR: The article describes the development of new enhancements to Jabber that provide a set of multimedia extensions called Jingle that provide an open-standards approach for IM.
Abstract: Instant messaging is a powerful tool for real-time communication and online collaboration. IM's early focus was text communication, but with the ubiquity of multimedia-enabled devices, there has been great interest in extending IM to support multimedia interactions. Jabber is an open-standards-based approach for IM. The article describes the development of new enhancements to Jabber that provide a set of multimedia extensions called Jingle

Journal ArticleDOI
TL;DR: Biofeedback as a form of physiological human-computer interaction and interactive art offers a wide scope for future creative development, with obvious potential for applications beyond the art gallery, including therapeutic and health-promotion applications.
Abstract: Biofeedback as a form of physiological human-computer interaction and interactive art offers a wide scope for future creative development, with obvious potential for applications beyond the art gallery, including therapeutic and health-promotion applications. Participation in this arena requires an appreciation of the full interactive experience, since these systems engage the body not simply as a data set but as a living subject, experiencing and responding to the world according to the person's felt sense, motivations, and curiosities.

Journal ArticleDOI
TL;DR: A natural language processing-based interface is presented that lets users formulate queries in English and discuss the advantage of using such an interface.
Abstract: The authors developed a video database system called BilVideo that provides integrated support for spatiotemporal, semantic, and low-level feature queries. As a further development for this system, the authors present a natural language processing-based interface that lets users formulate queries in English and discuss the advantage of using such an interface

Journal ArticleDOI
TL;DR: The Tiling Slideshow system automatically organizes consumer photos and provides a novel audiovisual presentation that provides tighter audiov isual coordination, and offers a more lively viewing experience.
Abstract: The Tiling Slideshow system automatically organizes consumer photos and provides a novel audiovisual presentation. Displaying at the same pace as user-selected music, photos are elaborately manipulated and displayed to mold a novel browsing experience. In contrast to conventional photo slideshows, the proposed presentation provides tighter audiovisual coordination, and offers a more lively viewing experience.

Journal ArticleDOI
TL;DR: This paper presents a recently developed fourth standard that meets the China's unique needs and is very willing to collaborate with people from academic, engineering, and industrial groups to improve the DTTB standard in the future.
Abstract: This paper presents a recently developed fourth standard that meets the China's unique needs. In China, the proportion of television viewers using terrestrial reception is pretty high. Other characteristics of China's TV audience made it necessary to develop a new digital terrestrial television broadcasting standard that provides high-quality multimedia service as well. Field trials to date have demonstrated DTTB's good performance, providing a solid foundation for building on these applications. We're very willing to collaborate with people from academic, engineering, and industrial groups to improve the DTTB standard in the future.

Journal ArticleDOI
TL;DR: The channel set adaptation (CSA) framework lets clients request custom data flows for interactive applications using standard broadcast or multicast join and leave operations, and scales to support large user groups while providing interactive data access to clients.
Abstract: Streaming linear media objects has become ubiquitous on today's Internet. Interactive, nonlinear media objects, such as large 3D models and visualization databases, have proven difficult to stream. The channel set adaptation (CSA) framework lets clients request custom data flows for interactive applications using standard broadcast or multicast join and leave operations. CSA scales to support large user groups while providing interactive data access to clients.

Journal ArticleDOI
TL;DR: BSDL abstracts the minutiae of bitstream parsing out of software code, into an interoperable data file (the BSDL schema), allowing developers to concentrate on the functionality of their particular application.
Abstract: Developing the code to parse and generate multimedia bitstreams has traditionally been a repetitive and error-prone task. It has also been an area of application development that defied the goal of software reuse. In contrast, BSDL abstracts the minutiae of bitstream parsing out of software code, into an interoperable data file (the BSDL schema), allowing developers to concentrate on the functionality of their particular application. BSDL's approach has demonstrated applications at numerous points in the multimedia delivery chain. In the future, this approach may be extended to still other processing tasks, such as transcoding and transmoding, or to types of binary data other than multimedia

Journal ArticleDOI
TL;DR: This work presents conTACT, a new concept for assisting in surgical interventions via 3D multimodal computer-navigated surgery that explicitly processes and uses information about the surgeon to augment human-machine interaction.
Abstract: We present conTACT, a new concept for assisting in surgical interventions via 3D multimodal computer-navigated surgery. In contrast to conventional computer-navigated surgery, our system explicitly processes and uses information about the surgeon to augment human-machine interaction. We also evaluate humans' neuroscientific behavior and show how we can transfer 3D navigation information via tactile signals delivered to the hand surface.

Journal ArticleDOI
TL;DR: This work proposes a scalable forward-error-correction-based, multiple-description coding packetization scheme to achieve optimal performance for each client in an overlay network with an application-layer multicast.
Abstract: We consider the problem of distributing video data from one sender to a population of interested clients. Deploying an overlay network with an application-layer multicast, we propose a scalable forward-error-correction-based, multiple-description coding packetization scheme to achieve optimal performance for each client

Journal ArticleDOI
TL;DR: The study and findings on psychophysical experiments regarding human abilities to perceive a digital watermark, or hidden signal, through a haptic interface are presented.
Abstract: The growing interest in haptic applications suggests that haptic digital media will soon become widely available, and the need will arise to protect digital haptic data from misuse. In this article, we present our study and findings on psychophysical experiments regarding human abilities to perceive a digital watermark, or hidden signal, through a haptic interface.

Journal ArticleDOI
TL;DR: This three-part series describes how a variety of methods adapted from computer vision, image analysis, and pattern recognition can be applied to visual arts and help answer questions in art history.
Abstract: This three-part series describes how a variety of methods adapted from computer vision, image analysis, and pattern recognition can be applied to visual arts and help answer questions in art history. In this final installment, David Stork discusses how shapes can be described. He outlines the challenges in quantifying shape and form analysis, and describes how techniques to compare shapes may be used to compare paintings to, for example, determine how a copy was made