Showing papers in &quot;IEEE Transactions on Multimedia in 2003&quot;

Structural digital signature for image authentication: an incidental distortion resistant scheme

TL;DR: Circular interpretation of bijective transformations is proposed to implement a method that fulfills all quality and functionality requirements of lossless watermarking.

...read moreread less

Abstract: The need for reversible or lossless watermarking methods has been highlighted in the literature to associate subliminal management information with losslessly processed media and to enable their authentication. The paper first analyzes the specificity and the application scope of lossless watermarking methods. It explains why early attempts to achieve reversibility are not satisfactory. They are restricted to well-chosen images, strictly lossless context and/or suffer from annoying visual artifacts. Circular interpretation of bijective transformations is proposed to implement a method that fulfills all quality and functionality requirements of lossless watermarking. Results of several bench tests demonstrate the validity of the approach.

...read moreread less

438 citations

Journal Article•DOI•

[...]

Chun-Shien Lu, Hong-Yuan Mark Liao

Efficient frequency domain selective scrambling of digital video

TL;DR: A new digital signature scheme which makes use of an image's contents (in the wavelet transform domain) to construct a structural digital signature (SDS) for image authentication, which can tolerate content-preserving modifications while detecting content-changing modifications.

...read moreread less

Abstract: The existing digital data verification methods are able to detect regions that have been tampered with, but are too fragile to resist incidental manipulations. This paper proposes a new digital signature scheme which makes use of an image's contents (in the wavelet transform domain) to construct a structural digital signature (SDS) for image authentication. The characteristic of the SDS is that it can tolerate content-preserving modifications while detecting content-changing modifications. Many incidental manipulations, which were detected as malicious modifications in the previous digital signature verification or fragile watermarking schemes, can be bypassed in the proposed scheme. Performance analysis is conducted and experimental results show that the new scheme is indeed superb for image authentication.

...read moreread less

387 citations

Journal Article•DOI•

[...]

Wenjun Zeng, S. Lei

DCT-based image watermarking using subsampling

TL;DR: A joint encryption and compression framework in which video data are scrambled efficiently in the frequency domain by employing selective bit scrambling, block shuffling and block rotation of the transform coefficients and motion vectors is presented.

...read moreread less

Abstract: Multimedia data security is very important for multimedia commerce on the Internet such as video-on-demand and real-time video multicast. Traditional cryptographic algorithms/systems for data security are often not fast enough to process the vast amount of data generated by multimedia applications to meet real-time constraints. This paper presents a joint encryption and compression framework in which video data are scrambled efficiently in the frequency domain by employing selective bit scrambling, block shuffling and block rotation of the transform coefficients and motion vectors. The new approach is very simple to implement, yet provides considerable levels of security and different levels of transparency, and has a very limited adverse impact on compression efficiency and no adverse impact on error resiliency. Furthermore, it allows transcodability/scalability, and other content processing functionalities without having to access the cryptographic key and perform decryption and re-encryption.

...read moreread less

375 citations

Journal Article•DOI•

[...]

W.C. Chu¹•Institutions (1)

NTT DoCoMo¹

Adaptive playout scheduling and loss concealment for voice communication over IP networks

TL;DR: A DCT-based image watermarking algorithm is described, where the original image is not required for watermark recovery, and is achieved by inserting the watermark in subimages obtained through subsampling.

...read moreread less

Abstract: A DCT-based image watermarking algorithm is described, where the original image is not required for watermark recovery, and is achieved by inserting the watermark in subimages obtained through subsampling.

...read moreread less

303 citations

Journal Article•DOI•

[...]

Yi Liang¹, N. Farber¹, Bernd Girod¹•Institutions (1)

Stanford University¹

A quick search method for audio and video signals based on histogram pruning

TL;DR: A new receiver-based playout scheduling scheme is proposed to improve the tradeoff between buffering delay and late loss for real-time voice communication over IP networks and the overall audio quality is investigated based on subjective listening tests.

...read moreread less

Abstract: The quality of service limitation of today's Internet is a major challenge for real-time voice communications. Excessive delay, packet loss, and high delay jitter all impair the communication quality. A new receiver-based playout scheduling scheme is proposed to improve the tradeoff between buffering delay and late loss for real-time voice communication over IP networks. In this scheme the network delay is estimated from past statistics and the playout time of the voice packets is adaptively adjusted. In contrast to previous work, the adjustment is not only performed between talkspurts, but also within talkspurts in a highly dynamic way. Proper reconstruction of continuous playout speech is achieved by scaling individual voice packets using a time-scale modification technique based on the Waveform Similarity Overlap-Add (WSOLA) algorithm. Results of subjective listening tests show that this operation does not impair audio quality, since the adaptation process requires infrequent scaling of the voice packets and low playout jitter is perceptually tolerable. The same time-scale modification technique is also used to conceal packet loss at very low delay, i.e., one packet time. Simulation results based on Internet measurements show that the tradeoff between buffering delay and late loss can be improved significantly. The overall audio quality is investigated based on subjective listening tests, showing typical gains of 1 on a 5-point scale of the Mean Opinion Score.

...read moreread less

184 citations

Journal Article•DOI•

[...]

Kunio Kashino¹, T. Kurozumi¹, Hiroshi Murase•Institutions (1)

NTT Communications Corp¹

Automated event clustering and quality screening of consumer pictures for digital albuming

TL;DR: The proposed algorithm, which is referred to as time-series active search, offers significantly faster search with sufficient accuracy and the key to the acceleration is an effective pruning algorithm introduced in the histogram matching stage.

...read moreread less

Abstract: This paper proposes a quick method of similarity-based signal searching to detect and locate a specific audio or video signal given as a query in a stored long audio or video signal. With existing techniques, similarity-based searching may become impractical in terms of computing time in the case of searching through long-running (several-days' worth of) signals. The proposed algorithm, which is referred to as time-series active search, offers significantly faster search with sufficient accuracy. The key to the acceleration is an effective pruning algorithm introduced in the histogram matching stage. Through the pruning, the actual number of matching calculations can be reduced by 200 to 500 times compared with exhaustive search while guaranteeing exactly the same search result. Experiments show that the proposed method can correctly detect and locate a 15-s signal in a 48-h recording of TV broadcasts within 1 s, once the feature vectors are calculated and quantized. As extentions of the basic algorithm, efficient AND/OR search methods for searching for multiple query signals and a feature dithering method for coping with signal distortion are also discussed.

...read moreread less

178 citations

Journal Article•DOI•

[...]

Alexander C. Loui¹, Andreas Savakis²•Institutions (2)

Eastman Kodak Company¹, Rochester Institute of Technology²

Statistical sequential analysis for real-time video scene change detection on compressed multimedia bitstream

TL;DR: In this article, two core algorithms namely event clustering and screening of low-quality images are introduced and their performance is evaluated, and two algorithms for automatic albuming of consumer photographs are described.

...read moreread less

Abstract: In this paper, algorithms for automatic albuming of consumer photographs are described. Specifically, two core algorithms namely event clustering and screening of low-quality images, are introduced and their performance is evaluated. Event clustering and image quality screening have many applications including albuming services, image management and organization, and digital photofinishing. These are difficult tasks because there is, in general, none (or very limited) contextual information about picture content, and the final interpretation could be subjective. A novel event-clustering algorithm is created to automatically segment pictures into events and subevents for albuming, based on date/time metadata information, as well as color content of the pictures. A block-based color histogram correlation technique is developed for image content comparison of general consumer pictures. A new quality-screening algorithm is developed based on object quality measures, to detect problematic images caused by underexposure, low contrast, and camera defocus or movement.

...read moreread less

143 citations

Journal Article•DOI•

[...]

D. Lelescu¹, Dan Schonfeld²•Institutions (2)

NTT DoCoMo¹, University of Illinois at Chicago²

CyberWalk: a web-based distributed virtual walkthrough environment

TL;DR: A novel one-pass, real-time approach to video scene change detection based on statistical sequential analysis and operating on a compressed multimedia bitstream is proposed.

...read moreread less

Abstract: The increased availability and usage of multimedia information have created a critical need for efficient multimedia processing algorithms. These algorithms must offer capabilities related to browsing, indexing, and retrieval of relevant data. A crucial step in multimedia processing is that of reliable video segmentation into visually coherent video shots through scene change detection. Video segmentation enables subsequent processing operations on video shots, such as video indexing, semantic representation, or tracking of selected video information. Since video sequences generally contain both abrupt and gradual scene changes, video segmentation algorithms must be able to detect a large variety of changes. While existing algorithms perform relatively well for detecting abrupt transitions (video cuts), reliable detection of gradual changes is much more difficult. A novel one-pass, real-time approach to video scene change detection based on statistical sequential analysis and operating on a compressed multimedia bitstream is proposed. Our approach models video sequences as stochastic processes, with scene changes being reflected by changes in the characteristics (parameters) of the process. Statistical sequential analysis is used to provide an unified framework for the detection of both abrupt and gradual scene changes.

...read moreread less

115 citations

Journal Article•DOI•

[...]

J. Chim, Rynson W. H. Lau¹, Hong Va Leong², Antonio Si³•Institutions (3)

City University of Hong Kong¹, Hong Kong Polytechnic University², Oracle Corporation³

Two-stage hierarchical video summary extraction to match low-level user browsing preferences

TL;DR: The effectiveness of the multiresolution caching mechanism of CyberWalk in supporting virtual walkthrough applications in the Internet environment is investigated through numerous experiments, both on the simulation system and on the prototype system.

...read moreread less

Abstract: A distributed virtual walkthrough environment allows users connected to the geometry server to walk through a specific place of interest, without having to travel physically. This place of interest may be a virtual museum, virtual library or virtual university. There are two basic approaches to distribute the virtual environment from the geometry server to the clients, complete replication and on-demand transmission. Although the on-demand transmission approach saves waiting time and optimizes network usage, many technical issues need to be addressed in order for the system to be interactive. CyberWalk is a web-based distributed virtual walkthrough system developed based on the on-demand transmission approach. It achieves the necessary performance with a multiresolution caching mechanism. First, it reduces the model transmission and rendering times by employing a progressive multiresolution modeling technique. Second, it reduces the Internet response time by providing a caching and prefetching mechanism. Third, it allows a client to continue to operate, at least partially, when the Internet is disconnected. The caching mechanism of CyberWalk tries to maintain at least a minimum resolution of the object models in order to provide at least a coarse view of the objects to the viewer. All these features allow CyberWalk to provide sufficient interactivity to the user for virtual walkthrough over the Internet environment. In this paper, we demonstrate the design and implementation of CyberWalk. We investigate the effectiveness of the multiresolution caching mechanism of CyberWalk in supporting virtual walkthrough applications in the Internet environment through numerous experiments, both on the simulation system and on the prototype system.

...read moreread less

113 citations

Journal Article•DOI•

[...]

Ahmet Mufit Ferman, A.M. Tekalp

Classifying color edges in video into shadow-geometry, highlight, or material transitions

TL;DR: A two-stage framework to generate MPEG-7-compliant hierarchical key frame summaries of video sequences by reducing the number of key frames to match the low-level browsing preferences of a user is proposed.

...read moreread less

Abstract: A compact summary of video that conveys visual content at various levels of detail enhances user interaction significantly. In this paper, we propose a two-stage framework to generate MPEG-7-compliant hierarchical key frame summaries of video sequences. At the first stage, which is carried out off-line at the time of content production, fuzzy clustering and data pruning methods are applied to given video segments to obtain a nonredundant set of key frames that comprise the finest level of the hierarchical summary. The number of key frames allocated to each shot or segment is determined dynamically and without user supervision through the use of cluster validation techniques. A coarser summary is generated on-demand in the second stage by reducing the number of key frames to match the low-level browsing preferences of a user. The proposed method has been validated by experimental results on a collection of video programs.

...read moreread less

107 citations

Journal Article•DOI•

[...]

Theo Gevers¹, H. Stokman¹•Institutions (1)

University of Amsterdam¹

09 Jul 2003-IEEE Transactions on Multimedia

TL;DR: A novel approach to color edge detection by automatic noise-adaptive thresholding derived from sensor noise analysis is proposed and a taxonomy on color edge types is presented, obtaining a parameter-free edge classifier.

...read moreread less

Abstract: We aim at using color information to classify the physical nature of edges in video. To achieve physics-based edge classification, we first propose a novel approach to color edge detection by automatic noise-adaptive thresholding derived from sensor noise analysis. Then, we present a taxonomy on color edge types. As a result, a parameter-free edge classifier is obtained labeling color transitions into one of the following types: 1) shadow-geometry, 2) highlight edges, and 3) material edges. The proposed method is empirically verified on images showing complex real world scenes.

...read moreread less

Journal Article•DOI•

Grounded spoken language acquisition: experiments in word learning

[...]

Deb Roy¹•Institutions (1)

Massachusetts Institute of Technology¹

Spatial color descriptor for image retrieval and video segmentation

TL;DR: Inspired by theories of infant cognition, this work presents a computational model which learns words from untranscribed acoustic and video input which is implemented in a real-time robotic system which performs interactive language learning and understanding.

...read moreread less

Abstract: Language is grounded in sensory-motor experience. Grounding connects concepts to the physical world enabling humans to acquire and use words and sentences in context. Currently most machines which process language are not grounded. Instead, semantic representations are abstract, pre-specified, and have meaning only when interpreted by humans. We are interested in developing computational systems which represent words, utterances, and underlying concepts in terms of sensory-motor experiences leading to richer levels of machine understanding. A key element of this work is the development of effective architectures for processing multisensory data. Inspired by theories of infant cognition, we present a computational model which learns words from untranscribed acoustic and video input. Channels of input derived from different sensors are integrated in an information-theoretic framework. Acquired words are represented in terms of associations between acoustic and visual sensory experience. The model has been implemented in a real-time robotic system which performs interactive language learning and understanding. Successful learning has also been demonstrated using infant-directed speech and images.

...read moreread less

Journal Article•DOI•

[...]

Ho-Young Lee¹, Ho Keun Lee¹, Yeong-Ho Ha¹•Institutions (1)

Kyungpook National University¹

Key management and distribution for secure multimedia multicast

TL;DR: Experimental results show that the proposed color descriptor could produce a high image retrieval rate and accurately detect abrupt scene-cuts in a video analysis and the storage space required for the image histogram values can be effectively reduced.

...read moreread less

Abstract: An important problem in color-based image retrieval and video segmentation is to lack information about how color is spatially distributed. To solve this problem and enhance the performance of image and video analyses, a spatial color descriptor is proposed involving a color adjacency histogram and color vector angle histogram. The color adjacency histogram represents the spatial distribution of color pairs at color edges in an image, thereby incorporating spatial information into the proposed color descriptor. Meanwhile, the color vector angle histogram represents the global color distribution of smooth pixels in an image. Since the proposed color descriptor includes spatial adjacency information between colors, it can robustly reduce the effect of a significant change in appearance and shape in image and video analyses. Moreover, since the color adjacency histogram is simply represented by binary streams, the storage space required for the image histogram values can be effectively reduced. Experimental results show that even with significant appearance changes, the proposed color descriptor could produce a high image retrieval rate and accurately detect abrupt scene-cuts in a video analysis.

...read moreread less

Journal Article•DOI•

[...]

Wade Trappe¹, Jie Song², Raadhakrishnan Poovendran³, K.J.R. Liu⁴•Institutions (4)

Rutgers University¹, Agere Systems², University of Washington³, University of Maryland, College Park⁴

Joint semantics and feature based image retrieval using relevance feedback

TL;DR: A new message format that is suitable for multicast key management schemes, and the feasibility of embedding rekeying messages using a data embedding method that has been recently proposed for fractional-pel video coding standards such as H.263 and MPEG-2.

...read moreread less

Abstract: The problem of controlling access to multimedia multicasts requires the distribution and maintenance of keying information. Typically, the problem of key management is considered separately from the problem of distributing the rekeying messages. Multimedia sources provide two approaches to distributing the rekeying messages associated with securing group communication. The first, and more conventional approach employs the use of a media-independent channel to convey rekeying messages. We propose, however, a second approach that involves the use of a media-dependent channel, and is achieved for multimedia by using data embedding techniques. Compared to a media-independent channel, the use of data embedding to convey rekeying messages provides enhanced security by masking the presence of rekeying operations. This covert communication makes it difficult for an adversary to gather information regarding the group membership and its dynamics. In addition to proposing a new mode of conveyance for the rekeying messages, we introduce a new message format that is suitable for multicast key management schemes. This new message format uses one-way functions to securely distribute new key material to subgroups of users. An advantage of this approach over the traditional message format is that no additional messages must be sent to flag the users which portion of the message is intended for them, thereby reducing communication overhead. We then show how to map the message to a tree structure in order to achieve desirable scalability in communication and computational overhead. Next, as an example of the interplay between the key management scheme and the mode of conveyance, we study the feasibility of embedding rekeying messages using a data embedding method that has been recently proposed for fractional-pel video coding standards such as H.263 and MPEG-2. Finally, since multimedia services will involve multiple layers or objects, we extend the tree-based key management schemes to include new operations needed to handle multilayer multimedia applications where group members may subscribe or cancel membership to some layers while maintaining membership to other layers.

...read moreread less

Journal Article•DOI•

[...]

Ye Lu¹, Hong-Jiang Zhang², Liu Wenyin³, Chunhui Hu²•Institutions (3)

Simon Fraser University¹, Microsoft², City University of Hong Kong³

A new approach to automatic speech summarization

TL;DR: By forming a semantic network on top of the keyword association on the images, this work is able to accurately deduce and utilize the images' semantic contents for retrieval purposes and proposes a ranking measure that is suitable for this framework.

...read moreread less

Abstract: Relevance feedback is a powerful technique for image retrieval and has been an active research direction for the past few years. Various ad hoc parameter estimation techniques have been proposed for relevance feedback. In addition, methods that perform optimization on multilevel image content model have been formulated. However, these methods only perform relevance feedback on low-level image features and fail to address the images' semantic content. In this paper, we propose a relevance feedback framework to take advantage of the semantic contents of images in addition to low-level features. By forming a semantic network on top of the keyword association on the images, we are able to accurately deduce and utilize the images' semantic contents for retrieval purposes. We also propose a ranking measure that is suitable for our framework. The accuracy and effectiveness of our method is demonstrated with experimental results on real-world image collections.

...read moreread less

Journal Article•DOI•

[...]

Chiori Hori¹, Sadaoki Furui¹•Institutions (1)

Tokyo Institute of Technology¹

Weighted walkthroughs between extended entities for retrieval by spatial arrangement

TL;DR: Experimental results show that the proposed method effectively extracts relatively important information by removing redundant and irrelevant information.

...read moreread less

Abstract: This paper proposes a new automatic speech summarization method. In this method, a set of words maximizing a summarization score is extracted from automatically transcribed speech. This extraction is performed according to a target compression ratio using a dynamic programming (DP) technique. The extracted set of words is then connected to build a summarization sentence. The summarization score consists of a word significance measure, a confidence measure, linguistic likelihood, and a word concatenation probability. The word concatenation score is determined by a dependency structure in the original speech given by stochastic dependency context free grammar (SDCFG). Japanese broadcast news speech transcribed using a large-vocabulary continuous-speech recognition (LVCSR) system is summarized using our proposed method and compared with manual summarization by human subjects. The manual summarization results are combined to build a word network. This word network is used to calculate the word accuracy of each automatic summarization result using the most similar word string in the network. Experimental results show that the proposed method effectively extracts relatively important information by removing redundant and irrelevant information.

...read moreread less

Journal Article•DOI•

[...]

Stefano Berretti, A. Del Bimbo, Enrico Vicario

An omnidirectional stroll-based virtual reality interface and its application on overhead crane training

TL;DR: An original framework is presented which supports quantitative nonsymbolic representation and comparison of the mutual positioning of extended nonrectangular spatial entities and properties of the model are expounded to develop an efficient computation technique and to motivate and assess a metric of similarity for quantitative comparison of spatial relationships.

...read moreread less

Abstract: In the access to image databases, queries based on the appearing visual features of searched data reduce the gap between the user and the engineering representation. To support this access modality, image content can be modeled in terms of different types of features such as shape, texture, color, and spatial arrangement. An original framework is presented which supports quantitative nonsymbolic representation and comparison of the mutual positioning of extended nonrectangular spatial entities. Properties of the model are expounded to develop an efficient computation technique and to motivate and assess a metric of similarity for quantitative comparison of spatial relationships. Representation and comparison of binary relationships between entities is then embedded into a graph-theoretical framework supporting representation and comparison of the spatial arrangements of a picture. Two prototype applications are described.

...read moreread less

Journal Article•DOI•

[...]

Jiung-Yao Huang¹•Institutions (1)

Tamkang University¹

Appearance-based virtual view generation from multicamera videos captured in the 3-D room

TL;DR: A locomotion mechanism called omni-directional ball-bearing disc platform (OBDP), which allows the user to walk naturally on it and thus to navigate the virtual environment and the gait sensing algorithm that simulates the user's posture based upon his footstep data collected from the OBDP is presented.

...read moreread less

Abstract: Locomotion is a virtual reality interface that enables the user to walk inside the virtual environment in any direction over a long distance without actually leaving the physical device. In order to enable the user to freely navigate the virtual world and get fully immersed into the virtual environment accordingly, a locomotion device must fulfill the following two distinct requirements. First, it should allow the user to navigate an infinite distance within a limited area. Secondly, the user should not need to wear any tracking devices to detect his motion. The paper presents a locomotion mechanism called omni-directional ball-bearing disc platform (OBDP), which allows the user to walk naturally on it and thus to navigate the virtual environment. The gait sensing algorithm that simulates the user's posture based upon his footstep data collected from the OBDP is then elaborated. Followed with an omnidirectional stroll-based virtual reality system to integrate the OBDP with the gait sensing algorithm. Significantly, instead of using the three-dimensional (3-D) tracker, the OBDP adopts arrays of ball-bearing sensors on a disc to detect the pace. No other sensor, except the head tracker to detect the user's head rotation, is required on the user's body. Finally, a prototype of an overhead crane training simulator that fully explores the advantage of the OBDP is presented along with the verification of the effectiveness of the presented gait sensing algorithm.

...read moreread less

Journal Article•DOI•

[...]

Hideo Saito¹, S. Baba, Takeo Kanade²•Institutions (2)

Keio University¹, Carnegie Mellon University²

Routing multimedia traffic with QoS guarantees

TL;DR: An appearance-based virtual view generation method that allows viewers to fly through a real dynamic scene by interpolating two original camera-views near the given viewpoint and extracting much more reliable and comprehensive 3D geometry of the scene as a 3D model.

...read moreread less

Abstract: We present an appearance-based virtual view generation method that allows viewers to fly through a real dynamic scene. The scene is captured by multiple synchronized cameras. Arbitrary views are generated by interpolating two original camera-views near the given viewpoint. The quality of the generated synthetic view is determined by the precision, consistency and density of correspondences between the two images. All or most of previous work that uses interpolation extracts the correspondences from these two images. However, not only is it difficult to do so reliably (the task requires a good stereo algorithm), but also the two images alone sometimes do not have enough information, due to problems such as occlusion. Instead, we take advantage of the fact that we have many views, from which we can extract much more reliable and comprehensive 3D geometry of the scene as a 3D model. Dense and precise correspondences between the two images, to be used for interpolation, are obtained using this constructed 3D model.

...read moreread less

Journal Article•DOI•

[...]

Turgay Korkmaz¹, Marwan Krunz²•Institutions (2)

University of Texas at San Antonio¹, University of Arizona²

VSculpt : a distributed virtual sculpting environment for collaborative design

TL;DR: The efficiency of H/spl I.bar/MCOP over its (less general) contenders in terms of finding feasible paths and minimizing their costs under the same level of computational complexity is shown.

...read moreread less

Abstract: One of the challenging issues in exchanging multimedia information over a network is how to determine a feasible path that satisfies all the quality-of-service (QoS) requirements of multimedia applications while maintaining high utilization of network resources. The latter objective implies the need to impose an additional optimality requirement on the feasibility problem. This can be done through a primary cost function (e.g., administrative weight, hop-count) according to which the selected feasible path is optimal. In general, multiconstrained path selection, with or without optimization, is an NP-complete problem that cannot be exactly solved in polynomial time. Heuristics and approximation algorithms with polynomial- and pseudo-polynomial-time complexities are often used to deal with this problem. However, existing solutions suffer either from excessive computational complexities that cannot be used for online network operation or from low performance. Moreover, they only deal with special cases of the problem (e.g., two constraints without optimization, one constraint with optimization, etc.). For the feasibility problem under multiple constraints, some researchers have recently proposed a nonlinear cost function whose minimization provides a continuous spectrum of solutions ranging from a generalized linear approximation (GLA) to an asymptotically exact solution. In this paper, we propose an efficient heuristic algorithm for the most general form of the problem. We first formalize the theoretical properties of the above nonlinear cost function. We then introduce our heuristic algorithm (H/spl I.bar/MCOP), which attempts to minimize both the nonlinear cost function (for the feasibility part) and the primary cost function (for the optimality part). We prove that H/spl I.bar/MCOP guarantees at least the performance of GLA and often improves upon it. H/spl I.bar/MCOP has the same order of complexity as Dijkstra's algorithm. Using extensive simulations on random graphs and realistic network topologies with correlated and uncorrelated link weights from several distributions including uniform, normal, and exponential, we show the efficiency of H/spl I.bar/MCOP over its (less general) contenders in terms of finding feasible paths and minimizing their costs under the same level of computational complexity.

...read moreread less

Journal Article•DOI•

[...]

Frederick W. B. Li¹, Rynson W. H. Lau¹, F.F.C. Ng¹•Institutions (1)

City University of Hong Kong¹

Decision trees for error concealment in video decoding

TL;DR: This work proposes a collaborative virtual sculpting framework, called VSculpt, that provides a real-time intuitive environment for collaborative design and addresses issues on efficient rendering and transmission of deformable objects, intuitive object deformation using the CyberGlove and concurrent object deformed by multiple clients.

...read moreread less

Abstract: A collaborative virtual sculpting system supports a team of geographically separated designers/engineers connected by networks to participate in designing three-dimensional (3D) virtual engineering tools or sculptures. It encourages international collaboration at a minimal cost. However, in order for the system to be useful, two factors need to be addressed: intuitiveness and real-time interaction. Although a lot of effort has been put into developing virtual sculpting environments, only limited work addresses collaborative virtual sculpting. This is because in order to support real-time collaborative virtual sculpting, many challenging issues need to be addressed. We propose a collaborative virtual sculpting framework, called VSculpt. Through adapting some techniques we developed earlier and integrating them with some techniques developed here, the proposed framework provides a real-time intuitive environment for collaborative design. In particular, it addresses issues on efficient rendering and transmission of deformable objects, intuitive object deformation using the CyberGlove and concurrent object deformation by multiple clients. We demonstrate and evaluate the performance of the proposed framework through a number of experiments.

...read moreread less

Journal Article•DOI•

[...]

Song Cen¹, Pamela C. Cosman¹•Institutions (1)

University of California, Los Angeles¹

On interactive browsing of large images

TL;DR: It is shown how the use of a decision tree that can adaptively choose among several different error concealment methods can outperform each single method.

...read moreread less

Abstract: When macro-blocks are lost in a video decoder such as MPEG-2, the decoder can try to conceal the error by estimating or interpolating the missing area. Many different methods for this type of post-processing concealment have been proposed, operating in the spatial, frequency, or temporal domains, or some hybrid combination of them. In this paper, we show how the use of a decision tree that can adaptively choose among several different error concealment methods can outperform each single method. We also propose two promising new methods for temporal error concealment.

...read moreread less

Journal Article•DOI•

[...]

Jin Li, Hong-Hui Sun¹•Institutions (1)

Microsoft¹

A resource management strategy in wireless multimedia communications-total power saving in mobile terminals with a guaranteed QoS

TL;DR: A new effective mechanism is proposed for the browsing of large compressed images over the Internet where the user specifies a region of interest (ROI) with certain spatial and resolution constraint and the browser only downloads the portion of the compressed bitstream that covers the current ROI.

...read moreread less

Abstract: A new effective mechanism is proposed for the browsing of large compressed images over the Internet. The image is compressed with the JPEG 2000 into one single bitstream and put on the server. During the browsing process, the user specifies a region of interest (ROI) with certain spatial and resolution constraint. The browser only downloads the portion of the compressed bitstream that covers the current ROI, and the download is performed in a progressive fashion so that a coarse view of the ROI can be rendered very quickly and then gradually refined as more and more bitstream arrives. In the case of the switch of ROI, e.g., zooming in/out or panning around, the browser uses existing compressed bitstream in cache to quickly render a coarse view of the new ROI, and in the same time, request a new set of compressed bitstream corresponding to the updated view. The system greatly improves the experience of browsing large images over the slow networks.

...read moreread less

Journal Article•DOI•

[...]

Tse-Hua Lan¹, A.H. Tewfik¹•Institutions (1)

University of Minnesota¹

Content-trajectory approach for searching video databases

TL;DR: Two technologies consist of an energy-efficient communication protocol for the uplink channel and a low-complexity multirate transmission scheme that can make a wireless multimedia communication system more energy- efficient while ensuring QoS.

...read moreread less

Abstract: The integration of multimedia services into wireless communication networks is a major source of future technological advances. One of the main challenging issues in this endeavor is the resource optimization strategy. This paper addresses this issue from the perspective of minimizing the total power consumption of a mobile terminal while maintaining a guaranteed quality-of-service (QoS). For many years, the management strategy has dealt primarily with bandwidth allocation, network capacity, and QoS. However, due to the integration of multimedia services, the increasing energy consumption of a mobile unit is also becoming a dominant factor in the design of communication systems. In this paper, we describe two technologies that can make a wireless multimedia communication system more energy-efficient while ensuring QoS. These technologies consist of an energy-efficient communication protocol for the uplink channel and a low-complexity multirate transmission scheme. We also provide a video transmission example using the H.263 standard in the proposed system to demonstrate the importance of our total power optimization strategy. The simulation results show that a savings of 10-32% is achieved in the total energy consumption of the mobile unit.

...read moreread less

Journal Article•DOI•

[...]

Zaher Al Aghbari¹, Kunihiko Kaneko², Akifumi Makinouchi²•Institutions (2)

University of Sharjah¹, Kyushu University²

Model-based segmentation and tracking of head-and-shoulder video objects for real time multimedia services

TL;DR: A hierarchal approach to model videos at three levels, object level (OL), frame level (FL), and shot level (SL), and a novel query interface that allows users to describe the time-varying contents of complex video shots by sketch and feature specification is presented.

...read moreread less

Abstract: In the past few years, modeling and querying video databases have been a subject of extensive research to develop tools for effective search of videos. In this paper, we present a hierarchal approach to model videos at three levels, object level (OL), frame level (FL), and shot level (SL). The model captures the visual features of individual objects at OL, visual-spatio-temporal (VST) relationships between objects at FL, and time-varying visual features and time-varying VST relationships at SL. We call the combination of the time-varying visual features and the time-varying VST relationships a Content trajectory which is used to represent and index a shot. A novel query interface that allows users to describe the time-varying contents of complex video shots such as those of skiers, soccer players, etc., by sketch and feature specification is presented. Our experimental results prove the effectiveness of modeling and querying shots using the content trajectory approach.

...read moreread less

Journal Article•DOI•

[...]

Huitao Luo¹, A. Eleftheriadis•Institutions (1)

Hewlett-Packard¹

HMM delay prediction technique for VoIP

TL;DR: A statistical model-based video segmentation algorithm is presented for head-and-shoulder type video that runs in real time over a QCIF size video, and segments it into background, head and shoulder three video objects on average Pentium PC platforms.

...read moreread less

Abstract: A statistical model-based video segmentation algorithm is presented for head-and-shoulder type video. This algorithm uses domain knowledge by abstracting the head-and-shoulder object with a blob-based statistical region model and a shape model. The object segmentation problem is then converted into a model detection and tracking problem. At the system level, a hierarchical structure is designed and spatial and temporal filters are used to improve segmentation quality. This algorithm runs in real time over a QCIF size video, and segments it into background, head and shoulder three video objects on average Pentium PC platforms. Due to its real time feature, this algorithm is appropriate for real time multimedia services such as videophone and web chat. Simulation results are offered to compare MPEG-4 performance with H.263 on segmented video objects with respects to compression efficiency, bit rate adaptation and functionality.

...read moreread less

Journal Article•DOI•

[...]

T. Yensen¹, J.P. Lariviere¹, Ioannis Lambadaris¹, Rafik Goubran¹•Institutions (1)

Carleton University¹

Design of an interactive video-on-demand system

TL;DR: This paper will show that the proposed HMM technique makes a good compromise between the mean end-to- end delay, end- to-end delay standard deviation and average packet loss rate.

...read moreread less

Abstract: This paper proposes a new algorithm for predicting audio packet playout delay for voice conferencing applications that use silence suppression. The proposed algorithm uses a hidden Markov model (HMM) to predict the playout delay. Several existing algorithms are reviewed to show that the HMM technique is based on a combination of various desirable features of other algorithms. Voice over Internet protocol (VoIP) applications produce packets at a deterministic rate but various queuing delays are added to the packets by the network causing packet interarrival jitter. Playout delay prediction techniques schedule audio packets for playout and attempt to make a reasonable compromise between the number of lost packets, the one-way delay and the delay variation since these criteria cannot be optimized simultaneously. In particular, this paper will show that the proposed HMM technique makes a good compromise between the mean end-to-end delay, end-to-end delay standard deviation and average packet loss rate.

...read moreread less

Journal Article•DOI•

[...]

Yiu-Wing Leung¹, Tony K. C. Chan¹•Institutions (1)

Hong Kong Baptist University¹

Improving the transcoding capability of speech coders

TL;DR: The proposed VOD system is suitable for large scale applications with many customers, and has several desirable features: it can be scaled up to serve more concurrent customers and provide more video programs, it provides interactive operations, and it requires a small buffer size for each video stream.

...read moreread less

Abstract: We design an interactive video-on-demand (VOD) system using both the client-server paradigm and broadcast delivery paradigm. Between the VOD warehouse and customers, we adopt a client-server paradigm to provide an interactive service. Within the VOD warehouse, we adopt a broadcast delivery paradigm to support many concurrent customers. In particular, we exploit the enormous bandwidth of optical fibers for broadcast delivery, so that the system can provide many video programs and maintain a small access delay. In addition, we design and adopt an interleaved broadcast delivery scheme, so that every video stream only requires a small buffer size for temporary storage. A simple proxy is allocated to each ongoing customer, and it retrieves video from optical channels and delivers video to the customer through an information network. The proposed VOD system is suitable for large scale applications with many customers, and has several desirable features: 1) it can be scaled up to serve more concurrent customers and provide more video programs, 2) it provides interactive operations, 3) it only requires point-to-point communication between the VOD warehouse and the customer and involves no network control, 4) it has a small access delay, and 5) it requires a small buffer size for each video stream.

...read moreread less

Journal Article•DOI•

[...]

Hong-Goo Kang¹, Hong Kook Kim², R.V. Cox²•Institutions (2)

Yonsei University¹, AT&T²

Loss concealment using B-pictures motion information

TL;DR: Informal listening tests and the perceptual subjective quality measure (PSQM) scores show that the proposed bitstream mapping method has better quality than the cross tandem method, while it has at least 5 ms less delay and six times less computation.

...read moreread less

Abstract: With the trend of merging various communication networks, a need arises to provide transcoding between different speech coding formats. Presently this means a cross tandem between the two coders in each case. This results in both quality loss and extra delay. A possible alternative is using a bitstream mapping approach that directly converts parameter values. For several standard coders having a similar coding structure, it should be possible to generate comparable or better quality without adding much delay or complexity. This paper proposes a bitstream mapping method between ITU-T Recommendation G.729 and TIA IS-641. Informal listening tests and the perceptual subjective quality measure (PSQM) scores show that the proposed method has better quality than the cross tandem method, while it has at least 5 ms less delay and six times less computation.

...read moreread less

Journal Article•DOI•

[...]

Tamer Shanableh¹, Mohammed Ghanbari¹•Institutions (1)

University of Essex¹