scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Multimedia in 2003"


Journal ArticleDOI
TL;DR: Circular interpretation of bijective transformations is proposed to implement a method that fulfills all quality and functionality requirements of lossless watermarking.
Abstract: The need for reversible or lossless watermarking methods has been highlighted in the literature to associate subliminal management information with losslessly processed media and to enable their authentication. The paper first analyzes the specificity and the application scope of lossless watermarking methods. It explains why early attempts to achieve reversibility are not satisfactory. They are restricted to well-chosen images, strictly lossless context and/or suffer from annoying visual artifacts. Circular interpretation of bijective transformations is proposed to implement a method that fulfills all quality and functionality requirements of lossless watermarking. Results of several bench tests demonstrate the validity of the approach.

438 citations


Journal ArticleDOI
TL;DR: A new digital signature scheme which makes use of an image's contents (in the wavelet transform domain) to construct a structural digital signature (SDS) for image authentication, which can tolerate content-preserving modifications while detecting content-changing modifications.
Abstract: The existing digital data verification methods are able to detect regions that have been tampered with, but are too fragile to resist incidental manipulations. This paper proposes a new digital signature scheme which makes use of an image's contents (in the wavelet transform domain) to construct a structural digital signature (SDS) for image authentication. The characteristic of the SDS is that it can tolerate content-preserving modifications while detecting content-changing modifications. Many incidental manipulations, which were detected as malicious modifications in the previous digital signature verification or fragile watermarking schemes, can be bypassed in the proposed scheme. Performance analysis is conducted and experimental results show that the new scheme is indeed superb for image authentication.

387 citations


Journal ArticleDOI
TL;DR: A joint encryption and compression framework in which video data are scrambled efficiently in the frequency domain by employing selective bit scrambling, block shuffling and block rotation of the transform coefficients and motion vectors is presented.
Abstract: Multimedia data security is very important for multimedia commerce on the Internet such as video-on-demand and real-time video multicast. Traditional cryptographic algorithms/systems for data security are often not fast enough to process the vast amount of data generated by multimedia applications to meet real-time constraints. This paper presents a joint encryption and compression framework in which video data are scrambled efficiently in the frequency domain by employing selective bit scrambling, block shuffling and block rotation of the transform coefficients and motion vectors. The new approach is very simple to implement, yet provides considerable levels of security and different levels of transparency, and has a very limited adverse impact on compression efficiency and no adverse impact on error resiliency. Furthermore, it allows transcodability/scalability, and other content processing functionalities without having to access the cryptographic key and perform decryption and re-encryption.

375 citations


Journal ArticleDOI
W.C. Chu1
TL;DR: A DCT-based image watermarking algorithm is described, where the original image is not required for watermark recovery, and is achieved by inserting the watermark in subimages obtained through subsampling.
Abstract: A DCT-based image watermarking algorithm is described, where the original image is not required for watermark recovery, and is achieved by inserting the watermark in subimages obtained through subsampling.

303 citations


Journal ArticleDOI
TL;DR: A new receiver-based playout scheduling scheme is proposed to improve the tradeoff between buffering delay and late loss for real-time voice communication over IP networks and the overall audio quality is investigated based on subjective listening tests.
Abstract: The quality of service limitation of today's Internet is a major challenge for real-time voice communications. Excessive delay, packet loss, and high delay jitter all impair the communication quality. A new receiver-based playout scheduling scheme is proposed to improve the tradeoff between buffering delay and late loss for real-time voice communication over IP networks. In this scheme the network delay is estimated from past statistics and the playout time of the voice packets is adaptively adjusted. In contrast to previous work, the adjustment is not only performed between talkspurts, but also within talkspurts in a highly dynamic way. Proper reconstruction of continuous playout speech is achieved by scaling individual voice packets using a time-scale modification technique based on the Waveform Similarity Overlap-Add (WSOLA) algorithm. Results of subjective listening tests show that this operation does not impair audio quality, since the adaptation process requires infrequent scaling of the voice packets and low playout jitter is perceptually tolerable. The same time-scale modification technique is also used to conceal packet loss at very low delay, i.e., one packet time. Simulation results based on Internet measurements show that the tradeoff between buffering delay and late loss can be improved significantly. The overall audio quality is investigated based on subjective listening tests, showing typical gains of 1 on a 5-point scale of the Mean Opinion Score.

184 citations


Journal ArticleDOI
TL;DR: The proposed algorithm, which is referred to as time-series active search, offers significantly faster search with sufficient accuracy and the key to the acceleration is an effective pruning algorithm introduced in the histogram matching stage.
Abstract: This paper proposes a quick method of similarity-based signal searching to detect and locate a specific audio or video signal given as a query in a stored long audio or video signal. With existing techniques, similarity-based searching may become impractical in terms of computing time in the case of searching through long-running (several-days' worth of) signals. The proposed algorithm, which is referred to as time-series active search, offers significantly faster search with sufficient accuracy. The key to the acceleration is an effective pruning algorithm introduced in the histogram matching stage. Through the pruning, the actual number of matching calculations can be reduced by 200 to 500 times compared with exhaustive search while guaranteeing exactly the same search result. Experiments show that the proposed method can correctly detect and locate a 15-s signal in a 48-h recording of TV broadcasts within 1 s, once the feature vectors are calculated and quantized. As extentions of the basic algorithm, efficient AND/OR search methods for searching for multiple query signals and a feature dithering method for coping with signal distortion are also discussed.

178 citations


Journal ArticleDOI
TL;DR: In this article, two core algorithms namely event clustering and screening of low-quality images are introduced and their performance is evaluated, and two algorithms for automatic albuming of consumer photographs are described.
Abstract: In this paper, algorithms for automatic albuming of consumer photographs are described. Specifically, two core algorithms namely event clustering and screening of low-quality images, are introduced and their performance is evaluated. Event clustering and image quality screening have many applications including albuming services, image management and organization, and digital photofinishing. These are difficult tasks because there is, in general, none (or very limited) contextual information about picture content, and the final interpretation could be subjective. A novel event-clustering algorithm is created to automatically segment pictures into events and subevents for albuming, based on date/time metadata information, as well as color content of the pictures. A block-based color histogram correlation technique is developed for image content comparison of general consumer pictures. A new quality-screening algorithm is developed based on object quality measures, to detect problematic images caused by underexposure, low contrast, and camera defocus or movement.

143 citations


Journal ArticleDOI
TL;DR: A novel one-pass, real-time approach to video scene change detection based on statistical sequential analysis and operating on a compressed multimedia bitstream is proposed.
Abstract: The increased availability and usage of multimedia information have created a critical need for efficient multimedia processing algorithms. These algorithms must offer capabilities related to browsing, indexing, and retrieval of relevant data. A crucial step in multimedia processing is that of reliable video segmentation into visually coherent video shots through scene change detection. Video segmentation enables subsequent processing operations on video shots, such as video indexing, semantic representation, or tracking of selected video information. Since video sequences generally contain both abrupt and gradual scene changes, video segmentation algorithms must be able to detect a large variety of changes. While existing algorithms perform relatively well for detecting abrupt transitions (video cuts), reliable detection of gradual changes is much more difficult. A novel one-pass, real-time approach to video scene change detection based on statistical sequential analysis and operating on a compressed multimedia bitstream is proposed. Our approach models video sequences as stochastic processes, with scene changes being reflected by changes in the characteristics (parameters) of the process. Statistical sequential analysis is used to provide an unified framework for the detection of both abrupt and gradual scene changes.

115 citations


Journal ArticleDOI
TL;DR: The effectiveness of the multiresolution caching mechanism of CyberWalk in supporting virtual walkthrough applications in the Internet environment is investigated through numerous experiments, both on the simulation system and on the prototype system.
Abstract: A distributed virtual walkthrough environment allows users connected to the geometry server to walk through a specific place of interest, without having to travel physically. This place of interest may be a virtual museum, virtual library or virtual university. There are two basic approaches to distribute the virtual environment from the geometry server to the clients, complete replication and on-demand transmission. Although the on-demand transmission approach saves waiting time and optimizes network usage, many technical issues need to be addressed in order for the system to be interactive. CyberWalk is a web-based distributed virtual walkthrough system developed based on the on-demand transmission approach. It achieves the necessary performance with a multiresolution caching mechanism. First, it reduces the model transmission and rendering times by employing a progressive multiresolution modeling technique. Second, it reduces the Internet response time by providing a caching and prefetching mechanism. Third, it allows a client to continue to operate, at least partially, when the Internet is disconnected. The caching mechanism of CyberWalk tries to maintain at least a minimum resolution of the object models in order to provide at least a coarse view of the objects to the viewer. All these features allow CyberWalk to provide sufficient interactivity to the user for virtual walkthrough over the Internet environment. In this paper, we demonstrate the design and implementation of CyberWalk. We investigate the effectiveness of the multiresolution caching mechanism of CyberWalk in supporting virtual walkthrough applications in the Internet environment through numerous experiments, both on the simulation system and on the prototype system.

113 citations


Journal ArticleDOI
TL;DR: A two-stage framework to generate MPEG-7-compliant hierarchical key frame summaries of video sequences by reducing the number of key frames to match the low-level browsing preferences of a user is proposed.
Abstract: A compact summary of video that conveys visual content at various levels of detail enhances user interaction significantly. In this paper, we propose a two-stage framework to generate MPEG-7-compliant hierarchical key frame summaries of video sequences. At the first stage, which is carried out off-line at the time of content production, fuzzy clustering and data pruning methods are applied to given video segments to obtain a nonredundant set of key frames that comprise the finest level of the hierarchical summary. The number of key frames allocated to each shot or segment is determined dynamically and without user supervision through the use of cluster validation techniques. A coarser summary is generated on-demand in the second stage by reducing the number of key frames to match the low-level browsing preferences of a user. The proposed method has been validated by experimental results on a collection of video programs.

107 citations


Journal ArticleDOI
TL;DR: A novel approach to color edge detection by automatic noise-adaptive thresholding derived from sensor noise analysis is proposed and a taxonomy on color edge types is presented, obtaining a parameter-free edge classifier.
Abstract: We aim at using color information to classify the physical nature of edges in video. To achieve physics-based edge classification, we first propose a novel approach to color edge detection by automatic noise-adaptive thresholding derived from sensor noise analysis. Then, we present a taxonomy on color edge types. As a result, a parameter-free edge classifier is obtained labeling color transitions into one of the following types: 1) shadow-geometry, 2) highlight edges, and 3) material edges. The proposed method is empirically verified on images showing complex real world scenes.

Journal ArticleDOI
TL;DR: Inspired by theories of infant cognition, this work presents a computational model which learns words from untranscribed acoustic and video input which is implemented in a real-time robotic system which performs interactive language learning and understanding.
Abstract: Language is grounded in sensory-motor experience. Grounding connects concepts to the physical world enabling humans to acquire and use words and sentences in context. Currently most machines which process language are not grounded. Instead, semantic representations are abstract, pre-specified, and have meaning only when interpreted by humans. We are interested in developing computational systems which represent words, utterances, and underlying concepts in terms of sensory-motor experiences leading to richer levels of machine understanding. A key element of this work is the development of effective architectures for processing multisensory data. Inspired by theories of infant cognition, we present a computational model which learns words from untranscribed acoustic and video input. Channels of input derived from different sensors are integrated in an information-theoretic framework. Acquired words are represented in terms of associations between acoustic and visual sensory experience. The model has been implemented in a real-time robotic system which performs interactive language learning and understanding. Successful learning has also been demonstrated using infant-directed speech and images.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed color descriptor could produce a high image retrieval rate and accurately detect abrupt scene-cuts in a video analysis and the storage space required for the image histogram values can be effectively reduced.
Abstract: An important problem in color-based image retrieval and video segmentation is to lack information about how color is spatially distributed. To solve this problem and enhance the performance of image and video analyses, a spatial color descriptor is proposed involving a color adjacency histogram and color vector angle histogram. The color adjacency histogram represents the spatial distribution of color pairs at color edges in an image, thereby incorporating spatial information into the proposed color descriptor. Meanwhile, the color vector angle histogram represents the global color distribution of smooth pixels in an image. Since the proposed color descriptor includes spatial adjacency information between colors, it can robustly reduce the effect of a significant change in appearance and shape in image and video analyses. Moreover, since the color adjacency histogram is simply represented by binary streams, the storage space required for the image histogram values can be effectively reduced. Experimental results show that even with significant appearance changes, the proposed color descriptor could produce a high image retrieval rate and accurately detect abrupt scene-cuts in a video analysis.

Journal ArticleDOI
TL;DR: A new message format that is suitable for multicast key management schemes, and the feasibility of embedding rekeying messages using a data embedding method that has been recently proposed for fractional-pel video coding standards such as H.263 and MPEG-2.
Abstract: The problem of controlling access to multimedia multicasts requires the distribution and maintenance of keying information. Typically, the problem of key management is considered separately from the problem of distributing the rekeying messages. Multimedia sources provide two approaches to distributing the rekeying messages associated with securing group communication. The first, and more conventional approach employs the use of a media-independent channel to convey rekeying messages. We propose, however, a second approach that involves the use of a media-dependent channel, and is achieved for multimedia by using data embedding techniques. Compared to a media-independent channel, the use of data embedding to convey rekeying messages provides enhanced security by masking the presence of rekeying operations. This covert communication makes it difficult for an adversary to gather information regarding the group membership and its dynamics. In addition to proposing a new mode of conveyance for the rekeying messages, we introduce a new message format that is suitable for multicast key management schemes. This new message format uses one-way functions to securely distribute new key material to subgroups of users. An advantage of this approach over the traditional message format is that no additional messages must be sent to flag the users which portion of the message is intended for them, thereby reducing communication overhead. We then show how to map the message to a tree structure in order to achieve desirable scalability in communication and computational overhead. Next, as an example of the interplay between the key management scheme and the mode of conveyance, we study the feasibility of embedding rekeying messages using a data embedding method that has been recently proposed for fractional-pel video coding standards such as H.263 and MPEG-2. Finally, since multimedia services will involve multiple layers or objects, we extend the tree-based key management schemes to include new operations needed to handle multilayer multimedia applications where group members may subscribe or cancel membership to some layers while maintaining membership to other layers.

Journal ArticleDOI
TL;DR: By forming a semantic network on top of the keyword association on the images, this work is able to accurately deduce and utilize the images' semantic contents for retrieval purposes and proposes a ranking measure that is suitable for this framework.
Abstract: Relevance feedback is a powerful technique for image retrieval and has been an active research direction for the past few years. Various ad hoc parameter estimation techniques have been proposed for relevance feedback. In addition, methods that perform optimization on multilevel image content model have been formulated. However, these methods only perform relevance feedback on low-level image features and fail to address the images' semantic content. In this paper, we propose a relevance feedback framework to take advantage of the semantic contents of images in addition to low-level features. By forming a semantic network on top of the keyword association on the images, we are able to accurately deduce and utilize the images' semantic contents for retrieval purposes. We also propose a ranking measure that is suitable for our framework. The accuracy and effectiveness of our method is demonstrated with experimental results on real-world image collections.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed method effectively extracts relatively important information by removing redundant and irrelevant information.
Abstract: This paper proposes a new automatic speech summarization method. In this method, a set of words maximizing a summarization score is extracted from automatically transcribed speech. This extraction is performed according to a target compression ratio using a dynamic programming (DP) technique. The extracted set of words is then connected to build a summarization sentence. The summarization score consists of a word significance measure, a confidence measure, linguistic likelihood, and a word concatenation probability. The word concatenation score is determined by a dependency structure in the original speech given by stochastic dependency context free grammar (SDCFG). Japanese broadcast news speech transcribed using a large-vocabulary continuous-speech recognition (LVCSR) system is summarized using our proposed method and compared with manual summarization by human subjects. The manual summarization results are combined to build a word network. This word network is used to calculate the word accuracy of each automatic summarization result using the most similar word string in the network. Experimental results show that the proposed method effectively extracts relatively important information by removing redundant and irrelevant information.

Journal ArticleDOI
TL;DR: An original framework is presented which supports quantitative nonsymbolic representation and comparison of the mutual positioning of extended nonrectangular spatial entities and properties of the model are expounded to develop an efficient computation technique and to motivate and assess a metric of similarity for quantitative comparison of spatial relationships.
Abstract: In the access to image databases, queries based on the appearing visual features of searched data reduce the gap between the user and the engineering representation. To support this access modality, image content can be modeled in terms of different types of features such as shape, texture, color, and spatial arrangement. An original framework is presented which supports quantitative nonsymbolic representation and comparison of the mutual positioning of extended nonrectangular spatial entities. Properties of the model are expounded to develop an efficient computation technique and to motivate and assess a metric of similarity for quantitative comparison of spatial relationships. Representation and comparison of binary relationships between entities is then embedded into a graph-theoretical framework supporting representation and comparison of the spatial arrangements of a picture. Two prototype applications are described.

Journal ArticleDOI
TL;DR: A locomotion mechanism called omni-directional ball-bearing disc platform (OBDP), which allows the user to walk naturally on it and thus to navigate the virtual environment and the gait sensing algorithm that simulates the user's posture based upon his footstep data collected from the OBDP is presented.
Abstract: Locomotion is a virtual reality interface that enables the user to walk inside the virtual environment in any direction over a long distance without actually leaving the physical device. In order to enable the user to freely navigate the virtual world and get fully immersed into the virtual environment accordingly, a locomotion device must fulfill the following two distinct requirements. First, it should allow the user to navigate an infinite distance within a limited area. Secondly, the user should not need to wear any tracking devices to detect his motion. The paper presents a locomotion mechanism called omni-directional ball-bearing disc platform (OBDP), which allows the user to walk naturally on it and thus to navigate the virtual environment. The gait sensing algorithm that simulates the user's posture based upon his footstep data collected from the OBDP is then elaborated. Followed with an omnidirectional stroll-based virtual reality system to integrate the OBDP with the gait sensing algorithm. Significantly, instead of using the three-dimensional (3-D) tracker, the OBDP adopts arrays of ball-bearing sensors on a disc to detect the pace. No other sensor, except the head tracker to detect the user's head rotation, is required on the user's body. Finally, a prototype of an overhead crane training simulator that fully explores the advantage of the OBDP is presented along with the verification of the effectiveness of the presented gait sensing algorithm.

Journal ArticleDOI
TL;DR: An appearance-based virtual view generation method that allows viewers to fly through a real dynamic scene by interpolating two original camera-views near the given viewpoint and extracting much more reliable and comprehensive 3D geometry of the scene as a 3D model.
Abstract: We present an appearance-based virtual view generation method that allows viewers to fly through a real dynamic scene. The scene is captured by multiple synchronized cameras. Arbitrary views are generated by interpolating two original camera-views near the given viewpoint. The quality of the generated synthetic view is determined by the precision, consistency and density of correspondences between the two images. All or most of previous work that uses interpolation extracts the correspondences from these two images. However, not only is it difficult to do so reliably (the task requires a good stereo algorithm), but also the two images alone sometimes do not have enough information, due to problems such as occlusion. Instead, we take advantage of the fact that we have many views, from which we can extract much more reliable and comprehensive 3D geometry of the scene as a 3D model. Dense and precise correspondences between the two images, to be used for interpolation, are obtained using this constructed 3D model.

Journal ArticleDOI
TL;DR: The efficiency of H/spl I.bar/MCOP over its (less general) contenders in terms of finding feasible paths and minimizing their costs under the same level of computational complexity is shown.
Abstract: One of the challenging issues in exchanging multimedia information over a network is how to determine a feasible path that satisfies all the quality-of-service (QoS) requirements of multimedia applications while maintaining high utilization of network resources. The latter objective implies the need to impose an additional optimality requirement on the feasibility problem. This can be done through a primary cost function (e.g., administrative weight, hop-count) according to which the selected feasible path is optimal. In general, multiconstrained path selection, with or without optimization, is an NP-complete problem that cannot be exactly solved in polynomial time. Heuristics and approximation algorithms with polynomial- and pseudo-polynomial-time complexities are often used to deal with this problem. However, existing solutions suffer either from excessive computational complexities that cannot be used for online network operation or from low performance. Moreover, they only deal with special cases of the problem (e.g., two constraints without optimization, one constraint with optimization, etc.). For the feasibility problem under multiple constraints, some researchers have recently proposed a nonlinear cost function whose minimization provides a continuous spectrum of solutions ranging from a generalized linear approximation (GLA) to an asymptotically exact solution. In this paper, we propose an efficient heuristic algorithm for the most general form of the problem. We first formalize the theoretical properties of the above nonlinear cost function. We then introduce our heuristic algorithm (H/spl I.bar/MCOP), which attempts to minimize both the nonlinear cost function (for the feasibility part) and the primary cost function (for the optimality part). We prove that H/spl I.bar/MCOP guarantees at least the performance of GLA and often improves upon it. H/spl I.bar/MCOP has the same order of complexity as Dijkstra's algorithm. Using extensive simulations on random graphs and realistic network topologies with correlated and uncorrelated link weights from several distributions including uniform, normal, and exponential, we show the efficiency of H/spl I.bar/MCOP over its (less general) contenders in terms of finding feasible paths and minimizing their costs under the same level of computational complexity.

Journal ArticleDOI
TL;DR: This work proposes a collaborative virtual sculpting framework, called VSculpt, that provides a real-time intuitive environment for collaborative design and addresses issues on efficient rendering and transmission of deformable objects, intuitive object deformation using the CyberGlove and concurrent object deformed by multiple clients.
Abstract: A collaborative virtual sculpting system supports a team of geographically separated designers/engineers connected by networks to participate in designing three-dimensional (3D) virtual engineering tools or sculptures. It encourages international collaboration at a minimal cost. However, in order for the system to be useful, two factors need to be addressed: intuitiveness and real-time interaction. Although a lot of effort has been put into developing virtual sculpting environments, only limited work addresses collaborative virtual sculpting. This is because in order to support real-time collaborative virtual sculpting, many challenging issues need to be addressed. We propose a collaborative virtual sculpting framework, called VSculpt. Through adapting some techniques we developed earlier and integrating them with some techniques developed here, the proposed framework provides a real-time intuitive environment for collaborative design. In particular, it addresses issues on efficient rendering and transmission of deformable objects, intuitive object deformation using the CyberGlove and concurrent object deformation by multiple clients. We demonstrate and evaluate the performance of the proposed framework through a number of experiments.

Journal ArticleDOI
TL;DR: It is shown how the use of a decision tree that can adaptively choose among several different error concealment methods can outperform each single method.
Abstract: When macro-blocks are lost in a video decoder such as MPEG-2, the decoder can try to conceal the error by estimating or interpolating the missing area. Many different methods for this type of post-processing concealment have been proposed, operating in the spatial, frequency, or temporal domains, or some hybrid combination of them. In this paper, we show how the use of a decision tree that can adaptively choose among several different error concealment methods can outperform each single method. We also propose two promising new methods for temporal error concealment.

Journal ArticleDOI
Jin Li, Hong-Hui Sun1
TL;DR: A new effective mechanism is proposed for the browsing of large compressed images over the Internet where the user specifies a region of interest (ROI) with certain spatial and resolution constraint and the browser only downloads the portion of the compressed bitstream that covers the current ROI.
Abstract: A new effective mechanism is proposed for the browsing of large compressed images over the Internet. The image is compressed with the JPEG 2000 into one single bitstream and put on the server. During the browsing process, the user specifies a region of interest (ROI) with certain spatial and resolution constraint. The browser only downloads the portion of the compressed bitstream that covers the current ROI, and the download is performed in a progressive fashion so that a coarse view of the ROI can be rendered very quickly and then gradually refined as more and more bitstream arrives. In the case of the switch of ROI, e.g., zooming in/out or panning around, the browser uses existing compressed bitstream in cache to quickly render a coarse view of the new ROI, and in the same time, request a new set of compressed bitstream corresponding to the updated view. The system greatly improves the experience of browsing large images over the slow networks.

Journal ArticleDOI
TL;DR: Two technologies consist of an energy-efficient communication protocol for the uplink channel and a low-complexity multirate transmission scheme that can make a wireless multimedia communication system more energy- efficient while ensuring QoS.
Abstract: The integration of multimedia services into wireless communication networks is a major source of future technological advances. One of the main challenging issues in this endeavor is the resource optimization strategy. This paper addresses this issue from the perspective of minimizing the total power consumption of a mobile terminal while maintaining a guaranteed quality-of-service (QoS). For many years, the management strategy has dealt primarily with bandwidth allocation, network capacity, and QoS. However, due to the integration of multimedia services, the increasing energy consumption of a mobile unit is also becoming a dominant factor in the design of communication systems. In this paper, we describe two technologies that can make a wireless multimedia communication system more energy-efficient while ensuring QoS. These technologies consist of an energy-efficient communication protocol for the uplink channel and a low-complexity multirate transmission scheme. We also provide a video transmission example using the H.263 standard in the proposed system to demonstrate the importance of our total power optimization strategy. The simulation results show that a savings of 10-32% is achieved in the total energy consumption of the mobile unit.

Journal ArticleDOI
TL;DR: A hierarchal approach to model videos at three levels, object level (OL), frame level (FL), and shot level (SL), and a novel query interface that allows users to describe the time-varying contents of complex video shots by sketch and feature specification is presented.
Abstract: In the past few years, modeling and querying video databases have been a subject of extensive research to develop tools for effective search of videos. In this paper, we present a hierarchal approach to model videos at three levels, object level (OL), frame level (FL), and shot level (SL). The model captures the visual features of individual objects at OL, visual-spatio-temporal (VST) relationships between objects at FL, and time-varying visual features and time-varying VST relationships at SL. We call the combination of the time-varying visual features and the time-varying VST relationships a Content trajectory which is used to represent and index a shot. A novel query interface that allows users to describe the time-varying contents of complex video shots such as those of skiers, soccer players, etc., by sketch and feature specification is presented. Our experimental results prove the effectiveness of modeling and querying shots using the content trajectory approach.

Journal ArticleDOI
TL;DR: A statistical model-based video segmentation algorithm is presented for head-and-shoulder type video that runs in real time over a QCIF size video, and segments it into background, head and shoulder three video objects on average Pentium PC platforms.
Abstract: A statistical model-based video segmentation algorithm is presented for head-and-shoulder type video. This algorithm uses domain knowledge by abstracting the head-and-shoulder object with a blob-based statistical region model and a shape model. The object segmentation problem is then converted into a model detection and tracking problem. At the system level, a hierarchical structure is designed and spatial and temporal filters are used to improve segmentation quality. This algorithm runs in real time over a QCIF size video, and segments it into background, head and shoulder three video objects on average Pentium PC platforms. Due to its real time feature, this algorithm is appropriate for real time multimedia services such as videophone and web chat. Simulation results are offered to compare MPEG-4 performance with H.263 on segmented video objects with respects to compression efficiency, bit rate adaptation and functionality.

Journal ArticleDOI
TL;DR: This paper will show that the proposed HMM technique makes a good compromise between the mean end-to- end delay, end- to-end delay standard deviation and average packet loss rate.
Abstract: This paper proposes a new algorithm for predicting audio packet playout delay for voice conferencing applications that use silence suppression. The proposed algorithm uses a hidden Markov model (HMM) to predict the playout delay. Several existing algorithms are reviewed to show that the HMM technique is based on a combination of various desirable features of other algorithms. Voice over Internet protocol (VoIP) applications produce packets at a deterministic rate but various queuing delays are added to the packets by the network causing packet interarrival jitter. Playout delay prediction techniques schedule audio packets for playout and attempt to make a reasonable compromise between the number of lost packets, the one-way delay and the delay variation since these criteria cannot be optimized simultaneously. In particular, this paper will show that the proposed HMM technique makes a good compromise between the mean end-to-end delay, end-to-end delay standard deviation and average packet loss rate.

Journal ArticleDOI
TL;DR: The proposed VOD system is suitable for large scale applications with many customers, and has several desirable features: it can be scaled up to serve more concurrent customers and provide more video programs, it provides interactive operations, and it requires a small buffer size for each video stream.
Abstract: We design an interactive video-on-demand (VOD) system using both the client-server paradigm and broadcast delivery paradigm. Between the VOD warehouse and customers, we adopt a client-server paradigm to provide an interactive service. Within the VOD warehouse, we adopt a broadcast delivery paradigm to support many concurrent customers. In particular, we exploit the enormous bandwidth of optical fibers for broadcast delivery, so that the system can provide many video programs and maintain a small access delay. In addition, we design and adopt an interleaved broadcast delivery scheme, so that every video stream only requires a small buffer size for temporary storage. A simple proxy is allocated to each ongoing customer, and it retrieves video from optical channels and delivers video to the customer through an information network. The proposed VOD system is suitable for large scale applications with many customers, and has several desirable features: 1) it can be scaled up to serve more concurrent customers and provide more video programs, 2) it provides interactive operations, 3) it only requires point-to-point communication between the VOD warehouse and the customer and involves no network control, 4) it has a small access delay, and 5) it requires a small buffer size for each video stream.

Journal ArticleDOI
TL;DR: Informal listening tests and the perceptual subjective quality measure (PSQM) scores show that the proposed bitstream mapping method has better quality than the cross tandem method, while it has at least 5 ms less delay and six times less computation.
Abstract: With the trend of merging various communication networks, a need arises to provide transcoding between different speech coding formats. Presently this means a cross tandem between the two coders in each case. This results in both quality loss and extra delay. A possible alternative is using a bitstream mapping approach that directly converts parameter values. For several standard coders having a similar coding structure, it should be possible to generate comparable or better quality without adding much delay or complexity. This paper proposes a bitstream mapping method between ITU-T Recommendation G.729 and TIA IS-641. Informal listening tests and the perceptual subjective quality measure (PSQM) scores show that the proposed method has better quality than the cross tandem method, while it has at least 5 ms less delay and six times less computation.

Journal ArticleDOI
TL;DR: While the composed motion vectors improve the quality of concealment over the conventional methods by more than 3-4 dB, another 2 dB improvement can be achieved by constraining the generation of the bidirectional motion vectors.
Abstract: In this paper, the motion parameters of the bidirectionally predicted pictures (B-pictures) of MPEG-1,2 are exploited for concealment of large portions of corrupted anchor pictures (and vice versa) that might arise due to channel errors or packet losses. To further enhance the quality of the concealed pictures, we propose two methods of constraining the motion vectors of the B-pictures that strengthen the tie between them and those of the anchor pictures in the same picture subgroup. In one method, the macroblock decisions on the last B-picture in each subgroup is constrained to be bidirectional if those of the other B-pictures are not, such that the derived motion vectors for the concealment of the anchor picture are always composed from the forward and backward motion vectors of the bidirectional motions. Second, the bidirectional motion vectors of the B-pictures in each subgroup is constrained such that the vectorial sum of their forward and backward motion vectors results in an accurate motion prediction of the anchor picture. The experimental results show that while the composed motion vectors improve the quality of concealment over the conventional methods by more than 3-4 dB, another 2 dB improvement can be achieved by constraining the generation of the bidirectional motion vectors.