scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Multimedia in 2004"


Journal Article•DOI•
Min Wu1, Bede Liu1•
TL;DR: The proposed data embedding method can be used to detect unauthorized use of a digitized signature, and annotate or authenticate binary documents, and presents analysis and discussions on robustness and security issues.
Abstract: This paper proposes a new method to embed data in binary images, including scanned text, figures, and signatures. The method manipulates "flippable" pixels to enforce specific block-based relationship in order to embed a significant amount of data without causing noticeable artifacts. Shuffling is applied before embedding to equalize the uneven embedding capacity from region to region. The hidden data can be extracted without using the original image, and can also be accurately extracted after high quality printing and scanning with the help of a few registration marks. The proposed data embedding method can be used to detect unauthorized use of a digitized signature, and annotate or authenticate binary documents. The paper also presents analysis and discussions on robustness and security issues.

395 citations


Journal Article•DOI•
TL;DR: This paper investigates how consistency can be established for replicated applications changing their state in reaction to user-initiated operations as well as the passing of time and presents a timewarp scheme that complements local-lag by guaranteeing consistency and correctness for replicated continuous applications.
Abstract: In this paper, we investigate how consistency can be established for replicated applications changing their state in reaction to user-initiated operations as well as the passing of time. Typical examples of these applications are networked computer games and distributed virtual environments. We give a formal definition of the terms consistency and correctness for this application class. Based on these definitions, it is shown that an important tradeoff relationship exists between the responsiveness of the application and the appearance of short-term inconsistencies. We propose to exploit the knowledge of this tradeoff by voluntarily decreasing the responsiveness of the application in order to eliminate short-term inconsistencies. This concept is called local-lag. Furthermore, a timewarp scheme is presented that complements local-lag by guaranteeing consistency and correctness for replicated continuous applications. The computational complexity of the timewarp algorithm is determined in theory and practice by examining a simple networked computer game. The timewarp scheme is then compared to the well-known dead-reckoning approach. It is shown that the choice between both schemes is application-dependent.

252 citations


Journal Article•DOI•
TL;DR: A novel error protection method that can provide adaptive quality-of-service (QoS) to layered coded video by utilizing priority queueing at the network layer and retry-limit adaptation at the link layer is proposed.
Abstract: Robust streaming of video over 802.11 wireless LANs (WLANs) poses many challenges, including coping with packets losses caused by network buffer overflow or link erasures. In this paper, we propose a novel error protection method that can provide adaptive quality-of-service (QoS) to layered coded video by utilizing priority queueing at the network layer and retry-limit adaptation at the link layer. The design of our method is motivated by the observation that the retry limit settings of the MAC layer can be optimized in such a way that the overall packet losses that are caused by either link erasure or buffer overflow are minimized. We developed a real-time retry limit adaptation algorithm to trace the optimal retry limit for both the single-queue (or single-layer) and multiqueue (or multilayer) cases. The video layers are unequally protected over the wireless link by the MAC with different retry limits. In our proposed transmission framework, these retry limits are dynamically adapted depending on the wireless channel conditions and traffic characteristics. Furthermore, the proposed priority queueing discipline is enhanced with packet filtering and purging functionalities that can significantly save bandwidth by discarding obsolete or un-decodable packets from the buffer. Simulations show that the proposed cross-layer protection mechanism can significantly improve the received video quality.

245 citations


Journal Article•DOI•
TL;DR: A novel robust watermarking approach called FuseMark is presented based on the principles of image fusion for copy protection or robust tagging applications to decrease the problem of false negative detection without increasing the false positive detection rate.
Abstract: This paper presents a novel robust watermarking approach called FuseMark based on the principles of image fusion for copy protection or robust tagging applications. We consider the problem of logo watermarking in still images and employ multiresolution data fusion principles for watermark embedding and extraction. A human visual system model based on contrast sensitivity is incorporated to hide a higher energy hidden logo in salient image components. Watermark extraction involves both characterization of attacks and logo estimation using a rake-like receiver. Statistical analysis demonstrates how our extraction approach can be used for watermark detection applications to decrease the problem of false negative detection without increasing the false positive detection rate. Simulation results verify theoretical observations and demonstrate the practical performance of FuseMark.

219 citations


Journal Article•DOI•
TL;DR: A modification on the three-step search algorithm which employs a small diamond pattern in the first step, and the unrestricted search step is used to search the center area and shows better results in terms of MSE and requires less computation by up to 15% on average.
Abstract: The three-step search algorithm has been widely used in block matching motion estimation due to its simplicity and effectiveness. The sparsely distributed checking points pattern in the first step is very suitable for searching large motion. However, for stationary or quasistationary blocks it will easily lead the search to be trapped into a local minimum. In this paper we propose a modification on the three-step search algorithm which employs a small diamond pattern in the first step, and the unrestricted search step is used to search the center area. Experimental results show that the new efficient three-step search performs better than new three-step search in terms of MSE and requires less computation by up to 15% on average.

219 citations


Journal Article•DOI•
TL;DR: This work uses a novel way of personalizing the head related transfer functions (HRTFs) from a database, based on anatomical measurements, to create virtual auditory spaces by rendering cues that arise from anatomical scattering, environmental scattering, and dynamical effects.
Abstract: High-quality virtual audio scene rendering is required for emerging virtual and augmented reality applications, perceptual user interfaces, and sonification of data. We describe algorithms for creation of virtual auditory spaces by rendering cues that arise from anatomical scattering, environmental scattering, and dynamical effects. We use a novel way of personalizing the head related transfer functions (HRTFs) from a database, based on anatomical measurements. Details of algorithms for HRTF interpolation, room impulse response creation, HRTF selection from a database, and audio scene presentation are presented. Our system runs in real time on an office PC without specialized DSP hardware.

188 citations


Journal Article•DOI•
TL;DR: Experiments show image retrieval mechanisms based on a combination of texture and color features to be as effective as other methods while computationally more tractable.
Abstract: In this paper, we explore image retrieval mechanisms based on a combination of texture and color features. Texture features are extracted using Discrete Wavelet Frames (DWF) analysis, an over-complete decomposition in scale and orientation. Two-dimensional (2-D) or one-dimensional (1-D) histograms of the CIE Lab chromaticity coordinates are used as color features. The 1-D histograms of the a, b coordinates were modeled according to the generalized Gaussian distribution. The similarity measure defined on the feature distribution is based on the Bhattacharya distance. Retrieval benchmarking is performed over the Brodatz album and on images from natural scenes, obtained from the VisTex database of MIT Media Laboratory and from the Corel Photo Gallery. As a performance indicator recall (relative number of correct images retrieved) is measured on both texture and color separately and in combination. Experiments show this approach to be as effective as other methods while computationally more tractable.

174 citations


Journal Article•DOI•
TL;DR: The results of these studies found that human subjects integrate audio and video quality together using a multiplicative rule.
Abstract: This paper describes two experiments designed to develop a basic multimedia predictive quality metric. In Experiment 1, two head and shoulder audio-video sequences were used for test material. Experiment 2 used one of the head and shoulder sequences from Experiment 1 together with a different, high-motion sequence. In both experiments, subjects assessed the audio quality first, followed by the video quality and finally a third test evaluated multimedia quality. The results of these studies found that human subjects integrate audio and video quality together using a multiplicative rule. A regression analysis using the subjective quality test data from each experiment found that: 1) for head and shoulder content, both modalities contribute significantly to the predictive power of the resultant model, although audio quality is weighted slightly higher than video quality and 2) for high-motion content, video quality is weighted significantly higher than audio quality.

169 citations


Journal Article•DOI•
Matija Marolt1•
TL;DR: A connectionist approach to automatic transcription of polyphonic piano music with a new partial tracking technique, based on a combination of an auditory model and adaptive oscillator networks, and shows how synchronization of adaptive oscillators can be exploited to track partials in a musical signal.
Abstract: In this paper, we present a connectionist approach to automatic transcription of polyphonic piano music. We first compare the performance of several neural network models on the task of recognizing tones from time-frequency representation of a musical signal. We then propose a new partial tracking technique, based on a combination of an auditory model and adaptive oscillator networks. We show how synchronization of adaptive oscillators can be exploited to track partials in a musical signal. We also present an extension of our technique for tracking individual partials to a method for tracking groups of partials by joining adaptive oscillators into networks. We show that oscillator networks improve the accuracy of transcription with neural networks. We also provide a short overview of our entire transcription system and present its performance on transcriptions of several synthesized and real piano recordings. Results show that our approach represents a viable alternative to existing transcription systems.

168 citations


Journal Article•DOI•
TL;DR: This paper has proposed a novel framework, called ClassView, to make some advances toward more efficient video database indexing and access, and proposes a hierarchical semantics-sensitive video classifier to shorten the semantic gap.
Abstract: Recent advances in digital video compression and networks have made video more accessible than ever. However, the existing content-based video retrieval systems still suffer from the following problems. 1) Semantics-sensitive video classification problem because of the semantic gap between low-level visual features and high-level semantic visual concepts; 2) Integrated video access problem because of the lack of efficient video database indexing, automatic video annotation, and concept-oriented summary organization techniques. In this paper, we have proposed a novel framework, called ClassView, to make some advances toward more efficient video database indexing and access. 1) A hierarchical semantics-sensitive video classifier is proposed to shorten the semantic gap. The hierarchical tree structure of the semantics-sensitive video classifier is derived from the domain-dependent concept hierarchy of video contents in a database. Relevance analysis is used for selecting the discriminating visual features with suitable importances. The Expectation-Maximization (EM) algorithm is also used to determine the classification rule for each visual concept node in the classifier. 2) A hierarchical video database indexing and summary presentation technique is proposed to support more effective video access over a large-scale database. The hierarchical tree structure of our video database indexing scheme is determined by the domain-dependent concept hierarchy which is also used for video classification. The presentation of visual summary is also integrated with the inherent hierarchical video database indexing tree structure. Integrating video access with efficient database indexing tree structure has provided great opportunity for supporting more powerful video search engines.

163 citations


Journal Article•DOI•
TL;DR: This work proposes a receiver-driven protocol for simultaneous video streaming from multiple senders to a single receiver in order to achieve higher throughput, and to increase tolerance to packet loss and delay due to network congestion.
Abstract: With the explosive growth of video applications over the Internet, many approaches have been proposed to stream video effectively over packet switched, best-effort networks. We propose a receiver-driven protocol for simultaneous video streaming from multiple senders to a single receiver in order to achieve higher throughput, and to increase tolerance to packet loss and delay due to network congestion. Our receiver-driven protocol employs a novel rate allocation algorithm (RAA) and a packet partition algorithm (PPA). The RAA, run at the receiver, determines the sending rate for each sender by taking into account available network bandwidth, channel characteristics, and a prespecified, fixed level of forward error correction, in such a way as to minimize the probability of packet loss. The PPA, run at the senders based on a set of parameters estimated by the receiver, ensures that every packet is sent by one and only one sender, and at the same time, minimizes the startup delay. Using both simulations and Internet experiments, we demonstrate the effectiveness of our protocol in reducing packet loss.

Journal Article•DOI•
TL;DR: This work presents three methods to estimate mean squared error (MSE) due to packet losses directly from the video bitstream, which uses only network-level measurements and extracts sequence-specific information including spatio-temporal activity and the effects of error propagation.
Abstract: We consider monitoring the quality of compressed video transmitted over a packet network from the perspective of a network service provider. Our focus is on no-reference methods, which do not access the original signal, and on evaluating the impact of packet losses on quality. We present three methods to estimate mean squared error (MSE) due to packet losses directly from the video bitstream. NoParse uses only network-level measurements (like packet loss rate), QuickParse extracts the spatio-temporal extent of the impact of the loss, and FullParse extracts sequence-specific information including spatio-temporal activity and the effects of error propagation. Our simulation results with MPEG-2 video subjected to transport packet losses illustrate the performance possible using the three methods.

Journal Article•DOI•
TL;DR: The problem of efficiently streaming a set of heterogeneous videos from a remote server through a proxy to multiple asynchronous clients so that they can experience playback with low startup delays is addressed.
Abstract: We address the problem of efficiently streaming a set of heterogeneous videos from a remote server through a proxy to multiple asynchronous clients so that they can experience playback with low startup delays. We determine the optimal proxy prefix cache allocation to the videos that minimizes the aggregate network bandwidth cost. We integrate proxy caching with traditional server-based reactive transmission schemes such as hatching, patching and stream merging to develop a set of proxy-assisted delivery schemes. We quantitatively explore the impact of the choice of transmission scheme, cache allocation policy, proxy cache size, and availability of unicast versus multicast capability, on the resulting transmission cost. Our evaluations show that even a relatively small prefix cache (10%-20% of the video repository) is sufficient to realize substantial savings in transmission cost. We find that carefully designed proxy-assisted reactive transmission schemes can produce significant cost savings even in a predominantly unicast environment such as the Internet.

Journal Article•DOI•
Bo Shen1, Sung-Ju Lee1, Sujoy Basu1•
TL;DR: The results indicate that compared with the traditional network caches, with marginal transcoding load, TeC improves the cache effectiveness, decreases the user-perceived latency, and reduces the traffic between the proxy and the content origin server.
Abstract: With the wide availability of high-speed network access, we are experiencing high quality streaming media delivery over the Internet. The emergence of ubiquitous computing enables mobile users to access the Internet with their laptops, PDAs, or even cell phones. When nomadic users connect to the network via wireless links or phone lines, high quality video transfer can be problematic due to long delay or size mismatch between the application display and the screen. Our proposed solution to this problem is to enable network proxies with the transcoding capability, and hence provide different, appropriate video quality to different network environment. The proxies in our transcoding-enabled caching (TeC) system perform transcoding as well as caching for efficient rich media delivery to heterogeneous network users. This design choice allows us to perform content adaptation at the network edges. We propose three different TeC caching strategies. We describe each algorithm and discuss its merits and shortcomings. We also study how the user access pattern affects the performance of TeC caching algorithms and compare them with other approaches. We evaluate TeC performance by conducting two types of simulation. Our first experiment uses synthesized traces while the other uses real traces derived from an enterprise media server logs. The results indicate that compared with the traditional network caches, with marginal transcoding load, TeC improves the cache effectiveness, decreases the user-perceived latency, and reduces the traffic between the proxy and the content origin server.

Journal Article•DOI•
TL;DR: An invisible spatial domain watermark insertion algorithm is presented for which it is shown that the watermark can be recovered, even if the attacker tries to manipulate the watermarks with the knowledge of the water marking process.
Abstract: Most of the existing watermarking processes become vulnerable when the attacker knows the watermark insertion algorithm. This paper presents an invisible spatial domain watermark insertion algorithm for which we show that the watermark can be recovered, even if the attacker tries to manipulate the watermark with the knowledge of the watermarking process. The process incorporates buyer specific watermarks within a single multimedia object, and the same multimedia object has different watermarks that differ from owner to owner. Therefore recovery of this watermark not only authenticates the particular owner of the multimedia object but also could be used to identify the buyer involved in the forging process. This is achieved after spatially dividing the multimedia signal randomly into a set of disjoint subsets (referred to as the image key) and then manipulating the intensity of these subsets differently depending on a buyer specific key. These buyer specific keys are generated using a secret permutation of error correcting codes so that exact keys are not known even with the knowledge of the error correcting scheme. During recovery process a manipulated buyer key (due to attack) is extracted from the knowledge of the image key. The recovered buyer key is matched with the exact buyer key in the database utilizing the principles of error correction. The survival of the watermark is demonstrated for a wide range of transformations and forging attempts on multimedia objects both in spatial and frequency domains. We have shown that quantitatively our watermarking survives rewatermarking attack using the knowledge of the watermarking process more efficiently compared to a spread spectrum based technique. The efficacy of the process increases in scenarios in which there exist fewer numbers of buyer keys for a specific multimedia object. We have also shown that a minor variation of the watermark insertion process can survive a "Stirmark" attack. By making the image key and the intensity manipulation proms specific for a buyer and with proper selection of error correcting codes, certain categories of collusion attacks can also be precluded.

Journal Article•DOI•
TL;DR: This paper proposes a video encoding algorithm that prevents the indefinite propagation of errors in predictively encoded video-a problem that has received considerable attention over the last decade, and demonstrates the efficacy of the proposed approach through experimental evaluation.
Abstract: This paper addresses the problem of video coding in a joint source-channel setting. In particular, we propose a video encoding algorithm that prevents the indefinite propagation of errors in predictively encoded video-a problem that has received considerable attention over the last decade. This is accomplished by periodically transmitting a small amount of additional information, termed coset information, to the decoder, as opposed to the popular approach of periodic insertion of intra-coded frames. Perhaps surprisingly, the coset information is capable of correcting for errors, without the encoder having a precise knowledge of the lost packets that resulted in the errors. In the context of real-time transmission, the proposed approach entails a minimal loss in performance over conventional encoding in the absence of channel losses, while simultaneously allowing error recovery in the event of channel losses. We demonstrate the efficacy of the proposed approach through experimental evaluation. In particular, the performance of the proposed framework is 3-4 dB superior to the conventional approach of periodic insertion of intra-coded frames, and 1.5-2 dB away from an ideal system, with infinite decoding delay, operating at Shannon capacity.

Journal Article•DOI•
TL;DR: This work shows that the separation between a delay jitter buffer and a decoder buffer is in general suboptimal for VBR video transmitted over VBR channels and specifies the minimum initial delay and the minimum required buffer for a given video stream and a deterministic VBR channel.
Abstract: We consider streaming of video sequences over both constant and variable bit-rate (VBR) channels. Our goal is to enable decoding of each video unit before exceeding its displaying deadline and, hence, to guarantee successful sequence presentation even if the media rate does not match the channel rate. In this work, we show that the separation between a delay jitter buffer and a decoder buffer is in general suboptimal for VBR video transmitted over VBR channels. We specify the minimum initial delay and the minimum required buffer for a given video stream and a deterministic VBR channel. In addition, we provide some probabilistic statements in case that we observe a random behavior of the channel bit rate. A specific example tailored to wireless video streaming is discussed in greater detail and bounds are derived which allow guaranteeing a certain quality-of-service even for random VBR channels in a wireless environment. Simulation results validate the findings.

Journal Article•DOI•
Shao-Yi Chien1, Yu-Wen Huang1, Bing-Yu Hsieh1, Shyh-Yih Ma, Liang-Gee Chen1 •
TL;DR: A fast video segmentation algorithm for MPEG-4 camera systems with change detection and background registration techniques, which can give satisfying segmentation results with low computation load.
Abstract: Automatic video segmentation plays an important role in real-time MPEG-4 encoding systems. Several video segmentation algorithms have been proposed; however, most of them are not suitable for real-time applications because of high computation load and many parameters needed to be set in advance. This paper presents a fast video segmentation algorithm for MPEG-4 camera systems. With change detection and background registration techniques, this algorithm can give satisfying segmentation results with low computation load. The processing speed of 40 QCIF frames per second can be achieved on a personal computer with an 800 MHz Pentium-III processor. Besides, it has shadow cancellation mode, which can deal with light changing effect and shadow effect. A fast global motion compensation algorithm is also included in this algorithm to make it applicable in slight moving camera situations. Furthermore, the required parameters can be decided automatically, which can enhance the proposed algorithm to have adaptive threshold ability. It can be integrated into MPEG-4 videophone systems and digital cameras.

Journal Article•DOI•
Zhe Xiang1, Qian Zhang1, Wenwu Zhu1, Zhensheng Zhang, Ya-Qin Zhang •
TL;DR: A novel framework for multimedia distribution service based on peer-to-peer (P2P) networks in which hosts self-organize into groups based on a topology-aware overlay, aimed at improving media delivery quality and providing high service availability.
Abstract: Recently, there are many research interests in providing efficient and scalable multimedia distribution service. However, stringent quality-of-service (QoS) requirements for media distribution, as well as dynamically changing and heterogeneous network capacity in today's best effort Internet, bring many challenges. In this paper, we introduce a novel framework for multimedia distribution service based on peer-to-peer (P2P) networks. A topology-aware overlay is proposed in which hosts self-organize into groups. End hosts within the same group have similar network conditions and can easily collaborate with each other to achieve QoS awareness. In order to improve media delivery quality and provide high service availability, we further propose two distributed heuristic replication strategies, intergroup replication and intragroup replication, based on this topology-aware overlay. Specifically, intergroup replication is aimed to improve the efficiency of media content delivery between the group where a request is issued and the group where the content is stored. Also, intragroup replication is targeted at improving the availability of the content. Extensive simulation results show that the latency in our proposed architecture is 20% less than that of the FreeNet and 50% less than that of the randomly replication system. Simulation results also show that the video quality in our system is much better than that in the other two systems. Our P2P-based approach is also distributed, scalable, cost effective, and aware of the performance.

Journal Article•DOI•
TL;DR: A probabilistic multimodal generation model is introduced and used to derive an information theoretic measure of cross-modal correspondence and nonparametric statistical density modeling techniques can characterize the mutual information between signals from different domains.
Abstract: Audio and visual signals arriving from a common source are detected using a signal-level fusion technique. A probabilistic multimodal generation model is introduced and used to derive an information theoretic measure of cross-modal correspondence. Nonparametric statistical density modeling techniques can characterize the mutual information between signals from different domains. By comparing the mutual information between different pairs of signals, it is possible to identify which person is speaking a given utterance and discount errant motion or audio from other utterances or nonspeech events.

Journal Article•DOI•
TL;DR: This paper proposes a method of generating a personalized abstract of broadcasted American football video by first detecting significant events in the video stream by matching textual overlays appearing in an image frame with the descriptions of gamestats in which highlights of the game are described.
Abstract: Video abstraction is defined as creating shorter video clips or video posters from an original video stream. In this paper, we propose a method of generating a personalized abstract of broadcasted American football video. We first detect significant events in the video stream by matching textual overlays appearing in an image frame with the descriptions of gamestats in which highlights of the game are described. Then, we select highlight shots which should be included in the video abstract from those detected events reflecting on their significance degree and personal preferences, and generate a video clip by connecting the shots augmented with related audio and text. An hour-length video can be compressed into a minute-length personalized abstract. We experimentally verified the effectiveness of this method by comparing man-made video abstracts.

Journal Article•DOI•
TL;DR: A stricter scene definition for narrative films is introduced and ShotWeave, a novel technique for clustering relevant shots into a scene using the stricter definition is presented, which outperforms two recent techniques utilizing global visual features in terms of segmentation accuracy and time.
Abstract: Automatic video segmentation is the first and necessary step for organizing a long video file into several smaller units. The smallest basic unit is a shot. Relevant shots are typically grouped into a high-level unit called a scene. Each scene is part of a story. Browsing these scenes unfolds the entire story of a film, enabling users to locate their desired video segments quickly and efficiently. Existing scene definitions are rather broad, making it difficult to compare the performance of existing techniques and to develop a better one. This paper introduces a stricter scene definition for narrative films and presents ShotWeave, a novel technique for clustering relevant shots into a scene using the stricter definition. The crux of ShotWeave is its feature extraction and comparison. Visual features are extracted from selected regions of representative frames of shots. These regions capture essential information needed to maintain viewers' thought in the presence of shot breaks. The new feature comparison is developed based on common continuity-editing techniques used in film making. Experiments were performed on full-length films with a wide range of camera motions and a complex composition of shots. The experimental results show that ShotWeave outperforms two recent techniques utilizing global visual features in terms of segmentation accuracy and time.

Journal Article•DOI•
TL;DR: This paper proposes the use of the isolated regions coding tool that jointly limits in-picture prediction and interprediction on a region-of-interest basis and can be applied as an error-robust macroblock mode decision method and used in combination with unequal error protection.
Abstract: Different types of prediction are applied in modern video coding. While predictive coding improves compression efficiency, the propagation of transmission errors becomes more likely. In addition, predictive coding brings difficulties to other aspects of video coding, including random access, parallel processing, and scalability. In order to combat the negative effects, video coding schemes introduce mechanisms such as slices and intracoding, to limit and break the prediction. This paper proposes the use of the isolated regions coding tool that jointly limits in-picture prediction and interprediction on a region-of-interest basis. The tool can be used to provide random access points from non-intrapictures and to respond to intrapicture update requests. Furthermore, it can be applied as an error-robust macroblock mode decision method and can be used in combination with unequal error protection. Finally, it enables mixing of scenes, which is useful in coding of masked scene transitions.

Journal Article•DOI•
TL;DR: The robustness of current visible watermarking schemes for digital images is doubtful and needs to be improved, so a general attacking scheme based on the contradictive requirements ofCurrent visible water marking techniques is worked out.
Abstract: Visible watermarking schemes are important intellectual property rights (IPR) protection mechanisms for digital images and videos that have to be released for certain purposes but illegal reproductions of them are prohibited. Visible watermarking techniques protect digital contents in a more active manner, which is quite different from the invisible watermarking techniques. Digital data embedded with visible watermarks will contain recognizable but unobtrusive copyright patterns, and the details of the host data should still exist. The embedded pattern of a useful visible watermarking scheme should be difficult or even impossible to be removed unless intensive and expensive human labors are involved. In this paper, we propose an attacking scheme against current visible image watermarking techniques. After manually selecting the watermarked areas, only few human interventions are required. For watermarks purely composed of thin patterns, basic image recovery techniques can completely remove the embedded patterns. For more general watermarks consisting of thick patterns, not only information in surrounding unmarked areas but also information within watermarked areas will be utilized to correctly recover the host image. Although the proposed scheme does not guarantee that the recovered images will be exactly identical to the unmarked originals, the structure of the embedded pattern will be seriously destroyed and a perceptually satisfying recovered image can be obtained. In other words, a general attacking scheme based on the contradictive requirements of current visible watermarking techniques is worked out. Thus, the robustness of current visible watermarking schemes for digital images is doubtful and needs to be improved.

Journal Article•DOI•
TL;DR: ViBE is a browseable/searchable paradigm for organizing video data containing a large number of sequences for indexing and browsing environment and how ViBE performs on a database of MPEG sequences is described.
Abstract: In this paper, we describe a unique new paradigm for video database management known as ViBE (video indexing and browsing environment). ViBE is a browseable/searchable paradigm for organizing video data containing a large number of sequences. The system first segments video sequences into shots by using a new feature vector known as the Generalized Trace obtained from the DC-sequence of the compressed data. Each video shot is then represented by a hierarchical structure known as the shot tree. The shots are then classified into pseudo-semantic classes that describe the shot content. Finally, the results are presented to the user in an active browsing environment using a similarity pyramid data structure. The similarity pyramid allows the user to view the video database at various levels of detail. The user can also define semantic classes and reorganize the browsing environment based on relevance feedback. We describe how ViBE performs on a database of MPEG sequences.

Journal Article•DOI•
Xiaoyan Sun1, Feng Wu, Shipeng Li, Wen Gao, Ya-Qin Zhang •
TL;DR: Experimental results clearly show that the proposed novel seamless switching scheme brings higher efficiency and more flexibility in video streaming.
Abstract: Efficient adaptation to channel bandwidth is broadly required for effective streaming video over the Internet. To address this requirement, a novel seamless switching scheme among scalable video bitstreams is proposed in this paper. It can significantly improve the performance of video streaming over a broad range of bit rates by fully taking advantage of both the high coding efficiency of nonscalable bitstreams and the flexibility of scalable bitstreams, where small channel bandwidth fluctuations are accommodated by the scalability of a single scalable bitstream, whereas large channel bandwidth fluctuations are tolerated by flexible switching between different scalable bitstreams. Two main techniques for switching between video bitstreams are proposed. Firstly, a novel coding scheme is proposed to enable drift-free switching at any frame from the current scalable bitstream to one operated at lower rates without sending any overhead bits. Secondly, a switching-frame coding scheme is proposed to greatly reduce the number of extra bits needed for switching from the current scalable bitstream to one operated at higher rates. Compared with existing approaches, such as switching between nonscalable bitstreams and streaming with a single scalable bitstream, our experimental results clearly show that the proposed scheme brings higher efficiency and more flexibility in video streaming.

Journal Article•DOI•
TL;DR: This work applies a radial-basis function (RBF) network for implementing an adaptive metric which progressively models the notion of image similarity through continual relevance feedback from users, and shows that the proposed methods not only outperform conventional CBIR systems in terms of both accuracy and robustness, but also previously proposed interactive systems.
Abstract: An important requirement for constructing effective content-based image retrieval (CBIR) systems is accurate characterization of visual information. Conventional nonadaptive models, which are usually adopted for this task in simple CBIR systems, do not adequately capture all aspects of the characteristics of the human visual system. An effective way of addressing this problem is to adopt a "human-computer" interactive approach, where the users directly teach the system about what they regard as being significant image features and their own notions of image similarity. We propose a machine learning approach for this task, which allows users to directly modify query characteristics by specifying their attributes in the form of training examples. Specifically, we apply a radial-basis function (RBF) network for implementing an adaptive metric which progressively models the notion of image similarity through continual relevance feedback from users. Experimental results show that the proposed methods not only outperform conventional CBIR systems in terms of both accuracy and robustness, but also previously proposed interactive systems.

Journal Article•DOI•
TL;DR: A family of new algorithms for rate-fidelity optimal packetization of scalable source bit streams with uneven error protection does away with the expediency of fractional bit allocation, a limitation of some existing algorithms.
Abstract: In this paper, we present a family of new algorithms for rate-fidelity optimal packetization of scalable source bit streams with uneven error protection. In the most general setting where no assumption is made on the probability function of packet loss or on the rate-fidelity function of the scalable code stream, one of our algorithms can find the globally optimal solution to the problem in O(N/sup 2/L/sup 2/) time, compared to a previously obtained O(N/sup 3/L/sup 2/) complexity, where N is the number of packets and L is the packet payload size. If the rate-fidelity function of the input is convex, the time complexity can be reduced to O(NL/sup 2/) for a class of erasure channels, including channels for which the probability function of losing n packets is monotonically decreasing in n and independent erasure channels with packet erasure rate no larger than N/2(N + 1). Furthermore, our O(NL/sup 2/) algorithm for the convex case can be modified to rind an approximation solution for the general case. All of our algorithms do away with the expediency of fractional bit allocation, a limitation of some existing algorithms.

Journal Article•DOI•
TL;DR: An asymmetric image steganographic method based on a chaotic dynamic system and the Euler theorem is proposed that possesses security, imperceptibility and survivability.
Abstract: Steganography has been proposed as a methodology for transmitting messages through innocuous covers to conceal their existence. This work proposes an asymmetric image steganographic method based on a chaotic dynamic system and the Euler theorem. The hidden message can be recovered using orbits different from the embedding orbits, and the original image is not required to extract the hidden message. Experimental results and discussions reveal that the proposed scheme possesses security, imperceptibility and survivability.

Journal Article•DOI•
TL;DR: The complex replica placement and routing optimization problem, in its essential form, can be expressed fairly simply, and can be solved for example client populations and realistic network topologies.
Abstract: Recent scalable multicast streaming protocols for on-demand delivery of media content offer the promise of greatly reduced server and network bandwidth. However, a key unresolved issue is how to design scalable content distribution systems that place replica servers closer to various client populations and route client requests and response streams so as to minimize the total server and network delivery cost. This issue is significantly more complex than the design of distribution systems for traditional Web files or unicast on-demand streaming, for two reasons. First, closest server and shortest path routing does not minimize network bandwidth usage; instead, the optimal routing of client requests and server multicasts is complex and interdependent. Second, the server bandwidth usage increases with the number of replicas. Nevertheless, this paper shows that the complex replica placement and routing optimization problem, in its essential form, can be expressed fairly simply, and can be solved for example client populations and realistic network topologies. The solutions show that the optimal scalable system can differ significantly from the optimal system for conventional delivery. Furthermore, simple canonical networks are analyzed to develop insights into effective heuristics for near-optimal placement and routing. The proposed new heuristics can be used for designing large and heterogeneous systems that are of practical interest. For a number of example networks, the best heuristics produce systems with total delivery cost that is within 16% of optimality.